Development

How to Generate an llms.txt File for Your Astro Website - Complete Guide

How to Generate an llms.txt File for Your Astro Website - Complete Guide

Every day more and more people are using ChatGPT, Perplexity, and other AI tools to search and summarize content. As discussed in one of my earlier posts on SEO to SAO, it is only a matter of time when the majority of visits to websites and blogs will come from AI agents.

When it comes to making your content accessible to language models, there’s a simpler way than complex crawling solutions - especially for statically generated sites built with Astro. In this post, I’ll walk you through creating an llms.txt file that exposes your blog content to LLMs in a clean, structured format.

What is llms.txt?

An llms.txt file is conceptually similar to robots.txt but designed specifically for language models. It provides a structured, text-based representation of your content that’s easy for LLMs to parse and understand.

LLMs are not very good at navigating through multiple pages through links. Instead, LLMs are very good at scraping content from a single page and storing it into their memory. This is where llms.txt becomes invaluable.

Why Crawling Tools Like Crawl4AI May Be Overkill

Tools like Crawl4AI offer powerful website crawling capabilities for LLMs. While they are ideal for generating LLMs.txt for dynamic sites, they can be overkill for static sites.

For Astro sites especially, where content is typically stored as markdown files with frontmatter, you already have perfectly structured content ready to be exposed directly.

Implementing an llms.txt Endpoint in Astro

Here’s how you can generate LLMs.txt files for your Astro site.

Create a file at src/pages/llms.txt.ts (or src/pages/api/llms.txt.ts depending on your Astro configuration) and add the following code:

import { getCollection } from "astro:content";
import type { APIRoute } from "astro";

export const GET: APIRoute = async () => {
    try {
        // Fetch all content collections
        const [blogs, features, transcriptions, alternatives, help] = await Promise.all([
            getCollection("blog"),
            getCollection("features"), 
            getCollection("transcription"),
            getCollection("alternatives"),
            getCollection("help")
        ]);

        // Sort blogs by date (newest first)
        const sortedBlogs = blogs
            .filter(post => !post.data.draft)
            .sort((a, b) => new Date(b.data.date).getTime() - new Date(a.data.date).getTime());

        // Filter non-draft content
        const activeFeatures = features.filter(item => !item.data.draft);
        const activeTranscriptions = transcriptions.filter(item => !item.data.draft);
        const activeAlternatives = alternatives.filter(item => !item.data.draft);

        const content = `# Your Website Name - Complete Content Guide

This document contains the complete content from your website.

Website: https://yourwebsite.com
Last Updated: ${new Date().toISOString().split('T')[0]}

## Blog Content

${sortedBlogs
    .map((post) => `### ${post.data.title}

URL: https://yourwebsite.com/blog/${post.data.slug || post.id}
Published: ${post.data.date}
Category: ${post.data.category}
Author: ${post.data.author}
Description: ${post.data.description}

${post.body}

---`).join('\n\n')}

## Additional Content Sections

${activeFeatures
    .map((feature) => `### ${feature.data.title}

URL: https://yourwebsite.com/${feature.data.slug}
Category: ${feature.data.category}
${feature.data.description}

${feature.body}

---`).join('\n\n')}

---

This content is provided to help AI assistants understand your website's offerings and provide accurate information.`;

        return new Response(content, {
            headers: { 
                "Content-Type": "text/plain; charset=utf-8",
                "Cache-Control": "public, max-age=3600" // Cache for 1 hour
            },
        });
    } catch (error) {
        console.error('Error generating llms.txt:', error);
        return new Response('Error generating llms.txt', { status: 500 });
    }
};

How This Code Works

This code creates an API endpoint that:

  1. Fetches all posts from your content collections using getCollection()
  2. Filters out draft content to only include published articles
  3. Sorts content by date to show the most recent content first
  4. Creates a structured text file starting with your site title
  5. Includes for each post:
    • The post title as a heading
    • A direct link to the post
    • Metadata (date, category, author, description)
    • The full post content
  6. Handles errors gracefully with proper error logging
  7. Sets appropriate headers including caching for performance

Customizing for Your Content Structure

Adjusting Collection Names

The example above uses multiple collections. If you only have a blog collection, simplify it:

export const GET: APIRoute = async () => {
    const posts = await getCollection("blog"); // Adjust to your collection name
    
    const publishedPosts = posts
        .filter(post => !post.data.draft)
        .sort((a, b) => new Date(b.data.date).getTime() - new Date(a.data.date).getTime());

    const content = `# Your Blog Name

${publishedPosts
    .map((post) => `# ${post.data.title}

https://yourwebsite.com/blog/${post.data.slug || post.id}

${post.data.description}

${post.body}
`).join('\n\n')}`;

    return new Response(content, {
        headers: { "Content-Type": "text/plain; charset=utf-8" },
    });
};

Adding Content Filtering

You might want to exclude certain categories or include only featured content:

// Only include featured posts
const featuredPosts = posts.filter(post => post.data.featured && !post.data.draft);

// Exclude specific categories
const filteredPosts = posts.filter(post => 
    !post.data.draft && !['internal', 'private'].includes(post.data.category)
);

Benefits of Using llms.txt

1. Improved AI Discoverability

AI tools can quickly understand your entire content library without crawling multiple pages.

2. Better Context for AI Responses

When users ask questions related to your content, AI models have access to comprehensive, structured information.

3. SEO for the AI Era

As search evolves toward AI-powered results, having structured content for AI consumption becomes crucial.

4. Performance Benefits

Static generation means your llms.txt file is created at build time, providing fast response times.

Testing Your llms.txt Implementation

After implementing the endpoint, test it by:

  1. Visit the endpoint directly: https://yoursite.com/llms.txt
  2. Check the content structure to ensure all posts are included
  3. Verify the formatting is clean and readable
  4. Test with AI tools by asking them to analyze your llms.txt content

Best Practices for llms.txt

Keep Content Fresh

Update your llms.txt regularly by rebuilding your site when you publish new content.

Include Relevant Metadata

Add publication dates, categories, and descriptions to help AI understand context.

Structure Content Clearly

Use consistent heading formats and clear separators between sections.

Monitor File Size

For sites with hundreds of posts, consider paginating or filtering content to keep the file manageable.

Conclusion

Creating an llms.txt file for your Astro website is a straightforward way to make your content accessible to AI language models. This approach leverages Astro’s content collections to create a structured, comprehensive view of your site’s content.

As AI becomes increasingly important for content discovery, implementing llms.txt positions your website for better visibility in the AI-powered search landscape. The implementation is simple, performant, and easy to maintain as part of your Astro build process.

Start implementing llms.txt today to ensure your content is ready for the future of AI-powered search and discovery.

Frequently Asked Questions

What’s the difference between llms.txt and sitemap.xml?

While sitemap.xml lists your pages for search engine crawlers, llms.txt provides the actual content in a format optimized for language models to understand and process.

How often should I update my llms.txt file?

Your llms.txt file updates automatically when you rebuild your Astro site, so it stays current with your content publishing schedule.

Can I include images and media in llms.txt?

llms.txt is text-based, so include descriptions of images and media rather than the files themselves. Focus on textual content that AI can process effectively.

Will llms.txt affect my SEO?

No, llms.txt won’t negatively impact traditional SEO. It’s designed to complement your existing SEO strategy by making content accessible to AI tools.

How large should my llms.txt file be?

There’s no strict limit, but keep it reasonable. For sites with hundreds of posts, consider filtering to include only your most important or recent content.

Andre Smith

Andre Smith

Expert in technology, productivity, and software solutions. Passionate about helping teams work more efficiently through innovative tools and strategies.

Related Articles

Discover more insights and tips to boost your productivity

Discover More Insights

Explore our blog for more productivity tips, technology insights, and software solutions.