Every day more and more people are using ChatGPT, Perplexity, and other AI tools to search and summarize content. As discussed in one of my earlier posts on SEO to SAO, it is only a matter of time when the majority of visits to websites and blogs will come from AI agents.
When it comes to making your content accessible to language models, there’s a simpler way than complex crawling solutions - especially for statically generated sites built with Astro. In this post, I’ll walk you through creating an llms.txt file that exposes your blog content to LLMs in a clean, structured format.
What is llms.txt?
An llms.txt file is conceptually similar to robots.txt but designed specifically for language models. It provides a structured, text-based representation of your content that’s easy for LLMs to parse and understand.
LLMs are not very good at navigating through multiple pages through links. Instead, LLMs are very good at scraping content from a single page and storing it into their memory. This is where llms.txt becomes invaluable.
Why Crawling Tools Like Crawl4AI May Be Overkill
Tools like Crawl4AI offer powerful website crawling capabilities for LLMs. While they are ideal for generating LLMs.txt for dynamic sites, they can be overkill for static sites.
For Astro sites especially, where content is typically stored as markdown files with frontmatter, you already have perfectly structured content ready to be exposed directly.
Implementing an llms.txt Endpoint in Astro
Here’s how you can generate LLMs.txt files for your Astro site.
Create a file at src/pages/llms.txt.ts
(or src/pages/api/llms.txt.ts
depending on your Astro configuration) and add the following code:
import { getCollection } from "astro:content";
import type { APIRoute } from "astro";
export const GET: APIRoute = async () => {
try {
// Fetch all content collections
const [blogs, features, transcriptions, alternatives, help] = await Promise.all([
getCollection("blog"),
getCollection("features"),
getCollection("transcription"),
getCollection("alternatives"),
getCollection("help")
]);
// Sort blogs by date (newest first)
const sortedBlogs = blogs
.filter(post => !post.data.draft)
.sort((a, b) => new Date(b.data.date).getTime() - new Date(a.data.date).getTime());
// Filter non-draft content
const activeFeatures = features.filter(item => !item.data.draft);
const activeTranscriptions = transcriptions.filter(item => !item.data.draft);
const activeAlternatives = alternatives.filter(item => !item.data.draft);
const content = `# Your Website Name - Complete Content Guide
This document contains the complete content from your website.
Website: https://yourwebsite.com
Last Updated: ${new Date().toISOString().split('T')[0]}
## Blog Content
${sortedBlogs
.map((post) => `### ${post.data.title}
URL: https://yourwebsite.com/blog/${post.data.slug || post.id}
Published: ${post.data.date}
Category: ${post.data.category}
Author: ${post.data.author}
Description: ${post.data.description}
${post.body}
---`).join('\n\n')}
## Additional Content Sections
${activeFeatures
.map((feature) => `### ${feature.data.title}
URL: https://yourwebsite.com/${feature.data.slug}
Category: ${feature.data.category}
${feature.data.description}
${feature.body}
---`).join('\n\n')}
---
This content is provided to help AI assistants understand your website's offerings and provide accurate information.`;
return new Response(content, {
headers: {
"Content-Type": "text/plain; charset=utf-8",
"Cache-Control": "public, max-age=3600" // Cache for 1 hour
},
});
} catch (error) {
console.error('Error generating llms.txt:', error);
return new Response('Error generating llms.txt', { status: 500 });
}
};
How This Code Works
This code creates an API endpoint that:
- Fetches all posts from your content collections using
getCollection()
- Filters out draft content to only include published articles
- Sorts content by date to show the most recent content first
- Creates a structured text file starting with your site title
- Includes for each post:
- The post title as a heading
- A direct link to the post
- Metadata (date, category, author, description)
- The full post content
- Handles errors gracefully with proper error logging
- Sets appropriate headers including caching for performance
Customizing for Your Content Structure
Adjusting Collection Names
The example above uses multiple collections. If you only have a blog collection, simplify it:
export const GET: APIRoute = async () => {
const posts = await getCollection("blog"); // Adjust to your collection name
const publishedPosts = posts
.filter(post => !post.data.draft)
.sort((a, b) => new Date(b.data.date).getTime() - new Date(a.data.date).getTime());
const content = `# Your Blog Name
${publishedPosts
.map((post) => `# ${post.data.title}
https://yourwebsite.com/blog/${post.data.slug || post.id}
${post.data.description}
${post.body}
`).join('\n\n')}`;
return new Response(content, {
headers: { "Content-Type": "text/plain; charset=utf-8" },
});
};
Adding Content Filtering
You might want to exclude certain categories or include only featured content:
// Only include featured posts
const featuredPosts = posts.filter(post => post.data.featured && !post.data.draft);
// Exclude specific categories
const filteredPosts = posts.filter(post =>
!post.data.draft && !['internal', 'private'].includes(post.data.category)
);
Benefits of Using llms.txt
1. Improved AI Discoverability
AI tools can quickly understand your entire content library without crawling multiple pages.
2. Better Context for AI Responses
When users ask questions related to your content, AI models have access to comprehensive, structured information.
3. SEO for the AI Era
As search evolves toward AI-powered results, having structured content for AI consumption becomes crucial.
4. Performance Benefits
Static generation means your llms.txt file is created at build time, providing fast response times.
Testing Your llms.txt Implementation
After implementing the endpoint, test it by:
- Visit the endpoint directly:
https://yoursite.com/llms.txt
- Check the content structure to ensure all posts are included
- Verify the formatting is clean and readable
- Test with AI tools by asking them to analyze your llms.txt content
Best Practices for llms.txt
Keep Content Fresh
Update your llms.txt regularly by rebuilding your site when you publish new content.
Include Relevant Metadata
Add publication dates, categories, and descriptions to help AI understand context.
Structure Content Clearly
Use consistent heading formats and clear separators between sections.
Monitor File Size
For sites with hundreds of posts, consider paginating or filtering content to keep the file manageable.
Conclusion
Creating an llms.txt file for your Astro website is a straightforward way to make your content accessible to AI language models. This approach leverages Astro’s content collections to create a structured, comprehensive view of your site’s content.
As AI becomes increasingly important for content discovery, implementing llms.txt positions your website for better visibility in the AI-powered search landscape. The implementation is simple, performant, and easy to maintain as part of your Astro build process.
Start implementing llms.txt today to ensure your content is ready for the future of AI-powered search and discovery.
Frequently Asked Questions
What’s the difference between llms.txt and sitemap.xml?
While sitemap.xml lists your pages for search engine crawlers, llms.txt provides the actual content in a format optimized for language models to understand and process.
How often should I update my llms.txt file?
Your llms.txt file updates automatically when you rebuild your Astro site, so it stays current with your content publishing schedule.
Can I include images and media in llms.txt?
llms.txt is text-based, so include descriptions of images and media rather than the files themselves. Focus on textual content that AI can process effectively.
Will llms.txt affect my SEO?
No, llms.txt won’t negatively impact traditional SEO. It’s designed to complement your existing SEO strategy by making content accessible to AI tools.
How large should my llms.txt file be?
There’s no strict limit, but keep it reasonable. For sites with hundreds of posts, consider filtering to include only your most important or recent content.