George Song

Building Better RSS/Atom Feeds in Astro with the `unified` Ecosystem

Migrating this site from Gatsby v2 to Astro v5 was a big success! 🙌 But one area required significant custom work: generating RSS and Atom feeds. While feeds are crucial for content syndication, handling MDX content within them presented unexpected challenges that required a custom solution.

Requirements for Feed Generation

My goal was to generate both RSS and Atom feeds that accurately represented the site’s articles, which are written in MDX. This meant addressing several key requirements:

  1. Dual Format Support: Provide both RSS 2.0 and Atom 1.0 feeds.
  2. MDX Content Handling: Process MDX source, stripping out JavaScript (e.g., import statements) and dynamic React components that wouldn’t work in a feed.
  3. Consistent HTML: Ensure the HTML output within feed entries (<content:encoded> or <content>) was clean, valid, and accurately reflected the article structure.
  4. Valid Markup: Maintain valid HTML, avoiding issues like unclosed tags often caused by aggressive minification.
  5. Absolute URLs: Convert all relative links within the content to absolute URLs based on the site’s domain.
  6. Proper Encoding: Ensure correct character encoding throughout the process.

The Standard Approach: Astro’s Built-in RSS Package

Astro provides an official @astrojs/rss package for generating feeds. Initially, this seemed like the obvious solution to use.

// Example using @astrojs/rss (Conceptual)
import rss from "@astrojs/rss";
import { getCollection } from "astro:content";
import sanitizeHtml from "sanitize-html";
import MarkdownIt from "markdown-it";
const parser = new MarkdownIt();
export async function GET(context) {
const posts = await getCollection("blog");
return rss({
// ... feed options
items: posts.map((post) => ({
// ... item options
content: sanitizeHtml(parser.render(post.body)), // Problematic for MDX
})),
});
}

However, this approach quickly proved insufficient for several reasons:

  1. MDX Incompatibility: @astrojs/rss relies on tools like markdown-it and sanitize-html, which are designed for standard Markdown (.md), not MDX (.mdx). These tools don’t understand JSX or imports/components embedded within the content. My attempts to render MDX to HTML using Astro’s internal tools specifically for the feed proved complex and unreliable.
  2. Limited Transformation: The package lacked built-in mechanisms to easily transform relative URLs to absolute ones within the rendered HTML content.
  3. Insufficient Control: There wasn’t a straightforward way to selectively remove specific elements generated by MDX processing (like automatically generated Tables of Contents) or handle custom syntax (like <mark> tags used for highlighting) correctly for feed output.
  4. No Atom Support: The package focuses on RSS, lacking direct support for generating Atom feeds.

A More Flexible Solution: The unified Ecosystem

To gain the necessary control over the MDX-to-HTML transformation pipeline specifically for feeds, I turned to the unified ecosystem. This powerful framework processes content through Abstract Syntax Trees (ASTs), giving precise control at each transformation stage:

For structuring and generating the final XML output, I chose the feed library. It offers a clean and effective API for creating both RSS 2.0 and Atom 1.0 feeds once the content is properly prepared.

Building a Custom MDX-to-HTML Pipeline

The core task was creating a function, mdxToHtml, to convert raw MDX article content and the description into clean, valid, feed-friendly HTML. Let’s look at how this pipeline works:

feeds/utils.ts
16 collapsed lines
import type { Root as HastRoot, RootContent } from "hast";
import type { Root as MdastRoot } from "mdast";
import type { Plugin } from "unified";
import { Buffer } from "node:buffer";
import minifyHtml from "@minify-html/node";
import rehypeStringify from "rehype-stringify";
import remarkMarkers from "remark-flexible-markers";
import remarkMdx from "remark-mdx";
import remarkParse from "remark-parse";
import remarkRehype from "remark-rehype";
import { unified } from "unified";
type UrlLike = URL | string;
export async function mdxToHtml(
mdxContent: string,
site: UrlLike,
): Promise<string> {
const result = await unified()
.use(remarkParse) // 1. Parse Markdown/MDX text -> MDAST
.use(remarkMdx) // 2. Handle MDX specific syntax (JSX, imports/exports)
.use(remarkMarkers, { markerClassName: () => [] }) // 3. Handle <mark> correctly
.use(remarkRemoveToc) // 4. Custom: Remove "Table of Contents" headings
.use(remarkRemoveImports) // 5. Custom: Remove JS import/export statements
.use(remarkRehype) // 6. Bridge: Convert MDAST -> HAST
.use(rehypeAbsoluteUrls, site) // 7. Custom: Convert relative URLs -> absolute
.use(rehypeStringify) // 8. Convert HAST -> HTML string
.process(mdxContent);
// 9. Minify the HTML safely
return minifyHtml
.minify(Buffer.from(result.toString()), { keep_closing_tags: true })
.toString();
}
62 collapsed lines
// Custom remark plugin to remove JS imports/exports
const remarkRemoveImports: Plugin<[], MdastRoot> = () => {
return (tree) => {
tree.children = tree.children.filter((node) => node.type !== "mdxjsEsm");
return tree;
};
};
// Custom remark plugin to remove "Table of Contents" headings
const remarkRemoveToc: Plugin<[], MdastRoot> = () => {
// ... (implementation filters out h2 nodes matching "Table of Contents")
return (tree: MdastRoot) => {
tree.children = tree.children.filter((node) => {
if (node.type === "heading" && node.depth === 2) {
const text = node.children
.filter((child) => child.type === "text")
.map((child) => child.value)
.join("")
.trim();
const tocRegex = /(table[ -]of[ -])?contents?|toc/i;
return !tocRegex.test(text);
}
return true;
});
return tree;
};
};
// Custom rehype plugin to make URLs absolute
const rehypeAbsoluteUrls: Plugin<[UrlLike], HastRoot> = (baseUrl) => {
// ... (implementation traverses HAST, updates href/src)
return (tree) => {
const visit = (node: RootContent | HastRoot) => {
if (node.type === "element") {
if (node.tagName === "a" && node.properties?.href) {
node.properties.href = createUrl(
node.properties.href as string,
baseUrl,
);
}
// Note: A complete implementation would also handle `img[src]`, etc.
}
if ("children" in node) {
node.children.forEach(visit);
}
};
visit(tree);
return tree;
};
};
// Helper to create absolute URLs
export function createUrl(path: string, baseUrl: UrlLike): string | null {
try {
const fullUrl = new URL(path, baseUrl);
return fullUrl.href;
} catch (error) {
console.error("Invalid path or base URL:", error);
return null;
}
}

Key Plugins and Customizations:

Overcoming Processing Challenges

Even with the powerful unified ecosystem, I encountered several challenges that we need to solve:

  1. Typography and Encoding Issues: An attempt to use remark-smartypants for typographic enhancements (curly quotes, em-dashes) resulted in mangled characters in the final feed output. Unable to properly debug the encoding conflicts, I omitted remark-smartypants from the feed generation pipeline to ensure feed validity.
  2. Minification Complications: Initial attempts used rehype-preset-minify to reduce HTML size. However, its default settings proved too aggressive, sometimes removing optional closing tags (like </p>) which, while valid in browsers, could break stricter XML parsers used by some feed readers. Switching to @minify-html/node with the keep_closing_tags: true option provided safe and effective minification.
feeds/utils.ts
export async function mdxToHtml(
mdxContent: string,
site: UrlLike,
): Promise<string> {
const result = await unified()
// ... remark/rehype pipeline
.use(rehypeStringify)
.process(mdxContent);
return minifyHtml
.minify(Buffer.from(result.toString()), { keep_closing_tags: true })
.toString();
}

Assembling the Complete Solution

With the mdxToHtml utility handling the complex content transformation, the final feed generation logic became much cleaner and more maintainable.

The generateFeed function in feeds/index.ts orchestrates the process:

feeds/index.ts
10 collapsed lines
import type { APIContext } from "astro";
import type { Author, FeedOptions } from "feed";
import { getCollection } from "astro:content";
import { Feed } from "feed"; // Use the 'feed' library
import { createUrl, mdxToHtml } from "./utils"; // Import our custom utils
// ... Author interface
export async function generateFeed(context: APIContext): Promise<Feed> {
4 collapsed lines
const site = context.site!.toString();
const author: SiteAuthor = {
/* ... author details */
};
const feed = createFeedInstance(site, author); // Initialize Feed object
await addArticlesToFeed(feed, site, author); // Add processed articles
return feed;
}
function createFeedInstance(site: string, author: SiteAuthor): Feed {
const feedOptions: FeedOptions = {
/* ... feed metadata */
};
return new Feed(feedOptions);
}
async function addArticlesToFeed(
feed: Feed,
site: string,
author: SiteAuthor,
): Promise<void> {
4 collapsed lines
const articles = (await getCollection("articles")).sort(
(a, b) => b.data.published.valueOf() - a.data.published.valueOf(),
);
for (const article of articles) {
2 collapsed lines
const link = createUrl(`/articles/${article.slug}`, site) as string;
feed.addItem({
6 collapsed lines
title: article.data.title,
id: link,
link,
published: article.data.published,
date: article.data.updated || article.data.published, // Use updated if available
author: [author],
// Process description and body using our custom mdxToHtml
description: await mdxToHtml(article.data.summary, site),
content: await mdxToHtml(article.body || "", site),
});
}
}

Finally, simple Astro API endpoints (pages/atom.xml.ts and pages/rss.xml.ts) call generateFeed and use the appropriate methods from the feed library to return the XML:

pages/atom.xml.ts
import type { APIContext } from "astro";
import { generateFeed } from "@/data/feeds";
export async function GET(context: APIContext) {
const feed = await generateFeed(context);
return new Response(feed.atom1(), {
headers: { "Content-Type": "application/atom+xml" },
});
}
pages/rss.xml.ts
import type { APIContext } from "astro";
import { generateFeed } from "@/data/feeds";
export async function GET(context: APIContext) {
const feed = await generateFeed(context);
return new Response(feed.rss2(), {
headers: { "Content-Type": "application/xml" },
});
}

Conclusion: A Robust Feed Solution for Complex Content

While Astro’s standard RSS package works well for simpler sites using standard Markdown, the complexities of MDX content required a more tailored approach. The unified toolchain, combined with custom remark and rehype plugins, provided the granular control needed to:

The primary job of our feed generation pipeline is to transform complex MDX content into clean, valid HTML that works reliably in feed readers while preserving the essence of our articles.

Pairing this custom pipeline with the robust feed library resulted in valid, clean, and content-rich Atom and RSS feeds that work reliably across feed readers. This solution successfully addressed one of the key challenges in my migration from Gatsby to Astro, ensuring content syndication remained a first-class feature of the site.