Restore blog RSS post rendering with resilient /api/rss fallback parsing#27
Restore blog RSS post rendering with resilient /api/rss fallback parsing#27jaypatrick merged 4 commits intomainfrom
/api/rss fallback parsing#27Conversation
Agent-Logs-Url: https://github.com/jaypatrick/jk.com/sessions/3d187a85-ea0c-4f4f-9728-202cda228346 Co-authored-by: jaypatrick <1800595+jaypatrick@users.noreply.github.com>
/api/rss fallback parsing
There was a problem hiding this comment.
Pull request overview
Improves the resilience of the blog RSS ingestion endpoint (/api/rss) so feed-backed UI can still render posts when the upstream XML feed is blocked/invalid by falling back to the WordPress REST posts API.
Changes:
- Added numeric XML entity decoding (
&#...;,&#x...;) to improve rendering of WordPress-style excerpts/titles. - Added runtime-compatible request timeout signal creation (uses
AbortSignal.timeoutwhen available, otherwise anAbortControllerfallback). - Added a WordPress
/wp-json/wp/v2/postsfallback path when RSS/Atom fetch/parsing fails or yields no items, plus targeted tests.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| src/pages/api/rss.ts | Adds numeric entity decoding, timeout signal helper, and WordPress fallback parsing/response paths for /api/rss. |
| src/pages/api/rss.test.ts | Adds tests for numeric entity decoding and the HTML-response WordPress fallback behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const extractTag = (block: string, tags: string[]): string => { | ||
| for (const tag of tags) { | ||
| const match = block.match(new RegExp(`<${tag}(?:\\s[^>]*)?>([\\s\\S]*?)<\\/${tag}>`, 'i')); | ||
| const match = block.match(new RegExp(`<${tag}(?:\s[^>]*)?>([\s\S]*?)<\/${tag}>`, 'i')); |
There was a problem hiding this comment.
extractTag builds a RegExp from a string that contains \s/\S escapes, but the string literal currently uses single backslashes (\s is written as \s in the source). In JS/TS string literals \s is not a valid escape and becomes just s, so the regex will not match whitespace/newlines and tag extraction will break. Use double-escaped backslashes (e.g. \\s, \\S) or String.raw so the intended regex escapes reach the RegExp constructor.
| const match = block.match(new RegExp(`<${tag}(?:\s[^>]*)?>([\s\S]*?)<\/${tag}>`, 'i')); | |
| const match = block.match(new RegExp(String.raw`<${tag}(?:\s[^>]*)?>([\s\S]*?)<\/${tag}>`, 'i')); |
| const safeFromCodePoint = (codePoint: number): string => { | ||
| if (codePoint >= 0 && codePoint <= 0x10ffff) { | ||
| return String.fromCodePoint(codePoint); | ||
| } | ||
| return '\uFFFD'; | ||
| }; |
There was a problem hiding this comment.
safeFromCodePoint currently treats any value in 0..0x10FFFF as valid, which includes surrogate code points (0xD800–0xDFFF). Decoding numeric entities to lone surrogates can produce invalid/unpaired UTF-16 in JS strings and unexpected rendering. Consider rejecting surrogate ranges (and non-finite values) and returning U+FFFD for those as well.
| if (typeof AbortSignal.timeout === 'function') { | ||
| return fetch(input, { ...init, signal: AbortSignal.timeout(timeoutMs) }); |
There was a problem hiding this comment.
fetchWithTimeout assumes AbortSignal is always defined and references AbortSignal.timeout directly. In runtimes where AbortSignal is missing or not global, this will throw before falling back to AbortController. Consider guarding via globalThis.AbortSignal (e.g., typeof globalThis.AbortSignal?.timeout === 'function') to keep the timeout wrapper truly runtime-safe.
| if (typeof AbortSignal.timeout === 'function') { | |
| return fetch(input, { ...init, signal: AbortSignal.timeout(timeoutMs) }); | |
| const abortSignalCtor = globalThis.AbortSignal; | |
| if (typeof abortSignalCtor?.timeout === 'function') { | |
| return fetch(input, { ...init, signal: abortSignalCtor.timeout(timeoutMs) }); |
| const tryWordPressFallback = async (): Promise<FeedItem[] | null> => { | ||
| try { | ||
| const wpApiUrl = new URL('/wp-json/wp/v2/posts', `${parsedFeedUrl.origin}/`); | ||
| wpApiUrl.searchParams.set('per_page', String(max)); | ||
| wpApiUrl.searchParams.set('_fields', 'link,title.rendered,excerpt.rendered,date,date_gmt'); | ||
|
|
||
| const response = await fetchWithTimeout(wpApiUrl, { | ||
| headers: { | ||
| Accept: 'application/json', | ||
| 'User-Agent': 'Mozilla/5.0 (compatible; JKcom-RSSBot/1.0; +https://jaysonknight.com)', | ||
| }, | ||
| cache: 'no-store', | ||
| }); |
There was a problem hiding this comment.
The WordPress fallback reuses parsedFeedUrl.origin derived from the user-controlled url parameter, which means the endpoint will now make an additional request to /<origin>/wp-json/wp/v2/posts for arbitrary hosts. This increases SSRF blast radius (extra internal endpoints hit on the same host) compared to the RSS-only fetch. Consider restricting allowed origins (e.g., a known blog host), enforcing http/https only, and/or blocking private IP ranges before issuing either fetch.
| if (!response.ok) { | ||
| const fallbackItems = await tryWordPressFallback(); | ||
| if (fallbackItems) { | ||
| return respondWithItems(fallbackItems); | ||
| } | ||
| console.error('[api/rss] Failed to fetch feed with non-OK status:', response.status, 'for URL:', sanitizedFeedUrl); | ||
| return new Response(JSON.stringify({ error: `Failed to fetch feed (${response.status}).` }), { |
There was a problem hiding this comment.
Fallback behavior is now triggered in several new cases (non-OK status, invalid feed XML, empty parsed items, thrown errors), but the new tests only cover the text/html trigger. Add at least one more test that demonstrates a successful fallback for another trigger (e.g., upstream 503 or invalid feed body) and a test that asserts no fallback request is made when the primary RSS/Atom parse yields items.
Blog feed cards were failing to populate because
/api/rssdepended on a single upstream XML feed path that can return challenge HTML/invalid payloads in production. This change makes feed ingestion tolerant of those upstream failure modes so posts still render.Feed retrieval hardening
/{origin}/wp-json/wp/v2/postswhen primary RSS/Atom fetch returns:text/html(challenge/redirect pages)Parsing quality improvements
&#...;,&#x...;) so excerpts/titles from WordPress-style content render correctly.AbortSignal.timeoutwhen present, otherwiseAbortControllerfallback) to avoid environment-specific timeout regressions.Targeted API behavior coverage