feat: add IPFS CID serving via Kubo sidecar (PE-9067)#682
feat: add IPFS CID serving via Kubo sidecar (PE-9067)#682vilenarios wants to merge 11 commits intodevelopfrom
Conversation
Add opt-in IPFS content serving to AR.IO Gateway. Operators enable
IPFS_ENABLED=true and start a Kubo Docker sidecar (--profile ipfs) to
serve IPFS content via path-based (/ipfs/{CID}) and subdomain-based
({CID}.{gateway-host}) access patterns.
Features:
- Kubo HTTP gateway integration with connection + stall timeouts
- LRU bounded filesystem cache (streams to disk, not memory)
- File-based CID blocklist with hot-reload
- Separate rate limiter pool for IPFS traffic
- x402 payment protection (same as Arweave data endpoints)
- HTTPSIG response signing for verifiable IPFS responses
- CIDv0 to CIDv1 base32 redirect (DNS-safe, works with wildcard certs)
- Cross-CID redirect for directory listing navigation
- Path traversal protection
- Docker Compose profile with TCP+UDP swarm ports
Default off (IPFS_ENABLED=false). Zero runtime impact when disabled.
Designed as foundation for Phase 2 ArNS-to-CID resolution.
Fix strict-boolean-expressions (explicit nullish checks) and prettier formatting issues caught by CI eslint.
Add OpenTelemetry span tracing to IPFS request lifecycle: - IpfsService.getContent span (cache check, blocklist, delegation) - KuboDataSource.getContent span (HTTP fetch with latency attributes) - Spans record cache hit/miss, content size, errors, and content type Also fix strict-boolean-expressions and prettier formatting for CI.
- Fix prettier line-length formatting (5 errors from CI) - Add IPFS cache volume mount to docker-compose (data persists) - Add IPFS field to /ar-io/info endpoint (network discovery) - Add OTEL span tracing to IpfsService and KuboDataSource
Remove custom text-file blocklist in favor of the existing PUT /ar-io/admin/block-data API. Operators block IPFS CIDs the same way they block Arweave TX IDs — unified moderation, single API. Removed: IpfsBlocklist class, IPFS_BLOCKLIST_PATH config, blocklist volume mount. IpfsService now uses DataBlockListValidator (SQLite).
…or moderation (PE-9067)
- IPFS rate limiter defaults now match Arweave (100K IP tokens, 20/s refill,
1M resource tokens, 100/s refill)
- Remove custom text-file blocklist — use existing PUT /ar-io/admin/block-data
API for CID moderation (unified with Arweave content moderation)
- Remove IPFS_BLOCKLIST_PATH config and volume mount
- Add IPFS cache volume mount to docker-compose for persistence
…it defaults (PE-9067)
- Add IPFS Grafana dashboard example (requests/sec, cache hit rate,
latency percentiles, content size, blocked requests, route type)
- Update OpenAPI spec: block-data endpoint accepts IPFS CIDs
- Update ipfs-integration.md: admin API for moderation (not text file)
- Match IPFS rate limiter defaults to Arweave (100K IP, 1M resource)
- Enforce IPFS_MAX_RESPONSE_SIZE_BYTES (reject with 413 when Content-Length exceeds limit) - Destroy response stream on 404/408/504 from Kubo (prevents socket leak) - Fix docs/envs.md rate limiter defaults to match config.ts - Add Grafana dashboard example for IPFS metrics - Update OpenAPI spec for CID content moderation
📝 WalkthroughWalkthroughAdds an opt-in IPFS gateway: Kubo sidecar in compose, config/env vars, CID utilities, filesystem LRU cache, Kubo streaming data source, IpfsService with tracing/limits/cache, Express subdomain/path middleware and routes, metrics, docs, tests, and dashboards. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant ExpressApp as Express App
participant IpfsHandler as IPFS Handler
participant IpfsService as IpfsService
participant Cache as IpfsFsCache
participant Kubo as KuboDataSource
participant KuboNode as Kubo Node
Client->>ExpressApp: GET /ipfs/{CID} or {CID}.{host}
ExpressApp->>IpfsHandler: dispatch request
IpfsHandler->>IpfsService: getContent({cid, path, signal})
IpfsService->>IpfsService: normalize CID -> v1 base32
IpfsService->>IpfsService: check blocklist
IpfsService->>Cache: has/get(cid,path)
alt Cache hit
Cache-->>IpfsService: stream + metadata
IpfsService-->>IpfsHandler: return cached stream
else Cache miss
IpfsService->>Kubo: getContent(cid,path)
Kubo->>KuboNode: HTTP GET /ipfs/{cid}/{path}
KuboNode-->>Kubo: stream response
Kubo-->>IpfsService: stream + metadata
IpfsService->>Cache: write temp file & finalize (async)
IpfsService-->>IpfsHandler: return live stream
end
IpfsHandler->>IpfsHandler: check payment & rate limits
IpfsHandler->>Client: set headers (ETag, Cache-Control, X-Ipfs-Path) and stream response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
🟡 Minor comments (17)
test-ipfs.sh-55-71 (1)
55-71:⚠️ Potential issue | 🟡 Minor
test_headerwith emptyexpected_valuepasses even when the header is absent.
echo "$actual" | grep -qi ""matches any string (including empty), so "X-Cache header present" and "Content-Type is set" will report PASS even if the header is missing from the response. Check for non-emptyactualexplicitly when the intent is presence-only:🔧 Proposed fix
actual=$(curl -s -I --max-time 30 "$url" 2>&1 | grep -i "^$header:" | head -1 | sed 's/^[^:]*: //' | tr -d '\r') - if echo "$actual" | grep -qi "$expected_value"; then + if [ -n "$expected_value" ]; then + match_ok=$(echo "$actual" | grep -qi "$expected_value" && echo yes || echo no) + else + match_ok=$([ -n "$actual" ] && echo yes || echo no) + fi + if [ "$match_ok" = "yes" ]; then echo -e " ${GREEN}PASS${NC} $name ($actual)" ((pass++)) elseAlso applies to: 136-142
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test-ipfs.sh` around lines 55 - 71, The test_header function incorrectly treats an empty expected_value as a match because echo "$actual" | grep -qi "" always succeeds; update test_header to first check whether expected_value is empty and in that case assert that actual is non-empty (i.e., the header exists), otherwise perform the current case-insensitive substring match; refer to the test_header function and use the existing local variables actual and expected_value to implement the conditional branch so presence-only checks fail when the header is absent.test-ipfs.sh-93-101 (1)
93-101:⚠️ Potential issue | 🟡 MinorHardcoded container name
ar-io-node-kubo-1is brittle.The container name is derived from the Compose project name (default = directory name). If the operator runs Compose from a different path or sets
COMPOSE_PROJECT_NAME, bothdocker execcalls fail and the script aborts without a clear diagnostic. Resolve the container by service label instead:🔧 Proposed fix
-FILE_CID=$(echo "Hello from AR.IO IPFS integration test!" | docker exec -i ar-io-node-kubo-1 ipfs add -q 2>&1) +KUBO_CONTAINER=$(docker ps --filter "label=com.docker.compose.service=kubo" --format '{{.Names}}' | head -1) +if [ -z "$KUBO_CONTAINER" ]; then + echo -e " ${RED}FAIL${NC} No running kubo container found" + exit 1 +fi +FILE_CID=$(echo "Hello from AR.IO IPFS integration test!" | docker exec -i "$KUBO_CONTAINER" ipfs add -q 2>&1)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test-ipfs.sh` around lines 93 - 101, Replace the hardcoded container name used in the two docker exec calls (currently "ar-io-node-kubo-1") by resolving the running container ID via its Compose service label before assigning FILE_CID and DIR_CID: run a docker ps lookup filtering on the Compose service label (e.g. "label=com.docker.compose.service=kubo") and fall back to the Compose project+service label if needed, store that container ID in a variable (use that variable in the two docker exec invocations that populate FILE_CID and DIR_CID) so the script works regardless of COMPOSE_PROJECT_NAME or working directory.docs/envs.md-376-376 (1)
376-376:⚠️ Potential issue | 🟡 MinorFix the documented cache cleanup env var name.
The implementation exports
IPFS_CACHE_CLEANUP_THRESHOLD_SECONDS, but the docs listIPFS_CACHE_CLEANUP_THRESHOLD, so operators would configure a no-op variable.Proposed doc fix
-| IPFS_CACHE_CLEANUP_THRESHOLD | Number | 3600 | Age in seconds before cached files become eviction candidates | +| IPFS_CACHE_CLEANUP_THRESHOLD_SECONDS | Number | 3600 | Age in seconds before cached files become eviction candidates |Based on learnings, update relevant documentation files in
docs/in the same PR when changing behavior affecting documented contracts.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/envs.md` at line 376, The docs list the wrong environment variable name — update all documentation entries that reference IPFS_CACHE_CLEANUP_THRESHOLD to the actual exported name IPFS_CACHE_CLEANUP_THRESHOLD_SECONDS, keeping the same default value (3600) and description ("Age in seconds before cached files become eviction candidates"), and search other files under docs/ for any remaining occurrences of IPFS_CACHE_CLEANUP_THRESHOLD to replace them so docs match the implementation.src/routes/ar-io-info-builder.ts-141-141 (1)
141-141:⚠️ Potential issue | 🟡 MinorDocument the new
/ar-io/infoipfsfield in OpenAPI.This builder can now emit
ipfs: { enabled: true }, but the provided/ar-io/infoOpenAPI schema does not expose that property, so generated clients won’t see the advertised capability.Based on learnings, update relevant documentation files in
docs/in the same PR when changing behavior affecting documented contracts.Also applies to: 325-327
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/routes/ar-io-info-builder.ts` at line 141, The OpenAPI schema for the /ar-io/info response needs to include the new optional ipfs property so generated clients see ipfs: { enabled: true }; update the response model in the OpenAPI spec (the schema that maps to the TypeScript builder which now includes ipfs?: IpfsInfo in src/routes/ar-io-info-builder.ts) to add an ipfs property (object with enabled:boolean) and ensure any components/schemas referenced are added or updated accordingly; also update the docs/ pages that describe the /ar-io/info contract (and the related places called out around lines 325-327) in the same PR so documentation and generated clients remain in sync.docs/openapi.yaml-2812-2818 (1)
2812-2818:⚠️ Potential issue | 🟡 MinorUpdate the request schema to match the expanded accepted IDs.
The prose now documents IPFS CIDs and content hashes, but the
idschema description below still says “TX ID for a transaction”. Please update that field description and consider adding a CID example so generated docs don’t contradict this section.Suggested OpenAPI doc adjustment
- description: TX ID for a transaction you want to block. + description: TX ID, data-item ID, IPFS CIDv1 base32 string, or SHA-256 content hash you want to block.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/openapi.yaml` around lines 2812 - 2818, Update the request schema for the block endpoint so the "id" field description reflects that it accepts transaction IDs, data-item IDs, IPFS CIDv1 base32 strings, or sha-256 content hashes (not just "TX ID for a transaction"); update the "id" schema description text in docs/openapi.yaml (the schema for the block request body / property named "id") and add an example value for an IPFS CIDv1 base32 string (e.g. bafkreigbk3hjz6oyiywqf7eknthwc2osvt5xi6b6igwljn2qrxkthqgrp4) so generated docs show a CID example alongside existing TX/hash examples.src/ipfs/kubo-data-source.test.ts-33-65 (1)
33-65:⚠️ Potential issue | 🟡 MinorMake these tests assert URL construction, or remove them.
Because the signal is aborted before
getContent()runs,signal.throwIfAborted()exits before the Kubo URL is built. These two tests also catch any error without asserting the URL, so they provide false confidence and duplicate the explicit abort test below.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ipfs/kubo-data-source.test.ts` around lines 33 - 65, These two tests currently abort the signal before calling kuboDataSource.getContent so signal.throwIfAborted prevents URL construction; either remove the tests or change them to assert URL construction by not aborting the signal and instead stubbing/spy-ing the HTTP call (e.g., fetch/axios used inside kuboDataSource.getContent) to throw/return, or call an exposed helper that builds the Kubo URL (e.g., buildKuboUrl or the internal URL-construction logic) and assert its return; ensure you reference kuboDataSource.getContent and avoid pre-aborting the signal so the URL is actually created.docs/INDEX.md-26-30 (1)
26-30:⚠️ Potential issue | 🟡 MinorAdd IPFS, CID, and Kubo definitions to
docs/glossary.md.These terms are introduced in
docs/ipfs-integration.mdbut are missing from the glossary. Following the documentation guidelines, add entries under a new "IPFS Integration" section covering:
- IPFS - InterPlanetary File System
- CID - Content Identifier (IPFS hash/address)
- Kubo - The IPFS node implementation used as the gateway sidecar
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/INDEX.md` around lines 26 - 30, Add a new "IPFS Integration" section to docs/glossary.md and create three glossary entries: "IPFS" — define as InterPlanetary File System; "CID" — define as Content Identifier and note it is the IPFS hash/address; and "Kubo" — define as the IPFS node implementation used as the gateway sidecar; ensure each entry follows existing glossary formatting (term header and short definition) and link or reference the ipfs-integration.md page if your glossary style uses cross-links.docs/ipfs-integration.md-504-511 (1)
504-511:⚠️ Potential issue | 🟡 MinorCorrect the documented Prometheus labels.
ipfs_requests_totalis defined withroute_type,status, andipfs_request_duration_secondsusesroute_type,cache_status; the table currently documentsmethodandsource.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/ipfs-integration.md` around lines 504 - 511, The Prometheus label documentation is incorrect: update the `ipfs_requests_total` row to list labels `route_type`,`status` instead of `method`, and update the `ipfs_request_duration_seconds` row to list labels `route_type`,`cache_status` instead of `source`; ensure the table entries for `ipfs_cache_hit_total`, `ipfs_cache_miss_total`, `ipfs_content_size_bytes`, and `ipfs_blocked_total` remain unchanged and remove the incorrect `method`/`source` mentions so the documented labels match the actual metric definitions.src/routes/ipfs.ts-216-221 (1)
216-221:⚠️ Potential issue | 🟡 MinorIncrement the dedicated IPFS cache/block metrics.
ipfsCacheHitTotal,ipfsCacheMissTotal, andipfsBlockedTotalare exported but never updated here, so dashboards/alerts will stay flat even when IPFS traffic is active.Proposed metric updates
if (result.size > 0) { metrics.ipfsContentSizeHistogram.observe(result.size); } + if (result.cached) { + metrics.ipfsCacheHitTotal.inc(); + } else { + metrics.ipfsCacheMissTotal.inc(); + } ... if (error instanceof IpfsBlockedError) { + metrics.ipfsBlockedTotal.inc(); metrics.ipfsRequestsTotal.inc({Also applies to: 259-268
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/routes/ipfs.ts` around lines 216 - 221, The IPFS route currently increments only metrics.ipfsRequestsTotal and content size but never updates the dedicated counters; update the logic in the IPFS handler (the block using metrics.ipfsRequestsTotal and cacheStatus) to increment metrics.ipfsCacheHitTotal when result.cached is true, metrics.ipfsCacheMissTotal when false, and metrics.ipfsBlockedTotal when result.blocked (or result.blocked === true) is set; ensure you make the same updates in the second analogous block around the code that mirrors lines 259-268 so both request paths consistently increment ipfsCacheHitTotal/ipfsCacheMissTotal and ipfsBlockedTotal in addition to ipfsRequestsTotal and ipfsContentSizeHistogram.docs/ipfs-integration.md-58-58 (1)
58-58:⚠️ Potential issue | 🟡 MinorAdd languages to fenced code blocks.
markdownlintflags these fences; usetextfor diagrams/span trees and an appropriate language where applicable.Also applies to: 113-113, 531-531
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/ipfs-integration.md` at line 58, Update the three fenced code blocks flagged by markdownlint that currently lack language identifiers: add language="text" for diagram/span-tree blocks and add an appropriate language (e.g., json, js, sh, or html) for any code examples so the fences are like ```text or ```json; locate the exact fences by searching for triple-backtick blocks in the IPFS integration doc that are used for diagrams/span trees and example snippets and add the matching language tag to each opening fence.docs/ipfs-integration.md-63-64 (1)
63-64:⚠️ Potential issue | 🟡 MinorAlign subdomain docs with the implemented host pattern.
These sections describe
{CID}.ipfs.gateway.io/ a distinguishing.ipfs.label, but the implementation and PR objective use{CID}.{root_host}. This can lead operators to configure DNS/TLS incorrectly.Also applies to: 141-144, 263-266, 362-365
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/ipfs-integration.md` around lines 63 - 64, Update the docs to match the implemented host pattern by replacing instances of the illustrative host "{CID}.ipfs.gateway.io" and references to a distinguishing ".ipfs." label with the actual pattern "{CID}.{root_host}" (and adjust explanatory text accordingly); ensure examples, DNS and TLS guidance, and any sample configs that mention the ".ipfs." sublabel are changed to use "{root_host}"-based hostnames and clarify how to configure DNS/TLS for wildcard or per-CID certificates; apply the same change to the other occurrences called out in the comment so all examples and instructions consistently use "{CID}.{root_host}".docs/ipfs-integration.md-113-120 (1)
113-120:⚠️ Potential issue | 🟡 MinorRemove stale file-based blocklist/config references and fix defaults.
The IPFS moderation path now uses the existing
PUT /ar-io/admin/block-dataflow, andsrc/config.tsdoes not defineIPFS_BLOCKLIST_PATH. Also update the rate-limit defaults here to matchsrc/config.ts(100000/20per-IP and1000000/100per-resource).Also applies to: 198-208, 431-445
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/ipfs-integration.md` around lines 113 - 120, Remove references to a file-based IPFS blocklist and any mention of IPFS_BLOCKLIST_PATH in the docs and replace them with a note that moderation uses the PUT /ar-io/admin/block-data flow; update the diagram text near ipfs-service/ipfs-blocklist and any prose to reflect that blocklist is managed via that admin API (not a local file). Also change the rate-limit defaults in the documentation to match src/config.ts: per-IP 100000 requests per 20 seconds and per-resource 1000000 requests per 100 seconds. Apply these same edits where the stale blocklist/config references and rate-limit numbers appear elsewhere in the document (the other two affected sections noted in the comment).src/routes/ipfs.ts-118-121 (1)
118-121:⚠️ Potential issue | 🟡 MinorUse the configured public protocol for CIDv0 redirects.
As in the subdomain middleware,
req.protocolmay producehttp://redirects behind TLS termination. Useconfig.SANDBOX_PROTOCOL ?? req.protocol.Proposed redirect fix
+ const protocol = config.SANDBOX_PROTOCOL ?? req.protocol; res.redirect( 302, - `${req.protocol}://${v1Base32}.${rootHost}${pathSuffix}`, + `${protocol}://${v1Base32}.${rootHost}${pathSuffix}`, );🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/routes/ipfs.ts` around lines 118 - 121, The redirect uses req.protocol which can be wrong behind TLS termination; update the redirect construction in the CIDv0 handling to use the configured public protocol by replacing req.protocol with (config.SANDBOX_PROTOCOL ?? req.protocol) when building the URL for v1Base32, rootHost and pathSuffix so the redirect uses the sandbox/public protocol consistently.src/middleware/ipfs.ts-85-88 (1)
85-88:⚠️ Potential issue | 🟡 MinorUse the configured public protocol for CID redirects.
req.protocolcan behttpbehind TLS termination unless proxy trust is perfectly configured, causing downgrade redirects. Preferconfig.SANDBOX_PROTOCOL ?? req.protocolwhen building the externalLocation.Proposed redirect fix
+ const protocol = config.SANDBOX_PROTOCOL ?? req.protocol; res.redirect( 302, - `${req.protocol}://${targetCid}.${rootHost}${pathSuffix}`, + `${protocol}://${targetCid}.${rootHost}${pathSuffix}`, );🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/middleware/ipfs.ts` around lines 85 - 88, The redirect currently builds the external Location using req.protocol which may be incorrect behind TLS termination; update the redirect in the middleware where res.redirect is called (the line constructing `${req.protocol}://${targetCid}.${rootHost}${pathSuffix}`) to use the configured protocol instead: replace req.protocol with config.SANDBOX_PROTOCOL ?? req.protocol so the Location uses the intended public protocol while still falling back to req.protocol; keep the rest of the URL components (targetCid, rootHost, pathSuffix) and the 302 status unchanged.src/ipfs/ipfs-cache.ts-49-52 (1)
49-52:⚠️ Potential issue | 🟡 MinorNormalize CIDs before deriving cache keys.
Using the raw input means the same content addressed as CIDv0 and CIDv1 creates separate cache entries, despite the docs promising encoding-independent keys.
Proposed key normalization
+import { cidToV1Base32 } from '../lib/ipfs-cid.js'; + private cacheKey(cidString: string, path?: string): string { + const normalizedCid = cidToV1Base32(cidString); const raw = - path !== undefined && path !== '' ? `${cidString}/${path}` : cidString; + path !== undefined && path !== '' + ? `${normalizedCid}/${path}` + : normalizedCid; return crypto.createHash('sha256').update(raw).digest('hex'); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ipfs/ipfs-cache.ts` around lines 49 - 52, The cacheKey function currently hashes the raw cidString (plus path) which treats semantically identical CIDs in different encodings as distinct; update cacheKey to normalize the CID string first by parsing the input CID (use the project's CID library e.g., CID.parse or new CID(...)), convert it to a canonical form such as CID.toV1().toString('base32'), then append the path (when path !== undefined && path !== '') and finally hash that normalized string; ensure errors parsing invalid CIDs are handled or propagated consistently from cacheKey.src/ipfs/kubo-data-source.ts-131-136 (1)
131-136:⚠️ Potential issue | 🟡 Minor
parseIntwithoutNumber.isFinite/ NaN guard.
parseInt('0', 10)when the header is absent returns0, but a malformedcontent-lengthheader (e.g.,"garbage") yieldsNaN, which then propagates asipfs.content_length = NaNinto the span and assize = NaNintoIpfsService, where the comparisonresult.size > 0is false (so size-limit skipped) and the value is surfaced to the caller. Coerce with a fallback:- const contentLength = parseInt( - response.headers['content-length'] ?? '0', - 10, - ); + const parsed = parseInt(response.headers['content-length'] ?? '', 10); + const contentLength = Number.isFinite(parsed) && parsed >= 0 ? parsed : 0;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ipfs/kubo-data-source.ts` around lines 131 - 136, The parsed contentLength can be NaN for malformed headers; update the parsing logic around response.headers to coerce to a safe finite number (e.g., parseInt or Number.parseInt result) and fallback to 0 when the parsed value is NaN or not finite so ipfs.content_length and downstream result.size never become NaN. Locate the variables/content around contentLength and response.headers['content-length'] in kubo-data-source.ts and replace the direct parseInt usage with a guarded parse + Number.isFinite/Number.isNaN check and a default of 0.src/ipfs/kubo-data-source.ts-57-59 (1)
57-59:⚠️ Potential issue | 🟡 MinorEncode path segments when interpolating into the Kubo URL.
pathis user-controlled (from the HTTP request) and may contain characters that change URL meaning —?,#, spaces,%. Raw interpolation into the URL can produce a different upstream path than intended (query strings being cut off, fragments being dropped by the client). Encode each non-empty segment:- const ipfsPath = - path !== undefined && path !== '' ? `${cidString}/${path}` : cidString; - const url = `${this.kuboUrl}/ipfs/${ipfsPath}`; + const encodedPath = + path !== undefined && path !== '' + ? path.split('/').map(encodeURIComponent).join('/') + : undefined; + const ipfsPath = encodedPath ? `${cidString}/${encodedPath}` : cidString; + const url = `${this.kuboUrl}/ipfs/${ipfsPath}`;(
cidStringitself is safe — it has already been normalized to base32 v1 byIpfsService.getContent.)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ipfs/kubo-data-source.ts` around lines 57 - 59, Raw interpolation of the user-controlled path into the Kubo URL can break or change the upstream request; update the code that builds ipfsPath (and thus the URL using this.kuboUrl) to URL-encode each non-empty path segment instead of inserting path raw. Specifically, when composing ipfsPath from cidString and path (used in ipfsPath and `${this.kuboUrl}/ipfs/${ipfsPath}`), split path on '/', run encodeURIComponent on each segment, rejoin with '/', and then join with cidString (which is already safe per IpfsService.getContent) so query chars/fragments/spaces are preserved as path segments.
🧹 Nitpick comments (10)
test-ipfs.sh (1)
42-43: Unusedstatus=$?(shellcheck SC2034).
statusis captured but never used;curl -salready swallows transport errors into an empty body, and pass/fail is driven bygrep -qbelow. Drop it.- body=$(curl -s --max-time 30 "$url" 2>&1) - local status=$? + body=$(curl -s --max-time 30 "$url" 2>&1)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test-ipfs.sh` around lines 42 - 43, Remove the unused capture of curl's exit status: the assignment to local status after running curl (the variable named status) is never referenced and triggers shellcheck SC2034; drop the line "local status=$?" and leave the body assignment via body=$(curl -s --max-time 30 "$url" 2>&1) (grep -q is used later to determine success).monitoring/grafana/dashboards/examples/ipfs-example.json (1)
14-22: Cache hit-rate panel can divide by zero at idle.When there is no IPFS traffic in the window (hit rate + miss rate both 0), the expression yields
NaN/ "No data". Consider guarding so the panel shows0(or hides gracefully) at idle, e.g.:📊 Proposed fix
- "targets": [{ "expr": "sum(rate(ipfs_cache_hit_total[5m])) / (sum(rate(ipfs_cache_hit_total[5m])) + sum(rate(ipfs_cache_miss_total[5m])))", "legendFormat": "hit rate" }] + "targets": [{ "expr": "sum(rate(ipfs_cache_hit_total[5m])) / clamp_min(sum(rate(ipfs_cache_hit_total[5m])) + sum(rate(ipfs_cache_miss_total[5m])), 1)", "legendFormat": "hit rate" }]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@monitoring/grafana/dashboards/examples/ipfs-example.json` around lines 14 - 22, The "IPFS Cache Hit Rate" stat panel expression can produce NaN when both hit and miss rates are zero; update the Prometheus expression used in the panel's targets (the existing expr string) to guard the denominator by replacing the raw division with a safe variant that returns 0 at idle (for example use a clamp_min on the denominator or append an "or vector(0)" fallback so the query yields 0 instead of NaN when sum(rate(ipfs_cache_hit_total[5m])) + sum(rate(ipfs_cache_miss_total[5m])) == 0). Ensure you edit the expr value in the target for the "IPFS Cache Hit Rate" panel to the guarded expression.src/routes/ipfs.ts (1)
35-70: Add TSDoc for the new exported route factories.These are new exported integration points; please document mount order expectations and the middleware/path split.
As per coding guidelines,
src/**/*.{ts,tsx}: “Always add or improve TSDoc comments on code you touch when changing behavior”.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/routes/ipfs.ts` around lines 35 - 70, Add TSDoc comments to the exported route factory functions createIpfsRouter and createIpfsHandler explaining their purpose, expected mount order, and how middleware vs path handling should be wired: document that createIpfsRouter returns an Express Router that mounts handlers for '/ipfs/:cid' and '/ipfs/:cid/*' and should be mounted before any catch-all routes, and that createIpfsHandler returns a single Handler meant to be used directly or attached to route paths; include notes about interaction with rateLimiter and paymentProcessor parameters and any ordering requirements for middleware (e.g., auth/rate limiting should run before the handler). Ensure the TSDoc appears directly above each exported function signature with param and returns descriptions.src/ipfs/ipfs-cache.ts (1)
19-24: Add class-level TSDoc for the new cache abstraction.This is a new exported service used by system wiring; document keying, persistence, and eviction semantics at the class boundary.
As per coding guidelines,
src/**/*.{ts,tsx}: “Always add or improve TSDoc comments on code you touch when changing behavior”.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ipfs/ipfs-cache.ts` around lines 19 - 24, Add a class-level TSDoc block above the exported IpfsFsCache class describing its purpose, keying, persistence model, and eviction semantics: explain that IpfsFsCache stores CacheEntry objects on-disk under baseDir with keys derived from the IPFS CID or provided keying scheme, that it exposes an in-memory LRU index (index: LRUCache<string, CacheEntry>) to track recency, and that eviction follows the LRU policy causing entries to be removed from the index and optionally from disk; also document constructor parameters and any persistence/initialization behavior so callers wiring this service understand lifecycle and capacity limits.src/ipfs/kubo-data-source.ts (3)
155-173: Span may be ended twice on late stream error.On success you register
stream.on('end' | 'error')at L155-159 to end the span. If the returned stream later errors aftergetContentresolved, the'error'handler records the exception and ends the span — good. But if an error occurs during response processing after attaching these listeners and control falls into thecatchat L166 (it won't in the current code, but the arrangement is brittle),span.end()would run twice. Using a localendedflag ortry/finallyaround the span would make this robust to future edits.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ipfs/kubo-data-source.ts` around lines 155 - 173, The span may be ended twice by the stream 'end'/'error' handlers and the catch block in getContent; add a local boolean flag (e.g., ended = false) and a small helper (e.g., endSpan(error?)) that checks the flag before calling span.recordException(...) and span.end(), then use that helper from the stream.on('end'), stream.on('error'), and the catch/finally cleanup (also before clearTimeout and removing onClientAbort) so span.recordException/ span.end are only invoked once; reference span, stream.on('end'|'error'), the catch block around getContent, connectionTimer, and onClientAbort when making the change.
203-236: Minor: consolidate repeated error-subclass boilerplate.Five
Errorsubclasses with identical shape. A small factory or a single base class would reduce the boilerplate and guarantee consistentnameassignment / prototype chain handling under downlevel transpilation:class IpfsError extends Error { constructor(message: string, name: string) { super(message); this.name = name; Object.setPrototypeOf(this, new.target.prototype); } }Not blocking.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ipfs/kubo-data-source.ts` around lines 203 - 236, Replace the five nearly identical subclasses (IpfsNotFoundError, IpfsTimeoutError, IpfsUnavailableError, IpfsBlockedError, IpfsSizeLimitError) with a single base class (e.g., IpfsError) that accepts (message, name) and calls Object.setPrototypeOf(this, new.target.prototype) after super(message), then either export thin subclasses that extend IpfsError and pass the specific name or provide a small factory to create them; update usages to construct the specific named error via new IpfsNotFoundError(...) or via the factory so name/prototype are consistently set.
100-103: ConsidermaxRedirects: 0for local Kubo gateway.For a trusted local service (kubo:8080), redirects are atypical. Setting
maxRedirects: 0simplifies the configuration and tightens the SSRF surface in casekuboUrlis ever misconfigured. The current validateStatus already handles non-2xx responses appropriately, so this change is safe.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ipfs/kubo-data-source.ts` around lines 100 - 103, The current HTTP client config in KuboDataSource sets maxRedirects to 5 which is unnecessary for a trusted local kuboUrl and increases SSRF surface; change the axios config (where maxRedirects is set in the KuboDataSource constructor or createHttpClient function) to maxRedirects: 0, keeping the existing validateStatus logic intact so non-2xx responses are still handled by validateStatus.src/ipfs/ipfs-service.ts (3)
93-96: Path traversal check is both too strict and too loose.
path.includes('..')rejects legitimate filenames (e.g.report..v2.pdf,..hiddeninside a directory listing) while still allowing variants like percent-encoded%2E%2Eto reach Kubo unchanged. Prefer a segment-based check after splitting on/:- if (path !== undefined && (path.includes('..') || path.startsWith('/'))) { - throw new IpfsNotFoundError('Invalid IPFS path'); + if (path !== undefined) { + const segments = path.split('/'); + if ( + path.startsWith('/') || + segments.some((s) => s === '..' || s === '.' || s === '') + ) { + throw new IpfsNotFoundError('Invalid IPFS path'); + } }Also consider normalizing percent-encoding before the check.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ipfs/ipfs-service.ts` around lines 93 - 96, Replace the current naive path check with a normalized, segment-based check: first reject absolute paths (path.startsWith('/')) as before, then run a percent-decoding (e.g. decodeURIComponent) on path and split by '/' into segments and reject if any segment is exactly '..' or '.' (but allow names like '..hidden' or 'report..v2.pdf'); when rejecting throw the same IpfsNotFoundError. Update the code that references the variable path and IpfsNotFoundError so it decodes and inspects segments rather than using path.includes('..').
144-158: Tee via'data'listener is fragile; preferPassThrough+pipeline.Attaching the
'data'listener insidestreamToCachesynchronously flipsresult.streaminto flowing mode before the caller attachespipe(res). It is safe only because consumers pipe in the same tick; anawaiton the caller side betweengetContentreturning and piping would drop chunks silently. Usingstream.pipeline(source, new PassThrough())and handing thePassThroughto the caller (while also piping the source through a secondPassThroughinto the cache file) preserves paused mode until the caller is ready and gets you proper error propagation and backpressure for free.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ipfs/ipfs-service.ts` around lines 144 - 158, The current implementation in streamToCache attaches a 'data' listener to result.stream which forces it into flowing mode and can drop chunks; instead, create two PassThrough streams and use stream.pipeline to pipe the original result.stream into both PassThroughs (one returned to the caller and one piped into the cache write stream) so the source stays in paused mode until the consumer pipes; ensure you replace the 'data' listener logic in streamToCache with pipeline(result.stream, teePassThrough1) and pipeline(result.stream, teePassThrough2 -> cacheWriteStream) (or use a single pipeline with a tee operator) and propagate errors to the span by listening to pipeline callbacks so span.recordException/ span.end() run on failures/completion for result.stream.
153-158: Duplicate span termination between service and data source.
KuboDataSource.getContentalready registersstream.on('end' | 'error')to end its own child span (kubo-data-source.ts L155-159).IpfsService.getContenthere does the same on theIpfsService.getContentspan — fine — but note that ifstreamToCachelater destroys the stream on a cache-write error, the'error'path here will record that exception on the service span even though it is a cache-side failure unrelated to the user-visible response. Consider gating therecordExceptionon whether the error surfaced to the caller.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ipfs/ipfs-service.ts` around lines 153 - 158, The service span in IpfsService.getContent is recording exceptions for stream errors that may originate from cache writes; to avoid logging cache-side failures as service errors, add a small error gating flag in IpfsService.getContent (e.g., let cacheWriteFailed = false) and set it to true if streamToCache(...) rejects or reports a cache-specific error; in the existing result.stream.on('error', ...) handler only call span.recordException(err) when cacheWriteFailed is false (i.e., the error actually surfaced to the caller). Adjust the callsite that invokes streamToCache to catch/reject and flip cacheWriteFailed so KuboDataSource.getContent and IpfsService.getContent no longer duplicate-record cache-originated exceptions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/config.ts`:
- Around line 2159-2210: Several IPFS numeric constants
(IPFS_KUBO_REQUEST_TIMEOUT_MS, IPFS_STREAM_STALL_TIMEOUT_MS,
IPFS_CACHE_MAX_SIZE_BYTES, IPFS_CACHE_CLEANUP_THRESHOLD_SECONDS,
IPFS_RATE_LIMITER_IP_TOKENS_PER_BUCKET, IPFS_RATE_LIMITER_IP_REFILL_PER_SEC,
IPFS_RATE_LIMITER_RESOURCE_TOKENS_PER_BUCKET,
IPFS_RATE_LIMITER_RESOURCE_REFILL_PER_SEC, IPFS_MAX_RESPONSE_SIZE_BYTES)
currently use the unary + on env.varOrDefault which can produce NaN on malformed
input; replace those uses with the typed helper positiveIntOrDefault (the same
pattern used elsewhere in src/config.ts) to validate and throw on invalid values
at startup so timeouts, rate limits and sizes are enforced reliably. Ensure you
pass the same environment variable names and default string values into
positiveIntOrDefault for each constant and keep non-numeric keys (e.g.,
IPFS_CACHE_PATH) unchanged.
In `@src/ipfs/ipfs-cache.ts`:
- Around line 149-179: The put() and putFromFile() flows must clean up temp
files on pipeline failure and ensure cache finalization is atomic: on failure of
pipeline() delete the tempPath (use tempDir()/createTempPath() to locate it);
before calling index.set(key, ...) ensure any existing index entry is removed
(index.delete(key)) and its data/meta files unlinked to avoid eviction disposing
wrong files; perform metadata write to a temporary meta file (metaPath(key) +
'.tmp') and only after both meta temp write and index.set succeed atomically
rename temp data into dataPath(key) and rename the temp meta to metaPath(key);
if any step after rename fails, unlink the new dataPath and metaPath to avoid
orphans. Use createTempPath(), dataDir(), dataPath(), metaPath(), pipeline(),
put(), and putFromFile() to locate and implement these fixes.
In `@src/ipfs/ipfs-service.ts`:
- Around line 194-266: The issue is that stream.on('end') takes the !writeStream
branch and calls cleanup() without setting failed, allowing the pending
mkdir().then() to later create a dangling writeStream; fix by setting failed =
true before calling cleanup() in the stream.on('end') handler when writeStream
is not present (and ensure you return immediately after cleanup). Reference
symbols: stream.on('end'), failed, writeStream, cleanup(),
fs.promises.mkdir(...).then().
- Around line 230-239: The current data handler in the stream.on('data')
callback ignores write-side backpressure by not checking the boolean return of
writeStream.write and therefore allows unbounded buffering; update the logic in
the stream.on('data') handler (and related pendingChunks logic) to respect
backpressure: check the return value of writeStream.write(chunk) and if it
returns false, call stream.pause(), attach a writeStream.once('drain') to resume
the source, and flush any pendingChunks after resume; alternatively replace the
manual tee with a pipeline() or a PassThrough-driven pipeline to let Node handle
backpressure; if you prefer a bounded-memory fallback, enforce a high-water mark
on pendingChunks and set failed = true (or switch to no-cache) when that limit
is exceeded so you avoid unbounded memory growth.
- Around line 132-142: The size-limit check in ipfs-service.ts currently skips
enforcement when result.size is 0 (no Content-Length), letting large responses
bypass the cap; update the fetch/streaming path to enforce maxResponseSizeBytes
during streaming by wiring the existing bytesWritten counter in streamToCache
(or equivalent stream consumer) to abort the stream and throw IpfsSizeLimitError
as soon as bytesWritten exceeds this.maxResponseSizeBytes, and remove/adjust the
guard that only applies when result.size > 0 so the mid-stream enforcement
always runs; reference result.size, this.maxResponseSizeBytes,
IpfsSizeLimitError, and streamToCache/bytesWritten (and KuboDataSource behavior)
when making the change.
In `@src/routes/ipfs.ts`:
- Around line 172-187: The code currently treats unknown result.size as 1024
bytes when calling checkPaymentAndRateLimits, which underestimates large chunked
responses and can bypass byte-based limits; change the initial placeholder to a
conservative upper bound (e.g., a configured MAX_UNKNOWN_SIZE) when computing
contentSize for checkPaymentAndRateLimits (the call that passes id:
cidToV1Base32(cidString) and contentType: result.contentType), and implement
tracking of actual streamed bytes during the IPFS response stream so you can
call the final adjustment with the real byte count instead of the 1024 fallback;
update usages around contentSize, result.size, and the final adjustment block
(also where extractAllClientIPs(req) and rateLimiter/paymentProcessor are
passed) to use the conservative initial bound and then replace it with the
tracked streamedBytes for final billing/rate decisions.
---
Minor comments:
In `@docs/envs.md`:
- Line 376: The docs list the wrong environment variable name — update all
documentation entries that reference IPFS_CACHE_CLEANUP_THRESHOLD to the actual
exported name IPFS_CACHE_CLEANUP_THRESHOLD_SECONDS, keeping the same default
value (3600) and description ("Age in seconds before cached files become
eviction candidates"), and search other files under docs/ for any remaining
occurrences of IPFS_CACHE_CLEANUP_THRESHOLD to replace them so docs match the
implementation.
In `@docs/INDEX.md`:
- Around line 26-30: Add a new "IPFS Integration" section to docs/glossary.md
and create three glossary entries: "IPFS" — define as InterPlanetary File
System; "CID" — define as Content Identifier and note it is the IPFS
hash/address; and "Kubo" — define as the IPFS node implementation used as the
gateway sidecar; ensure each entry follows existing glossary formatting (term
header and short definition) and link or reference the ipfs-integration.md page
if your glossary style uses cross-links.
In `@docs/ipfs-integration.md`:
- Around line 504-511: The Prometheus label documentation is incorrect: update
the `ipfs_requests_total` row to list labels `route_type`,`status` instead of
`method`, and update the `ipfs_request_duration_seconds` row to list labels
`route_type`,`cache_status` instead of `source`; ensure the table entries for
`ipfs_cache_hit_total`, `ipfs_cache_miss_total`, `ipfs_content_size_bytes`, and
`ipfs_blocked_total` remain unchanged and remove the incorrect `method`/`source`
mentions so the documented labels match the actual metric definitions.
- Line 58: Update the three fenced code blocks flagged by markdownlint that
currently lack language identifiers: add language="text" for diagram/span-tree
blocks and add an appropriate language (e.g., json, js, sh, or html) for any
code examples so the fences are like ```text or ```json; locate the exact fences
by searching for triple-backtick blocks in the IPFS integration doc that are
used for diagrams/span trees and example snippets and add the matching language
tag to each opening fence.
- Around line 63-64: Update the docs to match the implemented host pattern by
replacing instances of the illustrative host "{CID}.ipfs.gateway.io" and
references to a distinguishing ".ipfs." label with the actual pattern
"{CID}.{root_host}" (and adjust explanatory text accordingly); ensure examples,
DNS and TLS guidance, and any sample configs that mention the ".ipfs." sublabel
are changed to use "{root_host}"-based hostnames and clarify how to configure
DNS/TLS for wildcard or per-CID certificates; apply the same change to the other
occurrences called out in the comment so all examples and instructions
consistently use "{CID}.{root_host}".
- Around line 113-120: Remove references to a file-based IPFS blocklist and any
mention of IPFS_BLOCKLIST_PATH in the docs and replace them with a note that
moderation uses the PUT /ar-io/admin/block-data flow; update the diagram text
near ipfs-service/ipfs-blocklist and any prose to reflect that blocklist is
managed via that admin API (not a local file). Also change the rate-limit
defaults in the documentation to match src/config.ts: per-IP 100000 requests per
20 seconds and per-resource 1000000 requests per 100 seconds. Apply these same
edits where the stale blocklist/config references and rate-limit numbers appear
elsewhere in the document (the other two affected sections noted in the
comment).
In `@docs/openapi.yaml`:
- Around line 2812-2818: Update the request schema for the block endpoint so the
"id" field description reflects that it accepts transaction IDs, data-item IDs,
IPFS CIDv1 base32 strings, or sha-256 content hashes (not just "TX ID for a
transaction"); update the "id" schema description text in docs/openapi.yaml (the
schema for the block request body / property named "id") and add an example
value for an IPFS CIDv1 base32 string (e.g.
bafkreigbk3hjz6oyiywqf7eknthwc2osvt5xi6b6igwljn2qrxkthqgrp4) so generated docs
show a CID example alongside existing TX/hash examples.
In `@src/ipfs/ipfs-cache.ts`:
- Around line 49-52: The cacheKey function currently hashes the raw cidString
(plus path) which treats semantically identical CIDs in different encodings as
distinct; update cacheKey to normalize the CID string first by parsing the input
CID (use the project's CID library e.g., CID.parse or new CID(...)), convert it
to a canonical form such as CID.toV1().toString('base32'), then append the path
(when path !== undefined && path !== '') and finally hash that normalized
string; ensure errors parsing invalid CIDs are handled or propagated
consistently from cacheKey.
In `@src/ipfs/kubo-data-source.test.ts`:
- Around line 33-65: These two tests currently abort the signal before calling
kuboDataSource.getContent so signal.throwIfAborted prevents URL construction;
either remove the tests or change them to assert URL construction by not
aborting the signal and instead stubbing/spy-ing the HTTP call (e.g.,
fetch/axios used inside kuboDataSource.getContent) to throw/return, or call an
exposed helper that builds the Kubo URL (e.g., buildKuboUrl or the internal
URL-construction logic) and assert its return; ensure you reference
kuboDataSource.getContent and avoid pre-aborting the signal so the URL is
actually created.
In `@src/ipfs/kubo-data-source.ts`:
- Around line 131-136: The parsed contentLength can be NaN for malformed
headers; update the parsing logic around response.headers to coerce to a safe
finite number (e.g., parseInt or Number.parseInt result) and fallback to 0 when
the parsed value is NaN or not finite so ipfs.content_length and downstream
result.size never become NaN. Locate the variables/content around contentLength
and response.headers['content-length'] in kubo-data-source.ts and replace the
direct parseInt usage with a guarded parse + Number.isFinite/Number.isNaN check
and a default of 0.
- Around line 57-59: Raw interpolation of the user-controlled path into the Kubo
URL can break or change the upstream request; update the code that builds
ipfsPath (and thus the URL using this.kuboUrl) to URL-encode each non-empty path
segment instead of inserting path raw. Specifically, when composing ipfsPath
from cidString and path (used in ipfsPath and
`${this.kuboUrl}/ipfs/${ipfsPath}`), split path on '/', run encodeURIComponent
on each segment, rejoin with '/', and then join with cidString (which is already
safe per IpfsService.getContent) so query chars/fragments/spaces are preserved
as path segments.
In `@src/middleware/ipfs.ts`:
- Around line 85-88: The redirect currently builds the external Location using
req.protocol which may be incorrect behind TLS termination; update the redirect
in the middleware where res.redirect is called (the line constructing
`${req.protocol}://${targetCid}.${rootHost}${pathSuffix}`) to use the configured
protocol instead: replace req.protocol with config.SANDBOX_PROTOCOL ??
req.protocol so the Location uses the intended public protocol while still
falling back to req.protocol; keep the rest of the URL components (targetCid,
rootHost, pathSuffix) and the 302 status unchanged.
In `@src/routes/ar-io-info-builder.ts`:
- Line 141: The OpenAPI schema for the /ar-io/info response needs to include the
new optional ipfs property so generated clients see ipfs: { enabled: true };
update the response model in the OpenAPI spec (the schema that maps to the
TypeScript builder which now includes ipfs?: IpfsInfo in
src/routes/ar-io-info-builder.ts) to add an ipfs property (object with
enabled:boolean) and ensure any components/schemas referenced are added or
updated accordingly; also update the docs/ pages that describe the /ar-io/info
contract (and the related places called out around lines 325-327) in the same PR
so documentation and generated clients remain in sync.
In `@src/routes/ipfs.ts`:
- Around line 216-221: The IPFS route currently increments only
metrics.ipfsRequestsTotal and content size but never updates the dedicated
counters; update the logic in the IPFS handler (the block using
metrics.ipfsRequestsTotal and cacheStatus) to increment
metrics.ipfsCacheHitTotal when result.cached is true, metrics.ipfsCacheMissTotal
when false, and metrics.ipfsBlockedTotal when result.blocked (or result.blocked
=== true) is set; ensure you make the same updates in the second analogous block
around the code that mirrors lines 259-268 so both request paths consistently
increment ipfsCacheHitTotal/ipfsCacheMissTotal and ipfsBlockedTotal in addition
to ipfsRequestsTotal and ipfsContentSizeHistogram.
- Around line 118-121: The redirect uses req.protocol which can be wrong behind
TLS termination; update the redirect construction in the CIDv0 handling to use
the configured public protocol by replacing req.protocol with
(config.SANDBOX_PROTOCOL ?? req.protocol) when building the URL for v1Base32,
rootHost and pathSuffix so the redirect uses the sandbox/public protocol
consistently.
In `@test-ipfs.sh`:
- Around line 55-71: The test_header function incorrectly treats an empty
expected_value as a match because echo "$actual" | grep -qi "" always succeeds;
update test_header to first check whether expected_value is empty and in that
case assert that actual is non-empty (i.e., the header exists), otherwise
perform the current case-insensitive substring match; refer to the test_header
function and use the existing local variables actual and expected_value to
implement the conditional branch so presence-only checks fail when the header is
absent.
- Around line 93-101: Replace the hardcoded container name used in the two
docker exec calls (currently "ar-io-node-kubo-1") by resolving the running
container ID via its Compose service label before assigning FILE_CID and
DIR_CID: run a docker ps lookup filtering on the Compose service label (e.g.
"label=com.docker.compose.service=kubo") and fall back to the Compose
project+service label if needed, store that container ID in a variable (use that
variable in the two docker exec invocations that populate FILE_CID and DIR_CID)
so the script works regardless of COMPOSE_PROJECT_NAME or working directory.
---
Nitpick comments:
In `@monitoring/grafana/dashboards/examples/ipfs-example.json`:
- Around line 14-22: The "IPFS Cache Hit Rate" stat panel expression can produce
NaN when both hit and miss rates are zero; update the Prometheus expression used
in the panel's targets (the existing expr string) to guard the denominator by
replacing the raw division with a safe variant that returns 0 at idle (for
example use a clamp_min on the denominator or append an "or vector(0)" fallback
so the query yields 0 instead of NaN when sum(rate(ipfs_cache_hit_total[5m])) +
sum(rate(ipfs_cache_miss_total[5m])) == 0). Ensure you edit the expr value in
the target for the "IPFS Cache Hit Rate" panel to the guarded expression.
In `@src/ipfs/ipfs-cache.ts`:
- Around line 19-24: Add a class-level TSDoc block above the exported
IpfsFsCache class describing its purpose, keying, persistence model, and
eviction semantics: explain that IpfsFsCache stores CacheEntry objects on-disk
under baseDir with keys derived from the IPFS CID or provided keying scheme,
that it exposes an in-memory LRU index (index: LRUCache<string, CacheEntry>) to
track recency, and that eviction follows the LRU policy causing entries to be
removed from the index and optionally from disk; also document constructor
parameters and any persistence/initialization behavior so callers wiring this
service understand lifecycle and capacity limits.
In `@src/ipfs/ipfs-service.ts`:
- Around line 93-96: Replace the current naive path check with a normalized,
segment-based check: first reject absolute paths (path.startsWith('/')) as
before, then run a percent-decoding (e.g. decodeURIComponent) on path and split
by '/' into segments and reject if any segment is exactly '..' or '.' (but allow
names like '..hidden' or 'report..v2.pdf'); when rejecting throw the same
IpfsNotFoundError. Update the code that references the variable path and
IpfsNotFoundError so it decodes and inspects segments rather than using
path.includes('..').
- Around line 144-158: The current implementation in streamToCache attaches a
'data' listener to result.stream which forces it into flowing mode and can drop
chunks; instead, create two PassThrough streams and use stream.pipeline to pipe
the original result.stream into both PassThroughs (one returned to the caller
and one piped into the cache write stream) so the source stays in paused mode
until the consumer pipes; ensure you replace the 'data' listener logic in
streamToCache with pipeline(result.stream, teePassThrough1) and
pipeline(result.stream, teePassThrough2 -> cacheWriteStream) (or use a single
pipeline with a tee operator) and propagate errors to the span by listening to
pipeline callbacks so span.recordException/ span.end() run on
failures/completion for result.stream.
- Around line 153-158: The service span in IpfsService.getContent is recording
exceptions for stream errors that may originate from cache writes; to avoid
logging cache-side failures as service errors, add a small error gating flag in
IpfsService.getContent (e.g., let cacheWriteFailed = false) and set it to true
if streamToCache(...) rejects or reports a cache-specific error; in the existing
result.stream.on('error', ...) handler only call span.recordException(err) when
cacheWriteFailed is false (i.e., the error actually surfaced to the caller).
Adjust the callsite that invokes streamToCache to catch/reject and flip
cacheWriteFailed so KuboDataSource.getContent and IpfsService.getContent no
longer duplicate-record cache-originated exceptions.
In `@src/ipfs/kubo-data-source.ts`:
- Around line 155-173: The span may be ended twice by the stream 'end'/'error'
handlers and the catch block in getContent; add a local boolean flag (e.g.,
ended = false) and a small helper (e.g., endSpan(error?)) that checks the flag
before calling span.recordException(...) and span.end(), then use that helper
from the stream.on('end'), stream.on('error'), and the catch/finally cleanup
(also before clearTimeout and removing onClientAbort) so span.recordException/
span.end are only invoked once; reference span, stream.on('end'|'error'), the
catch block around getContent, connectionTimer, and onClientAbort when making
the change.
- Around line 203-236: Replace the five nearly identical subclasses
(IpfsNotFoundError, IpfsTimeoutError, IpfsUnavailableError, IpfsBlockedError,
IpfsSizeLimitError) with a single base class (e.g., IpfsError) that accepts
(message, name) and calls Object.setPrototypeOf(this, new.target.prototype)
after super(message), then either export thin subclasses that extend IpfsError
and pass the specific name or provide a small factory to create them; update
usages to construct the specific named error via new IpfsNotFoundError(...) or
via the factory so name/prototype are consistently set.
- Around line 100-103: The current HTTP client config in KuboDataSource sets
maxRedirects to 5 which is unnecessary for a trusted local kuboUrl and increases
SSRF surface; change the axios config (where maxRedirects is set in the
KuboDataSource constructor or createHttpClient function) to maxRedirects: 0,
keeping the existing validateStatus logic intact so non-2xx responses are still
handled by validateStatus.
In `@src/routes/ipfs.ts`:
- Around line 35-70: Add TSDoc comments to the exported route factory functions
createIpfsRouter and createIpfsHandler explaining their purpose, expected mount
order, and how middleware vs path handling should be wired: document that
createIpfsRouter returns an Express Router that mounts handlers for '/ipfs/:cid'
and '/ipfs/:cid/*' and should be mounted before any catch-all routes, and that
createIpfsHandler returns a single Handler meant to be used directly or attached
to route paths; include notes about interaction with rateLimiter and
paymentProcessor parameters and any ordering requirements for middleware (e.g.,
auth/rate limiting should run before the handler). Ensure the TSDoc appears
directly above each exported function signature with param and returns
descriptions.
In `@test-ipfs.sh`:
- Around line 42-43: Remove the unused capture of curl's exit status: the
assignment to local status after running curl (the variable named status) is
never referenced and triggers shellcheck SC2034; drop the line "local status=$?"
and leave the body assignment via body=$(curl -s --max-time 30 "$url" 2>&1)
(grep -q is used later to determine success).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 27655ba0-996a-4afc-8f9d-1c24d8e0007f
📒 Files selected for processing (26)
.dockerignoreCLAUDE.mddocker-compose.yamldocs/INDEX.mddocs/envs.mddocs/ipfs-integration.mddocs/openapi.yamlmonitoring/grafana/dashboards/examples/ipfs-example.jsonpackage.jsonsrc/app.tssrc/config.tssrc/ipfs/ipfs-cache.tssrc/ipfs/ipfs-cid.test.tssrc/ipfs/ipfs-rate-limiter.tssrc/ipfs/ipfs-service.tssrc/ipfs/kubo-data-source.test.tssrc/ipfs/kubo-data-source.tssrc/lib/httpsig.tssrc/lib/ipfs-cid.tssrc/metrics.tssrc/middleware/ipfs.tssrc/routes/ar-io-info-builder.tssrc/routes/ar-io.tssrc/routes/ipfs.tssrc/system.tstest-ipfs.sh
| try { | ||
| await fs.promises.mkdir(this.tempDir(), { recursive: true }); | ||
| const tempPath = this.createTempPath(); | ||
| const writeStream = fs.createWriteStream(tempPath); | ||
|
|
||
| await pipeline(stream, writeStream); | ||
|
|
||
| // Move to final location | ||
| const dataDir = this.dataDir(key); | ||
| await fs.promises.mkdir(dataDir, { recursive: true }); | ||
| await fs.promises.rename(tempPath, this.dataPath(key)); | ||
|
|
||
| // Write metadata | ||
| const meta: CacheEntry = { size, contentType }; | ||
| await fs.promises.writeFile( | ||
| this.metaPath(key), | ||
| JSON.stringify(meta), | ||
| 'utf-8', | ||
| ); | ||
|
|
||
| // Update index | ||
| this.index.set(key, meta); | ||
|
|
||
| this.log.debug('Cached IPFS content', { cidString, path, key, size }); | ||
| } catch (error: any) { | ||
| this.log.error('Failed to cache IPFS content', { | ||
| cidString, | ||
| path, | ||
| message: error.message, | ||
| }); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Inspect other LRU-backed filesystem caches for overwrite/dispose patterns to mirror.
rg -n -C4 "new LRUCache|dispose:|putFromFile|rename\\(" --type=tsRepository: ar-io/ar-io-node
Length of output: 22088
🏁 Script executed:
cat -n src/ipfs/ipfs-cache.ts | head -230Repository: ar-io/ar-io-node
Length of output: 7773
Add cleanup for temp files and ensure atomic cache finalization.
Two related failures exist:
-
In
put()(lines 149–179): Ifpipeline()fails at line 154, the temp file is never cleaned up. -
In both
put()andputFromFile(): After therename()succeeds, if metadata write or index update fails, the data file is left orphaned on disk. Additionally, if a key already exists in the index, callingindex.set()on it without first deleting the old entry can cause the dispose callback to unlink the wrong files if eviction occurs.
Prevent temp files from accumulating, ensure metadata and index updates complete before marking a cache entry live, and clean stale disk files when overwriting. The suggested changes track temp and final paths to enable selective cleanup on failure.
Proposed safer finalization flow
+ let tempPath: string | undefined;
+ let finalPath: string | undefined;
try {
await fs.promises.mkdir(this.tempDir(), { recursive: true });
- const tempPath = this.createTempPath();
+ tempPath = this.createTempPath();
const writeStream = fs.createWriteStream(tempPath);
await pipeline(stream, writeStream);
- // Move to final location
const dataDir = this.dataDir(key);
+ finalPath = this.dataPath(key);
await fs.promises.mkdir(dataDir, { recursive: true });
- await fs.promises.rename(tempPath, this.dataPath(key));
+ this.index.delete(key);
+ await fs.promises.unlink(finalPath).catch(() => {});
+ await fs.promises.unlink(this.metaPath(key)).catch(() => {});
+ await fs.promises.rename(tempPath, finalPath);
+ tempPath = undefined;
...
} catch (error: any) {
+ if (tempPath !== undefined) {
+ fs.promises.unlink(tempPath).catch(() => {});
+ }
+ if (finalPath !== undefined) {
+ fs.promises.unlink(finalPath).catch(() => {});
+ fs.promises.unlink(this.metaPath(key)).catch(() => {});
+ }
this.log.error('Failed to cache IPFS content', {Also applies to: putFromFile() (lines 195–223)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ipfs/ipfs-cache.ts` around lines 149 - 179, The put() and putFromFile()
flows must clean up temp files on pipeline failure and ensure cache finalization
is atomic: on failure of pipeline() delete the tempPath (use
tempDir()/createTempPath() to locate it); before calling index.set(key, ...)
ensure any existing index entry is removed (index.delete(key)) and its data/meta
files unlinked to avoid eviction disposing wrong files; perform metadata write
to a temporary meta file (metaPath(key) + '.tmp') and only after both meta temp
write and index.set succeed atomically rename temp data into dataPath(key) and
rename the temp meta to metaPath(key); if any step after rename fails, unlink
the new dataPath and metaPath to avoid orphans. Use createTempPath(), dataDir(),
dataPath(), metaPath(), pipeline(), put(), and putFromFile() to locate and
implement these fixes.
| stream.on('data', (chunk: Buffer) => { | ||
| if (failed) return; | ||
| bytesWritten += chunk.length; | ||
| if (writeStream) { | ||
| writeStream.write(chunk); | ||
| } else { | ||
| // Buffer until writeStream is ready (typically only first 1-2 chunks) | ||
| pendingChunks.push(chunk); | ||
| } | ||
| }); |
There was a problem hiding this comment.
Missing write-side backpressure contradicts the "no memory buffering" guarantee.
writeStream.write(chunk) ignores its boolean return; if the disk is slower than the incoming network/Kubo stream (or than the HTTP client consuming the tee), chunks accumulate in the WriteStream internal buffer unboundedly for large files. The PR description explicitly claims streaming-to-disk without memory buffering, so this should pause the source on backpressure and resume on 'drain' — or use pipeline()/a PassThrough tee driven by pipeline.
Note: pausing the source for disk backpressure will also pause delivery to the HTTP client. If that trade-off is undesirable, a bounded high-water mark plus degrade-to-no-cache (set failed = true when the buffer grows past a limit) is a reasonable middle ground.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ipfs/ipfs-service.ts` around lines 230 - 239, The current data handler in
the stream.on('data') callback ignores write-side backpressure by not checking
the boolean return of writeStream.write and therefore allows unbounded
buffering; update the logic in the stream.on('data') handler (and related
pendingChunks logic) to respect backpressure: check the return value of
writeStream.write(chunk) and if it returns false, call stream.pause(), attach a
writeStream.once('drain') to resume the source, and flush any pendingChunks
after resume; alternatively replace the manual tee with a pipeline() or a
PassThrough-driven pipeline to let Node handle backpressure; if you prefer a
bounded-memory fallback, enforce a high-water mark on pendingChunks and set
failed = true (or switch to no-cache) when that limit is exceeded so you avoid
unbounded memory growth.
| // Check payment and rate limits (x402 + rate limiting in one call). | ||
| // Content size is needed for token calculation and payment pricing. | ||
| const contentSize = result.size > 0 ? result.size : 1024; // min 1KB for pricing | ||
| const limitCheck = await checkPaymentAndRateLimits({ | ||
| req, | ||
| res, | ||
| id: cidToV1Base32(cidString), | ||
| contentSize, | ||
| contentType: result.contentType, | ||
| requestAttributes: { | ||
| hops: 0, | ||
| clientIps: extractAllClientIPs(req).clientIps, | ||
| }, | ||
| rateLimiter, | ||
| paymentProcessor, | ||
| }); |
There was a problem hiding this comment.
Do not charge/rate-limit unknown-size responses as only 1KB.
When result.size is unknown/zero, the initial check and final adjustment both use 1024, so large chunked IPFS responses can bypass byte-based payment/rate limiting. Use a conservative bound for the initial check and track actual streamed bytes for final adjustment.
Also applies to: 234-239
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/routes/ipfs.ts` around lines 172 - 187, The code currently treats unknown
result.size as 1024 bytes when calling checkPaymentAndRateLimits, which
underestimates large chunked responses and can bypass byte-based limits; change
the initial placeholder to a conservative upper bound (e.g., a configured
MAX_UNKNOWN_SIZE) when computing contentSize for checkPaymentAndRateLimits (the
call that passes id: cidToV1Base32(cidString) and contentType:
result.contentType), and implement tracking of actual streamed bytes during the
IPFS response stream so you can call the final adjustment with the real byte
count instead of the 1024 fallback; update usages around contentSize,
result.size, and the final adjustment block (also where extractAllClientIPs(req)
and rateLimiter/paymentProcessor are passed) to use the conservative initial
bound and then replace it with the tracked streamedBytes for final billing/rate
decisions.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #682 +/- ##
===========================================
- Coverage 77.14% 77.13% -0.02%
===========================================
Files 116 118 +2
Lines 39692 40124 +432
Branches 2933 2947 +14
===========================================
+ Hits 30621 30950 +329
- Misses 9034 9137 +103
Partials 37 37 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
- Use positiveIntOrDefault for all IPFS numeric configs (prevents NaN) - URL-encode IPFS path segments (prevents request injection) - Guard contentLength parse against NaN (fallback to 0) - Use SANDBOX_PROTOCOL for redirects (correct behind TLS termination) - Enforce size limit during streaming (catches chunked responses) - Set failed=true before cleanup in stream end handler (race fix) - Use conservative size estimate for x402 when Content-Length unknown - Increment cache hit/miss counters in route handler - Fix env var name: IPFS_CACHE_CLEANUP_THRESHOLD_SECONDS - Guard Grafana cache hit rate expr against division by zero - Destroy response stream on 404/408/504 from Kubo (socket leak)
IPFS cached content now triggers the same data-cached event as Arweave, feeding into the existing webhook system. No new configuration needed — operators' existing webhook consumers see IPFS content automatically.
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (4)
src/ipfs/ipfs-service.ts (2)
230-248:⚠️ Potential issue | 🟠 MajorKeep streaming size-limit enforcement independent of cache-write state.
When cache setup/write fails,
failedbecomes true and thedatahandler returns before incrementingbytesWrittenor enforcingmaxResponseSizeBytes. That lets unknown-length oversized responses bypass the streaming cap whenever the cache path is unavailable. Track total bytes in a separate listener/path that always runs, and gate only cache writes onfailed.Suggested structure
- let bytesWritten = 0; + let bytesSeen = 0; + let bytesWritten = 0; ... stream.on('data', (chunk: Buffer) => { - if (failed) return; - bytesWritten += chunk.length; + bytesSeen += chunk.length; // Enforce size limit during streaming (catches chunked responses // that lack Content-Length) if ( this.maxResponseSizeBytes > 0 && - bytesWritten > this.maxResponseSizeBytes + bytesSeen > this.maxResponseSizeBytes ) { failed = true; stream.destroy( ... return; } + + if (failed) return; + bytesWritten += chunk.length;Also applies to: 280-283
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ipfs/ipfs-service.ts` around lines 230 - 248, The data handler currently bails out early when `failed` is true, which prevents `bytesWritten` from being incremented and the `this.maxResponseSizeBytes` check from running; change the logic in the `stream.on('data', ...)` handler so that tracking `bytesWritten` and enforcing the size limit (throwing an `IpfsSizeLimitError` and calling `cleanup()`) always executes regardless of `failed`, while only the cache write path is gated by `failed` (i.e., update references to `bytesWritten`, `this.maxResponseSizeBytes`, `IpfsSizeLimitError`, `cleanup()`, and the cache write block so the size enforcement is independent of cache state).
217-255:⚠️ Potential issue | 🟠 MajorRespect write-side backpressure when teeing to the cache file.
writeStream.write(chunk)is still unchecked andpendingChunkscan grow without a bound while the temp directory/write stream is not ready. Pause/resume ondrain, or switch to apipeline()/PassThroughtee with bounded buffering.Node.js Writable write returns false backpressure drain event recommended handling🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ipfs/ipfs-service.ts` around lines 217 - 255, When teeing stream data into the cache, respect writable backpressure: in the stream.on('data') handler (and when flushing pendingChunks), check the boolean return of writeStream.write(chunk) and if it returns false, pause the source stream (the `stream` readable) and resume it on writeStream's 'drain' event; also enforce a bound on the `pendingChunks` buffer to avoid unbounded growth if the temp dir/writeStream isn't ready, or alternatively replace the manual tee logic with a bounded `PassThrough`/`pipeline` that handles backpressure automatically. Ensure this integrates with existing error/cleanup handling (see `failed`, `cleanup()`, `IpfsSizeLimitError`, and `maxResponseSizeBytes`) so pause/resume and pending buffer limits honor abort/failure states.src/routes/ipfs.ts (1)
172-177:⚠️ Potential issue | 🟠 MajorUse actual streamed bytes for the final unknown-size adjustment.
The initial check now uses a conservative max-size estimate, but the final adjustment still reuses that estimate when
Content-Lengthis unknown. That can overcharge payments or over-consume rate-limit tokens for smaller chunked responses; count bytes during streaming and pass the real total toadjustRateLimitTokens.Also applies to: 241-244
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/routes/ipfs.ts` around lines 172 - 177, The current logic uses a conservative contentSize (result.size || config.IPFS_MAX_RESPONSE_SIZE_BYTES) for both the initial limit check and the final token adjustment, which overcharges for chunked responses; during streaming (the code that pipes/iterates the IPFS response) accumulate the actual number of bytes transmitted and, after the stream finishes, call adjustRateLimitTokens (and any equivalent call around lines 241-244) with that real byte count instead of contentSize so payments and rate-limits are adjusted using the true streamed size; update any references to contentSize used for final adjustments to use the accumulated streamedBytes variable.src/ipfs/ipfs-cache.ts (1)
157-178:⚠️ Potential issue | 🟠 MajorMake cache finalization cleanup-safe and atomic.
This flow still leaves temp files behind when
pipeline()fails input(), and bothput()/putFromFile()can leave finalized data without valid metadata/index state if later steps fail. Delete stale entries before overwrite, write metadata via a temp file, and clean up both temp/final paths on any finalization failure.Also applies to: 203-238
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ipfs/ipfs-cache.ts` around lines 157 - 178, The put()/putFromFile() finalization is not atomic and can leave temp files or finalized data without metadata/index; update the flow in methods like tempDir(), createTempPath(), pipeline(...), dataDir(), dataPath(key), metaPath(key) so you delete any stale final paths before overwrite, write the metadata to a temp meta file and atomically rename it to metaPath(key), and ensure any failure after creating tempPath or moving data cleans up both tempPath and the partially-written dataPath; wrap the sequence (write temp data, rename to dataPath, write temp meta, rename to metaPath, then index.set(key, meta)) in try/catch/finally and remove temp/final artifacts on errors to keep state consistent and atomic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/ipfs/ipfs-cache.ts`:
- Around line 43-46: When building the LRUCache in the constructor (this.index =
new LRUCache<...>), the sizeCalculation callback can return 0 for empty IPFS
objects which breaks lru-cache v11; update the sizeCalculation used in the new
LRUCache initialization to return Math.max(1, entry.size) (or equivalent) so the
cache always receives a positive weight while still keeping the real entry.size
stored on the CacheEntry metadata, leaving dispose and other logic unchanged.
In `@src/ipfs/ipfs-service.ts`:
- Around line 86-95: The current blocklist check only calls
blockListValidator.isIdBlocked(normalizedCid) and can miss alternative CID forms
or a resolved leaf CID for directory paths; update the logic around
normalizedCid and path to (1) check blockListValidator.isIdBlocked for both the
raw submitted CID (e.g., rawCid or the incoming id) and the normalizedCid alias,
and (2) when path is provided and the request will serve a file from a
directory, resolve the content via Kubo to obtain the leaf/resolved CID and call
blockListValidator.isIdBlocked on that leaf CID before serving; keep the
metrics.ipfsBlockedTotal.inc(), span.setAttribute('ipfs.blocked', true) and
throw IpfsBlockedError when any check returns true, and retain the existing path
traversal validation for '..' and leading '/'.
- Around line 144-151: The call to this.streamToCache(normalizedCid, path,
result.stream, result.contentType) on the IPFS response is done too early and
attaches a 'data' listener that forces the Kubo stream into flowing mode,
causing chunks to be consumed before payment/rate-limit and access checks
complete; move the streamToCache invocation out of the initial response path and
call it only after the route handler's payment/rateLimit/access checks succeed
(i.e., after the checks in the ipfs route handler) so caching/teeing only starts
for authorized requests, or alternatively modify streamToCache to tee into a
bounded PassThrough that keeps the original stream paused until the handler
explicitly starts piping, ensuring the Kubo stream isn't switched to flowing
mode prematurely; update references to streamToCache and any uses of
result.stream accordingly.
In `@src/ipfs/kubo-data-source.ts`:
- Around line 102-111: The axios.get call in KuboDataSource is leaving 5xx
response streams open because validateStatus currently rejects 5xx and axios
throws before your status handling; update the request to use validateStatus: ()
=> true so all responses (including 5xx) are returned for explicit handling in
the code path that processes response.status, and also add a defensive cleanup
in the catch block to call .destroy() (or .close() where applicable) on
error.response?.data if it is a stream before re-throwing; locate the axios.get
invocation (the call that passes responseType: 'stream', signal:
controller.signal, headers and maxRedirects) and the surrounding try/catch in
KuboDataSource to apply both changes.
In `@src/routes/ipfs.ts`:
- Around line 218-225: The route is double-counting cache metrics: remove the
redundant increments in the route handler (delete the calls to
metrics.ipfsCacheHitTotal.inc() and metrics.ipfsCacheMissTotal.inc() in the
block that sets cacheStatus and updates metrics) and rely on
IpfsService.getContent() to emit ipfsCacheHitTotal/ipfsCacheMissTotal; keep the
ipfsRequestsTotal.inc(...) and the cacheStatus variable logic intact so
route_type and success metrics remain reported.
---
Duplicate comments:
In `@src/ipfs/ipfs-cache.ts`:
- Around line 157-178: The put()/putFromFile() finalization is not atomic and
can leave temp files or finalized data without metadata/index; update the flow
in methods like tempDir(), createTempPath(), pipeline(...), dataDir(),
dataPath(key), metaPath(key) so you delete any stale final paths before
overwrite, write the metadata to a temp meta file and atomically rename it to
metaPath(key), and ensure any failure after creating tempPath or moving data
cleans up both tempPath and the partially-written dataPath; wrap the sequence
(write temp data, rename to dataPath, write temp meta, rename to metaPath, then
index.set(key, meta)) in try/catch/finally and remove temp/final artifacts on
errors to keep state consistent and atomic.
In `@src/ipfs/ipfs-service.ts`:
- Around line 230-248: The data handler currently bails out early when `failed`
is true, which prevents `bytesWritten` from being incremented and the
`this.maxResponseSizeBytes` check from running; change the logic in the
`stream.on('data', ...)` handler so that tracking `bytesWritten` and enforcing
the size limit (throwing an `IpfsSizeLimitError` and calling `cleanup()`) always
executes regardless of `failed`, while only the cache write path is gated by
`failed` (i.e., update references to `bytesWritten`,
`this.maxResponseSizeBytes`, `IpfsSizeLimitError`, `cleanup()`, and the cache
write block so the size enforcement is independent of cache state).
- Around line 217-255: When teeing stream data into the cache, respect writable
backpressure: in the stream.on('data') handler (and when flushing
pendingChunks), check the boolean return of writeStream.write(chunk) and if it
returns false, pause the source stream (the `stream` readable) and resume it on
writeStream's 'drain' event; also enforce a bound on the `pendingChunks` buffer
to avoid unbounded growth if the temp dir/writeStream isn't ready, or
alternatively replace the manual tee logic with a bounded
`PassThrough`/`pipeline` that handles backpressure automatically. Ensure this
integrates with existing error/cleanup handling (see `failed`, `cleanup()`,
`IpfsSizeLimitError`, and `maxResponseSizeBytes`) so pause/resume and pending
buffer limits honor abort/failure states.
In `@src/routes/ipfs.ts`:
- Around line 172-177: The current logic uses a conservative contentSize
(result.size || config.IPFS_MAX_RESPONSE_SIZE_BYTES) for both the initial limit
check and the final token adjustment, which overcharges for chunked responses;
during streaming (the code that pipes/iterates the IPFS response) accumulate the
actual number of bytes transmitted and, after the stream finishes, call
adjustRateLimitTokens (and any equivalent call around lines 241-244) with that
real byte count instead of contentSize so payments and rate-limits are adjusted
using the true streamed size; update any references to contentSize used for
final adjustments to use the accumulated streamedBytes variable.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: e13ad19b-4fd0-417c-9af1-2f61c76e84c2
📒 Files selected for processing (10)
docker-compose.yamldocs/envs.mdmonitoring/grafana/dashboards/examples/ipfs-example.jsonsrc/config.tssrc/ipfs/ipfs-cache.tssrc/ipfs/ipfs-service.tssrc/ipfs/kubo-data-source.tssrc/middleware/ipfs.tssrc/routes/ipfs.tssrc/system.ts
✅ Files skipped from review due to trivial changes (3)
- docs/envs.md
- src/config.ts
- monitoring/grafana/dashboards/examples/ipfs-example.json
🚧 Files skipped from review as they are similar to previous changes (2)
- src/middleware/ipfs.ts
- docker-compose.yaml
| this.index = new LRUCache<string, CacheEntry>({ | ||
| maxSize: maxSizeBytes, | ||
| sizeCalculation: (entry) => entry.size, | ||
| dispose: (_entry, key) => { |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Does lru-cache v11 require sizeCalculation values to be positive integers when maxSize is configured?
💡 Result:
Yes, lru-cache v11 requires sizeCalculation values to be positive integers when maxSize is configured. The sizeCalculation function must return a value that passes the isPosInt check: a finite number greater than 0 that equals Math.floor(n), i.e., a positive integer. If not, it throws TypeError: "size must be positive integer" as shown in the source code's #requireSize method and issue reports.
Citations:
- 1: Default sizeCalculation or specific error message isaacs/node-lru-cache#221
- 2: https://www.jsdocs.io/package/lru-cache
- 3: https://npmx.dev/package/lru-cache/v/11.2.6
- 4: https://www.npmjs.com/package/lru-cache
- 5: https://github.com/isaacs/node-lru-cache/blob/main/README.md
- 6: https://github.com/isaacs/node-lru-cache/blob/e701ea2dd403dd18641c8d6d536c18c290cb3b3c/src/index.ts
- 7: https://github.com/isaacs/node-lru-cache/blob/757c1570f4032731bf8aab71032a29e50c33ce67/src/index.ts
- 8: https://github.com/isaacs/node-lru-cache/blob/main/src/index.ts
🏁 Script executed:
cat -n src/ipfs/ipfs-cache.ts | head -80 | tail -50Repository: ar-io/ar-io-node
Length of output: 1879
🏁 Script executed:
# Search for entry.size usage and zero-size scenarios
rg "entry\.size|size.*=.*0" src/ipfs/ipfs-cache.ts -A 2 -B 2Repository: ar-io/ar-io-node
Length of output: 394
🏁 Script executed:
# Check the put and putFromFile methods to see how entries are created
ast-grep --pattern $'class $_ {
$$$
put($_) {
$$$
}
$$$
}' src/ipfs/ipfs-cache.tsRepository: ar-io/ar-io-node
Length of output: 42
🏁 Script executed:
# Find put and putFromFile methods
rg "^\s*(async\s+)?put\(|^\s*(async\s+)?putFromFile\(" src/ipfs/ipfs-cache.ts -A 20Repository: ar-io/ar-io-node
Length of output: 1223
🏁 Script executed:
# Find CacheEntry interface definition
rg "interface CacheEntry|type CacheEntry" src/ipfs/ipfs-cache.ts -A 5Repository: ar-io/ar-io-node
Length of output: 151
Guard zero-byte entries before inserting into LRUCache.
Empty IPFS objects can produce entry.size === 0. lru-cache v11 requires sizeCalculation to return a positive integer when maxSize is configured; passing 0 will throw TypeError: "size must be positive integer" at runtime. Use a minimum weight of 1 while preserving the real size in metadata.
Suggested adjustment
maxSize: maxSizeBytes,
- sizeCalculation: (entry) => entry.size,
+ sizeCalculation: (entry) => Math.max(1, entry.size),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| this.index = new LRUCache<string, CacheEntry>({ | |
| maxSize: maxSizeBytes, | |
| sizeCalculation: (entry) => entry.size, | |
| dispose: (_entry, key) => { | |
| this.index = new LRUCache<string, CacheEntry>({ | |
| maxSize: maxSizeBytes, | |
| sizeCalculation: (entry) => Math.max(1, entry.size), | |
| dispose: (_entry, key) => { |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ipfs/ipfs-cache.ts` around lines 43 - 46, When building the LRUCache in
the constructor (this.index = new LRUCache<...>), the sizeCalculation callback
can return 0 for empty IPFS objects which breaks lru-cache v11; update the
sizeCalculation used in the new LRUCache initialization to return Math.max(1,
entry.size) (or equivalent) so the cache always receives a positive weight while
still keeping the real entry.size stored on the CacheEntry metadata, leaving
dispose and other logic unchanged.
| // Check blocklist (uses the same admin API as Arweave data moderation) | ||
| if (await this.blockListValidator.isIdBlocked(normalizedCid)) { | ||
| metrics.ipfsBlockedTotal.inc(); | ||
| span.setAttribute('ipfs.blocked', true); | ||
| throw new IpfsBlockedError(`CID is blocked: ${normalizedCid}`); | ||
| } | ||
|
|
||
| // Reject path traversal attempts | ||
| if (path !== undefined && (path.includes('..') || path.startsWith('/'))) { | ||
| throw new IpfsNotFoundError('Invalid IPFS path'); |
There was a problem hiding this comment.
Check moderation against all CID forms that can serve the content.
Only normalizedCid is checked. If operators block a CIDv0 string as submitted to the existing admin API, or block a child file CID that is served through a directory CID/path, this root-only normalized check can miss it. Check both raw and normalized CID aliases, and consider validating the resolved leaf CID from Kubo response metadata before serving path-based directory content.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ipfs/ipfs-service.ts` around lines 86 - 95, The current blocklist check
only calls blockListValidator.isIdBlocked(normalizedCid) and can miss
alternative CID forms or a resolved leaf CID for directory paths; update the
logic around normalizedCid and path to (1) check blockListValidator.isIdBlocked
for both the raw submitted CID (e.g., rawCid or the incoming id) and the
normalizedCid alias, and (2) when path is provided and the request will serve a
file from a directory, resolve the content via Kubo to obtain the leaf/resolved
CID and call blockListValidator.isIdBlocked on that leaf CID before serving;
keep the metrics.ipfsBlockedTotal.inc(), span.setAttribute('ipfs.blocked', true)
and throw IpfsBlockedError when any check returns true, and retain the existing
path traversal validation for '..' and leading '/'.
| // Stream directly to the client while writing to a temp file on disk | ||
| // for caching. No memory buffering — handles files of any size. | ||
| this.streamToCache( | ||
| normalizedCid, | ||
| path, | ||
| result.stream, | ||
| result.contentType, | ||
| ); |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Node.js Readable stream adding data listener switches stream to flowing mode before pipe consumer attached
💡 Result:
Yes, adding a 'data' event listener to a Node.js Readable stream switches it from paused mode to flowing mode immediately, before any pipe consumer is attached. This is the designed behavior of Readable streams. All Readable streams start in paused mode. They switch to flowing mode in these ways: - Adding a 'data' event handler - Calling stream.resume - Calling stream.pipe to attach a Writable Once in flowing mode, data is automatically read from the source and emitted via 'data' events as fast as possible. If no consumers are ready (like a pipe destination), data may be lost. Attaching a 'data' listener explicitly switches to flowing mode regardless of whether a pipe is attached yet. To avoid this and keep paused mode for manual control (e.g., before piping), use the 'readable' event instead of 'data', or call stream.read explicitly. Example: const readable = getReadableStreamSomehow; // This switches to flowing mode immediately: readable.on('data', (chunk) => { /* handle */ }); // Data flows even if no pipe is attached yet. Official Node.js documentation confirms this behavior across versions up to v25.9.0.
Citations:
- 1: https://nodejs.org/api/stream.html
- 2: https://nodejs.org/docs/latest/api/stream.html
- 3: https://node.readthedocs.io/en/latest/api/stream/
- 4: https://nodejs.org/dist/latest-v16.x/docs/api/stream.html
- 5: https://wanago.io/2019/03/04/node-js-typescript-4-paused-and-flowing-modes-of-a-readable-stream/
- 6: https://blog.logrocket.com/guide-node-js-readable-streams/
- 7: https://nodejs.org/download/release/v4.8.6/docs/api/stream.html
🏁 Script executed:
cd /tmp/repo && find . -name "ipfs-service.ts" -type f | head -5Repository: ar-io/ar-io-node
Length of output: 119
🏁 Script executed:
cd /tmp/repo && cat -n src/ipfs/ipfs-service.ts | sed -n '140,160p'Repository: ar-io/ar-io-node
Length of output: 119
🏁 Script executed:
cd /tmp/repo && cat -n src/ipfs/ipfs-service.ts | sed -n '225,260p'Repository: ar-io/ar-io-node
Length of output: 119
🏁 Script executed:
cd /tmp/repo && rg -A 20 "streamToCache" src/ipfs/ipfs-service.ts | head -50Repository: ar-io/ar-io-node
Length of output: 119
🏁 Script executed:
cat -n src/ipfs/ipfs-service.ts | sed -n '140,160p'Repository: ar-io/ar-io-node
Length of output: 799
🏁 Script executed:
cat -n src/ipfs/ipfs-service.ts | sed -n '225,260p'Repository: ar-io/ar-io-node
Length of output: 1332
🏁 Script executed:
rg -A 25 "streamToCache" src/ipfs/ipfs-service.ts | head -80Repository: ar-io/ar-io-node
Length of output: 1432
🏁 Script executed:
rg -B 5 -A 20 "getContent.*{" src/ipfs/ipfs-service.ts | head -100Repository: ar-io/ar-io-node
Length of output: 149
🏁 Script executed:
cat -n src/ipfs/ipfs-service.ts | sed -n '100,175p'Repository: ar-io/ar-io-node
Length of output: 2565
🏁 Script executed:
rg -A 15 "src/routes/ipfs" src/ipfs/ipfs-service.ts | head -30Repository: ar-io/ar-io-node
Length of output: 42
🏁 Script executed:
find . -name "ipfs.ts" -path "*/routes/*" -type fRepository: ar-io/ar-io-node
Length of output: 80
🏁 Script executed:
cat -n src/routes/ipfs.ts | sed -n '1,100p'Repository: ar-io/ar-io-node
Length of output: 3135
🏁 Script executed:
rg -A 30 "ipfsService.getContent\|streamToCache" src/routes/ipfs.ts | head -80Repository: ar-io/ar-io-node
Length of output: 42
🏁 Script executed:
rg -B 10 -A 10 "\.pipe\(" src/routes/ipfs.ts | head -80Repository: ar-io/ar-io-node
Length of output: 660
🏁 Script executed:
cat -n src/routes/ipfs.ts | sed -n '110,200p'Repository: ar-io/ar-io-node
Length of output: 3188
🏁 Script executed:
rg -A 80 "async function handleIpfsRequest" src/routes/ipfs.ts | head -120Repository: ar-io/ar-io-node
Length of output: 2359
🏁 Script executed:
cat -n src/routes/ipfs.ts | sed -n '200,240p'Repository: ar-io/ar-io-node
Length of output: 1662
🏁 Script executed:
rg -B 5 "result.stream.pipe" src/routes/ipfs.tsRepository: ar-io/ar-io-node
Length of output: 214
Move streamToCache() call to after payment/rate-limit checks to prevent data loss.
streamToCache() attaches a data listener (line 230 in ipfs-service.ts) which immediately switches the Kubo stream from paused mode to flowing mode. This occurs before the route handler completes payment/rate-limit checks (ipfs.ts:177-189). Chunks emitted during the check window are consumed by the cache listener and will not reach the pipe consumer at ipfs.ts:231, causing truncated responses.
Additionally, move the call after access checks to avoid pre-authorization Kubo/disk work on blocked or rate-limited requests. Either defer cache setup until after checks, or wrap the tee in a bounded PassThrough that defers flowing mode.
Also applies to: 230-256
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ipfs/ipfs-service.ts` around lines 144 - 151, The call to
this.streamToCache(normalizedCid, path, result.stream, result.contentType) on
the IPFS response is done too early and attaches a 'data' listener that forces
the Kubo stream into flowing mode, causing chunks to be consumed before
payment/rate-limit and access checks complete; move the streamToCache invocation
out of the initial response path and call it only after the route handler's
payment/rateLimit/access checks succeed (i.e., after the checks in the ipfs
route handler) so caching/teeing only starts for authorized requests, or
alternatively modify streamToCache to tee into a bounded PassThrough that keeps
the original stream paused until the handler explicitly starts piping, ensuring
the Kubo stream isn't switched to flowing mode prematurely; update references to
streamToCache and any uses of result.stream accordingly.
| const response = await axios.get(url, { | ||
| responseType: 'stream', | ||
| signal: controller.signal, | ||
| headers: { | ||
| 'Accept-Encoding': 'identity', | ||
| }, | ||
| maxRedirects: 5, | ||
| // Accept non-2xx so we can handle 404/408/504 ourselves | ||
| validateStatus: (status) => status < 500 || status === 504, | ||
| }); |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Axios responseType stream validateStatus rejected response includes response.data stream destroy
💡 Result:
When using Axios with responseType: 'stream' in Node.js, if the server returns a non-2xx status code (causing rejection unless validateStatus is customized), the rejected promise's error.response.data is a Node.js stream containing the full response body. By default, Axios rejects promises for status codes outside 2xx range via validateStatus. For stream responses, this stream in error.response.data is not automatically consumed/buffered into a string or object—you must read it manually (e.g., collect chunks and parse if JSON). Example from real-world usage (Axios issue #6468): try { const response = await axios.post(url, data, { responseType: 'stream' }); } catch (error) { if (error.config.responseType === 'stream' && error.response?.data && typeof error.response.data.read === 'function') { const chunks = []; for await (const chunk of error.response.data) { chunks.push(Buffer.isBuffer(chunk) ? chunk : Buffer.from(chunk)); } error.response.data = JSON.parse(Buffer.concat(chunks).toString('utf8')); } throw error; } Regarding "stream destroy": The stream won't auto-destroy on rejection/scope exit. If unconsumed, it may leak resources or hang. Always consume (read to end) or explicitly call error.response.data.destroy in catch blocks after handling. Recent fixes (v1.13.1, Oct 2025) ensure streams aren't interrupted prematurely for non-OK statuses, preserving access. To avoid rejection for expected error streams, use validateStatus: (status) => true and check response.status manually.
Citations:
- 1: Different responseType for error vs response axios/axios#6468
- 2: https://github.com/axios/axios/blob/main/CHANGELOG.md
- 3: fix(http): fixed a regression that caused the data stream to be interrupted for responses with non-OK HTTP statuses axios/axios#7193
🏁 Script executed:
wc -l src/ipfs/kubo-data-source.tsRepository: ar-io/ar-io-node
Length of output: 92
🏁 Script executed:
cat -n src/ipfs/kubo-data-source.ts | sed -n '95,125p'Repository: ar-io/ar-io-node
Length of output: 1232
🏁 Script executed:
cat -n src/ipfs/kubo-data-source.ts | sed -n '170,215p'Repository: ar-io/ar-io-node
Length of output: 1503
🏁 Script executed:
rg "axios\.(get|post|request)" src/ipfs/kubo-data-source.ts -nRepository: ar-io/ar-io-node
Length of output: 109
🏁 Script executed:
cat -n src/ipfs/kubo-data-source.ts | sed -n '1,100p'Repository: ar-io/ar-io-node
Length of output: 3347
🏁 Script executed:
cat -n src/ipfs/kubo-data-source.ts | grep -A 30 "async.*{" | head -80Repository: ar-io/ar-io-node
Length of output: 1102
🏁 Script executed:
rg "validateStatus|catch" src/ipfs/kubo-data-source.ts -B 3 -A 5Repository: ar-io/ar-io-node
Length of output: 600
Destroy rejected 5xx response streams from Axios.
With the current validateStatus, Kubo 500/502/503 responses are rejected by Axios before the explicit status handling. The stream in error.response?.data is never destroyed and remains open in the catch block, causing a resource leak.
Handle this by changing to validateStatus: () => true and managing all statuses explicitly, or destroy error.response.data before re-throwing in the catch block:
Suggested fix
- // Accept non-2xx so we can handle 404/408/504 ourselves
- validateStatus: (status) => status < 500 || status === 504,
+ // Accept all statuses so streamed error bodies can be destroyed here.
+ validateStatus: () => true,
});Then extend the existing status handling:
if (response.status === 408 || response.status === 504) {
(response.data as Readable).destroy();
throw new IpfsTimeoutError(
`Kubo timed out resolving: /ipfs/${ipfsPath}`,
);
}
+
+ if (response.status >= 500) {
+ (response.data as Readable).destroy();
+ throw new IpfsUnavailableError(
+ `Kubo returned ${response.status} for /ipfs/${ipfsPath}`,
+ );
+ }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ipfs/kubo-data-source.ts` around lines 102 - 111, The axios.get call in
KuboDataSource is leaving 5xx response streams open because validateStatus
currently rejects 5xx and axios throws before your status handling; update the
request to use validateStatus: () => true so all responses (including 5xx) are
returned for explicit handling in the code path that processes response.status,
and also add a defensive cleanup in the catch block to call .destroy() (or
.close() where applicable) on error.response?.data if it is a stream before
re-throwing; locate the axios.get invocation (the call that passes responseType:
'stream', signal: controller.signal, headers and maxRedirects) and the
surrounding try/catch in KuboDataSource to apply both changes.
| // Track metrics | ||
| const cacheStatus = result.cached ? 'hit' : 'miss'; | ||
| metrics.ipfsRequestsTotal.inc({ route_type: routeType, status: 'success' }); | ||
| if (result.cached) { | ||
| metrics.ipfsCacheHitTotal.inc(); | ||
| } else { | ||
| metrics.ipfsCacheMissTotal.inc(); | ||
| } |
There was a problem hiding this comment.
Avoid double-counting IPFS cache hit/miss metrics.
IpfsService.getContent() already increments ipfsCacheHitTotal/ipfsCacheMissTotal on lines 98-117 of src/ipfs/ipfs-service.ts; incrementing them again here reports every request twice. Keep the counters in one layer only.
Suggested route-side removal
const cacheStatus = result.cached ? 'hit' : 'miss';
metrics.ipfsRequestsTotal.inc({ route_type: routeType, status: 'success' });
- if (result.cached) {
- metrics.ipfsCacheHitTotal.inc();
- } else {
- metrics.ipfsCacheMissTotal.inc();
- }
if (result.size > 0) {
metrics.ipfsContentSizeHistogram.observe(result.size);
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/routes/ipfs.ts` around lines 218 - 225, The route is double-counting
cache metrics: remove the redundant increments in the route handler (delete the
calls to metrics.ipfsCacheHitTotal.inc() and metrics.ipfsCacheMissTotal.inc() in
the block that sets cacheStatus and updates metrics) and rely on
IpfsService.getContent() to emit ipfsCacheHitTotal/ipfsCacheMissTotal; keep the
ipfsRequestsTotal.inc(...) and the cacheStatus variable logic intact so
route_type and success metrics remain reported.
…IPFS cache (PE-9067)
- IPFS blocked response now matches Arweave format (plain text with ID)
- Emit data-cached webhook event when IPFS content is cached (same event
as Arweave, feeds into content scanner pipeline automatically)
There was a problem hiding this comment.
♻️ Duplicate comments (2)
src/routes/ipfs.ts (2)
221-225:⚠️ Potential issue | 🟡 MinorDuplicate cache metrics increment (still unresolved).
IpfsService.getContent()already emitsipfsCacheHitTotal/ipfsCacheMissTotal, so these route-level increments double-count every request. Keep them in the service layer only.♻️ Proposed removal
const cacheStatus = result.cached ? 'hit' : 'miss'; metrics.ipfsRequestsTotal.inc({ route_type: routeType, status: 'success' }); - if (result.cached) { - metrics.ipfsCacheHitTotal.inc(); - } else { - metrics.ipfsCacheMissTotal.inc(); - } if (result.size > 0) { metrics.ipfsContentSizeHistogram.observe(result.size); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/routes/ipfs.ts` around lines 221 - 225, Remove the duplicate cache metric increments from the route handler: the block that checks result.cached and calls metrics.ipfsCacheHitTotal.inc() / metrics.ipfsCacheMissTotal.inc() should be deleted because IpfsService.getContent() already emits those metrics; keep metrics emission only inside IpfsService.getContent() and rely on its instrumentation (refer to result.cached and metrics.ipfsCacheHitTotal / metrics.ipfsCacheMissTotal in the route to locate the code to remove).
231-251:⚠️ Potential issue | 🟠 MajorFinal rate-limit adjustment still uses the max-size fallback instead of actual streamed bytes.
The initial check now correctly uses
IPFS_MAX_RESPONSE_SIZE_BYTESas a conservative upper bound (good), but theres.on('finish')adjustment at Line 243 reuses the samecontentSizefallback whenresult.sizeis unknown. For chunked/streaming responses the adjustment should reconcile against the bytes actually written to the socket, otherwise clients are charged/throttled at the maximum for every unknown-length response (e.g., directory listings, small files over chunked transport).Track streamed bytes via a pass-through or by counting bytes as they flow into
res, then pass that count toadjustRateLimitTokens.♻️ Sketch of the fix
+ let streamedBytes = 0; + result.stream.on('data', (chunk: Buffer) => { + streamedBytes += chunk.length; + }); result.stream.pipe(res); res.on('finish', () => { const durationSec = (Date.now() - startTime) / 1000; metrics.ipfsRequestDurationHistogram.observe( { route_type: routeType, cache_status: cacheStatus }, durationSec, ); adjustRateLimitTokens({ req, - responseSize: result.size > 0 ? result.size : contentSize, + responseSize: result.size > 0 ? result.size : streamedBytes, initialResult: limitCheck, rateLimiter, }).catch((error) => {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/routes/ipfs.ts` around lines 231 - 251, The final rate-limit adjustment is still using the conservative contentSize fallback instead of the actual number of bytes streamed; change the piping to count bytes as they are sent and pass that count into adjustRateLimitTokens. Specifically, when handling result.stream.pipe(res) (in the res.on('finish') block), create a streamedBytes counter and increment it for each chunk (either by wrapping result.stream with a PassThrough and counting chunk.length on its 'data' events or by listening to result.stream's 'data' events), then call adjustRateLimitTokens with responseSize: result.size > 0 ? result.size : streamedBytes; ensure you attach error handlers/cleanup for the stream and preserve the existing initialResult and rateLimiter parameters when calling adjustRateLimitTokens.
🧹 Nitpick comments (2)
src/routes/ipfs.ts (2)
35-45: Add TSDoc to the exported/public functions.The four top-level functions (
createIpfsRouter,createIpfsHandler,createIpfsPathHandler,handleIpfsRequest) have no TSDoc. Since this is a newly introduced public surface, brief docstrings describing the routing contract (what the returned router mounts, which request fieldscreateIpfsHandlerexpects from middleware, therouteTypesemantics) would make consumer wiring fromsrc/app.tsandsrc/middleware/ipfs.tsclearer.As per coding guidelines: "Add or improve TSDoc comments on code you touch".
Also applies to: 60-70, 88-98, 139-159
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/routes/ipfs.ts` around lines 35 - 45, Add TSDoc comments to the four exported functions createIpfsRouter, createIpfsHandler, createIpfsPathHandler, and handleIpfsRequest: for each function document its purpose, parameters (including types like log, ipfsService, rateLimiter, paymentProcessor and expected middleware-populated request fields), what routes or path patterns the returned Router/handler mounts or handles, the meaning/semantics of routeType and any other enum/flag used, and the return value (Router or RequestHandler) and possible thrown errors; keep entries brief (one-line summary + param/@returns/@throws tags) so consumers of src/app.ts and src/middleware/ipfs.ts can wire these functions correctly.
71-86: Avoid(req as any)— extend the ExpressRequesttype.
createIpfsHandlerreadsipfsCid/ipfsPathpopulated by the subdomain middleware via untyped casts. A small declaration-merging module gives both sides a typed contract and prevents silent drift if the middleware renames these fields.♻️ Example declaration
// src/types/express.d.ts (or similar) declare global { namespace Express { interface Request { ipfsCid?: string; ipfsPath?: string; } } } export {};Then:
- const cidString = (req as any).ipfsCid as string; - const path = (req as any).ipfsPath as string | undefined; + const cidString = req.ipfsCid!; + const path = req.ipfsPath;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/routes/ipfs.ts` around lines 71 - 86, The code casts req to any to access ipfsCid/ipfsPath in createIpfsHandler/asyncHandler before calling handleIpfsRequest; add an Express declaration-merging file (e.g., src/types/express.d.ts) that extends Express.Request with optional ipfsCid?: string and ipfsPath?: string, export an empty object to make it a module, then remove the (req as any) casts in createIpfsHandler and reference req.ipfsCid and req.ipfsPath directly so the types are enforced and the middleware/handler contract is explicit.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/routes/ipfs.ts`:
- Around line 221-225: Remove the duplicate cache metric increments from the
route handler: the block that checks result.cached and calls
metrics.ipfsCacheHitTotal.inc() / metrics.ipfsCacheMissTotal.inc() should be
deleted because IpfsService.getContent() already emits those metrics; keep
metrics emission only inside IpfsService.getContent() and rely on its
instrumentation (refer to result.cached and metrics.ipfsCacheHitTotal /
metrics.ipfsCacheMissTotal in the route to locate the code to remove).
- Around line 231-251: The final rate-limit adjustment is still using the
conservative contentSize fallback instead of the actual number of bytes
streamed; change the piping to count bytes as they are sent and pass that count
into adjustRateLimitTokens. Specifically, when handling result.stream.pipe(res)
(in the res.on('finish') block), create a streamedBytes counter and increment it
for each chunk (either by wrapping result.stream with a PassThrough and counting
chunk.length on its 'data' events or by listening to result.stream's 'data'
events), then call adjustRateLimitTokens with responseSize: result.size > 0 ?
result.size : streamedBytes; ensure you attach error handlers/cleanup for the
stream and preserve the existing initialResult and rateLimiter parameters when
calling adjustRateLimitTokens.
---
Nitpick comments:
In `@src/routes/ipfs.ts`:
- Around line 35-45: Add TSDoc comments to the four exported functions
createIpfsRouter, createIpfsHandler, createIpfsPathHandler, and
handleIpfsRequest: for each function document its purpose, parameters (including
types like log, ipfsService, rateLimiter, paymentProcessor and expected
middleware-populated request fields), what routes or path patterns the returned
Router/handler mounts or handles, the meaning/semantics of routeType and any
other enum/flag used, and the return value (Router or RequestHandler) and
possible thrown errors; keep entries brief (one-line summary +
param/@returns/@throws tags) so consumers of src/app.ts and
src/middleware/ipfs.ts can wire these functions correctly.
- Around line 71-86: The code casts req to any to access ipfsCid/ipfsPath in
createIpfsHandler/asyncHandler before calling handleIpfsRequest; add an Express
declaration-merging file (e.g., src/types/express.d.ts) that extends
Express.Request with optional ipfsCid?: string and ipfsPath?: string, export an
empty object to make it a module, then remove the (req as any) casts in
createIpfsHandler and reference req.ipfsCid and req.ipfsPath directly so the
types are enforced and the middleware/handler contract is explicit.
Summary
Add opt-in IPFS content serving to AR.IO Gateway via a Kubo Docker sidecar. Operators enable with
IPFS_ENABLED=trueanddocker compose --profile ipfs up -d.GET /ipfs/{CID}andGET /ipfs/{CID}/{path}{CID}.{gateway-host}/(same wildcard cert as ArNS)PUT /ar-io/admin/block-dataAPI*.hostcerts)/ar-io/infoadvertises IPFS capability for network discoveryDefault off (
IPFS_ENABLED=false). Zero runtime impact when disabled. No new ports required — Kubo fetches content outbound.Foundation for Phase 2: ArNS names resolving to IPFS CIDs (~10 line middleware change).
Operator Quick Start
Architecture
Full architecture doc:
docs/ipfs-integration.mdTest Plan
GET /ipfs/{CIDv1}serves content (tested: BAYC NFTs, Vitalik blog, 10MB files){CIDv1}.{host}/serves via subdomain