feat: add IPFS CID serving via Kubo sidecar (PE-9067) by vilenarios · Pull Request #682 · ar-io/ar-io-node

vilenarios · 2026-04-21T02:44:14Z

Summary

Add opt-in IPFS content serving to AR.IO Gateway via a Kubo Docker sidecar. Operators enable with IPFS_ENABLED=true and docker compose --profile ipfs up -d.

Path-based: GET /ipfs/{CID} and GET /ipfs/{CID}/{path}
Subdomain-based: {CID}.{gateway-host}/ (same wildcard cert as ArNS)
x402 payment protection on all IPFS endpoints (same as Arweave data)
HTTPSIG response signing — verifiable IPFS responses tied to gateway identity
Content moderation via existing PUT /ar-io/admin/block-data API
LRU filesystem cache with streaming-to-disk (no memory buffering)
Separate rate limiter pool (defaults match Arweave rate limits)
OTEL tracing spans for IPFS request lifecycle
CIDv0 → CIDv1 redirect (DNS-safe, works with *.host certs)
Cross-CID redirect for directory listing navigation
/ar-io/info advertises IPFS capability for network discovery

Default off (IPFS_ENABLED=false). Zero runtime impact when disabled. No new ports required — Kubo fetches content outbound.

Foundation for Phase 2: ArNS names resolving to IPFS CIDs (~10 line middleware change).

Operator Quick Start

# .env
IPFS_ENABLED=true

# Start (adds Kubo sidecar)
docker compose --profile ipfs up -d

Architecture

Client → HTTPS → AR.IO Gateway → [IPFS middleware] → [x402 + rate limit]
    → [blocklist check] → [cache] → [Kubo HTTP gateway] → IPFS network
                                         ↓ (on miss)
                                    [stream to disk cache]

Full architecture doc: docs/ipfs-integration.md

Test Plan

Add opt-in IPFS content serving to AR.IO Gateway. Operators enable IPFS_ENABLED=true and start a Kubo Docker sidecar (--profile ipfs) to serve IPFS content via path-based (/ipfs/{CID}) and subdomain-based ({CID}.{gateway-host}) access patterns. Features: - Kubo HTTP gateway integration with connection + stall timeouts - LRU bounded filesystem cache (streams to disk, not memory) - File-based CID blocklist with hot-reload - Separate rate limiter pool for IPFS traffic - x402 payment protection (same as Arweave data endpoints) - HTTPSIG response signing for verifiable IPFS responses - CIDv0 to CIDv1 base32 redirect (DNS-safe, works with wildcard certs) - Cross-CID redirect for directory listing navigation - Path traversal protection - Docker Compose profile with TCP+UDP swarm ports Default off (IPFS_ENABLED=false). Zero runtime impact when disabled. Designed as foundation for Phase 2 ArNS-to-CID resolution.

Fix strict-boolean-expressions (explicit nullish checks) and prettier formatting issues caught by CI eslint.

Add OpenTelemetry span tracing to IPFS request lifecycle: - IpfsService.getContent span (cache check, blocklist, delegation) - KuboDataSource.getContent span (HTTP fetch with latency attributes) - Spans record cache hit/miss, content size, errors, and content type Also fix strict-boolean-expressions and prettier formatting for CI.

- Fix prettier line-length formatting (5 errors from CI) - Add IPFS cache volume mount to docker-compose (data persists) - Add IPFS field to /ar-io/info endpoint (network discovery) - Add OTEL span tracing to IpfsService and KuboDataSource

Remove custom text-file blocklist in favor of the existing PUT /ar-io/admin/block-data API. Operators block IPFS CIDs the same way they block Arweave TX IDs — unified moderation, single API. Removed: IpfsBlocklist class, IPFS_BLOCKLIST_PATH config, blocklist volume mount. IpfsService now uses DataBlockListValidator (SQLite).

…or moderation (PE-9067) - IPFS rate limiter defaults now match Arweave (100K IP tokens, 20/s refill, 1M resource tokens, 100/s refill) - Remove custom text-file blocklist — use existing PUT /ar-io/admin/block-data API for CID moderation (unified with Arweave content moderation) - Remove IPFS_BLOCKLIST_PATH config and volume mount - Add IPFS cache volume mount to docker-compose for persistence

…it defaults (PE-9067) - Add IPFS Grafana dashboard example (requests/sec, cache hit rate, latency percentiles, content size, blocked requests, route type) - Update OpenAPI spec: block-data endpoint accepts IPFS CIDs - Update ipfs-integration.md: admin API for moderation (not text file) - Match IPFS rate limiter defaults to Arweave (100K IP, 1M resource)

- Enforce IPFS_MAX_RESPONSE_SIZE_BYTES (reject with 413 when Content-Length exceeds limit) - Destroy response stream on 404/408/504 from Kubo (prevents socket leak) - Fix docs/envs.md rate limiter defaults to match config.ts - Add Grafana dashboard example for IPFS metrics - Update OpenAPI spec for CID content moderation

coderabbitai · 2026-04-21T02:51:45Z

📝 Walkthrough

Walkthrough

Adds an opt-in IPFS gateway: Kubo sidecar in compose, config/env vars, CID utilities, filesystem LRU cache, Kubo streaming data source, IpfsService with tracing/limits/cache, Express subdomain/path middleware and routes, metrics, docs, tests, and dashboards.

Changes

Cohort / File(s)	Summary
Build & Compose `/.dockerignore`, `docker-compose.yaml`, `package.json`	Ignore additions in `.dockerignore`; add `kubo` service and IPFS mounts/envs to `docker-compose.yaml`; add `multiformats` dependency.
Configuration `src/config.ts`	Add IPFS env-backed exports (enable flag, Kubo URL/timeouts, cache path/size/cleanup, rate-limiter params, max response size).
Core IPFS Implementation `src/ipfs/kubo-data-source.ts`, `src/ipfs/ipfs-service.ts`, `src/ipfs/ipfs-cache.ts`, `src/ipfs/ipfs-rate-limiter.ts`	New KuboDataSource with streaming/timeouts and typed errors; IpfsService handling CID normalization, blocklist, cache, streaming, size limits and tracing; filesystem-backed LRU cache with temp-file write/finalize; IPFS-specific rate limiter factory.
Routing & Middleware `src/routes/ipfs.ts`, `src/middleware/ipfs.ts`, `src/app.ts`, `src/system.ts`	New IPFS router/handler for `/ipfs/:cid` and subdomain flows; subdomain middleware that normalizes/redirects; app wiring and conditional system initialization of ipfsService and limiter.
Utilities & Signing `src/lib/ipfs-cid.ts`, `src/lib/httpsig.ts`	CID parsing/conversion utilities and tests; extend HTTP-signing trigger headers and co-signable headers to include IPFS-related headers.
Metrics & Observability `src/metrics.ts`, `monitoring/grafana/dashboards/examples/ipfs-example.json`	Add IPFS Prometheus metrics (requests, cache hits/misses, size, duration, blocked) and a Grafana dashboard JSON.
Docs & OpenAPI `CLAUDE.md`, `docs/INDEX.md`, `docs/envs.md`, `docs/ipfs-integration.md`, `docs/openapi.yaml`	Document IPFS feature, env vars, integration guide, update admin block-data OpenAPI docs to accept IPFS CIDs.
Tests & Scripts `src/ipfs/ipfs-cid.test.ts`, `src/ipfs/kubo-data-source.test.ts`, `test-ipfs.sh`	Add unit tests for CID utilities and Kubo error handling; add end-to-end IPFS integration script exercising Kubo/gateway interactions.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant ExpressApp as Express App
    participant IpfsHandler as IPFS Handler
    participant IpfsService as IpfsService
    participant Cache as IpfsFsCache
    participant Kubo as KuboDataSource
    participant KuboNode as Kubo Node

    Client->>ExpressApp: GET /ipfs/{CID} or {CID}.{host}
    ExpressApp->>IpfsHandler: dispatch request
    IpfsHandler->>IpfsService: getContent({cid, path, signal})
    IpfsService->>IpfsService: normalize CID -> v1 base32
    IpfsService->>IpfsService: check blocklist
    IpfsService->>Cache: has/get(cid,path)
    alt Cache hit
        Cache-->>IpfsService: stream + metadata
        IpfsService-->>IpfsHandler: return cached stream
    else Cache miss
        IpfsService->>Kubo: getContent(cid,path)
        Kubo->>KuboNode: HTTP GET /ipfs/{cid}/{path}
        KuboNode-->>Kubo: stream response
        Kubo-->>IpfsService: stream + metadata
        IpfsService->>Cache: write temp file & finalize (async)
        IpfsService-->>IpfsHandler: return live stream
    end
    IpfsHandler->>IpfsHandler: check payment & rate limits
    IpfsHandler->>Client: set headers (ETag, Cache-Control, X-Ipfs-Path) and stream response

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

fix: replace wall-clock stream timeout with stall-based timeout #635: Implements stall-based stream timeouts and helpers used by KuboDataSource for stream stall enforcement.
feat: expose rate limiter and x402 settings on /ar-io/info endpoint (PE-8665) #516: Modifies ar-io info construction (related to added ipfs capability in buildArIoInfo).
feat: add x402 payment and rate limiting to chunk GET endpoint (PE-8654) #514: Changes payment/rate-limiting integration that overlaps with how IPFS routes call checkPaymentAndRateLimits.

Suggested reviewers

djwhitt
dtfiedler

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 53.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat: add IPFS CID serving via Kubo sidecar (PE-9067)' clearly and specifically summarizes the main change: adding IPFS content serving functionality via a Kubo sidecar, which aligns with the extensive changeset introducing IPFS routing, caching, services, and configuration.
Description check	✅ Passed	The description comprehensively explains the IPFS feature addition, detailing enabled access patterns (path and subdomain), key features (payment protection, caching, rate limiting, signing), operator quick start, architecture overview, and test coverage, all directly related to the changeset.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch PE-9067-add-ipfs-cid

⚔️ Resolve merge conflicts

Resolve merge conflict in branch PE-9067-add-ipfs-cid

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

🟡 Minor comments (17)

test-ipfs.sh-55-71 (1)
55-71: ⚠️ Potential issue | 🟡 Minor

test_header with empty expected_value passes even when the header is absent.

echo "$actual" | grep -qi "" matches any string (including empty), so "X-Cache header present" and "Content-Type is set" will report PASS even if the header is missing from the response. Check for non-empty actual explicitly when the intent is presence-only:
🔧 Proposed fix
   actual=$(curl -s -I --max-time 30 "$url" 2>&1 | grep -i "^$header:" | head -1 | sed 's/^[^:]*: //' | tr -d '\r')
 
-  if echo "$actual" | grep -qi "$expected_value"; then
+  if [ -n "$expected_value" ]; then
+    match_ok=$(echo "$actual" | grep -qi "$expected_value" && echo yes || echo no)
+  else
+    match_ok=$([ -n "$actual" ] && echo yes || echo no)
+  fi
+  if [ "$match_ok" = "yes" ]; then
     echo -e "  ${GREEN}PASS${NC} $name ($actual)"
     ((pass++))
   else
Also applies to: 136-142
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test-ipfs.sh` around lines 55 - 71, The test_header function incorrectly
treats an empty expected_value as a match because echo "$actual" | grep -qi ""
always succeeds; update test_header to first check whether expected_value is
empty and in that case assert that actual is non-empty (i.e., the header
exists), otherwise perform the current case-insensitive substring match; refer
to the test_header function and use the existing local variables actual and
expected_value to implement the conditional branch so presence-only checks fail
when the header is absent.
test-ipfs.sh-93-101 (1)
93-101: ⚠️ Potential issue | 🟡 Minor

Hardcoded container name ar-io-node-kubo-1 is brittle.

The container name is derived from the Compose project name (default = directory name). If the operator runs Compose from a different path or sets COMPOSE_PROJECT_NAME, both docker exec calls fail and the script aborts without a clear diagnostic. Resolve the container by service label instead:
🔧 Proposed fix
-FILE_CID=$(echo "Hello from AR.IO IPFS integration test!" | docker exec -i ar-io-node-kubo-1 ipfs add -q 2>&1)
+KUBO_CONTAINER=$(docker ps --filter "label=com.docker.compose.service=kubo" --format '{{.Names}}' | head -1)
+if [ -z "$KUBO_CONTAINER" ]; then
+  echo -e "  ${RED}FAIL${NC} No running kubo container found"
+  exit 1
+fi
+FILE_CID=$(echo "Hello from AR.IO IPFS integration test!" | docker exec -i "$KUBO_CONTAINER" ipfs add -q 2>&1)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test-ipfs.sh` around lines 93 - 101, Replace the hardcoded container name
used in the two docker exec calls (currently "ar-io-node-kubo-1") by resolving
the running container ID via its Compose service label before assigning FILE_CID
and DIR_CID: run a docker ps lookup filtering on the Compose service label (e.g.
"label=com.docker.compose.service=kubo") and fall back to the Compose
project+service label if needed, store that container ID in a variable (use that
variable in the two docker exec invocations that populate FILE_CID and DIR_CID)
so the script works regardless of COMPOSE_PROJECT_NAME or working directory.
docs/envs.md-376-376 (1)
376-376: ⚠️ Potential issue | 🟡 Minor

Fix the documented cache cleanup env var name.

The implementation exports IPFS_CACHE_CLEANUP_THRESHOLD_SECONDS, but the docs list IPFS_CACHE_CLEANUP_THRESHOLD, so operators would configure a no-op variable.
Proposed doc fix
-| IPFS_CACHE_CLEANUP_THRESHOLD              | Number  | 3600                | Age in seconds before cached files become eviction candidates       |
+| IPFS_CACHE_CLEANUP_THRESHOLD_SECONDS      | Number  | 3600                | Age in seconds before cached files become eviction candidates       |
Based on learnings, update relevant documentation files in docs/ in the same PR when changing behavior affecting documented contracts.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/envs.md` at line 376, The docs list the wrong environment variable name
— update all documentation entries that reference IPFS_CACHE_CLEANUP_THRESHOLD
to the actual exported name IPFS_CACHE_CLEANUP_THRESHOLD_SECONDS, keeping the
same default value (3600) and description ("Age in seconds before cached files
become eviction candidates"), and search other files under docs/ for any
remaining occurrences of IPFS_CACHE_CLEANUP_THRESHOLD to replace them so docs
match the implementation.
src/routes/ar-io-info-builder.ts-141-141 (1)
141-141: ⚠️ Potential issue | 🟡 Minor

Document the new /ar-io/info ipfs field in OpenAPI.

This builder can now emit ipfs: { enabled: true }, but the provided /ar-io/info OpenAPI schema does not expose that property, so generated clients won’t see the advertised capability.

Based on learnings, update relevant documentation files in docs/ in the same PR when changing behavior affecting documented contracts.

Also applies to: 325-327
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/routes/ar-io-info-builder.ts` at line 141, The OpenAPI schema for the
/ar-io/info response needs to include the new optional ipfs property so
generated clients see ipfs: { enabled: true }; update the response model in the
OpenAPI spec (the schema that maps to the TypeScript builder which now includes
ipfs?: IpfsInfo in src/routes/ar-io-info-builder.ts) to add an ipfs property
(object with enabled:boolean) and ensure any components/schemas referenced are
added or updated accordingly; also update the docs/ pages that describe the
/ar-io/info contract (and the related places called out around lines 325-327) in
the same PR so documentation and generated clients remain in sync.
docs/openapi.yaml-2812-2818 (1)
2812-2818: ⚠️ Potential issue | 🟡 Minor

Update the request schema to match the expanded accepted IDs.

The prose now documents IPFS CIDs and content hashes, but the id schema description below still says “TX ID for a transaction”. Please update that field description and consider adding a CID example so generated docs don’t contradict this section.
Suggested OpenAPI doc adjustment
-                  description: TX ID for a transaction you want to block.
+                  description: TX ID, data-item ID, IPFS CIDv1 base32 string, or SHA-256 content hash you want to block.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/openapi.yaml` around lines 2812 - 2818, Update the request schema for
the block endpoint so the "id" field description reflects that it accepts
transaction IDs, data-item IDs, IPFS CIDv1 base32 strings, or sha-256 content
hashes (not just "TX ID for a transaction"); update the "id" schema description
text in docs/openapi.yaml (the schema for the block request body / property
named "id") and add an example value for an IPFS CIDv1 base32 string (e.g.
bafkreigbk3hjz6oyiywqf7eknthwc2osvt5xi6b6igwljn2qrxkthqgrp4) so generated docs
show a CID example alongside existing TX/hash examples.
src/ipfs/kubo-data-source.test.ts-33-65 (1)
33-65: ⚠️ Potential issue | 🟡 Minor

Make these tests assert URL construction, or remove them.

Because the signal is aborted before getContent() runs, signal.throwIfAborted() exits before the Kubo URL is built. These two tests also catch any error without asserting the URL, so they provide false confidence and duplicate the explicit abort test below.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ipfs/kubo-data-source.test.ts` around lines 33 - 65, These two tests
currently abort the signal before calling kuboDataSource.getContent so
signal.throwIfAborted prevents URL construction; either remove the tests or
change them to assert URL construction by not aborting the signal and instead
stubbing/spy-ing the HTTP call (e.g., fetch/axios used inside
kuboDataSource.getContent) to throw/return, or call an exposed helper that
builds the Kubo URL (e.g., buildKuboUrl or the internal URL-construction logic)
and assert its return; ensure you reference kuboDataSource.getContent and avoid
pre-aborting the signal so the URL is actually created.
docs/INDEX.md-26-30 (1)
26-30: ⚠️ Potential issue | 🟡 Minor

Add IPFS, CID, and Kubo definitions to docs/glossary.md.

These terms are introduced in docs/ipfs-integration.md but are missing from the glossary. Following the documentation guidelines, add entries under a new "IPFS Integration" section covering:

IPFS - InterPlanetary File System

CID - Content Identifier (IPFS hash/address)

Kubo - The IPFS node implementation used as the gateway sidecar
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/INDEX.md` around lines 26 - 30, Add a new "IPFS Integration" section to
docs/glossary.md and create three glossary entries: "IPFS" — define as
InterPlanetary File System; "CID" — define as Content Identifier and note it is
the IPFS hash/address; and "Kubo" — define as the IPFS node implementation used
as the gateway sidecar; ensure each entry follows existing glossary formatting
(term header and short definition) and link or reference the ipfs-integration.md
page if your glossary style uses cross-links.
docs/ipfs-integration.md-504-511 (1)
504-511: ⚠️ Potential issue | 🟡 Minor

Correct the documented Prometheus labels.

ipfs_requests_total is defined with route_type,status, and ipfs_request_duration_seconds uses route_type,cache_status; the table currently documents method and source.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/ipfs-integration.md` around lines 504 - 511, The Prometheus label
documentation is incorrect: update the `ipfs_requests_total` row to list labels
`route_type`,`status` instead of `method`, and update the
`ipfs_request_duration_seconds` row to list labels `route_type`,`cache_status`
instead of `source`; ensure the table entries for `ipfs_cache_hit_total`,
`ipfs_cache_miss_total`, `ipfs_content_size_bytes`, and `ipfs_blocked_total`
remain unchanged and remove the incorrect `method`/`source` mentions so the
documented labels match the actual metric definitions.
src/routes/ipfs.ts-216-221 (1)
216-221: ⚠️ Potential issue | 🟡 Minor

Increment the dedicated IPFS cache/block metrics.

ipfsCacheHitTotal, ipfsCacheMissTotal, and ipfsBlockedTotal are exported but never updated here, so dashboards/alerts will stay flat even when IPFS traffic is active.
Proposed metric updates
     if (result.size > 0) {
       metrics.ipfsContentSizeHistogram.observe(result.size);
     }
+    if (result.cached) {
+      metrics.ipfsCacheHitTotal.inc();
+    } else {
+      metrics.ipfsCacheMissTotal.inc();
+    }
...
     if (error instanceof IpfsBlockedError) {
+      metrics.ipfsBlockedTotal.inc();
       metrics.ipfsRequestsTotal.inc({
Also applies to: 259-268
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/routes/ipfs.ts` around lines 216 - 221, The IPFS route currently
increments only metrics.ipfsRequestsTotal and content size but never updates the
dedicated counters; update the logic in the IPFS handler (the block using
metrics.ipfsRequestsTotal and cacheStatus) to increment
metrics.ipfsCacheHitTotal when result.cached is true, metrics.ipfsCacheMissTotal
when false, and metrics.ipfsBlockedTotal when result.blocked (or result.blocked
=== true) is set; ensure you make the same updates in the second analogous block
around the code that mirrors lines 259-268 so both request paths consistently
increment ipfsCacheHitTotal/ipfsCacheMissTotal and ipfsBlockedTotal in addition
to ipfsRequestsTotal and ipfsContentSizeHistogram.
docs/ipfs-integration.md-58-58 (1)
58-58: ⚠️ Potential issue | 🟡 Minor

Add languages to fenced code blocks.

markdownlint flags these fences; use text for diagrams/span trees and an appropriate language where applicable.

Also applies to: 113-113, 531-531
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/ipfs-integration.md` at line 58, Update the three fenced code blocks
flagged by markdownlint that currently lack language identifiers: add
language="text" for diagram/span-tree blocks and add an appropriate language
(e.g., json, js, sh, or html) for any code examples so the fences are like
```text or ```json; locate the exact fences by searching for triple-backtick
blocks in the IPFS integration doc that are used for diagrams/span trees and
example snippets and add the matching language tag to each opening fence.
docs/ipfs-integration.md-63-64 (1)
63-64: ⚠️ Potential issue | 🟡 Minor

Align subdomain docs with the implemented host pattern.

These sections describe {CID}.ipfs.gateway.io / a distinguishing .ipfs. label, but the implementation and PR objective use {CID}.{root_host}. This can lead operators to configure DNS/TLS incorrectly.

Also applies to: 141-144, 263-266, 362-365
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/ipfs-integration.md` around lines 63 - 64, Update the docs to match the
implemented host pattern by replacing instances of the illustrative host
"{CID}.ipfs.gateway.io" and references to a distinguishing ".ipfs." label with
the actual pattern "{CID}.{root_host}" (and adjust explanatory text
accordingly); ensure examples, DNS and TLS guidance, and any sample configs that
mention the ".ipfs." sublabel are changed to use "{root_host}"-based hostnames
and clarify how to configure DNS/TLS for wildcard or per-CID certificates; apply
the same change to the other occurrences called out in the comment so all
examples and instructions consistently use "{CID}.{root_host}".
docs/ipfs-integration.md-113-120 (1)
113-120: ⚠️ Potential issue | 🟡 Minor

Remove stale file-based blocklist/config references and fix defaults.

The IPFS moderation path now uses the existing PUT /ar-io/admin/block-data flow, and src/config.ts does not define IPFS_BLOCKLIST_PATH. Also update the rate-limit defaults here to match src/config.ts (100000/20 per-IP and 1000000/100 per-resource).

Also applies to: 198-208, 431-445
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/ipfs-integration.md` around lines 113 - 120, Remove references to a
file-based IPFS blocklist and any mention of IPFS_BLOCKLIST_PATH in the docs and
replace them with a note that moderation uses the PUT /ar-io/admin/block-data
flow; update the diagram text near ipfs-service/ipfs-blocklist and any prose to
reflect that blocklist is managed via that admin API (not a local file). Also
change the rate-limit defaults in the documentation to match src/config.ts:
per-IP 100000 requests per 20 seconds and per-resource 1000000 requests per 100
seconds. Apply these same edits where the stale blocklist/config references and
rate-limit numbers appear elsewhere in the document (the other two affected
sections noted in the comment).
src/routes/ipfs.ts-118-121 (1)
118-121: ⚠️ Potential issue | 🟡 Minor

Use the configured public protocol for CIDv0 redirects.

As in the subdomain middleware, req.protocol may produce http:// redirects behind TLS termination. Use config.SANDBOX_PROTOCOL ?? req.protocol.
Proposed redirect fix
+      const protocol = config.SANDBOX_PROTOCOL ?? req.protocol;
       res.redirect(
         302,
-        `${req.protocol}://${v1Base32}.${rootHost}${pathSuffix}`,
+        `${protocol}://${v1Base32}.${rootHost}${pathSuffix}`,
       );
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/routes/ipfs.ts` around lines 118 - 121, The redirect uses req.protocol
which can be wrong behind TLS termination; update the redirect construction in
the CIDv0 handling to use the configured public protocol by replacing
req.protocol with (config.SANDBOX_PROTOCOL ?? req.protocol) when building the
URL for v1Base32, rootHost and pathSuffix so the redirect uses the
sandbox/public protocol consistently.
src/middleware/ipfs.ts-85-88 (1)
85-88: ⚠️ Potential issue | 🟡 Minor

Use the configured public protocol for CID redirects.

req.protocol can be http behind TLS termination unless proxy trust is perfectly configured, causing downgrade redirects. Prefer config.SANDBOX_PROTOCOL ?? req.protocol when building the external Location.
Proposed redirect fix
+          const protocol = config.SANDBOX_PROTOCOL ?? req.protocol;
           res.redirect(
             302,
-            `${req.protocol}://${targetCid}.${rootHost}${pathSuffix}`,
+            `${protocol}://${targetCid}.${rootHost}${pathSuffix}`,
           );
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/middleware/ipfs.ts` around lines 85 - 88, The redirect currently builds
the external Location using req.protocol which may be incorrect behind TLS
termination; update the redirect in the middleware where res.redirect is called
(the line constructing
`${req.protocol}://${targetCid}.${rootHost}${pathSuffix}`) to use the configured
protocol instead: replace req.protocol with config.SANDBOX_PROTOCOL ??
req.protocol so the Location uses the intended public protocol while still
falling back to req.protocol; keep the rest of the URL components (targetCid,
rootHost, pathSuffix) and the 302 status unchanged.
src/ipfs/ipfs-cache.ts-49-52 (1)
49-52: ⚠️ Potential issue | 🟡 Minor

Normalize CIDs before deriving cache keys.

Using the raw input means the same content addressed as CIDv0 and CIDv1 creates separate cache entries, despite the docs promising encoding-independent keys.
Proposed key normalization
+import { cidToV1Base32 } from '../lib/ipfs-cid.js';
+
   private cacheKey(cidString: string, path?: string): string {
+    const normalizedCid = cidToV1Base32(cidString);
     const raw =
-      path !== undefined && path !== '' ? `${cidString}/${path}` : cidString;
+      path !== undefined && path !== ''
+        ? `${normalizedCid}/${path}`
+        : normalizedCid;
     return crypto.createHash('sha256').update(raw).digest('hex');
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ipfs/ipfs-cache.ts` around lines 49 - 52, The cacheKey function currently
hashes the raw cidString (plus path) which treats semantically identical CIDs in
different encodings as distinct; update cacheKey to normalize the CID string
first by parsing the input CID (use the project's CID library e.g., CID.parse or
new CID(...)), convert it to a canonical form such as
CID.toV1().toString('base32'), then append the path (when path !== undefined &&
path !== '') and finally hash that normalized string; ensure errors parsing
invalid CIDs are handled or propagated consistently from cacheKey.
src/ipfs/kubo-data-source.ts-131-136 (1)
131-136: ⚠️ Potential issue | 🟡 Minor

parseInt without Number.isFinite / NaN guard.

parseInt('0', 10) when the header is absent returns 0, but a malformed content-length header (e.g., "garbage") yields NaN, which then propagates as ipfs.content_length = NaN into the span and as size = NaN into IpfsService, where the comparison result.size > 0 is false (so size-limit skipped) and the value is surfaced to the caller. Coerce with a fallback:
-      const contentLength = parseInt(
-        response.headers['content-length'] ?? '0',
-        10,
-      );
+      const parsed = parseInt(response.headers['content-length'] ?? '', 10);
+      const contentLength = Number.isFinite(parsed) && parsed >= 0 ? parsed : 0;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ipfs/kubo-data-source.ts` around lines 131 - 136, The parsed
contentLength can be NaN for malformed headers; update the parsing logic around
response.headers to coerce to a safe finite number (e.g., parseInt or
Number.parseInt result) and fallback to 0 when the parsed value is NaN or not
finite so ipfs.content_length and downstream result.size never become NaN.
Locate the variables/content around contentLength and
response.headers['content-length'] in kubo-data-source.ts and replace the direct
parseInt usage with a guarded parse + Number.isFinite/Number.isNaN check and a
default of 0.
src/ipfs/kubo-data-source.ts-57-59 (1)
57-59: ⚠️ Potential issue | 🟡 Minor

Encode path segments when interpolating into the Kubo URL.

path is user-controlled (from the HTTP request) and may contain characters that change URL meaning — ?, #, spaces, %. Raw interpolation into the URL can produce a different upstream path than intended (query strings being cut off, fragments being dropped by the client). Encode each non-empty segment:
-    const ipfsPath =
-      path !== undefined && path !== '' ? `${cidString}/${path}` : cidString;
-    const url = `${this.kuboUrl}/ipfs/${ipfsPath}`;
+    const encodedPath =
+      path !== undefined && path !== ''
+        ? path.split('/').map(encodeURIComponent).join('/')
+        : undefined;
+    const ipfsPath = encodedPath ? `${cidString}/${encodedPath}` : cidString;
+    const url = `${this.kuboUrl}/ipfs/${ipfsPath}`;
(cidString itself is safe — it has already been normalized to base32 v1 by IpfsService.getContent.)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ipfs/kubo-data-source.ts` around lines 57 - 59, Raw interpolation of the
user-controlled path into the Kubo URL can break or change the upstream request;
update the code that builds ipfsPath (and thus the URL using this.kuboUrl) to
URL-encode each non-empty path segment instead of inserting path raw.
Specifically, when composing ipfsPath from cidString and path (used in ipfsPath
and `${this.kuboUrl}/ipfs/${ipfsPath}`), split path on '/', run
encodeURIComponent on each segment, rejoin with '/', and then join with
cidString (which is already safe per IpfsService.getContent) so query
chars/fragments/spaces are preserved as path segments.

🧹 Nitpick comments (10)

test-ipfs.sh (1)
42-43: Unused status=$? (shellcheck SC2034).

status is captured but never used; curl -s already swallows transport errors into an empty body, and pass/fail is driven by grep -q below. Drop it.
-  body=$(curl -s --max-time 30 "$url" 2>&1)
-  local status=$?
+  body=$(curl -s --max-time 30 "$url" 2>&1)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test-ipfs.sh` around lines 42 - 43, Remove the unused capture of curl's exit
status: the assignment to local status after running curl (the variable named
status) is never referenced and triggers shellcheck SC2034; drop the line "local
status=$?" and leave the body assignment via body=$(curl -s --max-time 30 "$url"
2>&1) (grep -q is used later to determine success).
monitoring/grafana/dashboards/examples/ipfs-example.json (1)
14-22: Cache hit-rate panel can divide by zero at idle.

When there is no IPFS traffic in the window (hit rate + miss rate both 0), the expression yields NaN / "No data". Consider guarding so the panel shows 0 (or hides gracefully) at idle, e.g.:
📊 Proposed fix
-      "targets": [{ "expr": "sum(rate(ipfs_cache_hit_total[5m])) / (sum(rate(ipfs_cache_hit_total[5m])) + sum(rate(ipfs_cache_miss_total[5m])))", "legendFormat": "hit rate" }]
+      "targets": [{ "expr": "sum(rate(ipfs_cache_hit_total[5m])) / clamp_min(sum(rate(ipfs_cache_hit_total[5m])) + sum(rate(ipfs_cache_miss_total[5m])), 1)", "legendFormat": "hit rate" }]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@monitoring/grafana/dashboards/examples/ipfs-example.json` around lines 14 -
22, The "IPFS Cache Hit Rate" stat panel expression can produce NaN when both
hit and miss rates are zero; update the Prometheus expression used in the
panel's targets (the existing expr string) to guard the denominator by replacing
the raw division with a safe variant that returns 0 at idle (for example use a
clamp_min on the denominator or append an "or vector(0)" fallback so the query
yields 0 instead of NaN when sum(rate(ipfs_cache_hit_total[5m])) +
sum(rate(ipfs_cache_miss_total[5m])) == 0). Ensure you edit the expr value in
the target for the "IPFS Cache Hit Rate" panel to the guarded expression.
src/routes/ipfs.ts (1)
35-70: Add TSDoc for the new exported route factories.

These are new exported integration points; please document mount order expectations and the middleware/path split.

As per coding guidelines, src/**/*.{ts,tsx}: “Always add or improve TSDoc comments on code you touch when changing behavior”.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/routes/ipfs.ts` around lines 35 - 70, Add TSDoc comments to the exported
route factory functions createIpfsRouter and createIpfsHandler explaining their
purpose, expected mount order, and how middleware vs path handling should be
wired: document that createIpfsRouter returns an Express Router that mounts
handlers for '/ipfs/:cid' and '/ipfs/:cid/*' and should be mounted before any
catch-all routes, and that createIpfsHandler returns a single Handler meant to
be used directly or attached to route paths; include notes about interaction
with rateLimiter and paymentProcessor parameters and any ordering requirements
for middleware (e.g., auth/rate limiting should run before the handler). Ensure
the TSDoc appears directly above each exported function signature with param and
returns descriptions.
src/ipfs/ipfs-cache.ts (1)
19-24: Add class-level TSDoc for the new cache abstraction.

This is a new exported service used by system wiring; document keying, persistence, and eviction semantics at the class boundary.

As per coding guidelines, src/**/*.{ts,tsx}: “Always add or improve TSDoc comments on code you touch when changing behavior”.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ipfs/ipfs-cache.ts` around lines 19 - 24, Add a class-level TSDoc block
above the exported IpfsFsCache class describing its purpose, keying, persistence
model, and eviction semantics: explain that IpfsFsCache stores CacheEntry
objects on-disk under baseDir with keys derived from the IPFS CID or provided
keying scheme, that it exposes an in-memory LRU index (index: LRUCache<string,
CacheEntry>) to track recency, and that eviction follows the LRU policy causing
entries to be removed from the index and optionally from disk; also document
constructor parameters and any persistence/initialization behavior so callers
wiring this service understand lifecycle and capacity limits.
src/ipfs/kubo-data-source.ts (3)
155-173: Span may be ended twice on late stream error.

On success you register stream.on('end' | 'error') at L155-159 to end the span. If the returned stream later errors after getContent resolved, the 'error' handler records the exception and ends the span — good. But if an error occurs during response processing after attaching these listeners and control falls into the catch at L166 (it won't in the current code, but the arrangement is brittle), span.end() would run twice. Using a local ended flag or try/finally around the span would make this robust to future edits.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ipfs/kubo-data-source.ts` around lines 155 - 173, The span may be ended
twice by the stream 'end'/'error' handlers and the catch block in getContent;
add a local boolean flag (e.g., ended = false) and a small helper (e.g.,
endSpan(error?)) that checks the flag before calling span.recordException(...)
and span.end(), then use that helper from the stream.on('end'),
stream.on('error'), and the catch/finally cleanup (also before clearTimeout and
removing onClientAbort) so span.recordException/ span.end are only invoked once;
reference span, stream.on('end'|'error'), the catch block around getContent,
connectionTimer, and onClientAbort when making the change.
203-236: Minor: consolidate repeated error-subclass boilerplate.

Five Error subclasses with identical shape. A small factory or a single base class would reduce the boilerplate and guarantee consistent name assignment / prototype chain handling under downlevel transpilation:
class IpfsError extends Error {
  constructor(message: string, name: string) {
    super(message);
    this.name = name;
    Object.setPrototypeOf(this, new.target.prototype);
  }
}
Not blocking.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ipfs/kubo-data-source.ts` around lines 203 - 236, Replace the five nearly
identical subclasses (IpfsNotFoundError, IpfsTimeoutError, IpfsUnavailableError,
IpfsBlockedError, IpfsSizeLimitError) with a single base class (e.g., IpfsError)
that accepts (message, name) and calls Object.setPrototypeOf(this,
new.target.prototype) after super(message), then either export thin subclasses
that extend IpfsError and pass the specific name or provide a small factory to
create them; update usages to construct the specific named error via new
IpfsNotFoundError(...) or via the factory so name/prototype are consistently
set.
100-103: Consider maxRedirects: 0 for local Kubo gateway.

For a trusted local service (kubo:8080), redirects are atypical. Setting maxRedirects: 0 simplifies the configuration and tightens the SSRF surface in case kuboUrl is ever misconfigured. The current validateStatus already handles non-2xx responses appropriately, so this change is safe.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ipfs/kubo-data-source.ts` around lines 100 - 103, The current HTTP client
config in KuboDataSource sets maxRedirects to 5 which is unnecessary for a
trusted local kuboUrl and increases SSRF surface; change the axios config (where
maxRedirects is set in the KuboDataSource constructor or createHttpClient
function) to maxRedirects: 0, keeping the existing validateStatus logic intact
so non-2xx responses are still handled by validateStatus.
src/ipfs/ipfs-service.ts (3)
93-96: Path traversal check is both too strict and too loose.

path.includes('..') rejects legitimate filenames (e.g. report..v2.pdf, ..hidden inside a directory listing) while still allowing variants like percent-encoded %2E%2E to reach Kubo unchanged. Prefer a segment-based check after splitting on /:
-      if (path !== undefined && (path.includes('..') || path.startsWith('/'))) {
-        throw new IpfsNotFoundError('Invalid IPFS path');
+      if (path !== undefined) {
+        const segments = path.split('/');
+        if (
+          path.startsWith('/') ||
+          segments.some((s) => s === '..' || s === '.' || s === '')
+        ) {
+          throw new IpfsNotFoundError('Invalid IPFS path');
+        }
       }
Also consider normalizing percent-encoding before the check.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ipfs/ipfs-service.ts` around lines 93 - 96, Replace the current naive
path check with a normalized, segment-based check: first reject absolute paths
(path.startsWith('/')) as before, then run a percent-decoding (e.g.
decodeURIComponent) on path and split by '/' into segments and reject if any
segment is exactly '..' or '.' (but allow names like '..hidden' or
'report..v2.pdf'); when rejecting throw the same IpfsNotFoundError. Update the
code that references the variable path and IpfsNotFoundError so it decodes and
inspects segments rather than using path.includes('..').
144-158: Tee via 'data' listener is fragile; prefer PassThrough + pipeline.

Attaching the 'data' listener inside streamToCache synchronously flips result.stream into flowing mode before the caller attaches pipe(res). It is safe only because consumers pipe in the same tick; an await on the caller side between getContent returning and piping would drop chunks silently. Using stream.pipeline(source, new PassThrough()) and handing the PassThrough to the caller (while also piping the source through a second PassThrough into the cache file) preserves paused mode until the caller is ready and gets you proper error propagation and backpressure for free.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ipfs/ipfs-service.ts` around lines 144 - 158, The current implementation
in streamToCache attaches a 'data' listener to result.stream which forces it
into flowing mode and can drop chunks; instead, create two PassThrough streams
and use stream.pipeline to pipe the original result.stream into both
PassThroughs (one returned to the caller and one piped into the cache write
stream) so the source stays in paused mode until the consumer pipes; ensure you
replace the 'data' listener logic in streamToCache with pipeline(result.stream,
teePassThrough1) and pipeline(result.stream, teePassThrough2 ->
cacheWriteStream) (or use a single pipeline with a tee operator) and propagate
errors to the span by listening to pipeline callbacks so span.recordException/
span.end() run on failures/completion for result.stream.
153-158: Duplicate span termination between service and data source.

KuboDataSource.getContent already registers stream.on('end' | 'error') to end its own child span (kubo-data-source.ts L155-159). IpfsService.getContent here does the same on the IpfsService.getContent span — fine — but note that if streamToCache later destroys the stream on a cache-write error, the 'error' path here will record that exception on the service span even though it is a cache-side failure unrelated to the user-visible response. Consider gating the recordException on whether the error surfaced to the caller.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ipfs/ipfs-service.ts` around lines 153 - 158, The service span in
IpfsService.getContent is recording exceptions for stream errors that may
originate from cache writes; to avoid logging cache-side failures as service
errors, add a small error gating flag in IpfsService.getContent (e.g., let
cacheWriteFailed = false) and set it to true if streamToCache(...) rejects or
reports a cache-specific error; in the existing result.stream.on('error', ...)
handler only call span.recordException(err) when cacheWriteFailed is false
(i.e., the error actually surfaced to the caller). Adjust the callsite that
invokes streamToCache to catch/reject and flip cacheWriteFailed so
KuboDataSource.getContent and IpfsService.getContent no longer duplicate-record
cache-originated exceptions.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/config.ts`:
- Around line 2159-2210: Several IPFS numeric constants
(IPFS_KUBO_REQUEST_TIMEOUT_MS, IPFS_STREAM_STALL_TIMEOUT_MS,
IPFS_CACHE_MAX_SIZE_BYTES, IPFS_CACHE_CLEANUP_THRESHOLD_SECONDS,
IPFS_RATE_LIMITER_IP_TOKENS_PER_BUCKET, IPFS_RATE_LIMITER_IP_REFILL_PER_SEC,
IPFS_RATE_LIMITER_RESOURCE_TOKENS_PER_BUCKET,
IPFS_RATE_LIMITER_RESOURCE_REFILL_PER_SEC, IPFS_MAX_RESPONSE_SIZE_BYTES)
currently use the unary + on env.varOrDefault which can produce NaN on malformed
input; replace those uses with the typed helper positiveIntOrDefault (the same
pattern used elsewhere in src/config.ts) to validate and throw on invalid values
at startup so timeouts, rate limits and sizes are enforced reliably. Ensure you
pass the same environment variable names and default string values into
positiveIntOrDefault for each constant and keep non-numeric keys (e.g.,
IPFS_CACHE_PATH) unchanged.

In `@src/ipfs/ipfs-cache.ts`:
- Around line 149-179: The put() and putFromFile() flows must clean up temp
files on pipeline failure and ensure cache finalization is atomic: on failure of
pipeline() delete the tempPath (use tempDir()/createTempPath() to locate it);
before calling index.set(key, ...) ensure any existing index entry is removed
(index.delete(key)) and its data/meta files unlinked to avoid eviction disposing
wrong files; perform metadata write to a temporary meta file (metaPath(key) +
'.tmp') and only after both meta temp write and index.set succeed atomically
rename temp data into dataPath(key) and rename the temp meta to metaPath(key);
if any step after rename fails, unlink the new dataPath and metaPath to avoid
orphans. Use createTempPath(), dataDir(), dataPath(), metaPath(), pipeline(),
put(), and putFromFile() to locate and implement these fixes.

In `@src/ipfs/ipfs-service.ts`:
- Around line 194-266: The issue is that stream.on('end') takes the !writeStream
branch and calls cleanup() without setting failed, allowing the pending
mkdir().then() to later create a dangling writeStream; fix by setting failed =
true before calling cleanup() in the stream.on('end') handler when writeStream
is not present (and ensure you return immediately after cleanup). Reference
symbols: stream.on('end'), failed, writeStream, cleanup(),
fs.promises.mkdir(...).then().
- Around line 230-239: The current data handler in the stream.on('data')
callback ignores write-side backpressure by not checking the boolean return of
writeStream.write and therefore allows unbounded buffering; update the logic in
the stream.on('data') handler (and related pendingChunks logic) to respect
backpressure: check the return value of writeStream.write(chunk) and if it
returns false, call stream.pause(), attach a writeStream.once('drain') to resume
the source, and flush any pendingChunks after resume; alternatively replace the
manual tee with a pipeline() or a PassThrough-driven pipeline to let Node handle
backpressure; if you prefer a bounded-memory fallback, enforce a high-water mark
on pendingChunks and set failed = true (or switch to no-cache) when that limit
is exceeded so you avoid unbounded memory growth.
- Around line 132-142: The size-limit check in ipfs-service.ts currently skips
enforcement when result.size is 0 (no Content-Length), letting large responses
bypass the cap; update the fetch/streaming path to enforce maxResponseSizeBytes
during streaming by wiring the existing bytesWritten counter in streamToCache
(or equivalent stream consumer) to abort the stream and throw IpfsSizeLimitError
as soon as bytesWritten exceeds this.maxResponseSizeBytes, and remove/adjust the
guard that only applies when result.size > 0 so the mid-stream enforcement
always runs; reference result.size, this.maxResponseSizeBytes,
IpfsSizeLimitError, and streamToCache/bytesWritten (and KuboDataSource behavior)
when making the change.

In `@src/routes/ipfs.ts`:
- Around line 172-187: The code currently treats unknown result.size as 1024
bytes when calling checkPaymentAndRateLimits, which underestimates large chunked
responses and can bypass byte-based limits; change the initial placeholder to a
conservative upper bound (e.g., a configured MAX_UNKNOWN_SIZE) when computing
contentSize for checkPaymentAndRateLimits (the call that passes id:
cidToV1Base32(cidString) and contentType: result.contentType), and implement
tracking of actual streamed bytes during the IPFS response stream so you can
call the final adjustment with the real byte count instead of the 1024 fallback;
update usages around contentSize, result.size, and the final adjustment block
(also where extractAllClientIPs(req) and rateLimiter/paymentProcessor are
passed) to use the conservative initial bound and then replace it with the
tracked streamedBytes for final billing/rate decisions.

---

Minor comments:
In `@docs/envs.md`:
- Line 376: The docs list the wrong environment variable name — update all
documentation entries that reference IPFS_CACHE_CLEANUP_THRESHOLD to the actual
exported name IPFS_CACHE_CLEANUP_THRESHOLD_SECONDS, keeping the same default
value (3600) and description ("Age in seconds before cached files become
eviction candidates"), and search other files under docs/ for any remaining
occurrences of IPFS_CACHE_CLEANUP_THRESHOLD to replace them so docs match the
implementation.

In `@docs/INDEX.md`:
- Around line 26-30: Add a new "IPFS Integration" section to docs/glossary.md
and create three glossary entries: "IPFS" — define as InterPlanetary File
System; "CID" — define as Content Identifier and note it is the IPFS
hash/address; and "Kubo" — define as the IPFS node implementation used as the
gateway sidecar; ensure each entry follows existing glossary formatting (term
header and short definition) and link or reference the ipfs-integration.md page
if your glossary style uses cross-links.

In `@docs/ipfs-integration.md`:
- Around line 504-511: The Prometheus label documentation is incorrect: update
the `ipfs_requests_total` row to list labels `route_type`,`status` instead of
`method`, and update the `ipfs_request_duration_seconds` row to list labels
`route_type`,`cache_status` instead of `source`; ensure the table entries for
`ipfs_cache_hit_total`, `ipfs_cache_miss_total`, `ipfs_content_size_bytes`, and
`ipfs_blocked_total` remain unchanged and remove the incorrect `method`/`source`
mentions so the documented labels match the actual metric definitions.
- Line 58: Update the three fenced code blocks flagged by markdownlint that
currently lack language identifiers: add language="text" for diagram/span-tree
blocks and add an appropriate language (e.g., json, js, sh, or html) for any
code examples so the fences are like ```text or ```json; locate the exact fences
by searching for triple-backtick blocks in the IPFS integration doc that are
used for diagrams/span trees and example snippets and add the matching language
tag to each opening fence.
- Around line 63-64: Update the docs to match the implemented host pattern by
replacing instances of the illustrative host "{CID}.ipfs.gateway.io" and
references to a distinguishing ".ipfs." label with the actual pattern
"{CID}.{root_host}" (and adjust explanatory text accordingly); ensure examples,
DNS and TLS guidance, and any sample configs that mention the ".ipfs." sublabel
are changed to use "{root_host}"-based hostnames and clarify how to configure
DNS/TLS for wildcard or per-CID certificates; apply the same change to the other
occurrences called out in the comment so all examples and instructions
consistently use "{CID}.{root_host}".
- Around line 113-120: Remove references to a file-based IPFS blocklist and any
mention of IPFS_BLOCKLIST_PATH in the docs and replace them with a note that
moderation uses the PUT /ar-io/admin/block-data flow; update the diagram text
near ipfs-service/ipfs-blocklist and any prose to reflect that blocklist is
managed via that admin API (not a local file). Also change the rate-limit
defaults in the documentation to match src/config.ts: per-IP 100000 requests per
20 seconds and per-resource 1000000 requests per 100 seconds. Apply these same
edits where the stale blocklist/config references and rate-limit numbers appear
elsewhere in the document (the other two affected sections noted in the
comment).

In `@docs/openapi.yaml`:
- Around line 2812-2818: Update the request schema for the block endpoint so the
"id" field description reflects that it accepts transaction IDs, data-item IDs,
IPFS CIDv1 base32 strings, or sha-256 content hashes (not just "TX ID for a
transaction"); update the "id" schema description text in docs/openapi.yaml (the
schema for the block request body / property named "id") and add an example
value for an IPFS CIDv1 base32 string (e.g.
bafkreigbk3hjz6oyiywqf7eknthwc2osvt5xi6b6igwljn2qrxkthqgrp4) so generated docs
show a CID example alongside existing TX/hash examples.

In `@src/ipfs/ipfs-cache.ts`:
- Around line 49-52: The cacheKey function currently hashes the raw cidString
(plus path) which treats semantically identical CIDs in different encodings as
distinct; update cacheKey to normalize the CID string first by parsing the input
CID (use the project's CID library e.g., CID.parse or new CID(...)), convert it
to a canonical form such as CID.toV1().toString('base32'), then append the path
(when path !== undefined && path !== '') and finally hash that normalized
string; ensure errors parsing invalid CIDs are handled or propagated
consistently from cacheKey.

In `@src/ipfs/kubo-data-source.test.ts`:
- Around line 33-65: These two tests currently abort the signal before calling
kuboDataSource.getContent so signal.throwIfAborted prevents URL construction;
either remove the tests or change them to assert URL construction by not
aborting the signal and instead stubbing/spy-ing the HTTP call (e.g.,
fetch/axios used inside kuboDataSource.getContent) to throw/return, or call an
exposed helper that builds the Kubo URL (e.g., buildKuboUrl or the internal
URL-construction logic) and assert its return; ensure you reference
kuboDataSource.getContent and avoid pre-aborting the signal so the URL is
actually created.

In `@src/ipfs/kubo-data-source.ts`:
- Around line 131-136: The parsed contentLength can be NaN for malformed
headers; update the parsing logic around response.headers to coerce to a safe
finite number (e.g., parseInt or Number.parseInt result) and fallback to 0 when
the parsed value is NaN or not finite so ipfs.content_length and downstream
result.size never become NaN. Locate the variables/content around contentLength
and response.headers['content-length'] in kubo-data-source.ts and replace the
direct parseInt usage with a guarded parse + Number.isFinite/Number.isNaN check
and a default of 0.
- Around line 57-59: Raw interpolation of the user-controlled path into the Kubo
URL can break or change the upstream request; update the code that builds
ipfsPath (and thus the URL using this.kuboUrl) to URL-encode each non-empty path
segment instead of inserting path raw. Specifically, when composing ipfsPath
from cidString and path (used in ipfsPath and
`${this.kuboUrl}/ipfs/${ipfsPath}`), split path on '/', run encodeURIComponent
on each segment, rejoin with '/', and then join with cidString (which is already
safe per IpfsService.getContent) so query chars/fragments/spaces are preserved
as path segments.

In `@src/middleware/ipfs.ts`:
- Around line 85-88: The redirect currently builds the external Location using
req.protocol which may be incorrect behind TLS termination; update the redirect
in the middleware where res.redirect is called (the line constructing
`${req.protocol}://${targetCid}.${rootHost}${pathSuffix}`) to use the configured
protocol instead: replace req.protocol with config.SANDBOX_PROTOCOL ??
req.protocol so the Location uses the intended public protocol while still
falling back to req.protocol; keep the rest of the URL components (targetCid,
rootHost, pathSuffix) and the 302 status unchanged.

In `@src/routes/ar-io-info-builder.ts`:
- Line 141: The OpenAPI schema for the /ar-io/info response needs to include the
new optional ipfs property so generated clients see ipfs: { enabled: true };
update the response model in the OpenAPI spec (the schema that maps to the
TypeScript builder which now includes ipfs?: IpfsInfo in
src/routes/ar-io-info-builder.ts) to add an ipfs property (object with
enabled:boolean) and ensure any components/schemas referenced are added or
updated accordingly; also update the docs/ pages that describe the /ar-io/info
contract (and the related places called out around lines 325-327) in the same PR
so documentation and generated clients remain in sync.

In `@src/routes/ipfs.ts`:
- Around line 216-221: The IPFS route currently increments only
metrics.ipfsRequestsTotal and content size but never updates the dedicated
counters; update the logic in the IPFS handler (the block using
metrics.ipfsRequestsTotal and cacheStatus) to increment
metrics.ipfsCacheHitTotal when result.cached is true, metrics.ipfsCacheMissTotal
when false, and metrics.ipfsBlockedTotal when result.blocked (or result.blocked
=== true) is set; ensure you make the same updates in the second analogous block
around the code that mirrors lines 259-268 so both request paths consistently
increment ipfsCacheHitTotal/ipfsCacheMissTotal and ipfsBlockedTotal in addition
to ipfsRequestsTotal and ipfsContentSizeHistogram.
- Around line 118-121: The redirect uses req.protocol which can be wrong behind
TLS termination; update the redirect construction in the CIDv0 handling to use
the configured public protocol by replacing req.protocol with
(config.SANDBOX_PROTOCOL ?? req.protocol) when building the URL for v1Base32,
rootHost and pathSuffix so the redirect uses the sandbox/public protocol
consistently.

In `@test-ipfs.sh`:
- Around line 55-71: The test_header function incorrectly treats an empty
expected_value as a match because echo "$actual" | grep -qi "" always succeeds;
update test_header to first check whether expected_value is empty and in that
case assert that actual is non-empty (i.e., the header exists), otherwise
perform the current case-insensitive substring match; refer to the test_header
function and use the existing local variables actual and expected_value to
implement the conditional branch so presence-only checks fail when the header is
absent.
- Around line 93-101: Replace the hardcoded container name used in the two
docker exec calls (currently "ar-io-node-kubo-1") by resolving the running
container ID via its Compose service label before assigning FILE_CID and
DIR_CID: run a docker ps lookup filtering on the Compose service label (e.g.
"label=com.docker.compose.service=kubo") and fall back to the Compose
project+service label if needed, store that container ID in a variable (use that
variable in the two docker exec invocations that populate FILE_CID and DIR_CID)
so the script works regardless of COMPOSE_PROJECT_NAME or working directory.

---

Nitpick comments:
In `@monitoring/grafana/dashboards/examples/ipfs-example.json`:
- Around line 14-22: The "IPFS Cache Hit Rate" stat panel expression can produce
NaN when both hit and miss rates are zero; update the Prometheus expression used
in the panel's targets (the existing expr string) to guard the denominator by
replacing the raw division with a safe variant that returns 0 at idle (for
example use a clamp_min on the denominator or append an "or vector(0)" fallback
so the query yields 0 instead of NaN when sum(rate(ipfs_cache_hit_total[5m])) +
sum(rate(ipfs_cache_miss_total[5m])) == 0). Ensure you edit the expr value in
the target for the "IPFS Cache Hit Rate" panel to the guarded expression.

In `@src/ipfs/ipfs-cache.ts`:
- Around line 19-24: Add a class-level TSDoc block above the exported
IpfsFsCache class describing its purpose, keying, persistence model, and
eviction semantics: explain that IpfsFsCache stores CacheEntry objects on-disk
under baseDir with keys derived from the IPFS CID or provided keying scheme,
that it exposes an in-memory LRU index (index: LRUCache<string, CacheEntry>) to
track recency, and that eviction follows the LRU policy causing entries to be
removed from the index and optionally from disk; also document constructor
parameters and any persistence/initialization behavior so callers wiring this
service understand lifecycle and capacity limits.

In `@src/ipfs/ipfs-service.ts`:
- Around line 93-96: Replace the current naive path check with a normalized,
segment-based check: first reject absolute paths (path.startsWith('/')) as
before, then run a percent-decoding (e.g. decodeURIComponent) on path and split
by '/' into segments and reject if any segment is exactly '..' or '.' (but allow
names like '..hidden' or 'report..v2.pdf'); when rejecting throw the same
IpfsNotFoundError. Update the code that references the variable path and
IpfsNotFoundError so it decodes and inspects segments rather than using
path.includes('..').
- Around line 144-158: The current implementation in streamToCache attaches a
'data' listener to result.stream which forces it into flowing mode and can drop
chunks; instead, create two PassThrough streams and use stream.pipeline to pipe
the original result.stream into both PassThroughs (one returned to the caller
and one piped into the cache write stream) so the source stays in paused mode
until the consumer pipes; ensure you replace the 'data' listener logic in
streamToCache with pipeline(result.stream, teePassThrough1) and
pipeline(result.stream, teePassThrough2 -> cacheWriteStream) (or use a single
pipeline with a tee operator) and propagate errors to the span by listening to
pipeline callbacks so span.recordException/ span.end() run on
failures/completion for result.stream.
- Around line 153-158: The service span in IpfsService.getContent is recording
exceptions for stream errors that may originate from cache writes; to avoid
logging cache-side failures as service errors, add a small error gating flag in
IpfsService.getContent (e.g., let cacheWriteFailed = false) and set it to true
if streamToCache(...) rejects or reports a cache-specific error; in the existing
result.stream.on('error', ...) handler only call span.recordException(err) when
cacheWriteFailed is false (i.e., the error actually surfaced to the caller).
Adjust the callsite that invokes streamToCache to catch/reject and flip
cacheWriteFailed so KuboDataSource.getContent and IpfsService.getContent no
longer duplicate-record cache-originated exceptions.

In `@src/ipfs/kubo-data-source.ts`:
- Around line 155-173: The span may be ended twice by the stream 'end'/'error'
handlers and the catch block in getContent; add a local boolean flag (e.g.,
ended = false) and a small helper (e.g., endSpan(error?)) that checks the flag
before calling span.recordException(...) and span.end(), then use that helper
from the stream.on('end'), stream.on('error'), and the catch/finally cleanup
(also before clearTimeout and removing onClientAbort) so span.recordException/
span.end are only invoked once; reference span, stream.on('end'|'error'), the
catch block around getContent, connectionTimer, and onClientAbort when making
the change.
- Around line 203-236: Replace the five nearly identical subclasses
(IpfsNotFoundError, IpfsTimeoutError, IpfsUnavailableError, IpfsBlockedError,
IpfsSizeLimitError) with a single base class (e.g., IpfsError) that accepts
(message, name) and calls Object.setPrototypeOf(this, new.target.prototype)
after super(message), then either export thin subclasses that extend IpfsError
and pass the specific name or provide a small factory to create them; update
usages to construct the specific named error via new IpfsNotFoundError(...) or
via the factory so name/prototype are consistently set.
- Around line 100-103: The current HTTP client config in KuboDataSource sets
maxRedirects to 5 which is unnecessary for a trusted local kuboUrl and increases
SSRF surface; change the axios config (where maxRedirects is set in the
KuboDataSource constructor or createHttpClient function) to maxRedirects: 0,
keeping the existing validateStatus logic intact so non-2xx responses are still
handled by validateStatus.

In `@src/routes/ipfs.ts`:
- Around line 35-70: Add TSDoc comments to the exported route factory functions
createIpfsRouter and createIpfsHandler explaining their purpose, expected mount
order, and how middleware vs path handling should be wired: document that
createIpfsRouter returns an Express Router that mounts handlers for '/ipfs/:cid'
and '/ipfs/:cid/*' and should be mounted before any catch-all routes, and that
createIpfsHandler returns a single Handler meant to be used directly or attached
to route paths; include notes about interaction with rateLimiter and
paymentProcessor parameters and any ordering requirements for middleware (e.g.,
auth/rate limiting should run before the handler). Ensure the TSDoc appears
directly above each exported function signature with param and returns
descriptions.

In `@test-ipfs.sh`:
- Around line 42-43: Remove the unused capture of curl's exit status: the
assignment to local status after running curl (the variable named status) is
never referenced and triggers shellcheck SC2034; drop the line "local status=$?"
and leave the body assignment via body=$(curl -s --max-time 30 "$url" 2>&1)
(grep -q is used later to determine success).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 27655ba0-996a-4afc-8f9d-1c24d8e0007f

📥 Commits

Reviewing files that changed from the base of the PR and between 599d822 and 2f48d0e.

📒 Files selected for processing (26)

.dockerignore
CLAUDE.md
docker-compose.yaml
docs/INDEX.md
docs/envs.md
docs/ipfs-integration.md
docs/openapi.yaml
monitoring/grafana/dashboards/examples/ipfs-example.json
package.json
src/app.ts
src/config.ts
src/ipfs/ipfs-cache.ts
src/ipfs/ipfs-cid.test.ts
src/ipfs/ipfs-rate-limiter.ts
src/ipfs/ipfs-service.ts
src/ipfs/kubo-data-source.test.ts
src/ipfs/kubo-data-source.ts
src/lib/httpsig.ts
src/lib/ipfs-cid.ts
src/metrics.ts
src/middleware/ipfs.ts
src/routes/ar-io-info-builder.ts
src/routes/ar-io.ts
src/routes/ipfs.ts
src/system.ts
test-ipfs.sh

coderabbitai · 2026-04-21T02:51:48Z

+    try {
+      await fs.promises.mkdir(this.tempDir(), { recursive: true });
+      const tempPath = this.createTempPath();
+      const writeStream = fs.createWriteStream(tempPath);
+
+      await pipeline(stream, writeStream);
+
+      // Move to final location
+      const dataDir = this.dataDir(key);
+      await fs.promises.mkdir(dataDir, { recursive: true });
+      await fs.promises.rename(tempPath, this.dataPath(key));
+
+      // Write metadata
+      const meta: CacheEntry = { size, contentType };
+      await fs.promises.writeFile(
+        this.metaPath(key),
+        JSON.stringify(meta),
+        'utf-8',
+      );
+
+      // Update index
+      this.index.set(key, meta);
+
+      this.log.debug('Cached IPFS content', { cidString, path, key, size });
+    } catch (error: any) {
+      this.log.error('Failed to cache IPFS content', {
+        cidString,
+        path,
+        message: error.message,
+      });
+    }


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Inspect other LRU-backed filesystem caches for overwrite/dispose patterns to mirror. rg -n -C4 "new LRUCache|dispose:|putFromFile|rename\\(" --type=ts

Repository: ar-io/ar-io-node

Length of output: 22088

🏁 Script executed:

cat -n src/ipfs/ipfs-cache.ts | head -230

Repository: ar-io/ar-io-node

Length of output: 7773

Add cleanup for temp files and ensure atomic cache finalization.

Two related failures exist:

In put() (lines 149–179): If pipeline() fails at line 154, the temp file is never cleaned up.

In both put() and putFromFile(): After the rename() succeeds, if metadata write or index update fails, the data file is left orphaned on disk. Additionally, if a key already exists in the index, calling index.set() on it without first deleting the old entry can cause the dispose callback to unlink the wrong files if eviction occurs.

Prevent temp files from accumulating, ensure metadata and index updates complete before marking a cache entry live, and clean stale disk files when overwriting. The suggested changes track temp and final paths to enable selective cleanup on failure.

Proposed safer finalization flow

+ let tempPath: string | undefined; + let finalPath: string | undefined; try { await fs.promises.mkdir(this.tempDir(), { recursive: true }); - const tempPath = this.createTempPath(); + tempPath = this.createTempPath(); const writeStream = fs.createWriteStream(tempPath); await pipeline(stream, writeStream); - // Move to final location const dataDir = this.dataDir(key); + finalPath = this.dataPath(key); await fs.promises.mkdir(dataDir, { recursive: true }); - await fs.promises.rename(tempPath, this.dataPath(key)); + this.index.delete(key); + await fs.promises.unlink(finalPath).catch(() => {}); + await fs.promises.unlink(this.metaPath(key)).catch(() => {}); + await fs.promises.rename(tempPath, finalPath); + tempPath = undefined; ... } catch (error: any) { + if (tempPath !== undefined) { + fs.promises.unlink(tempPath).catch(() => {}); + } + if (finalPath !== undefined) { + fs.promises.unlink(finalPath).catch(() => {}); + fs.promises.unlink(this.metaPath(key)).catch(() => {}); + } this.log.error('Failed to cache IPFS content', {

Also applies to: putFromFile() (lines 195–223)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/ipfs/ipfs-cache.ts` around lines 149 - 179, The put() and putFromFile() flows must clean up temp files on pipeline failure and ensure cache finalization is atomic: on failure of pipeline() delete the tempPath (use tempDir()/createTempPath() to locate it); before calling index.set(key, ...) ensure any existing index entry is removed (index.delete(key)) and its data/meta files unlinked to avoid eviction disposing wrong files; perform metadata write to a temporary meta file (metaPath(key) + '.tmp') and only after both meta temp write and index.set succeed atomically rename temp data into dataPath(key) and rename the temp meta to metaPath(key); if any step after rename fails, unlink the new dataPath and metaPath to avoid orphans. Use createTempPath(), dataDir(), dataPath(), metaPath(), pipeline(), put(), and putFromFile() to locate and implement these fixes.

coderabbitai · 2026-04-21T02:51:48Z

+    stream.on('data', (chunk: Buffer) => {
+      if (failed) return;
+      bytesWritten += chunk.length;
+      if (writeStream) {
+        writeStream.write(chunk);
+      } else {
+        // Buffer until writeStream is ready (typically only first 1-2 chunks)
+        pendingChunks.push(chunk);
+      }
+    });


⚠️ Potential issue | 🟠 Major

Missing write-side backpressure contradicts the "no memory buffering" guarantee.

writeStream.write(chunk) ignores its boolean return; if the disk is slower than the incoming network/Kubo stream (or than the HTTP client consuming the tee), chunks accumulate in the WriteStream internal buffer unboundedly for large files. The PR description explicitly claims streaming-to-disk without memory buffering, so this should pause the source on backpressure and resume on 'drain' — or use pipeline()/a PassThrough tee driven by pipeline.

Note: pausing the source for disk backpressure will also pause delivery to the HTTP client. If that trade-off is undesirable, a bounded high-water mark plus degrade-to-no-cache (set failed = true when the buffer grows past a limit) is a reasonable middle ground.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/ipfs/ipfs-service.ts` around lines 230 - 239, The current data handler in the stream.on('data') callback ignores write-side backpressure by not checking the boolean return of writeStream.write and therefore allows unbounded buffering; update the logic in the stream.on('data') handler (and related pendingChunks logic) to respect backpressure: check the return value of writeStream.write(chunk) and if it returns false, call stream.pause(), attach a writeStream.once('drain') to resume the source, and flush any pendingChunks after resume; alternatively replace the manual tee with a pipeline() or a PassThrough-driven pipeline to let Node handle backpressure; if you prefer a bounded-memory fallback, enforce a high-water mark on pendingChunks and set failed = true (or switch to no-cache) when that limit is exceeded so you avoid unbounded memory growth.

coderabbitai · 2026-04-21T02:51:48Z

+    // Check payment and rate limits (x402 + rate limiting in one call).
+    // Content size is needed for token calculation and payment pricing.
+    const contentSize = result.size > 0 ? result.size : 1024; // min 1KB for pricing
+    const limitCheck = await checkPaymentAndRateLimits({
+      req,
+      res,
+      id: cidToV1Base32(cidString),
+      contentSize,
+      contentType: result.contentType,
+      requestAttributes: {
+        hops: 0,
+        clientIps: extractAllClientIPs(req).clientIps,
+      },
+      rateLimiter,
+      paymentProcessor,
+    });


⚠️ Potential issue | 🟠 Major

Do not charge/rate-limit unknown-size responses as only 1KB.

When result.size is unknown/zero, the initial check and final adjustment both use 1024, so large chunked IPFS responses can bypass byte-based payment/rate limiting. Use a conservative bound for the initial check and track actual streamed bytes for final adjustment.

Also applies to: 234-239

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/routes/ipfs.ts` around lines 172 - 187, The code currently treats unknown result.size as 1024 bytes when calling checkPaymentAndRateLimits, which underestimates large chunked responses and can bypass byte-based limits; change the initial placeholder to a conservative upper bound (e.g., a configured MAX_UNKNOWN_SIZE) when computing contentSize for checkPaymentAndRateLimits (the call that passes id: cidToV1Base32(cidString) and contentType: result.contentType), and implement tracking of actual streamed bytes during the IPFS response stream so you can call the final adjustment with the real byte count instead of the 1024 fallback; update usages around contentSize, result.size, and the final adjustment block (also where extractAllClientIPs(req) and rateLimiter/paymentProcessor are passed) to use the conservative initial bound and then replace it with the tracked streamedBytes for final billing/rate decisions.

codecov · 2026-04-21T02:51:59Z

Codecov Report

❌ Patch coverage is 76.21247% with 103 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.13%. Comparing base (38695f4) to head (c7e4b4a).
⚠️ Report is 14 commits behind head on develop.

Files with missing lines	Patch %	Lines
src/ipfs/kubo-data-source.ts	59.51%	100 Missing ⚠️
src/routes/ar-io-info-builder.ts	80.00%	3 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop     #682      +/-   ##
===========================================
- Coverage    77.14%   77.13%   -0.02%     
===========================================
  Files          116      118       +2     
  Lines        39692    40124     +432     
  Branches      2933     2947      +14     
===========================================
+ Hits         30621    30950     +329     
- Misses        9034     9137     +103     
  Partials        37       37

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

- Use positiveIntOrDefault for all IPFS numeric configs (prevents NaN) - URL-encode IPFS path segments (prevents request injection) - Guard contentLength parse against NaN (fallback to 0) - Use SANDBOX_PROTOCOL for redirects (correct behind TLS termination) - Enforce size limit during streaming (catches chunked responses) - Set failed=true before cleanup in stream end handler (race fix) - Use conservative size estimate for x402 when Content-Length unknown - Increment cache hit/miss counters in route handler - Fix env var name: IPFS_CACHE_CLEANUP_THRESHOLD_SECONDS - Guard Grafana cache hit rate expr against division by zero - Destroy response stream on 404/408/504 from Kubo (socket leak)

IPFS cached content now triggers the same data-cached event as Arweave, feeding into the existing webhook system. No new configuration needed — operators' existing webhook consumers see IPFS content automatically.

coderabbitai

Actionable comments posted: 5

♻️ Duplicate comments (4)

src/ipfs/ipfs-service.ts (2)
230-248: ⚠️ Potential issue | 🟠 Major

Keep streaming size-limit enforcement independent of cache-write state.

When cache setup/write fails, failed becomes true and the data handler returns before incrementing bytesWritten or enforcing maxResponseSizeBytes. That lets unknown-length oversized responses bypass the streaming cap whenever the cache path is unavailable. Track total bytes in a separate listener/path that always runs, and gate only cache writes on failed.
Suggested structure
-    let bytesWritten = 0;
+    let bytesSeen = 0;
+    let bytesWritten = 0;
...
     stream.on('data', (chunk: Buffer) => {
-      if (failed) return;
-      bytesWritten += chunk.length;
+      bytesSeen += chunk.length;
 
       // Enforce size limit during streaming (catches chunked responses
       // that lack Content-Length)
       if (
         this.maxResponseSizeBytes > 0 &&
-        bytesWritten > this.maxResponseSizeBytes
+        bytesSeen > this.maxResponseSizeBytes
       ) {
         failed = true;
         stream.destroy(
...
         return;
       }
+
+      if (failed) return;
+      bytesWritten += chunk.length;
Also applies to: 280-283
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ipfs/ipfs-service.ts` around lines 230 - 248, The data handler currently
bails out early when `failed` is true, which prevents `bytesWritten` from being
incremented and the `this.maxResponseSizeBytes` check from running; change the
logic in the `stream.on('data', ...)` handler so that tracking `bytesWritten`
and enforcing the size limit (throwing an `IpfsSizeLimitError` and calling
`cleanup()`) always executes regardless of `failed`, while only the cache write
path is gated by `failed` (i.e., update references to `bytesWritten`,
`this.maxResponseSizeBytes`, `IpfsSizeLimitError`, `cleanup()`, and the cache
write block so the size enforcement is independent of cache state).
217-255: ⚠️ Potential issue | 🟠 Major

Respect write-side backpressure when teeing to the cache file.

writeStream.write(chunk) is still unchecked and pendingChunks can grow without a bound while the temp directory/write stream is not ready. Pause/resume on drain, or switch to a pipeline()/PassThrough tee with bounded buffering.
Node.js Writable write returns false backpressure drain event recommended handling
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ipfs/ipfs-service.ts` around lines 217 - 255, When teeing stream data
into the cache, respect writable backpressure: in the stream.on('data') handler
(and when flushing pendingChunks), check the boolean return of
writeStream.write(chunk) and if it returns false, pause the source stream (the
`stream` readable) and resume it on writeStream's 'drain' event; also enforce a
bound on the `pendingChunks` buffer to avoid unbounded growth if the temp
dir/writeStream isn't ready, or alternatively replace the manual tee logic with
a bounded `PassThrough`/`pipeline` that handles backpressure automatically.
Ensure this integrates with existing error/cleanup handling (see `failed`,
`cleanup()`, `IpfsSizeLimitError`, and `maxResponseSizeBytes`) so pause/resume
and pending buffer limits honor abort/failure states.
src/routes/ipfs.ts (1)
172-177: ⚠️ Potential issue | 🟠 Major

Use actual streamed bytes for the final unknown-size adjustment.

The initial check now uses a conservative max-size estimate, but the final adjustment still reuses that estimate when Content-Length is unknown. That can overcharge payments or over-consume rate-limit tokens for smaller chunked responses; count bytes during streaming and pass the real total to adjustRateLimitTokens.

Also applies to: 241-244
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/routes/ipfs.ts` around lines 172 - 177, The current logic uses a
conservative contentSize (result.size || config.IPFS_MAX_RESPONSE_SIZE_BYTES)
for both the initial limit check and the final token adjustment, which
overcharges for chunked responses; during streaming (the code that
pipes/iterates the IPFS response) accumulate the actual number of bytes
transmitted and, after the stream finishes, call adjustRateLimitTokens (and any
equivalent call around lines 241-244) with that real byte count instead of
contentSize so payments and rate-limits are adjusted using the true streamed
size; update any references to contentSize used for final adjustments to use the
accumulated streamedBytes variable.
src/ipfs/ipfs-cache.ts (1)
157-178: ⚠️ Potential issue | 🟠 Major

Make cache finalization cleanup-safe and atomic.

This flow still leaves temp files behind when pipeline() fails in put(), and both put()/putFromFile() can leave finalized data without valid metadata/index state if later steps fail. Delete stale entries before overwrite, write metadata via a temp file, and clean up both temp/final paths on any finalization failure.

Also applies to: 203-238
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ipfs/ipfs-cache.ts` around lines 157 - 178, The put()/putFromFile()
finalization is not atomic and can leave temp files or finalized data without
metadata/index; update the flow in methods like tempDir(), createTempPath(),
pipeline(...), dataDir(), dataPath(key), metaPath(key) so you delete any stale
final paths before overwrite, write the metadata to a temp meta file and
atomically rename it to metaPath(key), and ensure any failure after creating
tempPath or moving data cleans up both tempPath and the partially-written
dataPath; wrap the sequence (write temp data, rename to dataPath, write temp
meta, rename to metaPath, then index.set(key, meta)) in try/catch/finally and
remove temp/final artifacts on errors to keep state consistent and atomic.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ipfs/ipfs-cache.ts`:
- Around line 43-46: When building the LRUCache in the constructor (this.index =
new LRUCache<...>), the sizeCalculation callback can return 0 for empty IPFS
objects which breaks lru-cache v11; update the sizeCalculation used in the new
LRUCache initialization to return Math.max(1, entry.size) (or equivalent) so the
cache always receives a positive weight while still keeping the real entry.size
stored on the CacheEntry metadata, leaving dispose and other logic unchanged.

In `@src/ipfs/ipfs-service.ts`:
- Around line 86-95: The current blocklist check only calls
blockListValidator.isIdBlocked(normalizedCid) and can miss alternative CID forms
or a resolved leaf CID for directory paths; update the logic around
normalizedCid and path to (1) check blockListValidator.isIdBlocked for both the
raw submitted CID (e.g., rawCid or the incoming id) and the normalizedCid alias,
and (2) when path is provided and the request will serve a file from a
directory, resolve the content via Kubo to obtain the leaf/resolved CID and call
blockListValidator.isIdBlocked on that leaf CID before serving; keep the
metrics.ipfsBlockedTotal.inc(), span.setAttribute('ipfs.blocked', true) and
throw IpfsBlockedError when any check returns true, and retain the existing path
traversal validation for '..' and leading '/'.
- Around line 144-151: The call to this.streamToCache(normalizedCid, path,
result.stream, result.contentType) on the IPFS response is done too early and
attaches a 'data' listener that forces the Kubo stream into flowing mode,
causing chunks to be consumed before payment/rate-limit and access checks
complete; move the streamToCache invocation out of the initial response path and
call it only after the route handler's payment/rateLimit/access checks succeed
(i.e., after the checks in the ipfs route handler) so caching/teeing only starts
for authorized requests, or alternatively modify streamToCache to tee into a
bounded PassThrough that keeps the original stream paused until the handler
explicitly starts piping, ensuring the Kubo stream isn't switched to flowing
mode prematurely; update references to streamToCache and any uses of
result.stream accordingly.

In `@src/ipfs/kubo-data-source.ts`:
- Around line 102-111: The axios.get call in KuboDataSource is leaving 5xx
response streams open because validateStatus currently rejects 5xx and axios
throws before your status handling; update the request to use validateStatus: ()
=> true so all responses (including 5xx) are returned for explicit handling in
the code path that processes response.status, and also add a defensive cleanup
in the catch block to call .destroy() (or .close() where applicable) on
error.response?.data if it is a stream before re-throwing; locate the axios.get
invocation (the call that passes responseType: 'stream', signal:
controller.signal, headers and maxRedirects) and the surrounding try/catch in
KuboDataSource to apply both changes.

In `@src/routes/ipfs.ts`:
- Around line 218-225: The route is double-counting cache metrics: remove the
redundant increments in the route handler (delete the calls to
metrics.ipfsCacheHitTotal.inc() and metrics.ipfsCacheMissTotal.inc() in the
block that sets cacheStatus and updates metrics) and rely on
IpfsService.getContent() to emit ipfsCacheHitTotal/ipfsCacheMissTotal; keep the
ipfsRequestsTotal.inc(...) and the cacheStatus variable logic intact so
route_type and success metrics remain reported.

---

Duplicate comments:
In `@src/ipfs/ipfs-cache.ts`:
- Around line 157-178: The put()/putFromFile() finalization is not atomic and
can leave temp files or finalized data without metadata/index; update the flow
in methods like tempDir(), createTempPath(), pipeline(...), dataDir(),
dataPath(key), metaPath(key) so you delete any stale final paths before
overwrite, write the metadata to a temp meta file and atomically rename it to
metaPath(key), and ensure any failure after creating tempPath or moving data
cleans up both tempPath and the partially-written dataPath; wrap the sequence
(write temp data, rename to dataPath, write temp meta, rename to metaPath, then
index.set(key, meta)) in try/catch/finally and remove temp/final artifacts on
errors to keep state consistent and atomic.

In `@src/ipfs/ipfs-service.ts`:
- Around line 230-248: The data handler currently bails out early when `failed`
is true, which prevents `bytesWritten` from being incremented and the
`this.maxResponseSizeBytes` check from running; change the logic in the
`stream.on('data', ...)` handler so that tracking `bytesWritten` and enforcing
the size limit (throwing an `IpfsSizeLimitError` and calling `cleanup()`) always
executes regardless of `failed`, while only the cache write path is gated by
`failed` (i.e., update references to `bytesWritten`,
`this.maxResponseSizeBytes`, `IpfsSizeLimitError`, `cleanup()`, and the cache
write block so the size enforcement is independent of cache state).
- Around line 217-255: When teeing stream data into the cache, respect writable
backpressure: in the stream.on('data') handler (and when flushing
pendingChunks), check the boolean return of writeStream.write(chunk) and if it
returns false, pause the source stream (the `stream` readable) and resume it on
writeStream's 'drain' event; also enforce a bound on the `pendingChunks` buffer
to avoid unbounded growth if the temp dir/writeStream isn't ready, or
alternatively replace the manual tee logic with a bounded
`PassThrough`/`pipeline` that handles backpressure automatically. Ensure this
integrates with existing error/cleanup handling (see `failed`, `cleanup()`,
`IpfsSizeLimitError`, and `maxResponseSizeBytes`) so pause/resume and pending
buffer limits honor abort/failure states.

In `@src/routes/ipfs.ts`:
- Around line 172-177: The current logic uses a conservative contentSize
(result.size || config.IPFS_MAX_RESPONSE_SIZE_BYTES) for both the initial limit
check and the final token adjustment, which overcharges for chunked responses;
during streaming (the code that pipes/iterates the IPFS response) accumulate the
actual number of bytes transmitted and, after the stream finishes, call
adjustRateLimitTokens (and any equivalent call around lines 241-244) with that
real byte count instead of contentSize so payments and rate-limits are adjusted
using the true streamed size; update any references to contentSize used for
final adjustments to use the accumulated streamedBytes variable.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: e13ad19b-4fd0-417c-9af1-2f61c76e84c2

📥 Commits

Reviewing files that changed from the base of the PR and between 2f48d0e and 04fcc12.

📒 Files selected for processing (10)

docker-compose.yaml
docs/envs.md
monitoring/grafana/dashboards/examples/ipfs-example.json
src/config.ts
src/ipfs/ipfs-cache.ts
src/ipfs/ipfs-service.ts
src/ipfs/kubo-data-source.ts
src/middleware/ipfs.ts
src/routes/ipfs.ts
src/system.ts

✅ Files skipped from review due to trivial changes (3)

docs/envs.md
src/config.ts
monitoring/grafana/dashboards/examples/ipfs-example.json

🚧 Files skipped from review as they are similar to previous changes (2)

src/middleware/ipfs.ts
docker-compose.yaml

coderabbitai · 2026-04-21T14:57:50Z

+    this.index = new LRUCache<string, CacheEntry>({
+      maxSize: maxSizeBytes,
+      sizeCalculation: (entry) => entry.size,
+      dispose: (_entry, key) => {


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

Does lru-cache v11 require sizeCalculation values to be positive integers when maxSize is configured?

💡 Result:

Yes, lru-cache v11 requires sizeCalculation values to be positive integers when maxSize is configured. The sizeCalculation function must return a value that passes the isPosInt check: a finite number greater than 0 that equals Math.floor(n), i.e., a positive integer. If not, it throws TypeError: "size must be positive integer" as shown in the source code's #requireSize method and issue reports.

Citations:

1: Default sizeCalculation or specific error message isaacs/node-lru-cache#221

2: https://www.jsdocs.io/package/lru-cache

3: https://npmx.dev/package/lru-cache/v/11.2.6

4: https://www.npmjs.com/package/lru-cache

5: https://github.com/isaacs/node-lru-cache/blob/main/README.md

6: https://github.com/isaacs/node-lru-cache/blob/e701ea2dd403dd18641c8d6d536c18c290cb3b3c/src/index.ts

7: https://github.com/isaacs/node-lru-cache/blob/757c1570f4032731bf8aab71032a29e50c33ce67/src/index.ts

8: https://github.com/isaacs/node-lru-cache/blob/main/src/index.ts

🏁 Script executed:

cat -n src/ipfs/ipfs-cache.ts | head -80 | tail -50

Repository: ar-io/ar-io-node

Length of output: 1879

🏁 Script executed:

# Search for entry.size usage and zero-size scenarios rg "entry\.size|size.*=.*0" src/ipfs/ipfs-cache.ts -A 2 -B 2

Repository: ar-io/ar-io-node

Length of output: 394

🏁 Script executed:

# Check the put and putFromFile methods to see how entries are created ast-grep --pattern $'class $_ { $$$ put($_) { $$$ } $$$ }' src/ipfs/ipfs-cache.ts

Repository: ar-io/ar-io-node

Length of output: 42

🏁 Script executed:

# Find put and putFromFile methods rg "^\s*(async\s+)?put\(|^\s*(async\s+)?putFromFile\(" src/ipfs/ipfs-cache.ts -A 20

Repository: ar-io/ar-io-node

Length of output: 1223

🏁 Script executed:

# Find CacheEntry interface definition rg "interface CacheEntry|type CacheEntry" src/ipfs/ipfs-cache.ts -A 5

Repository: ar-io/ar-io-node

Length of output: 151

Guard zero-byte entries before inserting into LRUCache.

Empty IPFS objects can produce entry.size === 0. lru-cache v11 requires sizeCalculation to return a positive integer when maxSize is configured; passing 0 will throw TypeError: "size must be positive integer" at runtime. Use a minimum weight of 1 while preserving the real size in metadata.

Suggested adjustment

maxSize: maxSizeBytes, - sizeCalculation: (entry) => entry.size, + sizeCalculation: (entry) => Math.max(1, entry.size),

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

this.index = new LRUCache<string, CacheEntry>({

maxSize: maxSizeBytes,

sizeCalculation: (entry) => entry.size,

dispose: (_entry, key) => {

this.index = new LRUCache<string, CacheEntry>({

maxSize: maxSizeBytes,

sizeCalculation: (entry) => Math.max(1, entry.size),

dispose: (_entry, key) => {

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/ipfs/ipfs-cache.ts` around lines 43 - 46, When building the LRUCache in the constructor (this.index = new LRUCache<...>), the sizeCalculation callback can return 0 for empty IPFS objects which breaks lru-cache v11; update the sizeCalculation used in the new LRUCache initialization to return Math.max(1, entry.size) (or equivalent) so the cache always receives a positive weight while still keeping the real entry.size stored on the CacheEntry metadata, leaving dispose and other logic unchanged.

coderabbitai · 2026-04-21T14:57:51Z

+      // Check blocklist (uses the same admin API as Arweave data moderation)
+      if (await this.blockListValidator.isIdBlocked(normalizedCid)) {
+        metrics.ipfsBlockedTotal.inc();
+        span.setAttribute('ipfs.blocked', true);
+        throw new IpfsBlockedError(`CID is blocked: ${normalizedCid}`);
+      }
+
+      // Reject path traversal attempts
+      if (path !== undefined && (path.includes('..') || path.startsWith('/'))) {
+        throw new IpfsNotFoundError('Invalid IPFS path');


⚠️ Potential issue | 🟠 Major

Check moderation against all CID forms that can serve the content.

Only normalizedCid is checked. If operators block a CIDv0 string as submitted to the existing admin API, or block a child file CID that is served through a directory CID/path, this root-only normalized check can miss it. Check both raw and normalized CID aliases, and consider validating the resolved leaf CID from Kubo response metadata before serving path-based directory content.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/ipfs/ipfs-service.ts` around lines 86 - 95, The current blocklist check only calls blockListValidator.isIdBlocked(normalizedCid) and can miss alternative CID forms or a resolved leaf CID for directory paths; update the logic around normalizedCid and path to (1) check blockListValidator.isIdBlocked for both the raw submitted CID (e.g., rawCid or the incoming id) and the normalizedCid alias, and (2) when path is provided and the request will serve a file from a directory, resolve the content via Kubo to obtain the leaf/resolved CID and call blockListValidator.isIdBlocked on that leaf CID before serving; keep the metrics.ipfsBlockedTotal.inc(), span.setAttribute('ipfs.blocked', true) and throw IpfsBlockedError when any check returns true, and retain the existing path traversal validation for '..' and leading '/'.

coderabbitai · 2026-04-21T14:57:51Z

+      // Stream directly to the client while writing to a temp file on disk
+      // for caching. No memory buffering — handles files of any size.
+      this.streamToCache(
+        normalizedCid,
+        path,
+        result.stream,
+        result.contentType,
+      );


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

Node.js Readable stream adding data listener switches stream to flowing mode before pipe consumer attached

💡 Result:

Yes, adding a 'data' event listener to a Node.js Readable stream switches it from paused mode to flowing mode immediately, before any pipe consumer is attached. This is the designed behavior of Readable streams. All Readable streams start in paused mode. They switch to flowing mode in these ways: - Adding a 'data' event handler - Calling stream.resume - Calling stream.pipe to attach a Writable Once in flowing mode, data is automatically read from the source and emitted via 'data' events as fast as possible. If no consumers are ready (like a pipe destination), data may be lost. Attaching a 'data' listener explicitly switches to flowing mode regardless of whether a pipe is attached yet. To avoid this and keep paused mode for manual control (e.g., before piping), use the 'readable' event instead of 'data', or call stream.read explicitly. Example: const readable = getReadableStreamSomehow; // This switches to flowing mode immediately: readable.on('data', (chunk) => { /* handle */ }); // Data flows even if no pipe is attached yet. Official Node.js documentation confirms this behavior across versions up to v25.9.0.

Citations:

1: https://nodejs.org/api/stream.html

2: https://nodejs.org/docs/latest/api/stream.html

3: https://node.readthedocs.io/en/latest/api/stream/

4: https://nodejs.org/dist/latest-v16.x/docs/api/stream.html

5: https://wanago.io/2019/03/04/node-js-typescript-4-paused-and-flowing-modes-of-a-readable-stream/

6: https://blog.logrocket.com/guide-node-js-readable-streams/

7: https://nodejs.org/download/release/v4.8.6/docs/api/stream.html

🏁 Script executed:

cd /tmp/repo && find . -name "ipfs-service.ts" -type f | head -5

Repository: ar-io/ar-io-node

Length of output: 119

🏁 Script executed:

cd /tmp/repo && cat -n src/ipfs/ipfs-service.ts | sed -n '140,160p'

Repository: ar-io/ar-io-node

Length of output: 119

🏁 Script executed:

cd /tmp/repo && cat -n src/ipfs/ipfs-service.ts | sed -n '225,260p'

Repository: ar-io/ar-io-node

Length of output: 119

🏁 Script executed:

cd /tmp/repo && rg -A 20 "streamToCache" src/ipfs/ipfs-service.ts | head -50

Repository: ar-io/ar-io-node

Length of output: 119

🏁 Script executed:

cat -n src/ipfs/ipfs-service.ts | sed -n '140,160p'

Repository: ar-io/ar-io-node

Length of output: 799

🏁 Script executed:

cat -n src/ipfs/ipfs-service.ts | sed -n '225,260p'

Repository: ar-io/ar-io-node

Length of output: 1332

🏁 Script executed:

rg -A 25 "streamToCache" src/ipfs/ipfs-service.ts | head -80

Repository: ar-io/ar-io-node

Length of output: 1432

🏁 Script executed:

rg -B 5 -A 20 "getContent.*{" src/ipfs/ipfs-service.ts | head -100

Repository: ar-io/ar-io-node

Length of output: 149

🏁 Script executed:

cat -n src/ipfs/ipfs-service.ts | sed -n '100,175p'

Repository: ar-io/ar-io-node

Length of output: 2565

🏁 Script executed:

rg -A 15 "src/routes/ipfs" src/ipfs/ipfs-service.ts | head -30

Repository: ar-io/ar-io-node

Length of output: 42

🏁 Script executed:

find . -name "ipfs.ts" -path "*/routes/*" -type f

Repository: ar-io/ar-io-node

Length of output: 80

🏁 Script executed:

cat -n src/routes/ipfs.ts | sed -n '1,100p'

Repository: ar-io/ar-io-node

Length of output: 3135

🏁 Script executed:

rg -A 30 "ipfsService.getContent\|streamToCache" src/routes/ipfs.ts | head -80

Repository: ar-io/ar-io-node

Length of output: 42

🏁 Script executed:

rg -B 10 -A 10 "\.pipe\(" src/routes/ipfs.ts | head -80

Repository: ar-io/ar-io-node

Length of output: 660

🏁 Script executed:

cat -n src/routes/ipfs.ts | sed -n '110,200p'

Repository: ar-io/ar-io-node

Length of output: 3188

🏁 Script executed:

rg -A 80 "async function handleIpfsRequest" src/routes/ipfs.ts | head -120

Repository: ar-io/ar-io-node

Length of output: 2359

🏁 Script executed:

cat -n src/routes/ipfs.ts | sed -n '200,240p'

Repository: ar-io/ar-io-node

Length of output: 1662

🏁 Script executed:

rg -B 5 "result.stream.pipe" src/routes/ipfs.ts

Repository: ar-io/ar-io-node

Length of output: 214

Move streamToCache() call to after payment/rate-limit checks to prevent data loss.

streamToCache() attaches a data listener (line 230 in ipfs-service.ts) which immediately switches the Kubo stream from paused mode to flowing mode. This occurs before the route handler completes payment/rate-limit checks (ipfs.ts:177-189). Chunks emitted during the check window are consumed by the cache listener and will not reach the pipe consumer at ipfs.ts:231, causing truncated responses.

Additionally, move the call after access checks to avoid pre-authorization Kubo/disk work on blocked or rate-limited requests. Either defer cache setup until after checks, or wrap the tee in a bounded PassThrough that defers flowing mode.

Also applies to: 230-256

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/ipfs/ipfs-service.ts` around lines 144 - 151, The call to this.streamToCache(normalizedCid, path, result.stream, result.contentType) on the IPFS response is done too early and attaches a 'data' listener that forces the Kubo stream into flowing mode, causing chunks to be consumed before payment/rate-limit and access checks complete; move the streamToCache invocation out of the initial response path and call it only after the route handler's payment/rateLimit/access checks succeed (i.e., after the checks in the ipfs route handler) so caching/teeing only starts for authorized requests, or alternatively modify streamToCache to tee into a bounded PassThrough that keeps the original stream paused until the handler explicitly starts piping, ensuring the Kubo stream isn't switched to flowing mode prematurely; update references to streamToCache and any uses of result.stream accordingly.

coderabbitai · 2026-04-21T14:57:51Z

+      const response = await axios.get(url, {
+        responseType: 'stream',
+        signal: controller.signal,
+        headers: {
+          'Accept-Encoding': 'identity',
+        },
+        maxRedirects: 5,
+        // Accept non-2xx so we can handle 404/408/504 ourselves
+        validateStatus: (status) => status < 500 || status === 504,
+      });


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

Axios responseType stream validateStatus rejected response includes response.data stream destroy

💡 Result:

When using Axios with responseType: 'stream' in Node.js, if the server returns a non-2xx status code (causing rejection unless validateStatus is customized), the rejected promise's error.response.data is a Node.js stream containing the full response body. By default, Axios rejects promises for status codes outside 2xx range via validateStatus. For stream responses, this stream in error.response.data is not automatically consumed/buffered into a string or object—you must read it manually (e.g., collect chunks and parse if JSON). Example from real-world usage (Axios issue #6468): try { const response = await axios.post(url, data, { responseType: 'stream' }); } catch (error) { if (error.config.responseType === 'stream' && error.response?.data && typeof error.response.data.read === 'function') { const chunks = []; for await (const chunk of error.response.data) { chunks.push(Buffer.isBuffer(chunk) ? chunk : Buffer.from(chunk)); } error.response.data = JSON.parse(Buffer.concat(chunks).toString('utf8')); } throw error; } Regarding "stream destroy": The stream won't auto-destroy on rejection/scope exit. If unconsumed, it may leak resources or hang. Always consume (read to end) or explicitly call error.response.data.destroy in catch blocks after handling. Recent fixes (v1.13.1, Oct 2025) ensure streams aren't interrupted prematurely for non-OK statuses, preserving access. To avoid rejection for expected error streams, use validateStatus: (status) => true and check response.status manually.

Citations:

1: Different responseType for error vs response axios/axios#6468

2: https://github.com/axios/axios/blob/main/CHANGELOG.md

3: fix(http): fixed a regression that caused the data stream to be interrupted for responses with non-OK HTTP statuses axios/axios#7193

🏁 Script executed:

wc -l src/ipfs/kubo-data-source.ts

Repository: ar-io/ar-io-node

Length of output: 92

🏁 Script executed:

cat -n src/ipfs/kubo-data-source.ts | sed -n '95,125p'

Repository: ar-io/ar-io-node

Length of output: 1232

🏁 Script executed:

cat -n src/ipfs/kubo-data-source.ts | sed -n '170,215p'

Repository: ar-io/ar-io-node

Length of output: 1503

🏁 Script executed:

rg "axios\.(get|post|request)" src/ipfs/kubo-data-source.ts -n

Repository: ar-io/ar-io-node

Length of output: 109

🏁 Script executed:

cat -n src/ipfs/kubo-data-source.ts | sed -n '1,100p'

Repository: ar-io/ar-io-node

Length of output: 3347

🏁 Script executed:

cat -n src/ipfs/kubo-data-source.ts | grep -A 30 "async.*{" | head -80

Repository: ar-io/ar-io-node

Length of output: 1102

🏁 Script executed:

rg "validateStatus|catch" src/ipfs/kubo-data-source.ts -B 3 -A 5

Repository: ar-io/ar-io-node

Length of output: 600

Destroy rejected 5xx response streams from Axios.

With the current validateStatus, Kubo 500/502/503 responses are rejected by Axios before the explicit status handling. The stream in error.response?.data is never destroyed and remains open in the catch block, causing a resource leak.

Handle this by changing to validateStatus: () => true and managing all statuses explicitly, or destroy error.response.data before re-throwing in the catch block:

Suggested fix

- // Accept non-2xx so we can handle 404/408/504 ourselves - validateStatus: (status) => status < 500 || status === 504, + // Accept all statuses so streamed error bodies can be destroyed here. + validateStatus: () => true, });

Then extend the existing status handling:

if (response.status === 408 || response.status === 504) { (response.data as Readable).destroy(); throw new IpfsTimeoutError( `Kubo timed out resolving: /ipfs/${ipfsPath}`, ); } + + if (response.status >= 500) { + (response.data as Readable).destroy(); + throw new IpfsUnavailableError( + `Kubo returned ${response.status} for /ipfs/${ipfsPath}`, + ); + }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/ipfs/kubo-data-source.ts` around lines 102 - 111, The axios.get call in KuboDataSource is leaving 5xx response streams open because validateStatus currently rejects 5xx and axios throws before your status handling; update the request to use validateStatus: () => true so all responses (including 5xx) are returned for explicit handling in the code path that processes response.status, and also add a defensive cleanup in the catch block to call .destroy() (or .close() where applicable) on error.response?.data if it is a stream before re-throwing; locate the axios.get invocation (the call that passes responseType: 'stream', signal: controller.signal, headers and maxRedirects) and the surrounding try/catch in KuboDataSource to apply both changes.

coderabbitai · 2026-04-21T14:57:51Z

+    // Track metrics
+    const cacheStatus = result.cached ? 'hit' : 'miss';
+    metrics.ipfsRequestsTotal.inc({ route_type: routeType, status: 'success' });
+    if (result.cached) {
+      metrics.ipfsCacheHitTotal.inc();
+    } else {
+      metrics.ipfsCacheMissTotal.inc();
+    }


⚠️ Potential issue | 🟡 Minor

Avoid double-counting IPFS cache hit/miss metrics.

IpfsService.getContent() already increments ipfsCacheHitTotal/ipfsCacheMissTotal on lines 98-117 of src/ipfs/ipfs-service.ts; incrementing them again here reports every request twice. Keep the counters in one layer only.

Suggested route-side removal

const cacheStatus = result.cached ? 'hit' : 'miss'; metrics.ipfsRequestsTotal.inc({ route_type: routeType, status: 'success' }); - if (result.cached) { - metrics.ipfsCacheHitTotal.inc(); - } else { - metrics.ipfsCacheMissTotal.inc(); - } if (result.size > 0) { metrics.ipfsContentSizeHistogram.observe(result.size); }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/routes/ipfs.ts` around lines 218 - 225, The route is double-counting cache metrics: remove the redundant increments in the route handler (delete the calls to metrics.ipfsCacheHitTotal.inc() and metrics.ipfsCacheMissTotal.inc() in the block that sets cacheStatus and updates metrics) and rely on IpfsService.getContent() to emit ipfsCacheHitTotal/ipfsCacheMissTotal; keep the ipfsRequestsTotal.inc(...) and the cacheStatus variable logic intact so route_type and success metrics remain reported.

…IPFS cache (PE-9067) - IPFS blocked response now matches Arweave format (plain text with ID) - Emit data-cached webhook event when IPFS content is cached (same event as Arweave, feeds into content scanner pipeline automatically)

coderabbitai

♻️ Duplicate comments (2)

src/routes/ipfs.ts (2)

221-225: ⚠️ Potential issue | 🟡 Minor

Duplicate cache metrics increment (still unresolved).

IpfsService.getContent() already emits ipfsCacheHitTotal/ipfsCacheMissTotal, so these route-level increments double-count every request. Keep them in the service layer only.

♻️ Proposed removal

     const cacheStatus = result.cached ? 'hit' : 'miss';
     metrics.ipfsRequestsTotal.inc({ route_type: routeType, status: 'success' });
-    if (result.cached) {
-      metrics.ipfsCacheHitTotal.inc();
-    } else {
-      metrics.ipfsCacheMissTotal.inc();
-    }
     if (result.size > 0) {
       metrics.ipfsContentSizeHistogram.observe(result.size);
     }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/routes/ipfs.ts` around lines 221 - 225, Remove the duplicate cache metric
increments from the route handler: the block that checks result.cached and calls
metrics.ipfsCacheHitTotal.inc() / metrics.ipfsCacheMissTotal.inc() should be
deleted because IpfsService.getContent() already emits those metrics; keep
metrics emission only inside IpfsService.getContent() and rely on its
instrumentation (refer to result.cached and metrics.ipfsCacheHitTotal /
metrics.ipfsCacheMissTotal in the route to locate the code to remove).

231-251: ⚠️ Potential issue | 🟠 Major

Final rate-limit adjustment still uses the max-size fallback instead of actual streamed bytes.

The initial check now correctly uses IPFS_MAX_RESPONSE_SIZE_BYTES as a conservative upper bound (good), but the res.on('finish') adjustment at Line 243 reuses the same contentSize fallback when result.size is unknown. For chunked/streaming responses the adjustment should reconcile against the bytes actually written to the socket, otherwise clients are charged/throttled at the maximum for every unknown-length response (e.g., directory listings, small files over chunked transport).

Track streamed bytes via a pass-through or by counting bytes as they flow into res, then pass that count to adjustRateLimitTokens.

♻️ Sketch of the fix

+    let streamedBytes = 0;
+    result.stream.on('data', (chunk: Buffer) => {
+      streamedBytes += chunk.length;
+    });
     result.stream.pipe(res);

     res.on('finish', () => {
       const durationSec = (Date.now() - startTime) / 1000;
       metrics.ipfsRequestDurationHistogram.observe(
         { route_type: routeType, cache_status: cacheStatus },
         durationSec,
       );

       adjustRateLimitTokens({
         req,
-        responseSize: result.size > 0 ? result.size : contentSize,
+        responseSize: result.size > 0 ? result.size : streamedBytes,
         initialResult: limitCheck,
         rateLimiter,
       }).catch((error) => {

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/routes/ipfs.ts` around lines 231 - 251, The final rate-limit adjustment
is still using the conservative contentSize fallback instead of the actual
number of bytes streamed; change the piping to count bytes as they are sent and
pass that count into adjustRateLimitTokens. Specifically, when handling
result.stream.pipe(res) (in the res.on('finish') block), create a streamedBytes
counter and increment it for each chunk (either by wrapping result.stream with a
PassThrough and counting chunk.length on its 'data' events or by listening to
result.stream's 'data' events), then call adjustRateLimitTokens with
responseSize: result.size > 0 ? result.size : streamedBytes; ensure you attach
error handlers/cleanup for the stream and preserve the existing initialResult
and rateLimiter parameters when calling adjustRateLimitTokens.

🧹 Nitpick comments (2)

src/routes/ipfs.ts (2)
35-45: Add TSDoc to the exported/public functions.

The four top-level functions (createIpfsRouter, createIpfsHandler, createIpfsPathHandler, handleIpfsRequest) have no TSDoc. Since this is a newly introduced public surface, brief docstrings describing the routing contract (what the returned router mounts, which request fields createIpfsHandler expects from middleware, the routeType semantics) would make consumer wiring from src/app.ts and src/middleware/ipfs.ts clearer.

As per coding guidelines: "Add or improve TSDoc comments on code you touch".

Also applies to: 60-70, 88-98, 139-159
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/routes/ipfs.ts` around lines 35 - 45, Add TSDoc comments to the four
exported functions createIpfsRouter, createIpfsHandler, createIpfsPathHandler,
and handleIpfsRequest: for each function document its purpose, parameters
(including types like log, ipfsService, rateLimiter, paymentProcessor and
expected middleware-populated request fields), what routes or path patterns the
returned Router/handler mounts or handles, the meaning/semantics of routeType
and any other enum/flag used, and the return value (Router or RequestHandler)
and possible thrown errors; keep entries brief (one-line summary +
param/@returns/@throws tags) so consumers of src/app.ts and
src/middleware/ipfs.ts can wire these functions correctly.
71-86: Avoid (req as any) — extend the Express Request type.

createIpfsHandler reads ipfsCid/ipfsPath populated by the subdomain middleware via untyped casts. A small declaration-merging module gives both sides a typed contract and prevents silent drift if the middleware renames these fields.
♻️ Example declaration
// src/types/express.d.ts (or similar)
declare global {
  namespace Express {
    interface Request {
      ipfsCid?: string;
      ipfsPath?: string;
    }
  }
}
export {};
Then:
-    const cidString = (req as any).ipfsCid as string;
-    const path = (req as any).ipfsPath as string | undefined;
+    const cidString = req.ipfsCid!;
+    const path = req.ipfsPath;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/routes/ipfs.ts` around lines 71 - 86, The code casts req to any to access
ipfsCid/ipfsPath in createIpfsHandler/asyncHandler before calling
handleIpfsRequest; add an Express declaration-merging file (e.g.,
src/types/express.d.ts) that extends Express.Request with optional ipfsCid?:
string and ipfsPath?: string, export an empty object to make it a module, then
remove the (req as any) casts in createIpfsHandler and reference req.ipfsCid and
req.ipfsPath directly so the types are enforced and the middleware/handler
contract is explicit.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/routes/ipfs.ts`:
- Around line 221-225: Remove the duplicate cache metric increments from the
route handler: the block that checks result.cached and calls
metrics.ipfsCacheHitTotal.inc() / metrics.ipfsCacheMissTotal.inc() should be
deleted because IpfsService.getContent() already emits those metrics; keep
metrics emission only inside IpfsService.getContent() and rely on its
instrumentation (refer to result.cached and metrics.ipfsCacheHitTotal /
metrics.ipfsCacheMissTotal in the route to locate the code to remove).
- Around line 231-251: The final rate-limit adjustment is still using the
conservative contentSize fallback instead of the actual number of bytes
streamed; change the piping to count bytes as they are sent and pass that count
into adjustRateLimitTokens. Specifically, when handling result.stream.pipe(res)
(in the res.on('finish') block), create a streamedBytes counter and increment it
for each chunk (either by wrapping result.stream with a PassThrough and counting
chunk.length on its 'data' events or by listening to result.stream's 'data'
events), then call adjustRateLimitTokens with responseSize: result.size > 0 ?
result.size : streamedBytes; ensure you attach error handlers/cleanup for the
stream and preserve the existing initialResult and rateLimiter parameters when
calling adjustRateLimitTokens.

---

Nitpick comments:
In `@src/routes/ipfs.ts`:
- Around line 35-45: Add TSDoc comments to the four exported functions
createIpfsRouter, createIpfsHandler, createIpfsPathHandler, and
handleIpfsRequest: for each function document its purpose, parameters (including
types like log, ipfsService, rateLimiter, paymentProcessor and expected
middleware-populated request fields), what routes or path patterns the returned
Router/handler mounts or handles, the meaning/semantics of routeType and any
other enum/flag used, and the return value (Router or RequestHandler) and
possible thrown errors; keep entries brief (one-line summary +
param/@returns/@throws tags) so consumers of src/app.ts and
src/middleware/ipfs.ts can wire these functions correctly.
- Around line 71-86: The code casts req to any to access ipfsCid/ipfsPath in
createIpfsHandler/asyncHandler before calling handleIpfsRequest; add an Express
declaration-merging file (e.g., src/types/express.d.ts) that extends
Express.Request with optional ipfsCid?: string and ipfsPath?: string, export an
empty object to make it a module, then remove the (req as any) casts in
createIpfsHandler and reference req.ipfsCid and req.ipfsPath directly so the
types are enforced and the middleware/handler contract is explicit.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 2c737508-c731-446c-a5d1-427eb4abace9

📥 Commits

Reviewing files that changed from the base of the PR and between 04fcc12 and c7e4b4a.

📒 Files selected for processing (1)

src/routes/ipfs.ts

vilenarios added 8 commits April 20, 2026 23:22

fix: lint errors in IPFS files (PE-9067)

7a8f322

Fix strict-boolean-expressions (explicit nullish checks) and prettier formatting issues caught by CI eslint.

coderabbitai Bot reviewed Apr 21, 2026

View reviewed changes

vilenarios added 2 commits April 21, 2026 03:03

feat: emit data-cached webhook event for IPFS content (PE-9067)

04fcc12

IPFS cached content now triggers the same data-cached event as Arweave, feeding into the existing webhook system. No new configuration needed — operators' existing webhook consumers see IPFS content automatically.

coderabbitai Bot reviewed Apr 21, 2026

View reviewed changes

Conversation

vilenarios commented Apr 21, 2026

Summary

Operator Quick Start

Architecture

Test Plan

Uh oh!

coderabbitai Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Apr 21, 2026 •

edited

Loading

codecov Bot commented Apr 21, 2026 •

edited

Loading