Skip to content

feat: Phase 2A — graphile-presigned-url-plugin (requestUploadUrl, confirmUpload, downloadUrl)#895

Open
pyramation wants to merge 3 commits intomainfrom
devin/1774477393-presigned-url-plugin
Open

feat: Phase 2A — graphile-presigned-url-plugin (requestUploadUrl, confirmUpload, downloadUrl)#895
pyramation wants to merge 3 commits intomainfrom
devin/1774477393-presigned-url-plugin

Conversation

@pyramation
Copy link
Copy Markdown
Contributor

@pyramation pyramation commented Mar 25, 2026

Summary

New PostGraphile v5 plugin package (graphile/graphile-presigned-url-plugin/) implementing Phase 2A of the presigned URL upload pipeline. Adds three capabilities to the GraphQL schema:

  • requestUploadUrl mutation — Accepts a client-computed SHA-256 content hash, validates bucket access (RLS), checks MIME type / file size against bucket config, deduplicates against existing files, creates a pending file record, generates a presigned PUT URL, and tracks the request in upload_requests.
  • confirmUpload mutation — Verifies the file exists in S3 via HEAD, checks content-type matches, transitions file status pending → ready, marks upload_request as confirmed. Idempotent for already-confirmed files.
  • downloadUrl computed field — Detected via @storageFiles smart tag on the codec (table). Returns public URL for public files, presigned GET URL for private files.

Uses extendSchema + grafast lambda/object plans (same pattern as PublicKeySignature), GraphQLObjectType_fields hook for the computed field (same pattern as graphile-search), and an LRU cache for per-database storage module config resolution.

Note: This package is scaffolded and builds successfully but is not yet wired into ConstructivePreset and has no integration tests yet. Those are planned as follow-up commits.

Updates since last revision

  • jwt_private.current_database_id() for database resolution (plugin.ts): resolveDatabaseId() now calls SELECT jwt_private.current_database_id() — a cheap function reading the JWT context already set by the server middleware — instead of querying metaschema_public.database WHERE db_name = current_database() on every request. The storage module config LRU cache (storage-module-cache.ts) still runs its heavier metaschema join query only on cache miss.
  • Replaced duck-typing with @storageFiles smart tag (download-url-field.ts): File table detection now checks codec.extensions.tags.storageFiles instead of heuristically matching 6 column names. The storage module generator in constructive-db emits this tag on the generated files table (see constructive-db#687).

Review & Testing Checklist for Human

  • jwt_private.current_database_id() availabilityresolveDatabaseId() assumes the server middleware sets jwt.claims.database_id in pgSettings. Verify this is always set for authenticated requests. If a request lacks this claim (e.g. misconfigured middleware), the plugin returns DATABASE_NOT_FOUND. Check graphile.ts:159 in the server.
  • SQL injection surface in dynamic table namesstorage-module-cache.ts:89-91 builds qualified table names via string interpolation from metaschema rows. These are then interpolated into raw SQL in plugin.ts (lines 193, 232, 247, 263, 294, 342, 375, 388, 396). Values come from trusted metaschema, but verify this is acceptable vs. parameterized identifiers.
  • @storageFiles smart tag cross-repo contractdownload-url-field.ts reads codec.extensions.tags.storageFiles. This must match the smart_tags.storageFiles key emitted by storage_module.sql in constructive-db#687. If renamed in one repo, it must be renamed in both.
  • Race condition on dedup + INSERT — Between the dedup SELECT and the file INSERT in requestUploadUrl, a concurrent request with the same hash could cause a unique constraint violation. No explicit conflict handling (e.g., ON CONFLICT).
  • No testsjest.config.js exists with --passWithNoTests. Verify the plan for integration tests against MinIO before merge.

Suggested test plan:

  1. Stand up MinIO locally and wire the plugin into a test PostGraphile server
  2. Call requestUploadUrl → verify presigned PUT URL is returned and upload_requests row is created
  3. PUT a file to the presigned URL, then call confirmUpload → verify file status transitions to ready
  4. Query a file row → verify downloadUrl field appears (requires @storageFiles tag on the table)
  5. Test dedup: call requestUploadUrl with the same content hash → verify deduplicated: true response

Notes

  • Companion PR: constructive-db#687 adds the upload_requests table, filename column, and @storageFiles smart tag to the storage module.
  • The pg client pattern uses withPgClient(pgSettings, ...) (not null like PublicKeySignature) because RLS enforcement requires the user's JWT context for bucket/file access checks.
  • Multiple as any casts exist for Graphile scope types and AWS SDK type mismatches — these mirror patterns in existing plugins (graphile-search, graphile-upload-plugin) but are worth noting.
  • The downloadUrl resolver is async for private files (calls generatePresignedGetUrl per row), which means N+1 S3 signing calls on list queries. Acceptable for v1 but may need batching later.
  • S3 key = raw content hash with no prefix (buildS3Key() in plugin.ts:52-54). If multiple logical buckets map to the same physical S3 bucket, keys could collide. Intentional simplification for v1.

Link to Devin session: https://app.devin.ai/sessions/4c882ba2dfbf4045adf85fb83cde6f77
Requested by: @pyramation


Open with Devin

… mutations + downloadUrl field

Phase 2A Step 2: Presigned URL plugin for PostGraphile v5

- requestUploadUrl mutation: presigned PUT URL generation with SHA-256 content-hash keys,
  bucket validation (RLS), MIME type enforcement, file size limits, deduplication
- confirmUpload mutation: S3 HEAD verification, content-type check, status transition
  (pending -> ready), upload_request tracking
- downloadUrl computed field: presigned GET URLs for private files, public URL prefix for public
- StorageModuleCache: per-database LRU cache for storage module config (TTL-based)
- S3 signer wrapper: AWS SDK v3 presigned URL generation (PUT/GET/HEAD)
- PresignedUrlPreset factory for easy integration into ConstructivePreset

Follows existing patterns:
- extendSchema + grafast plans (same as PublicKeySignature)
- GraphQLObjectType_fields hook for downloadUrl (same as graphile-search)
- LRU cache with TTL (same as graphile-cache)
@devin-ai-integration
Copy link
Copy Markdown
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

… detection

Uses codec.extensions.tags.storageFiles instead of checking for 6 specific
column names. The storage module generator in constructive-db will set this
tag on the generated files table via a smart comment.
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

View 5 additional findings in Devin Review.

Open in Devin Review

Comment on lines +179 to +307
return withPgClient(pgSettings, async (pgClient: any) => {
// --- Resolve storage module config ---
const databaseId = await resolveDatabaseId(pgClient);
if (!databaseId) {
throw new Error('DATABASE_NOT_FOUND');
}

const storageConfig = await getStorageModuleConfig(pgClient, databaseId);
if (!storageConfig) {
throw new Error('STORAGE_MODULE_NOT_PROVISIONED');
}

// --- Look up the bucket (RLS enforced) ---
const bucketResult = await pgClient.query(
`SELECT id, type, is_public, owner_id, allowed_mime_types, max_file_size
FROM ${storageConfig.bucketsQualifiedName}
WHERE key = $1
LIMIT 1`,
[bucketKey],
);

if (bucketResult.rows.length === 0) {
throw new Error('BUCKET_NOT_FOUND');
}

const bucket = bucketResult.rows[0];

// --- Validate content type against bucket's allowed_mime_types ---
if (bucket.allowed_mime_types && bucket.allowed_mime_types.length > 0) {
const allowed = bucket.allowed_mime_types as string[];
const isAllowed = allowed.some((pattern: string) => {
if (pattern === '*/*') return true;
if (pattern.endsWith('/*')) {
const prefix = pattern.slice(0, -1);
return contentType.startsWith(prefix);
}
return contentType === pattern;
});
if (!isAllowed) {
throw new Error(`CONTENT_TYPE_NOT_ALLOWED: ${contentType} not in bucket allowed types`);
}
}

// --- Validate size against bucket's max_file_size ---
if (bucket.max_file_size && size > bucket.max_file_size) {
throw new Error(`FILE_TOO_LARGE: exceeds bucket max of ${bucket.max_file_size} bytes`);
}

const s3Key = buildS3Key(contentHash);

// --- Dedup check: look for existing file with same content_hash in this bucket ---
const dedupResult = await pgClient.query(
`SELECT id, status
FROM ${storageConfig.filesQualifiedName}
WHERE content_hash = $1
AND bucket_id = $2
AND status IN ('ready', 'processed')
LIMIT 1`,
[contentHash, bucket.id],
);

if (dedupResult.rows.length > 0) {
const existingFile = dedupResult.rows[0];
log.info(`Dedup hit: file ${existingFile.id} for hash ${contentHash}`);

// Track the dedup request
await pgClient.query(
`INSERT INTO ${storageConfig.uploadRequestsQualifiedName}
(file_id, bucket_id, key, content_type, content_hash, size, status, expires_at)
VALUES ($1, $2, $3, $4, $5, $6, 'confirmed', NOW())`,
[existingFile.id, bucket.id, s3Key, contentType, contentHash, size],
);

return {
uploadUrl: null,
fileId: existingFile.id,
key: s3Key,
deduplicated: true,
expiresAt: null,
};
}

// --- Create file record (status=pending) ---
const fileResult = await pgClient.query(
`INSERT INTO ${storageConfig.filesQualifiedName}
(bucket_id, key, content_type, content_hash, size, filename, owner_id, is_public, status)
VALUES ($1, $2, $3, $4, $5, $6, $7, $8, 'pending')
RETURNING id`,
[
bucket.id,
s3Key,
contentType,
contentHash,
size,
filename || null,
bucket.owner_id,
bucket.is_public,
],
);

const fileId = fileResult.rows[0].id;

// --- Generate presigned PUT URL ---
const uploadUrl = await generatePresignedPutUrl(
s3,
s3Key,
contentType,
size,
urlExpirySeconds,
);

const expiresAt = new Date(Date.now() + urlExpirySeconds * 1000).toISOString();

// --- Track the upload request ---
await pgClient.query(
`INSERT INTO ${storageConfig.uploadRequestsQualifiedName}
(file_id, bucket_id, key, content_type, content_hash, size, status, expires_at)
VALUES ($1, $2, $3, $4, $5, $6, 'issued', $7)`,
[fileId, bucket.id, s3Key, contentType, contentHash, size, expiresAt],
);

return {
uploadUrl,
fileId,
key: s3Key,
deduplicated: false,
expiresAt,
};
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Missing explicit transaction management for multi-step mutations

Both requestUploadUrl and confirmUpload perform multiple sequential write operations without wrapping them in an explicit BEGIN/COMMIT/ROLLBACK transaction. The existing PublicKeySignature plugin (graphile/graphile-settings/src/plugins/PublicKeySignature.ts:113-140) — which uses the identical withPgClient pattern — explicitly manages transactions because withPgClient does not automatically wrap callbacks in a transaction.

In confirmUpload, the file status is updated to 'ready' (line 387-391) before the upload request is updated to 'confirmed' (line 395-399). If the second UPDATE fails or the process crashes, the file is 'ready' but the upload_request remains 'issued' — a data inconsistency. Similarly in requestUploadUrl, if the file INSERT (line 262) succeeds but the upload_request INSERT (line 293) fails, an orphaned 'pending' file record exists with no tracking entry.

Expected pattern from PublicKeySignature
return withPgClient(pgSettings, async (pgClient: any) => {
  await pgClient.query('BEGIN');
  try {
    // ... all queries ...
    await pgClient.query('COMMIT');
    return result;
  } catch (err) {
    await pgClient.query('ROLLBACK');
    throw err;
  }
});
Prompt for agents
In graphile/graphile-presigned-url-plugin/src/plugin.ts, both mutation handlers need explicit transaction management matching the PublicKeySignature pattern.

For requestUploadUrl (line 179): Wrap the entire withPgClient callback body in a BEGIN/COMMIT/ROLLBACK try-catch block. After line 179 add: await pgClient.query('BEGIN'); then wrap the remaining logic (lines 180-306) in a try block that ends with await pgClient.query('COMMIT'); and a catch block with await pgClient.query('ROLLBACK'); throw err;

For confirmUpload (line 328): Same pattern. After line 328 add: await pgClient.query('BEGIN'); then wrap the remaining logic (lines 329-406) in a try block with COMMIT before returning and a catch block with ROLLBACK and re-throw.

This ensures that the multiple write operations (file INSERT + upload_request INSERT, or file UPDATE + upload_request UPDATE) are atomic.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +69 to +71
if (filename) {
params.ResponseContentDisposition = `attachment; filename="${filename}"`;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Unsanitized filename in Content-Disposition header allows header value injection

User-provided filename is interpolated directly into the ResponseContentDisposition header value without sanitizing special characters. A filename containing " characters (e.g., evil"; malicious=true) produces a malformed Content-Disposition: attachment; filename="evil"; malicious=true" header in the S3 download response. The filename flows from user input (plugin.ts:156) → database storage (plugin.ts:273) → download URL generation (download-url-field.ts:81,101) → presigned URL header (s3-signer.ts:70).

Suggested change
if (filename) {
params.ResponseContentDisposition = `attachment; filename="${filename}"`;
}
if (filename) {
const sanitized = filename.replace(/["\\\r\n]/g, '_');
params.ResponseContentDisposition = `attachment; filename="${sanitized}"`;
}
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

…hema_public.database

The server middleware already sets jwt.claims.database_id in pgSettings,
so jwt_private.current_database_id() is a cheap function call vs the
previous approach of querying metaschema_public.database on every request.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant