Skip to content

Intermittent server startup failures due to missing credentials in startup race #2670

@rtyler

Description

@rtyler

I have seen this issue crop up periodically and it really seems like a race condition but I'm not sure whether it's triggered in OpenDAL or in sccache. The issue seems to stem from the propagation of environment variables when the server is started

This is an example of a failure to start when using AWS_PROFILE

[2026-04-08T12:56:10Z DEBUG sccache::config] Attempting to read config file at "/home/tyler/.config/sccache/config"                                                                                                                                                                                    12:56:10 [47/960]
[2026-04-08T12:56:10Z INFO  sccache::server] start_server: 127.0.0.1:4226
[2026-04-08T12:56:10Z INFO  sccache::server] No scheduler address configured, disabling distributed sccache
[2026-04-08T12:56:10Z DEBUG sccache::cache::cache] Configuring single cache from CacheType
[2026-04-08T12:56:10Z DEBUG sccache::cache::cache] Init s3 cache with bucket cache, endpoint Some("REDACTED")
[2026-04-08T12:56:10Z DEBUG opendal::services::s3::backend] backend build started: S3Builder { config: S3Config { root: Some("sccache"), bucket: "cache", endpoint: Some("REDACTED"), region: Some("garage"), .. }, .. }
[2026-04-08T12:56:10Z DEBUG opendal::services::s3::backend] backend use root /sccache/
[2026-04-08T12:56:10Z DEBUG opendal::services::s3::backend] backend use bucket cache
[2026-04-08T12:56:10Z DEBUG opendal::services::s3::backend] backend use region: garage
[2026-04-08T12:56:10Z DEBUG opendal::services::s3::backend] backend use endpoint: REDACTED/cache
[2026-04-08T12:56:10Z DEBUG opendal::services] service=s3 name=cache path=.sccache_check: read started
[2026-04-08T12:56:10Z DEBUG reqsign::aws::v4] calculated scope: 20260408/garage/s3/aws4_request
[2026-04-08T12:56:10Z DEBUG reqsign::aws::v4] calculated string to sign: AWS4-HMAC-SHA256
    20260408T125610Z
    20260408/garage/s3/aws4_request
    11ef4c50c2e4794468652060a188043f042dd3cb7b574de816d20696c232e333
[2026-04-08T12:56:10Z DEBUG reqwest::connect] starting new connection: REDACTED
[2026-04-08T12:56:10Z WARN  opendal::services] service=s3 name=cache path=.sccache_check: read failed PermissionDenied (permanent) at read, context: { uri: REDACTED/cache/sccache/.sccache_check, response: Parts { status: 403, version: HTTP/1.1, headers: {"server": "nginx/1.28.0", "date": "We
d, 08 Apr 2026 12:56:10 GMT", "content-type": "application/xml", "content-length": "194", "connection": "keep-alive", "access-control-allow-origin": "*"} }, service: s3, path: .sccache_check, range: 0- } => S3Error { code: "AccessDenied", message: "Forbidden: No such key: ", resource: "/cache/sccache/.sccache_c
heck", request_id: "" }
[2026-04-08T12:56:10Z DEBUG opendal::services] service=s3 name=cache path=.sccache_check: read failed
[2026-04-08T12:56:10Z ERROR sccache::server] storage check failed for: cache storage failed to read: PermissionDenied (permanent) at read => S3Error { code: "AccessDenied", message: "Forbidden: No such key: ", resource: "/cache/sccache/.sccache_check", request_id: "" }

    Context:
       uri:REDACTED/cache/sccache/.sccache_check
       response: Parts { status: 403, version: HTTP/1.1, headers: {"server": "nginx/1.28.0", "date": "Wed, 08 Apr 2026 12:56:10 GMT", "content-type": "application/xml", "content-length": "194", "connection": "keep-alive", "access-control-allow-origin": "*"} }
       service: s3
       path: .sccache_check
       range: 0-


    Stack backtrace:
       0: <unknown>
       1: <unknown>
       2: <unknown>
       3: <unknown>
       4: <unknown>
       5: <unknown>
       6: <unknown>
       7: <unknown>
       8: <unknown>
       9: __libc_start_call_main
                 at ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16
      10: __libc_start_main_impl
                 at ./csu/../csu/libc-start.c:360:3
      11: <unknown>
[2026-04-08T12:56:10Z DEBUG sccache::server] notify_server_startup(Err { reason: "cache storage failed to read: PermissionDenied (permanent) at read => S3Error { code: \"AccessDenied\", message: \"Forbidden: No such key: \", resource: \"/cache/sccache/.sccache_check\", request_id: \"\" }\n\nContext:\n   uri: REDACTED/cache/sccache/.sccache_check\n   response: Parts { status: 403, version: HTTP/1.1, headers: {\"server\": \"nginx/1.28.0\", \"date\": \"Wed, 08 Apr 2026 12:56:10 GMT\", \"content-type\": \"application/xml\", \"content-length\": \"194\", \"connection\": \"keep-alive\", \"access-control-
allow-origin\": \"*\"} }\n   service: s3\n   path: .sccache_check\n   range: 0-\n" })
sccache: error: Server startup failed: cache storage failed to read: PermissionDenied (permanent) at read => S3Error { code: "AccessDenied", message: "Forbidden: No such key: ", resource: "/cache/sccache/.sccache_check", request_id: "" }

Context:
   uri: REDACTED/cache/sccache/.sccache_check
   response: Parts { status: 403, version: HTTP/1.1, headers: {"server": "nginx/1.28.0", "date": "Wed, 08 Apr 2026 12:56:10 GMT", "content-type": "application/xml", "content-length": "194", "connection": "keep-alive", "access-control-allow-origin": "*"} }
   service: s3
   path: .sccache_check
   range: 0-

sccache: error: cache storage failed to read: PermissionDenied (permanent) at read => S3Error { code: "AccessDenied", message: "Forbidden: No such key: ", resource: "/cache/sccache/.sccache_check", request_id: "" }

Context:
   uri: REDACTED/cache/sccache/.sccache_check
   response: Parts { status: 403, version: HTTP/1.1, headers: {"server": "nginx/1.28.0", "date": "Wed, 08 Apr 2026 12:56:10 GMT", "content-type": "application/xml", "content-length": "194", "connection": "keep-alive", "access-control-allow-origin": "*"} }
   service: s3
   path: .sccache_check
   range: 0-

When starting a debug instance with environment-set keys: I can get the check to work and the server to start

SCCACHE_NO_DAEMON=1 SCCACHE_LOG=debug SCCACHE_IDLE_TIMEOUT=0 sccache --start-server
[2026-04-08T13:06:33Z DEBUG sccache::config] Attempting to read config file at "/home/tyler/.config/sccache/config"
sccache: Starting the server...
[2026-04-08T13:06:33Z DEBUG sccache::config] Attempting to read config file at "/home/tyler/.config/sccache/config"
[2026-04-08T13:06:33Z INFO  sccache::server] start_server: 127.0.0.1:4226
[2026-04-08T13:06:33Z INFO  sccache::server] No scheduler address configured, disabling distributed sccache
[2026-04-08T13:06:33Z DEBUG sccache::cache::cache] Configuring single cache from CacheType
[2026-04-08T13:06:33Z DEBUG sccache::cache::cache] Init s3 cache with bucket cache, endpoint Some("REDACTED")
[2026-04-08T13:06:33Z DEBUG opendal::services::s3::backend] backend build started: S3Builder { config: S3Config { root: Some("sccache"), bucket: "cache", endpoint: Some("REDACTED"), region: Some("garage"), .. }, .. }
[2026-04-08T13:06:33Z DEBUG opendal::services::s3::backend] backend use root /sccache/
[2026-04-08T13:06:33Z DEBUG opendal::services::s3::backend] backend use bucket cache
[2026-04-08T13:06:33Z DEBUG opendal::services::s3::backend] backend use region: garage
[2026-04-08T13:06:33Z DEBUG opendal::services::s3::backend] backend use endpoint: REDACTED/cache
[2026-04-08T13:06:33Z DEBUG opendal::services] service=s3 name=cache path=.sccache_check: read started
[2026-04-08T13:06:33Z DEBUG reqsign::aws::v4] calculated scope: 20260408/garage/s3/aws4_request
[2026-04-08T13:06:33Z DEBUG reqsign::aws::v4] calculated string to sign: AWS4-HMAC-SHA256
    20260408T130633Z
    20260408/garage/s3/aws4_request
    e39f5fa87b14bb2fa198b119e02dc3b351d4d115a90636ae2b9c018842820b2e
[2026-04-08T13:06:33Z DEBUG reqwest::connect] starting new connection: REDACTED/
[2026-04-08T13:06:33Z DEBUG opendal::services] service=s3 name=cache path=.sccache_check: read created reader
[2026-04-08T13:06:33Z DEBUG opendal::services] service=s3 name=cache path=.sccache_check read=13 size=0: read finished
[2026-04-08T13:06:33Z DEBUG opendal::services] service=s3 name=cache path=.sccache_check: write started
[2026-04-08T13:06:33Z DEBUG opendal::services] service=s3 name=cache path=.sccache_check: write created writer
[2026-04-08T13:06:33Z DEBUG reqsign::aws::v4] calculated scope: 20260408/garage/s3/aws4_request
[2026-04-08T13:06:33Z DEBUG reqsign::aws::v4] calculated string to sign: AWS4-HMAC-SHA256
    20260408T130633Z
    20260408/garage/s3/aws4_request
    940f13630330f4b11a789891ef29976a3fb86388ae22ccc933fcb696f970185b
[2026-04-08T13:06:34Z DEBUG opendal::services] service=s3 name=cache path=.sccache_check written=13: write close succeeded
[2026-04-08T13:06:34Z DEBUG sccache::cache::cache] storage check result: ReadWrite
[2026-04-08T13:06:34Z INFO  sccache::server] server has setup with ReadWrite
[2026-04-08T13:06:34Z INFO  sccache::server] server started, listening on 127.0.0.1:4226
[2026-04-08T13:06:34Z DEBUG sccache::server] notify_server_startup(Ok { addr: "127.0.0.1:4226" })
sccache: Listening on address 127.0.0.1:4226

And then when I subsequently run a build:

error: process didn't exit successfully: `sccache /home/tyler/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/rustc -vV` (exit status: 2)
--- stderr
sccache: error: Server startup failed: cache storage failed to read: Unexpected (temporary) at read => loading credential to sign http request

Context:
   called: reqsign::LoadCredential
   service: s3
   path: .sccache_check
   range: 0-

Source:
   error sending request for url (http://169.254.169.254/latest/api/token): operation timed out

Backtrace:
   0: <unknown>
   1: <unknown>
   2: <unknown>
   3: <unknown>
   4: <unknown>
   5: <unknown>
   6: <unknown>
   7: <unknown>
   8: <unknown>
   9: <unknown>
  10: <unknown>
  11: <unknown>
  12: <unknown>
  13: <unknown>
  14: <unknown>
  15: <unknown>
  16: __libc_start_call_main
             at ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16
  17: __libc_start_main_impl
             at ./csu/../csu/libc-start.c:360:3
  18: <unknown>

This feels like variables are not being propagated successfully but I see it only sometimes and haven't figured out why it only sometimes will manifest in the same shell!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions