Troubleshooting¶

Common failure modes¶

"Failed to load …" on every page¶

Likely the api isn't reachable. Check:

curl -fsS http://localhost:18080/readyz

If 503, the api can see Postgres or Redis but one of them isn't ready — the response body lists which:

{"status":"unready","checks":{"postgres":"ping failed: dial tcp ..."}}

Check the audit log:

TOKEN=...  # need an admin token from another session
curl -fsS "http://localhost:18080/api/v1/audit-events?action=auth.login" \
  -H "Authorization: Bearer $TOKEN" \
  | jq '.[] | select(.status=="denied")'

The metadata.error_kind tells you what's wrong:

`error_kind`	Meaning
`user_not_found`	Email doesn't exist (case-insensitive). Check `SB_BOOTSTRAP_ADMIN_EMAIL`
`wrong_password`	Password mismatch
`user_disabled`	`local_users.disabled = TRUE`
`missing_fields`	Body shape wrong

Agent job fails immediately ("agent reported failure")¶

Read the agent's log:

docker logs secrets-bridge-agent-1 | grep -iE "job|error|fail" | tail

The most common cause is provider config missing. For Vault: leave the Provider config field blank in the Submit drawer (the UI defaults to kvMount=secret). For AWS Secrets Manager: the region is required.

For a deeper trace, set the agent's log level to debug:

SB_LOG_LEVEL=debug docker compose up -d agent

"Wraps card shows 'No wraps issued yet'" after the agent ran¶

Two possibilities:

You're signed in as the wrong user. The Wraps card only fetches when the viewer is the requester. Sign in as the user who submitted the request.
The agent failed before posting the wrap. Check GET /audit-events?correlation_id=<request-uuid> for the full chain — you should see wrap.create events on success.

"Failed to load audit events"¶

Likely audit.read permission missing. The seed admin role has it; if you're a different user, an admin needs to grant you the audit.read permission via a role.

# As admin:
curl -fsS http://localhost:18080/api/v1/audit-events \
  -H "Authorization: Bearer $TOKEN"

Expected: 200 with an array. If 403, your role doesn't carry audit.read.

Agent stays "stale" forever¶

Two things to check:

The agent's heartbeat is reaching the CP. From the agent's network, run:

docker exec secrets-bridge-agent-1 wget -qO- http://api:8080/healthz

The agent secret is correct. A wrong secret returns 401 on heartbeat (generic message; the audit log records error_kind=unauthorized).

Vault read returns "permission denied"¶

The agent uses the credentials in SB_VAULT_TOKEN or via the configured Kubernetes auth role. Check that they have read access to the secret path:

docker exec secrets-bridge-vault-1 sh -c \
  'VAULT_ADDR=http://127.0.0.1:8200 VAULT_TOKEN=devroot \
   vault kv get secret/<path>'

If that works but the agent gets denied, the agent token has less scope than the dev token. Adjust the policy in Vault.

Useful queries against the audit log¶

Find every action for one request¶

curl -fsS "http://localhost:18080/api/v1/audit-events?correlation_id=<uuid>" \
  -H "Authorization: Bearer $TOKEN" | jq '.[] | {time:.occurred_at, actor, action, resource}'

SINCE=$(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ)
curl -fsS "http://localhost:18080/api/v1/audit-events?action=auth.login&since=$SINCE" \
  -H "Authorization: Bearer $TOKEN" \
  | jq '.[] | select(.status=="denied")'

Find every wrap retrieval (who saw which value when)¶

curl -fsS "http://localhost:18080/api/v1/audit-events?action=wrap.retrieve" \
  -H "Authorization: Bearer $TOKEN" \
  | jq '.[] | {time:.occurred_at, actor, resource, status}'

This is the SOC2-friendly query — every plaintext reveal across the platform, append-only, with the actor and timestamp.

Reading the metrics¶

The api and worker both expose Prometheus metrics at /metrics. The interesting ones:

Metric	What
`worker_scheduler_runs_total{task,outcome}`	Per-sweeper success / failure / skipped_lock counter
`worker_scheduler_run_duration_seconds{task}`	Sweeper latency histogram
`worker_scheduler_lock_skipped_total{task}`	Leader-election losses (expected for N-1 of N replicas)

For HTTP-side latency, log-derived metrics are the source today; a future PR adds explicit histograms.

When in doubt¶

Check the audit log first (GET /audit-events)
Then the api logs (docker logs secrets-bridge-api-1)
Then the agent logs (docker logs secrets-bridge-agent-1)
Then Postgres (docker exec -it secrets-bridge-postgres-1 psql -U secrets_bridge)
File an issue at secrets-bridge/.github with the correlation_id and the audit chain