Self-hosted API gateway for Claude Code on Amazon Bedrock
CCAG is a self-hosted API gateway that routes Claude Code through your Amazon Bedrock account. It translates the Anthropic Messages API into Bedrock API calls, presenting as the Anthropic Direct API so Claude Code enables extended thinking and web search. It also provides centralized management for teams: virtual API keys, budgets, rate limits, OIDC SSO, and an admin portal for observability.
When you set CLAUDE_CODE_USE_BEDROCK=1, Claude Code connects to Bedrock directly. In this mode, extended thinking and web search are not available. CCAG sits between Claude Code and Bedrock, translating requests so Claude Code thinks it is talking to the Anthropic API. This means those features work normally. CCAG also adds team management, spend tracking, and centralized configuration that direct Bedrock usage does not provide.
By presenting as the Anthropic Direct API, CCAG enables extended thinking and web search in Claude Code. Additionally, it provides:
CCAG supports Claude 4+ models on Bedrock, with hardcoded mappings for Claude Opus 4.6, Claude Sonnet 4.6, Claude Opus 4.5, Claude Sonnet 4.5, Claude Sonnet 4, and Claude Haiku 4.5. Additional models can be mapped through the admin portal. Model IDs are automatically mapped from Anthropic format (e.g., claude-sonnet-4-20250514) to Bedrock format (e.g., cross-region inference profile IDs).
Yes. CCAG is released under the MIT license.
You need:
See Getting Started for the full walkthrough.
The primary costs are:
Total infrastructure overhead is approximately $100-120/month for a minimal staging deployment. Production with 2 tasks adds roughly $15-30/month more.
CCAG works in any region where Bedrock is available. Model routing prefixes are auto-detected from the region:
| Region Pattern | Routing Prefix | Example |
|---|---|---|
us-*, ca-* |
us |
US cross-region inference |
eu-* |
eu |
EU cross-region inference |
ap-southeast-2, ap-southeast-4 |
au |
Australia |
ap-*, me-* |
apac |
Asia Pacific |
us-gov-* |
us-gov |
GovCloud |
Yes. CCAG supports multi-endpoint routing, where a single gateway instance can route requests to Bedrock in different AWS accounts or regions. Configure endpoints through the admin portal’s Endpoints section.
Yes. CCAG requires a PostgreSQL database for virtual keys, spend tracking, settings, and user management. In production, the CDK stack creates an RDS instance automatically. For local development, use Docker: docker run -d --name ccag-db -e POSTGRES_DB=proxy -e POSTGRES_USER=proxy -e POSTGRES_PASSWORD=devpass -p 5432:5432 postgres:16.
Edit ~/.ccag/config.json to set domain_name and hosted_zone_name, then redeploy. The CDK stack creates a Route53 DNS record and an ACM certificate (auto-validated via DNS). If you manage certificates externally, set certificate_arn to your existing certificate.
CCAG supports any OpenID Connect provider that issues RS256-signed JWTs. Tested providers include Okta, Azure AD (Microsoft Entra ID), Google Workspace, Auth0, and Keycloak. See Authentication for provider-specific setup guides.
Yes. You can configure one provider via environment variables (OIDC_ISSUER) and additional providers through the admin API or portal. All configured providers are active simultaneously, and each JWT is validated against the matching issuer.
Virtual keys are API keys managed by CCAG (prefixed with sk-proxy-). They are stored as SHA-256 hashes in the database and cached in memory for fast validation. Each key can have a name, rate limit, user assignment, and team assignment. The gateway uses its own AWS credentials for Bedrock; virtual keys only authenticate the client to CCAG.
Use the apiKeyHelper mechanism in Claude Code. Add to ~/.claude/settings.json:
{
"env": {
"ANTHROPIC_BASE_URL": "https://ccag.example.com",
"CLAUDE_CODE_API_KEY_HELPER_TTL_MS": "840000"
},
"apiKeyHelper": "bash ~/.claude/proxy-login.sh"
}
This opens a browser for SSO login and passes the resulting token back to Claude Code. The proxy-login.sh script is served by the gateway at /auth/setup/token-script. See Authentication for details.
ADMIN_PASSWORD is the bootstrap and break-glass recovery credential. It lets you log in to the admin portal even if OIDC is misconfigured or the database is corrupted. Set it to a strong value in production and store it securely. See Configuration for more.
Yes. Streaming is fully supported. CCAG translates Bedrock’s binary event stream into Anthropic’s SSE (Server-Sent Events) format in real time:
Claude Code --[POST stream:true]--> CCAG --[InvokeModelWithResponseStream]--> Bedrock
Claude Code <--[SSE text/event-stream]-- CCAG <--[AWS binary event stream]--- Bedrock
Bedrock does not support Anthropic’s web_search server tool. CCAG intercepts web search tool use requests, executes the search via DuckDuckGo, and translates the results back into the server_tool_use/web_search_tool_result format that Claude Code expects. This is transparent to the client.
Yes. The POST /v1/messages/count_tokens endpoint is supported and proxied to Bedrock.
Extended thinking works through CCAG. The gateway handles the thinking beta flag and passes it through to Bedrock where supported.
CCAG exposes Prometheus metrics at /metrics and supports OTLP export. Key metrics include:
Set OTEL_EXPORTER_OTLP_ENDPOINT to export metrics to your observability stack (Grafana, Datadog, etc.).
In ECS deployments, logs are sent to CloudWatch Logs. The CDK stack configures a log group automatically. Use the AWS Console or CLI:
aws logs tail /ecs/CCAG --follow
Locally, logs are written to stderr. Set RUST_LOG=debug for verbose logging including request/response bodies.
CCAG is stateless (aside from the in-memory key cache, which is rebuilt on startup and synced via database polling). Scale by editing desired_count in ~/.ccag/config.json and redeploying. The ALB distributes traffic across tasks.
The in-memory key cache and settings are eventually consistent across instances (5-second polling interval). Rate limiting is per-instance (in-memory sliding window), not distributed. With N instances, the effective rate limit is approximately N times the configured limit.
CCAG tracks token usage per request and aggregates it by user, team, and model. View spend data in the admin portal’s Analytics section or via the API:
curl https://ccag.example.com/admin/analytics \
-H "authorization: Bearer $TOKEN"
Export spend data as CSV:
curl https://ccag.example.com/admin/analytics/export \
-H "authorization: Bearer $TOKEN" > spend.csv
Pre-built images and binaries are published to GitHub Releases on every release. No compilation required.
docker compose pull && docker compose up -d (or pin with CCAG_VERSION=1.0.2)npx cdk deploy -c environment=prod -c imageTag=1.0.2ccag updateDatabase migrations run automatically on startup. See Upgrading for details.
CCAG adds 1-5ms of latency for request translation. The gateway is written in Rust (axum/tokio) and processes requests asynchronously. The primary latency is Bedrock’s inference time (typically hundreds of milliseconds); the gateway adds 1-5ms.
Common causes:
proxy-login.shapi.anthropic.comThis is a non-blocking warning at startup. It means the gateway could not reach Bedrock. Check:
bedrock:InvokeModel* permissions (the CDK stack configures this automatically)The portal is embedded in the binary at compile time (include_str!). If you see a blank page:
/health)Check:
DATABASE_URL is set correctly (or DATABASE_HOST + DB_PASSWORD)Rate limiting is per-instance (in-memory sliding window). With multiple ECS tasks, the effective limit per key is approximately configured_rpm * number_of_tasks because requests are distributed across instances by the ALB. For strict rate limiting, set the per-key limit to desired_limit / desired_count.
The admin password is set via the ADMIN_PASSWORD environment variable. To reset it:
There is no password stored in the database. The admin password is always read from the environment variable at startup.
Set RUST_LOG=debug to see full request and response bodies in the logs. This shows the original Anthropic-format request, the translated Bedrock request, and the translated response.
RUST_LOG=debug cargo run
In ECS, update the RUST_LOG environment variable in the task definition and redeploy.