Configuration¶
agate-server (the Docker entrypoint) is configured by a single agate.toml
file, layered with environment overrides. Mount the file into the container and
point AGATE_CONFIG at it. A ready-to-edit template lives at
agate.example.toml.
Sources and precedence¶
Lowest to highest — a later layer overrides an earlier one:
- Built-in defaults.
agate.toml— path fromAGATE_CONFIG(default/etc/agate/agate.toml). A missing file is fine; the defaults apply.- Environment —
AGATE__SECTION__KEY(uppercase,__between levels). Prefer env for secrets.
So [audit].database_url is overridden by AGATE__AUDIT__DATABASE_URL.
docker run --rm \
-v "$PWD/agate.toml:/etc/agate/agate.toml:ro" \
-e AGATE_CONFIG=/etc/agate/agate.toml \
-e AGATE__AUDIT__DATABASE_URL='postgres://agate:secret@db:5432/agate' \
ghcr.io/c3equalzz/agate
A missing required value (proxy.agent_endpoint, audit.database_url)
aborts startup — fail fast on misconfiguration rather than running degraded.
[proxy]¶
| Key | Required | Default | Meaning |
|---|---|---|---|
agent_endpoint |
yes | — | URL of the upstream AG-UI agent that Agate forwards inspected traffic to. |
bind |
no | 0.0.0.0:8080 |
Address/port Agate listens on for incoming AG-UI traffic. |
connect_timeout_secs |
no | 5 |
Fail-fast connect timeout to the upstream agent. |
read_timeout_secs |
no | 60 |
Idle timeout between upstream SSE chunks. Not an overall deadline — a healthy stream runs on. |
max_body_bytes |
no | 1048576 |
Maximum accepted request body size (1 MiB). Oversized requests get 413. |
max_concurrent_requests |
no | 256 |
Maximum concurrently in-flight proxied runs. Each holds an upstream connection for its full stream; requests over the cap are shed with 503 (not queued), so a flood cannot exhaust memory/connections. |
api_key |
no | — | If set, required on the X-API-Key header (else 401). Blank/absent leaves the proxy open. Prefer AGATE__PROXY__API_KEY for the secret. |
Liveness vs readiness probes
/healthz (liveness) returns 200 whenever the process is up. /readyz
(readiness) returns 200 only when the transparency-log store is
reachable, else 503 — point your orchestrator's readiness probe at it so
traffic is held until Agate can record. The reachability check is behind a
HealthCheck port, so it stays correct if the store backend changes. Both
probes bypass the API-key and body-size guards.
[audit]¶
| Key | Required | Default | Meaning |
|---|---|---|---|
database_url |
yes | — | PostgreSQL connection string for the Merkle transparency log. Migrations run on startup. Prefer AGATE__AUDIT__DATABASE_URL for the password. |
max_connections |
no | 10 |
Maximum pooled database connections. |
acquire_timeout_secs |
no | 30 |
How long to wait for a free pooled connection before erroring. |
connect_max_retries |
no | 10 |
Initial-connect retries before startup gives up (0 = try once). Rides out a database still starting beside Agate (compose/Kubernetes) instead of crashing on the first failed connect. |
connect_backoff_secs |
no | 1 |
Base backoff between connect attempts (doubled each retry, capped). |
The transparency log to append to is pinned by the AUDIT_LOG_ID environment
variable (a UUID). If unset, a fresh log is created on startup and its id is
printed so you can pin it on the next run.
[policy.tools] and [policy]¶
All policy keys are optional. With none set, every tool is permitted and nothing is redacted.
| Key | Format | Meaning |
|---|---|---|
[policy.tools].mode |
allow-all | allowlist | denylist |
How tool calls are authorized. Default allow-all. |
[policy.tools].names |
array of tool names | Tools governed by mode (ignored when allow-all). |
[policy].redact |
array of literal markers | Substrings masked (case-insensitive) in emitted text before it reaches the client. |
[policy].fail_mode |
open | closed |
What to do if a policy decision times out: forward (open) or block (closed). Default closed (safety over availability). |
[policy].decision_timeout_ms |
integer (ms) | Deadline for one policy decision. Default 5000; must be > 0. |
Invalid policy aborts startup
A blank or invalid tool name, or an empty redaction pattern, aborts startup — a typo must never silently weaken enforcement.
[observability.logging]¶
| Key | Default | Meaning |
|---|---|---|
enabled |
true |
Install a log subscriber at all; false silences logs. |
format |
pretty |
pretty (console) or json (one object per line, for log shippers). |
level |
info |
Filter directive (e.g. agate_proxy=debug,info). RUST_LOG overrides it when set. |
At info you see lifecycle events: startup, each proxied run, policy denials and
redactions, and transparency-log creation. Raise to debug (e.g.
level = "agate_proxy=debug,info") for per-event detail (each forwarded/buffered
event, every appended audit record).
[observability.metrics]¶
A Prometheus scrape endpoint on its own port, kept off the public data-plane port (scrape it from inside the network).
| Key | Default | Meaning |
|---|---|---|
enabled |
false |
Install the metrics recorder + exporter. When off, metrics are no-ops. |
exporter |
prometheus |
prometheus (a /metrics endpoint) or none. |
bind |
0.0.0.0:9090 |
Address the /metrics endpoint listens on. |
Exposed metrics:
agate_runs_total— runs proxied.agate_events_inspected_total{outcome="forward|buffer|transform|deny|terminate"}— inspected events by outcome (the verdict breakdown).agate_upstream_errors_total— upstream agent request/stream failures.agate_audit_records_appended_total/agate_audit_records_dropped_total— transparency-log writes vs. drops (a non-zero drop rate means audit is falling behind — alert on it).
A ready-to-run Prometheus + Grafana stack with a pre-built dashboard lives in
deploy/observability/.
[observability.tracing]¶
OTLP trace export — the third observability pillar beside logs and metrics. When
tracing is off, spans are still created but not exported; they are visible in
logs only when [observability.logging].enabled = true (the subscriber that
both renders spans to logs and exports them is installed only then).
| Key | Default | Meaning |
|---|---|---|
enabled |
false |
Export spans to an OTLP collector over gRPC. |
endpoint |
http://localhost:4317 |
OTLP gRPC endpoint of the collector. |
service_name |
agate-server |
service.name reported on exported spans. |
Spans cover the request path end to end:
proxy_run— one per proxied run on the data plane.audit.request— one per dispatched audit command/query. ATracingBehaviorwraps the whole mediator pipeline (outermost, enclosing the metrics and transaction behaviors), so every use case is traced uniformly — new use cases get a span for free.db.log.load/db.log.save/db.proof.inclusion/db.proof.consistency— one per SQL statement, nested under theaudit.requestspan that issued it.
When tracing is enabled (and [observability.logging].enabled = true, so the
subscriber is installed), spans are flushed on graceful shutdown. Point
endpoint at an OpenTelemetry Collector (or any OTLP/gRPC backend) to collect
per-run traces.
[tls]¶
Terminate TLS at Agate's own listener. Off by default — Agate then serves plain HTTP, which is only sensible behind a TLS-terminating gateway (an ingress, load balancer, or service mesh). Enable it to serve HTTPS directly (e.g. for a zero-trust deployment with no separate terminator).
| Key | Default | Meaning |
|---|---|---|
enabled |
false |
Serve HTTPS instead of plain HTTP. When false, cert/key are ignored. |
cert |
— | Path to the PEM certificate chain (leaf certificate first). Required when enabled. |
key |
— | Path to the PEM private key for cert. Required when enabled. |
When enabled, a missing or invalid cert/key aborts startup (fail fast on
a misconfigured listener). Both probes and the data plane are then served over
the same TLS listener on proxy.bind.
Full example¶
[proxy]
agent_endpoint = "http://agent:8000/run"
bind = "0.0.0.0:8080"
connect_timeout_secs = 5
read_timeout_secs = 60
max_body_bytes = 1048576
max_concurrent_requests = 256
# api_key = "change-me" # prefer AGATE__PROXY__API_KEY
[audit]
# Prefer AGATE__AUDIT__DATABASE_URL for the password.
database_url = "postgres://agate@postgres:5432/agate"
max_connections = 10
acquire_timeout_secs = 30
connect_max_retries = 10
connect_backoff_secs = 1
[policy.tools]
mode = "allowlist"
names = ["search", "fetch"]
[policy]
redact = ["sk-", "AKIA"]
fail_mode = "closed"
decision_timeout_ms = 5000
[observability.logging]
enabled = true
format = "pretty"
level = "info"
[observability.metrics]
enabled = true
exporter = "prometheus"
bind = "0.0.0.0:9090"
[observability.tracing]
enabled = false
endpoint = "http://localhost:4317"
service_name = "agate-server"
[tls]
enabled = false
cert = "/etc/agate/tls/cert.pem"
key = "/etc/agate/tls/key.pem"