Production deploy

The local quickstart gets you running on a laptop. Production adds TLS, real secret management, durable backups, and observability. This page covers the standard production setup.

What you need

A Linux host with Docker Engine (or Kubernetes — patterns the same).
PostgreSQL 16+ (managed or self-hosted).
An HTTPS-terminating reverse proxy upstream of the orchestrator.
Outbound HTTPS to your LLM providers from each engine container.
A secret manager (Vault, AWS Secrets Manager, GCP Secret Manager, k8s Secrets).

1. Build and tag the engine image

The orchestrator launches engines from ORCH_ENGINE_IMAGE. Build once, tag deliberately:

cd engine
docker build -t registry.your-domain.com/september-engine:2.3.0 --target prod .
docker push registry.your-domain.com/september-engine:2.3.0

Pin to a specific version (2.3.0), not latest, so canary upgrades work. See Engine deploy for upgrade flow.

2. Build and tag the orchestrator image

cd bap-engine
docker build -t registry.your-domain.com/bap-engine:0.2.0 --target prod \
  --secret id=github_token,src=$HOME/.github_token \
  .
docker push registry.your-domain.com/bap-engine:0.2.0

The github_token build secret is needed once, at pip-install time, to pull september-engine from the private repo.

3. Provision Postgres

bap-engine reads/writes one database. Don’t share with anything else.

Database name: orchestrator.
User: orch_user with full DDL on its own database.
Storage: 50 GB+ for moderate fleets. The bulk is audit_log.
Backups: daily snapshot, 7-day retention. Hourly point-in-time recovery if your provider supports it.

Apply the migration once. The orchestrator does this on first boot, but if you’re managing Postgres outside the orchestrator’s reach, run manually:

psql "$ORCH_DATABASE_URL" \
  < orchestrator/migrations/001_registry.sql

4. Generate secrets

Two critical orchestrator secrets:

# Master key for Fernet-encrypting engine API keys at rest.
# LOSE THIS = LOSE EVERY ENGINE'S KEY.
python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"

# Admin key for /products/register and /products/{id}/policy.
python -c "import secrets; print(secrets.token_urlsafe(48))"

Store both in your secret manager. They’re injected at runtime as ORCH_MASTER_KEY and ORCH_ADMIN_KEY. Don’t rotate ORCH_MASTER_KEY without a re-encrypt step. Rotating it without re-encrypting every engine’s engine_key_enc invalidates every stored engine key.

5. Configure environment

Minimum production environment:

# Database
ORCH_DATABASE_URL=postgresql://orch_user:<secret>@<host>:5432/orchestrator

# Secrets (from secret manager)
ORCH_MASTER_KEY=<from secret manager>
ORCH_ADMIN_KEY=<from secret manager>

# Engine image (pinned version)
ORCH_ENGINE_IMAGE=registry.your-domain.com/september-engine:2.3.0
ORCH_ENGINE_BACKEND=docker
ORCH_ENGINE_NETWORK=engine_net
ORCH_ENGINE_ENV_PASSTHROUGH=LLM_API_KEY,OPENAI_API_KEY,ANTHROPIC_API_KEY,GEMINI_API_KEY

# Per-engine LLM provider keys (forwarded to containers)
LLM_API_KEY=<from secret manager>
OPENAI_API_KEY=<from secret manager>
ANTHROPIC_API_KEY=<from secret manager>

# Listen
ORCH_HOST=0.0.0.0
ORCH_PORT=8000

# Health and recovery
ORCH_HEALTH_CHECK_INTERVAL_S=30
ORCH_HEALTH_CHECK_TIMEOUT_S=10
ORCH_HEALTH_MAX_FAILURES=3
ORCH_RESTART_BACKOFF_BASE_S=5
ORCH_RESTART_BACKOFF_MAX_S=300
ORCH_RESTART_MAX_ATTEMPTS=8

# Idle behavior
ORCH_IDLE_SLEEP_THRESHOLD_S=3600

# Port range for engine containers
ORCH_PORT_MIN=9001
ORCH_PORT_MAX=9999

# Volume root for engine brains
ORCH_DATA_ROOT_PATH=/data/engine-data

# Catalog mount (read-only, shared)
ORCH_CATALOG_MOUNT_PATH=/data/catalog

For the full list see Environment variables.

6. Run the orchestrator

docker run -d \
  --name bap-engine \
  --restart unless-stopped \
  -p 8000:8000 \
  --env-file /etc/bap-engine/env \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -v /data/catalog:/data/catalog:ro \
  -v /data/engine-data:/data/engine-data \
  --network engine_net \
  registry.your-domain.com/bap-engine:0.2.0

The orchestrator needs:

The Docker socket — to create/start/stop engine containers.
The catalog directory mounted at /data/catalog, read-only — passed through to engine containers.
The data root mounted at /data/engine-data — engine brain volumes live here.
Membership in engine_net — to reach engine /health endpoints.

7. TLS termination

Run an HTTPS-terminating reverse proxy upstream:

upstream bap_engine {
    server 127.0.0.1:8000;
}

server {
    listen 443 ssl http2;
    server_name bap-engine.your-domain.com;

    ssl_certificate ...;
    ssl_certificate_key ...;

    location / {
        proxy_pass http://bap_engine;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_read_timeout 60s;
    }
}

The orchestrator itself is HTTP only — TLS belongs upstream. Don’t expose the engine container ports to the internet. They’re on 127.0.0.1 by default; keep them there. The product’s traffic to the engine goes via the same internal network as the orchestrator.

8. Register the first product

Once bap-engine is up:

curl -X POST https://bap-engine.your-domain.com/products/register \
  -H "X-Admin-Key: $ORCH_ADMIN_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "slug": "production-app",
    "display_name": "Production application",
    "policy": {
      "max_engines": 100,
      "rate_limit_rpm": 600
    }
  }'

Save the returned platform_api_key — that’s the credential your product uses for every subsequent call.

9. Health checks and probes

Wire the upstream proxy and any orchestrator monitoring to:

GET /health on the orchestrator. Returns {"status":"ok"}.

For deeper checks:

GET /status (with platform key) — fleet snapshot.
Postgres connectivity test (your monitoring’s standard check).

10. Backups

Two volumes matter:

Postgres

The orchestrator’s source of truth. Back up daily; verify monthly. Restore drill: bring up a fresh Postgres, restore the snapshot, point a fresh orchestrator at it, confirm /status returns the expected fleet.

Engine data volumes

Each user’s brain lives on disk under ORCH_DATA_ROOT_PATH. Back up nightly:

Volume snapshots (EBS, GCP PD, k8s VolumeSnapshot) — fastest.
File-level (tar of the brain directory) — fallback.
Per-brain export via the engine’s GET /memory/export — slowest, but portable.

Test brain restore quarterly. A backup you’ve never restored isn’t a backup.

11. Observability

For each of the layers, ship telemetry to your stack:

Orchestrator logs — structured JSON to stdout. Ship via your log pipeline.
Audit log (audit_log table) — periodic export to a long-term store for compliance.
Fleet metrics — scrape GET /metrics or query the audit table directly for provisions/restarts/crashes.
Engine logs — each engine container writes to its own stdout. Ship per container.
Postgres — your provider’s standard metrics.

Alerts worth wiring:

Signal	Threshold	Severity
Orchestrator `/health` non-ok	1 min	Sev1
Postgres unreachable	1 min	Sev1
Engine restart rate > 5/min	5 min	Sev2
Engines `failed` count > 0 sustained	5 min	Sev2
Port allocation usage > 80%	1 hour	Sev3
Audit log table size > N GB	1 day	Sev3

12. Rolling out a new orchestrator version

Standard zero-downtime swap:

Push new image.
Update the deployment manifest with the new image tag.
The new orchestrator picks up state from Postgres on boot — no migration of in-memory state.
Drain the old container with docker stop --time 60 so in-flight requests complete.
Start the new container.
Confirm /health ok and /status returns the expected fleet.

In-flight engines are unaffected — the orchestrator restarting doesn’t restart them.

What goes wrong

Symptom	Likely cause
Orchestrator can’t start engines	Docker socket not mounted, or ORCH_ENGINE_IMAGE not pulled.
Engine `health_failures` climbing	Network issue between orchestrator and engine container, or engine itself unhealthy. Check engine logs.
`PORT_EXHAUSTION` errors	Port range too narrow. Increase `ORCH_PORT_MAX`.
`INVALID_PLATFORM_KEY` for known products	Postgres restored from a backup older than the latest product registrations.
Engine API keys reject after restart	`ORCH_MASTER_KEY` changed; encrypted keys can’t be decrypted.

Architecture

BAP Engine

Engineering

Production deploy

What you need

1. Build and tag the engine image

2. Build and tag the orchestrator image

3. Provision Postgres

4. Generate secrets

5. Configure environment

6. Run the orchestrator

7. TLS termination

8. Register the first product

9. Health checks and probes

10. Backups

Postgres

Engine data volumes

11. Observability

12. Rolling out a new orchestrator version

What goes wrong

See also

Architecture

BAP Engine

Engineering

Documentation Index

​What you need

​1. Build and tag the engine image

​2. Build and tag the orchestrator image

​3. Provision Postgres

​4. Generate secrets

​5. Configure environment

​6. Run the orchestrator

​7. TLS termination

​8. Register the first product

​9. Health checks and probes

​10. Backups

​Postgres

​Engine data volumes

​11. Observability

​12. Rolling out a new orchestrator version

​What goes wrong

​See also

What you need

1. Build and tag the engine image

2. Build and tag the orchestrator image

3. Provision Postgres

4. Generate secrets

5. Configure environment

6. Run the orchestrator

7. TLS termination

8. Register the first product

9. Health checks and probes

10. Backups

Postgres

Engine data volumes

11. Observability

12. Rolling out a new orchestrator version

What goes wrong

See also