Skip to main content

Documentation Index

Fetch the complete documentation index at: https://internal.september.wtf/llms.txt

Use this file to discover all available pages before exploring further.

The local quickstart gets you running on a laptop. Production adds TLS, real secret management, durable backups, and observability. This page covers the standard production setup.

What you need

  • A Linux host with Docker Engine (or Kubernetes — patterns the same).
  • PostgreSQL 16+ (managed or self-hosted).
  • An HTTPS-terminating reverse proxy upstream of the orchestrator.
  • Outbound HTTPS to your LLM providers from each engine container.
  • A secret manager (Vault, AWS Secrets Manager, GCP Secret Manager, k8s Secrets).

1. Build and tag the engine image

The orchestrator launches engines from ORCH_ENGINE_IMAGE. Build once, tag deliberately:
cd engine
docker build -t registry.your-domain.com/september-engine:2.3.0 --target prod .
docker push registry.your-domain.com/september-engine:2.3.0
Pin to a specific version (2.3.0), not latest, so canary upgrades work. See Engine deploy for upgrade flow.

2. Build and tag the orchestrator image

cd bap-engine
docker build -t registry.your-domain.com/bap-engine:0.2.0 --target prod \
  --secret id=github_token,src=$HOME/.github_token \
  .
docker push registry.your-domain.com/bap-engine:0.2.0
The github_token build secret is needed once, at pip-install time, to pull september-engine from the private repo.

3. Provision Postgres

bap-engine reads/writes one database. Don’t share with anything else.
  • Database name: orchestrator.
  • User: orch_user with full DDL on its own database.
  • Storage: 50 GB+ for moderate fleets. The bulk is audit_log.
  • Backups: daily snapshot, 7-day retention. Hourly point-in-time recovery if your provider supports it.
Apply the migration once. The orchestrator does this on first boot, but if you’re managing Postgres outside the orchestrator’s reach, run manually:
psql "$ORCH_DATABASE_URL" \
  < orchestrator/migrations/001_registry.sql

4. Generate secrets

Two critical orchestrator secrets:
# Master key for Fernet-encrypting engine API keys at rest.
# LOSE THIS = LOSE EVERY ENGINE'S KEY.
python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"

# Admin key for /products/register and /products/{id}/policy.
python -c "import secrets; print(secrets.token_urlsafe(48))"
Store both in your secret manager. They’re injected at runtime as ORCH_MASTER_KEY and ORCH_ADMIN_KEY. Don’t rotate ORCH_MASTER_KEY without a re-encrypt step. Rotating it without re-encrypting every engine’s engine_key_enc invalidates every stored engine key.

5. Configure environment

Minimum production environment:
# Database
ORCH_DATABASE_URL=postgresql://orch_user:<secret>@<host>:5432/orchestrator

# Secrets (from secret manager)
ORCH_MASTER_KEY=<from secret manager>
ORCH_ADMIN_KEY=<from secret manager>

# Engine image (pinned version)
ORCH_ENGINE_IMAGE=registry.your-domain.com/september-engine:2.3.0
ORCH_ENGINE_BACKEND=docker
ORCH_ENGINE_NETWORK=engine_net
ORCH_ENGINE_ENV_PASSTHROUGH=LLM_API_KEY,OPENAI_API_KEY,ANTHROPIC_API_KEY,GEMINI_API_KEY

# Per-engine LLM provider keys (forwarded to containers)
LLM_API_KEY=<from secret manager>
OPENAI_API_KEY=<from secret manager>
ANTHROPIC_API_KEY=<from secret manager>

# Listen
ORCH_HOST=0.0.0.0
ORCH_PORT=8000

# Health and recovery
ORCH_HEALTH_CHECK_INTERVAL_S=30
ORCH_HEALTH_CHECK_TIMEOUT_S=10
ORCH_HEALTH_MAX_FAILURES=3
ORCH_RESTART_BACKOFF_BASE_S=5
ORCH_RESTART_BACKOFF_MAX_S=300
ORCH_RESTART_MAX_ATTEMPTS=8

# Idle behavior
ORCH_IDLE_SLEEP_THRESHOLD_S=3600

# Port range for engine containers
ORCH_PORT_MIN=9001
ORCH_PORT_MAX=9999

# Volume root for engine brains
ORCH_DATA_ROOT_PATH=/data/engine-data

# Catalog mount (read-only, shared)
ORCH_CATALOG_MOUNT_PATH=/data/catalog
For the full list see Environment variables.

6. Run the orchestrator

docker run -d \
  --name bap-engine \
  --restart unless-stopped \
  -p 8000:8000 \
  --env-file /etc/bap-engine/env \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -v /data/catalog:/data/catalog:ro \
  -v /data/engine-data:/data/engine-data \
  --network engine_net \
  registry.your-domain.com/bap-engine:0.2.0
The orchestrator needs:
  • The Docker socket — to create/start/stop engine containers.
  • The catalog directory mounted at /data/catalog, read-only — passed through to engine containers.
  • The data root mounted at /data/engine-data — engine brain volumes live here.
  • Membership in engine_net — to reach engine /health endpoints.

7. TLS termination

Run an HTTPS-terminating reverse proxy upstream:
upstream bap_engine {
    server 127.0.0.1:8000;
}

server {
    listen 443 ssl http2;
    server_name bap-engine.your-domain.com;

    ssl_certificate ...;
    ssl_certificate_key ...;

    location / {
        proxy_pass http://bap_engine;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_read_timeout 60s;
    }
}
The orchestrator itself is HTTP only — TLS belongs upstream. Don’t expose the engine container ports to the internet. They’re on 127.0.0.1 by default; keep them there. The product’s traffic to the engine goes via the same internal network as the orchestrator.

8. Register the first product

Once bap-engine is up:
curl -X POST https://bap-engine.your-domain.com/products/register \
  -H "X-Admin-Key: $ORCH_ADMIN_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "slug": "production-app",
    "display_name": "Production application",
    "policy": {
      "max_engines": 100,
      "rate_limit_rpm": 600
    }
  }'
Save the returned platform_api_key — that’s the credential your product uses for every subsequent call.

9. Health checks and probes

Wire the upstream proxy and any orchestrator monitoring to:
  • GET /health on the orchestrator. Returns {"status":"ok"}.
For deeper checks:
  • GET /status (with platform key) — fleet snapshot.
  • Postgres connectivity test (your monitoring’s standard check).

10. Backups

Two volumes matter:

Postgres

The orchestrator’s source of truth. Back up daily; verify monthly. Restore drill: bring up a fresh Postgres, restore the snapshot, point a fresh orchestrator at it, confirm /status returns the expected fleet.

Engine data volumes

Each user’s brain lives on disk under ORCH_DATA_ROOT_PATH. Back up nightly:
  • Volume snapshots (EBS, GCP PD, k8s VolumeSnapshot) — fastest.
  • File-level (tar of the brain directory) — fallback.
  • Per-brain export via the engine’s GET /memory/export — slowest, but portable.
Test brain restore quarterly. A backup you’ve never restored isn’t a backup.

11. Observability

For each of the layers, ship telemetry to your stack:
  • Orchestrator logs — structured JSON to stdout. Ship via your log pipeline.
  • Audit log (audit_log table) — periodic export to a long-term store for compliance.
  • Fleet metrics — scrape GET /metrics or query the audit table directly for provisions/restarts/crashes.
  • Engine logs — each engine container writes to its own stdout. Ship per container.
  • Postgres — your provider’s standard metrics.
Alerts worth wiring:
SignalThresholdSeverity
Orchestrator /health non-ok1 minSev1
Postgres unreachable1 minSev1
Engine restart rate > 5/min5 minSev2
Engines failed count > 0 sustained5 minSev2
Port allocation usage > 80%1 hourSev3
Audit log table size > N GB1 daySev3

12. Rolling out a new orchestrator version

Standard zero-downtime swap:
  1. Push new image.
  2. Update the deployment manifest with the new image tag.
  3. The new orchestrator picks up state from Postgres on boot — no migration of in-memory state.
  4. Drain the old container with docker stop --time 60 so in-flight requests complete.
  5. Start the new container.
  6. Confirm /health ok and /status returns the expected fleet.
In-flight engines are unaffected — the orchestrator restarting doesn’t restart them.

What goes wrong

SymptomLikely cause
Orchestrator can’t start enginesDocker socket not mounted, or ORCH_ENGINE_IMAGE not pulled.
Engine health_failures climbingNetwork issue between orchestrator and engine container, or engine itself unhealthy. Check engine logs.
PORT_EXHAUSTION errorsPort range too narrow. Increase ORCH_PORT_MAX.
INVALID_PLATFORM_KEY for known productsPostgres restored from a backup older than the latest product registrations.
Engine API keys reject after restartORCH_MASTER_KEY changed; encrypted keys can’t be decrypted.

See also