Skip to main content

docs/gitops.md

Imported Content

GitOps and Promotion

Repo split

  • App repo (anchor-msp) builds and signs images, then opens a dev bump PR in ops repo.
  • Ops repo (anchor-msp-ops) is the GitOps source for Argo CD.

Environment overlays (ops repo)

  • infra/helm/platform/environments/dev/values.yaml
  • infra/helm/platform/environments/staging/values.yaml
  • infra/helm/platform/environments/prod/values.yaml

Notable defaults:

  • dev can run with AUTH_MODE=disabled for rapid iteration.
  • staging/prod require JWT auth and production signing/encryption secrets.
  • status-mock is disabled for staging/prod overlays.
  • Worker deployments (anchor-outbox-worker, anchor-job-worker) run from the gateway image.
  • Staging/prod Argo applications must run with automated.prune=true and automated.selfHeal=true to remove stale resources.

Promotion model

  1. Merge to app repo main builds and publishes images.
  2. App workflow opens ops PR updating environments/dev/release.yaml.
  3. Ops sync workflow validates release contract alignment with Helm overlays.
  4. Argo CD reconciles ops repo manifests and deploys dev.
  5. Ops promotion workflow creates PRs for dev -> staging -> prod.

Required GitHub settings

App repo (anchor-msp) variables/secrets:

  • ENGINE_ID (variable)
  • OPS_REPO (variable, <org>/anchor-msp-ops)
  • STATUS_API_URL (secret)
  • STATUS_API_TOKEN (secret)
  • OPS_REPO_PAT (secret; repo write for ops repo PR creation)

Ops repo (anchor-msp-ops) variables/secrets:

  • ENGINE_ID (variable)
  • STATUS_API_URL (secret)
  • STATUS_API_TOKEN (secret)

Runtime profile (4GB cost target)

  • prod always-on, minimum stateless replicas = 1.
  • dev and staging are on-demand in low-cost environments.
  • Stateful dependencies should use managed services (PostgreSQL/Redis/NATS-compatible).
  • Rollouts use constrained strategy (maxSurge: 0, maxUnavailable: 1) to protect single-node capacity.

Rollback

  • Revert environment values PR to prior image tag.
  • Argo CD self-heal enforces previous known-good state.
  • Emit rollback.executed status event from promotion pipeline.

Production No-Mock Cutover

  1. Ensure statusMock.enabled=false in environments/staging/values.yaml and environments/prod/values.yaml.
  2. Ensure staging/prod Argo applications are configured with:
    • syncPolicy.automated.prune=true
    • syncPolicy.automated.selfHeal=true
  3. Set production control-center endpoint in gateway env:
    • EGI_CONTROL_CENTER_URL must be https://... and must not include status-mock.
  4. Set production AgentField endpoint in gateway env:
    • AGENTFIELD_URL must be https://... and must not include agentfield-mock.
  5. Run a hard refresh + prune sync on staging and prod Argo applications to remove stale mock resources.

Required secret keys (staging/prod)

  • DATABASE_URL
  • JWT_SIGNING_SECRET (required when AUTH_JWKS_URL is not used)
  • AUTH_JWKS_URL (optional, recommended for IdP-managed JWT verification)
  • AUDIT_SIGNING_SECRET
  • AUDIT_SIGNING_KEY_ID
  • RESOURCE_ENCRYPTION_KEY_B64
  • AGENTFIELD_API_KEY
  • AGENTFIELD_WEBHOOK_SECRET
  • ANCHOR_OPERATOR_JWT
  • Optional web guardrail overrides:
    • ANCHOR_REQUIRE_OPERATOR_JWT (defaults to required in production)
    • ANCHOR_ALLOW_WILDCARD_WORKSPACE_IDS (defaults to false in production)
  • Optional integration secrets: GITHUB_WEBHOOK_SECRET, QUICKBOOKS_API_TOKEN, XERO_API_TOKEN