docs/telemetry.md
- Source: docs/telemetry.md
Imported Content
Telemetry and Endpoint Automation
Event schema types
HostHeartbeatPatchComplianceBackupFailureServiceDegradedPipelineFailure
Ingest path
- Edge agent buffers telemetry locally.
- Agent flushes payload to
POST /api/v1/events/ingestover HTTPS with:Authorization: Bearer <edge enrollment token>x-workspace-id: <workspace uuid>Idempotency-Key: <uuid>
- Gateway validates JWT role/scope, idempotency key, and event schema contract.
- Routed actions are persisted to
outbox_events. - Outbox dispatcher forwards actions to AgentField reasoners.
- AgentField completion callbacks persist workflow and audit evidence records.
Note:
- mTLS is supported at ingress/network layer when enabled by platform infra policy.
- Gateway route auth contract is currently bearer JWT + workspace header.
Enrollment and visibility APIs
GET /api/v1/edge-agent/enrollment: effective enrollment mode/policy metadata and ingest URL.POST /api/v1/edge-agent/enrollment/rotate: rotates workspace enrollment token with idempotency and returns token/expiry metadata.GET /api/v1/edge-agent/ingest-status: 24h ingest batch/event totals, schema rejection count, and outbox pending/failed counts.
Admin/Ops UI expectations
- Admin Settings contains enrollment rotation UX and token display window for secure handoff.
- Ops exposes ingest and routing health as first-class queue/latency context.
- Alert policy controls (mode, approval, retry bounds) are managed in structured settings forms, not raw JSON by default.
Operational closure flow:
alert.triggeredactions can be linked to tickets viaPOST /api/v1/ops/alerts/{id}/link-ticket.- Tickets can be linked to runbooks/assets and escalated/resolved from Ops routes.
- Workflow retries/cancellations are exposed through workflow control interfaces.
Reliability controls
- Offline-safe replay from local buffer.
- Idempotent mutation semantics.
- Correlation IDs across ingest and workflow pipelines.
- Outbox retry/backoff with terminal
failedstatus after max attempts. - Operator retry controls via
/api/v1/agent-runtime/outbox-failures/{id}/retry.
How metrics are measured
Telemetry and runtime metrics in Admin/Ops are measured from persisted database state:
events24h/batches24h: aggregated fromtelemetry_ingest_batches.pendingOutbox/failedOutbox: aggregated fromoutbox_eventsby status.- Worker runtime health: heartbeat recency and status in
worker_runtime_status. - Ops failure buckets: failed rows from
outbox_events,background_jobs, andworkflow_executions. - Console/Portal KPIs (tickets, alerts, invoices): computed from PSA resource records.
How a system hooks into Anchor
- In
Admin Settings, rotate edge enrollment token for the target workspace. - Install/configure endpoint agent with:
- API base URL
- workspace ID
- rotated enrollment token
- Agent pulls
GET /api/v1/edge-agent/policyand starts collection loops. - Agent posts batches to
POST /api/v1/events/ingestwith idempotency keys. - Verify ingest and routing in UI:
Admin Settings-> Telemetry Enrollment statsOps-> Failed Outbox/Jobs and runtime health
Control-center telemetry integration
- Gateway telemetry client uses
@egintegrations/telemetry. - Required env vars:
ENGINE_IDENGINE_SKUEGI_CONTROL_CENTER_URLEGI_TELEMETRY_TOKEN(optional if control center allows anonymous writes)EGI_TELEMETRY_ENABLED
- Runtime health compatibility endpoint:
GET /.well-known/engine-status
Production note:
- Staging/production enforce explicit control-center URLs and reject
status-mockendpoints. - Staging/production require
EGI_CONTROL_CENTER_URLto usehttps://. - Run Argo prune sync after disabling
statusMockso stale mock resources are removed.
Local/dev note:
apps/status-mockremains available for local telemetry/status contract tests.