Observability
Request correlation, error envelope, health endpoints, metrics, and how to triage a production incident.
Hatched bakes three primitives into every request so that a log line, an error returned to the client, and a metric emitted to Prometheus can always be correlated.
Request correlation
apps/api/src/common/interceptors/request-id.interceptor.tseither honors an incomingX-Request-Idheader or generates a UUID v4.- The id is:
- stored on
request.requestIdfor downstream handlers, - echoed back on the response as the
X-Request-Idheader, - included in every log line produced by
LoggingInterceptor, - surfaced to the client as
error.requestIdinside the canonical error envelope whenever an exception reachesGlobalExceptionFilter.
- stored on
- SDK clients (
@hatched/sdk-js) expose it asHatchedError.requestIdso downstream consumers can paste it directly into a support ticket or log search.
Error envelope
Every HTTP error — HatchedException, HttpException, or an unexpected
exception — is serialized by GlobalExceptionFilter into:
{
"error": {
"code": "stable_snake_case_code",
"message": "Human-readable message",
"details": { "_": "optional structured context" },
"requestId": "uuid-matching-X-Request-Id-header"
}
}See apps/api/src/common/exceptions/hatched.exception.ts for the typed
exception hierarchy. Prefer throwing a specific subclass
(ResourceNotFoundException, AuthException, RateLimitException,
ValidationException, UpstreamImageException,
ConfigVersionMismatchException) over HttpException so the envelope carries
a stable code.
Health endpoints
| Endpoint | Status codes | Consumer |
|---|---|---|
GET /health | 200 | Human-readable status dashboard |
GET /health/ready | 200 when all deps up, 503 otherwise | Load balancer / Fly.io readiness probe |
GET /health/live | 200 as long as the process is alive | Load balancer liveness probe |
/health/ready checks Postgres, Redis, BullMQ wait/active depths, and the
primary image provider. A 503 response removes the instance from rotation.
Metrics
GET /metrics emits Prometheus text-format counters/gauges. It is protected
by X-Internal-Service-Token matching the INTERNAL_SERVICE_TOKEN env var —
requests without the token receive 401 (or 403 when the token is not
configured). Never expose this endpoint publicly; scrape it from a trusted
network or a Prometheus instance that can attach the header.
Logs
All HTTP access logs are emitted as single-line JSON by LoggingInterceptor
with shape:
{
"requestId": "...",
"method": "POST",
"path": "/api/v1/events",
"statusCode": 200,
"duration": 42,
"ip": "...",
"userAgent": "..."
}Error logs additionally carry an error field with the exception message.
Diagnosing a production incident
- Grab the
X-Request-Idthe client received (or therequestIdinside the error envelope /HatchedError). - Grep logs for that id — you will find the access log, any error stack, and any downstream service calls that forwarded the id.
- Cross-reference with
/metricsvia Prometheus to see whether the request was part of a broader spike (checkhatched_http_requests_totaland queue depth gauges).