Skip to content

Monitoring

This guide covers monitoring and observability for the TradeX platform.

Metrics

Prometheus

All services expose Prometheus metrics at /metrics:

Service health metrics
Request latency
Error rates
Business metrics

Key Metrics

Request Rate: Requests per second
Latency: P50, P95, P99 latencies
Error Rate: Error percentage
Kafka Lag: Consumer lag per topic

Logging

Structured Logging

All services use structured logging with:

Service name
Timestamp
Log level
Context information

Log Aggregation

Logs are aggregated in centralized logging system for:

Search and analysis
Alerting
Debugging

Tracing

OpenTelemetry

Distributed tracing across services:

Request correlation
Span tracking
Performance analysis

Dashboards

Grafana

Pre-configured dashboards for:

Service health
Infrastructure metrics
Business metrics

Alerts

Alert Rules

Configure alerts for:

High error rates
High latency
Service downtime
Resource exhaustion