Skip to content

Infrastructure

This document describes the infrastructure components and architecture of the TradeX platform.

Purpose: Primary database for document storage

Used By:

  • Backend Service (user data, preferences)
  • User Service (profiles, preferences)
  • Matching Engine (trade history, order history)

Characteristics:

  • Document-based storage
  • Flexible schema
  • Horizontal scaling
  • Replication support

Purpose: Relational data storage

Used By:

  • Auth Service (user authentication, sessions)
  • User Service (KYC data, account settings)
  • Wallet Service (balances, transactions)
  • Metadata Service (instruments, configuration)
  • Market Data Service (trades, candles, order books)

Characteristics:

  • ACID compliance
  • Strong consistency
  • Complex queries
  • Transactions

Purpose: Time-series data storage

Used By:

  • Market Data Service (OHLCV candles, trades)

Characteristics:

  • Optimized for time-series data
  • Automatic data retention policies
  • Compression
  • Continuous aggregates

Purpose: Analytics and historical data

Used By:

  • Market Data Service (historical trades, analytics)

Characteristics:

  • Columnar storage
  • High compression
  • Fast analytical queries
  • Horizontal scaling

Purpose: Caching and session management

Used By:

  • All services (caching)
  • Market Data Service (order book caching)
  • Metadata Service (configuration caching)
  • Auth Service (session storage)

Characteristics:

  • In-memory storage
  • Sub-millisecond latency
  • Pub/sub support
  • Persistence options

Purpose: Event streaming and messaging

Configuration:

  • Brokers: 3-node cluster (kafka-1, kafka-2, kafka-3)
  • Replication: 3x replication factor
  • Partitions: 3 partitions per topic (default)
  • Schema Registry: Confluent Schema Registry

Topics:

  • engine.event.v1 - Matching engine events
  • engine.snapshot.v1 - Order book snapshots
  • md.instrument.* - Metadata events
  • wallet.* - Wallet events
  • auth.* - Authentication events
  • user.* - User events

Characteristics:

  • High throughput
  • Event ordering per partition
  • At-least-once delivery
  • Schema evolution support

Each service is:

  • Independent: Can be deployed independently
  • Scalable: Horizontal scaling support
  • Resilient: Fault-tolerant design
  • Observable: Metrics, logging, tracing
  • REST: External APIs
  • gRPC: Internal APIs
  • Kafka: Event-driven communication
  • WebSocket: Real-time client updates

Purpose: Metrics collection and storage

Metrics:

  • Service metrics (latency, throughput, errors)
  • Infrastructure metrics (CPU, memory, disk)
  • Business metrics (orders, trades, users)

Purpose: Metrics visualization and dashboards

Dashboards:

  • Service health dashboards
  • Infrastructure dashboards
  • Business metrics dashboards

Purpose: Distributed tracing

Features:

  • Request tracing across services
  • Span correlation
  • Performance analysis

Purpose: Structured logging

Features:

  • Centralized logging
  • Log aggregation
  • Search and analysis
  • Docker: Container runtime
  • Docker Compose: Local development
  • Kubernetes: Production deployment (optional)
  • DNS-based: Service discovery via DNS
  • Environment variables: Service URLs
  • Service mesh: Optional (Istio, Linkerd)
  • Gateway: API gateway for external traffic
  • Service mesh: Internal load balancing
  • Kafka: Partition-based load balancing
  • JWT Tokens: Stateless authentication
  • API Keys: Service-to-service authentication
  • mTLS: Mutual TLS for gRPC (optional)
  • RBAC: Role-based access control
  • Service-level permissions: Per-service permissions
  • Resource-level permissions: Per-resource permissions
  • TLS: Encryption in transit
  • Network policies: Service-to-service restrictions
  • Firewall rules: External access restrictions
  • Stateless services: Easy horizontal scaling
  • Database sharding: For high-volume data
  • Kafka partitioning: Parallel processing
  • Resource limits: Per-service resource limits
  • Auto-scaling: Based on metrics
  • Resource optimization: Efficient resource usage
  • Database replication: Master-replica setup
  • Kafka replication: 3x replication factor
  • Service replication: Multiple service instances
  • Automatic failover: Database failover
  • Service restart: Automatic service restart
  • Circuit breakers: Fault tolerance
  • Regular backups: Automated backups
  • Point-in-time recovery: Time-based recovery
  • Backup storage: Off-site backup storage
  • Recovery procedures: Documented procedures
  • Recovery testing: Regular testing
  • RTO/RPO: Recovery time and point objectives