In this volume · VOLUME 00
Introduction
Platform Overview System Architecture Glossary Technology Stack

Chapter 0.2 — System Architecture

chapter: 00-introduction/02-system-architecture
version: 1.0.0
status: stable
last_reviewed: 2026-05-26
owners: [platform-engineering]

1. Purpose

This chapter is the architectural reference for the platform. It explains how the moving parts fit together physically and logically, where the trust boundaries are, and how a request flows from a browser or API client all the way to a persisted journal entry.

2. Architectural style

travoBooks is a modular monolith with strictly-bounded modules, deployed as a small fleet of processes. We chose this over microservices for three reasons:

  1. Transactional integrity. The accounting promise — "an operational change and its journal entry are inseparable" — is enforced inside a single ACID database transaction. Distributing that across services creates a class of consistency bugs we refuse to ship.
  2. Domain coupling is high. Booking, ticketing, invoicing, ledger, and commission are all variations of the same event. Splitting them into services creates more inter-service chatter than they would have as in-process calls.
  3. Operational simplicity for the partner. A travel agency in Dhaka cannot operate a 30-service Kubernetes mesh. The deployment surface must remain understandable.

Inside the monolith, modules respect strict import rules (Layer 5 may import Layer 4, but never the reverse) which keep the codebase service-extractable when scale demands it.

3. High-level deployment topology

flowchart TB subgraph Edge[Edge] CDN[CDN / Static Assets] WAF[WAF + Rate Limiting] end subgraph App[Application Tier] LB[Load Balancer] WEB1[Web Worker 1] WEB2[Web Worker 2] WEBN[Web Worker N] API1[API Worker 1] APIN[API Worker N] JOB[Job / Cron Worker] WH[Webhook Dispatcher] end subgraph Data[Data Tier] DB[(Primary DB MySQL/MariaDB)] DBR[(Read Replica)] CACHE[(Redis Cache)] Q[(Job Queue Redis)] SEARCH[(Search Index)] S3[Object Storage] end subgraph External[External Services] GDS[GDS / Suppliers] PSP[Payment Gateways] SES[Email — SES] SMS[SMS Gateway] BANK[Bank Feeds] BSP[IATA BSP] end CDN --> WAF --> LB LB --> WEB1 & WEB2 & WEBN & API1 & APIN WEB1 & WEB2 & WEBN & API1 & APIN --> DB WEB1 & WEB2 & WEBN & API1 & APIN --> CACHE WEB1 & WEB2 & WEBN & API1 & APIN --> Q JOB --> Q JOB --> DB WH --> Q WH --> External API1 & APIN --> GDS & PSP JOB --> BANK & BSP WEB1 & WEB2 & WEBN --> SES & SMS WEB1 & WEB2 & WEBN & API1 & APIN --> S3 DB --> DBR DBR --> WEB1 & WEB2 & WEBN

3.1 Tier responsibilities

Tier Responsibility Scaling
Edge TLS termination, WAF, DDoS protection, static caching CDN provider handles
Load balancer Round-robin + sticky sessions for UI workers Horizontal
Web workers Render UI; thin controllers Horizontal, stateless
API workers JSON API, authenticated, rate-limited Horizontal, stateless
Job worker Cron, async jobs, BSP imports, FX rate fetch, dunning Vertical first, then sharded by job class
Webhook dispatcher Sign + deliver webhooks with retry Horizontal
Primary DB All writes, strongly-consistent reads Vertical + planned partner sharding
Read replica Reports, large reads, search-page queries Add replicas as needed
Redis cache Hot lookups: FX rates, permission cache, idempotency keys Vertical, then cluster
Redis queue Job queue, separate Redis instance from cache Vertical
Search index Customer / supplier / booking text search OpenSearch or MeiliSearch
Object storage Invoice PDFs, ticket PDFs, ID documents, imports S3-compatible

4. Request lifecycle (UI write)

The canonical lifecycle for an authenticated UI write — for example, "agent creates a booking":

sequenceDiagram autonumber participant U as User Browser participant E as Edge / WAF participant L as Load Balancer participant W as Web Worker participant A as Auth Middleware participant C as Controller participant S as Domain Service participant DB as Primary DB participant Q as Job Queue participant AUD as Audit Logger U->>E: POST /bookings (cookie + CSRF) E->>L: Forward (after WAF + rate limit) L->>W: Route to worker W->>A: Validate session, CSRF, permissions A->>C: Inject {actor, partner, perms} C->>S: createBooking(payload) S->>DB: BEGIN S->>DB: INSERT booking S->>DB: INSERT booking_segments S->>DB: INSERT invoice (draft) S->>DB: INSERT journal_entry + lines S->>DB: INSERT audit_log row S->>DB: COMMIT S->>Q: enqueue notify_supplier S->>Q: enqueue send_confirmation_email S-->>C: BookingDTO C-->>W: Render success W-->>U: 200 OK + redirect AUD-->>AUD: (async) ship audit_log to long-term store

Two architectural commitments are visible here:

  1. Steps 7–13 are a single database transaction. The booking, its segments, the draft invoice, the journal entry, and the audit log are written together or not at all. This is the structural enforcement of the "double-entry by default" pillar.
  2. Side effects are deferred (steps 14–15). Sending an email or notifying a supplier is not allowed inside the transaction, because rolling back the transaction cannot rewind an email.

5. Request lifecycle (API write)

API requests use the same pipeline with three differences:

  • Auth is by Bearer token (PAT or OAuth) instead of session cookie.
  • CSRF middleware is bypassed; the token is the proof.
  • Responses are JSON; errors follow the Error Code Catalog.
sequenceDiagram autonumber participant Cli as API Client participant E as Edge participant L as Load Balancer participant API as API Worker participant A as Token Middleware participant RL as Per-Token Rate Limiter participant C as Controller participant S as Domain Service participant DB as DB participant Q as Queue Cli->>E: POST /v1/bookings (Bearer + Idempotency-Key) E->>L: Forward L->>API: Route API->>A: Validate token, scopes, partner A->>RL: Check rate limit (token+route) RL->>C: Pass with {token, partner, scopes} C->>C: Idempotency lookup (Redis) alt Replay C-->>Cli: 200 + cached body else New C->>S: createBooking(payload) S->>DB: BEGIN ... COMMIT S->>Q: enqueue side-effects C->>C: Store idempotency record (24h) C-->>Cli: 201 + BookingDTO end

6. Trust boundaries

flowchart LR subgraph Untrusted BR[Browser] EXT[External integrator] SUPP[Supplier APIs] end subgraph SemiTrusted[Semi-trusted] EDGE[Edge / WAF] end subgraph Trusted[Trusted — application network] APP[Application workers] JOBS[Job workers] end subgraph Restricted[Highly restricted] DB[(Database)] S3 SECRETS[Secrets manager] end BR --> EDGE --> APP EXT --> EDGE --> APP SUPP -.-> APP APP --> DB & S3 & SECRETS JOBS --> DB & S3 & SECRETS

Rules at each boundary:

  • Untrusted → Semi-trusted: TLS, WAF, rate limit, geo-blocking optional per route.
  • Semi-trusted → Trusted: authentication required; CSRF on cookie-auth routes; idempotency required on POST/PUT for API; per-partner quotas.
  • Trusted → Restricted: the database is reachable only from app/job networks; credentials are scoped least-privilege; secrets are short-lived and rotated.
  • Trusted → External (supplier): outbound traffic flows through a network egress proxy with allowlist; supplier credentials live in the secrets manager and are loaded per-partner.

7. Data flow taxonomy

The platform handles four distinct shapes of data movement:

Shape Example Latency profile Failure semantics
Synchronous in-transaction Booking + journal entry write <300 ms All-or-nothing
Asynchronous out-of-band Send confirmation email Seconds–minutes At-least-once with retry
Batch / scheduled BSP file import, FX rate refresh Minutes–hours Idempotent reruns
Streaming / real-time Supplier inventory updates Sub-second Lossy with replay endpoint

Module chapters specify which shape each operation uses.

8. Concurrency control

The platform uses three concurrency techniques in deliberate combination:

Technique Where used Why
Optimistic locking (row_version) Mutable operational rows: bookings, invoices.draft, customers High read/low conflict; minimal contention.
Pessimistic row-level locks (SELECT ... FOR UPDATE) Posting a journal entry, advancing an invoice from draft to issued Strong serialisation for invariant-critical paths.
Distributed lock (Redis) Cron jobs, BSP import, FX refresh Prevents duplicate execution across job workers.

9. Idempotency

Every state-changing API endpoint requires an Idempotency-Key header. The server records (partner_id, route, key) → response_body for 24 hours. A replay returns the cached response without re-executing.

For internal jobs, idempotency is achieved by deterministic external keys (e.g. a BSP file is keyed by BSP-{period}-{partner_id}; reprocessing the same file is a no-op).

10. Caching strategy

Cache TTL Invalidation
FX rates 1 hour (intraday); historical rates cached forever TTL + explicit refresh job
Permission set per user 5 minutes Invalidated on role change
Chart-of-accounts tree 15 minutes Invalidated on CoA edit
Tax profile per partner 15 minutes Invalidated on profile edit
Customer / supplier list (paginated page 1) 30 seconds TTL only
API rate limit counters Rolling window TTL only

We deliberately do not cache anything in the ledger path. Ledger reads are direct from primary DB.

11. Background jobs and schedules

The platform runs the following scheduled jobs. Full listing and SLAs in 08-system-features/03-automation.md.

Job Frequency Purpose
fx_rates.refresh Hourly Pull rates from configured provider
bsp.import Daily 04:00 partner-local Pull BSP settlement file, post supplier payables
invoices.send_due_reminders Daily 09:00 partner-local Dunning emails
subscriptions.renew Daily 02:00 UTC Subscription renewal billing
gl.close_period_preview Daily 23:30 partner-local Pre-compute period close artefacts
audit.archive Daily 01:00 UTC Ship audit logs older than 90 days to cold storage
webhooks.retry Every 5 minutes Retry failed webhook deliveries with exponential backoff
notifications.flush Every 1 minute Send queued notifications

12. Observability

Three signal classes:

  • Metrics (Prometheus-style): request rate, error rate, p50/p95/p99 latency per route; job durations; queue depth; DB connection pool usage; FX cache hit ratio.
  • Logs: structured JSON, one event per significant action, correlated by request_id and actor_id. Logs are not the audit log — audit lives in the database.
  • Traces: optional OpenTelemetry traces for slow-path debugging.

A redaction filter strips PII (passport numbers, card PANs, full names of passengers) from logs before shipping.

13. Disaster recovery

Metric Target
RPO (data loss tolerated) ≤ 5 minutes
RTO (time to restore) ≤ 60 minutes
Backup frequency Continuous binlog + hourly logical dump
Backup retention 35 days hot, 1 year cold
Geo-redundancy Primary + warm standby in second region
Quarterly restore drill Mandatory; documented in 12-compliance/03-audit-readiness.md

14. Security controls (cross-reference)

This section summarises; full detail in Volume XII.

  • At rest: AES-256 for DB and object storage; column-level encryption for PAN-like fields.
  • In transit: TLS 1.3 only; HSTS; certificate pinning for supplier connectors where supported.
  • Auth: session cookies (httpOnly, SameSite=Strict, Secure) for UI; Bearer tokens for API; optional MFA per partner policy.
  • Secrets: managed vault; no secrets in environment variables in production.
  • Dependencies: locked, scanned weekly; CVE budget enforced in CI.
  • AppSec: parameterised queries everywhere; output encoding by default; CSRF tokens on all cookie-auth writes; strict CSP.

15. Scaling beyond Phase 1

The architecture is deliberately conservative. Known evolution paths:

Pressure Response
Single DB write hot-spot Partner-sharded DB cluster; shard key = partner_id
Long-tail GDS latencies Pull supplier search into a separate "shopping" service
Report generation contention Materialised views + warehouse offload (Snowflake/BigQuery)
Webhook fan-out Dedicated dispatch service with per-partner rate isolation
Real-time AI agent traffic gRPC bidirectional surface backed by the same domain services

Next: 03-glossary.md — terms and abbreviations used throughout the documentation.