Chapter 0.2 — System Architecture

chapter: 00-introduction/02-system-architecture
version: 1.0.0
status: stable
last_reviewed: 2026-05-26
owners: [platform-engineering]

1. Purpose

This chapter is the architectural reference for the platform. It explains how the moving parts fit together physically and logically, where the trust boundaries are, and how a request flows from a browser or API client all the way to a persisted journal entry.

2. Architectural style

travoBooks is a modular monolith with strictly-bounded modules, deployed as a small fleet of processes. We chose this over microservices for three reasons:

Transactional integrity. The accounting promise — "an operational change and its journal entry are inseparable" — is enforced inside a single ACID database transaction. Distributing that across services creates a class of consistency bugs we refuse to ship.
Domain coupling is high. Booking, ticketing, invoicing, ledger, and commission are all variations of the same event. Splitting them into services creates more inter-service chatter than they would have as in-process calls.
Operational simplicity for the partner. A travel agency in Dhaka cannot operate a 30-service Kubernetes mesh. The deployment surface must remain understandable.

Inside the monolith, modules respect strict import rules (Layer 5 may import Layer 4, but never the reverse) which keep the codebase service-extractable when scale demands it.

3. High-level deployment topology

flowchart TB subgraph Edge[Edge] CDN[CDN / Static Assets] WAF[WAF + Rate Limiting] end subgraph App[Application Tier] LB[Load Balancer] WEB1[Web Worker 1] WEB2[Web Worker 2] WEBN[Web Worker N] API1[API Worker 1] APIN[API Worker N] JOB[Job / Cron Worker] WH[Webhook Dispatcher] end subgraph Data[Data Tier] DB[(Primary DB MySQL/MariaDB)] DBR[(Read Replica)] CACHE[(Redis Cache)] Q[(Job Queue Redis)] SEARCH[(Search Index)] S3[Object Storage] end subgraph External[External Services] GDS[GDS / Suppliers] PSP[Payment Gateways] SES[Email — SES] SMS[SMS Gateway] BANK[Bank Feeds] BSP[IATA BSP] end CDN --> WAF --> LB LB --> WEB1 & WEB2 & WEBN & API1 & APIN WEB1 & WEB2 & WEBN & API1 & APIN --> DB WEB1 & WEB2 & WEBN & API1 & APIN --> CACHE WEB1 & WEB2 & WEBN & API1 & APIN --> Q JOB --> Q JOB --> DB WH --> Q WH --> External API1 & APIN --> GDS & PSP JOB --> BANK & BSP WEB1 & WEB2 & WEBN --> SES & SMS WEB1 & WEB2 & WEBN & API1 & APIN --> S3 DB --> DBR DBR --> WEB1 & WEB2 & WEBN

3.1 Tier responsibilities

Tier	Responsibility	Scaling
Edge	TLS termination, WAF, DDoS protection, static caching	CDN provider handles
Load balancer	Round-robin + sticky sessions for UI workers	Horizontal
Web workers	Render UI; thin controllers	Horizontal, stateless
API workers	JSON API, authenticated, rate-limited	Horizontal, stateless
Job worker	Cron, async jobs, BSP imports, FX rate fetch, dunning	Vertical first, then sharded by job class
Webhook dispatcher	Sign + deliver webhooks with retry	Horizontal
Primary DB	All writes, strongly-consistent reads	Vertical + planned partner sharding
Read replica	Reports, large reads, search-page queries	Add replicas as needed
Redis cache	Hot lookups: FX rates, permission cache, idempotency keys	Vertical, then cluster
Redis queue	Job queue, separate Redis instance from cache	Vertical
Search index	Customer / supplier / booking text search	OpenSearch or MeiliSearch
Object storage	Invoice PDFs, ticket PDFs, ID documents, imports	S3-compatible

4. Request lifecycle (UI write)

The canonical lifecycle for an authenticated UI write — for example, "agent creates a booking":

sequenceDiagram autonumber participant U as User Browser participant E as Edge / WAF participant L as Load Balancer participant W as Web Worker participant A as Auth Middleware participant C as Controller participant S as Domain Service participant DB as Primary DB participant Q as Job Queue participant AUD as Audit Logger U->>E: POST /bookings (cookie + CSRF) E->>L: Forward (after WAF + rate limit) L->>W: Route to worker W->>A: Validate session, CSRF, permissions A->>C: Inject {actor, partner, perms} C->>S: createBooking(payload) S->>DB: BEGIN S->>DB: INSERT booking S->>DB: INSERT booking_segments S->>DB: INSERT invoice (draft) S->>DB: INSERT journal_entry + lines S->>DB: INSERT audit_log row S->>DB: COMMIT S->>Q: enqueue notify_supplier S->>Q: enqueue send_confirmation_email S-->>C: BookingDTO C-->>W: Render success W-->>U: 200 OK + redirect AUD-->>AUD: (async) ship audit_log to long-term store

Two architectural commitments are visible here:

Steps 7–13 are a single database transaction. The booking, its segments, the draft invoice, the journal entry, and the audit log are written together or not at all. This is the structural enforcement of the "double-entry by default" pillar.
Side effects are deferred (steps 14–15). Sending an email or notifying a supplier is not allowed inside the transaction, because rolling back the transaction cannot rewind an email.

5. Request lifecycle (API write)

API requests use the same pipeline with three differences:

Auth is by Bearer token (PAT or OAuth) instead of session cookie.
CSRF middleware is bypassed; the token is the proof.
Responses are JSON; errors follow the Error Code Catalog.

sequenceDiagram autonumber participant Cli as API Client participant E as Edge participant L as Load Balancer participant API as API Worker participant A as Token Middleware participant RL as Per-Token Rate Limiter participant C as Controller participant S as Domain Service participant DB as DB participant Q as Queue Cli->>E: POST /v1/bookings (Bearer + Idempotency-Key) E->>L: Forward L->>API: Route API->>A: Validate token, scopes, partner A->>RL: Check rate limit (token+route) RL->>C: Pass with {token, partner, scopes} C->>C: Idempotency lookup (Redis) alt Replay C-->>Cli: 200 + cached body else New C->>S: createBooking(payload) S->>DB: BEGIN ... COMMIT S->>Q: enqueue side-effects C->>C: Store idempotency record (24h) C-->>Cli: 201 + BookingDTO end

6. Trust boundaries

flowchart LR subgraph Untrusted BR[Browser] EXT[External integrator] SUPP[Supplier APIs] end subgraph SemiTrusted[Semi-trusted] EDGE[Edge / WAF] end subgraph Trusted[Trusted — application network] APP[Application workers] JOBS[Job workers] end subgraph Restricted[Highly restricted] DB[(Database)] S3 SECRETS[Secrets manager] end BR --> EDGE --> APP EXT --> EDGE --> APP SUPP -.-> APP APP --> DB & S3 & SECRETS JOBS --> DB & S3 & SECRETS

Rules at each boundary:

Untrusted → Semi-trusted: TLS, WAF, rate limit, geo-blocking optional per route.
Semi-trusted → Trusted: authentication required; CSRF on cookie-auth routes; idempotency required on POST/PUT for API; per-partner quotas.
Trusted → Restricted: the database is reachable only from app/job networks; credentials are scoped least-privilege; secrets are short-lived and rotated.
Trusted → External (supplier): outbound traffic flows through a network egress proxy with allowlist; supplier credentials live in the secrets manager and are loaded per-partner.

7. Data flow taxonomy

The platform handles four distinct shapes of data movement:

Shape	Example	Latency profile	Failure semantics
Synchronous in-transaction	Booking + journal entry write	<300 ms	All-or-nothing
Asynchronous out-of-band	Send confirmation email	Seconds–minutes	At-least-once with retry
Batch / scheduled	BSP file import, FX rate refresh	Minutes–hours	Idempotent reruns
Streaming / real-time	Supplier inventory updates	Sub-second	Lossy with replay endpoint

Module chapters specify which shape each operation uses.

8. Concurrency control

The platform uses three concurrency techniques in deliberate combination:

Technique	Where used	Why
Optimistic locking (`row_version`)	Mutable operational rows: `bookings`, `invoices.draft`, `customers`	High read/low conflict; minimal contention.
Pessimistic row-level locks (`SELECT ... FOR UPDATE`)	Posting a journal entry, advancing an invoice from draft to issued	Strong serialisation for invariant-critical paths.
Distributed lock (Redis)	Cron jobs, BSP import, FX refresh	Prevents duplicate execution across job workers.

9. Idempotency

Every state-changing API endpoint requires an Idempotency-Key header. The server records (partner_id, route, key) → response_body for 24 hours. A replay returns the cached response without re-executing.

For internal jobs, idempotency is achieved by deterministic external keys (e.g. a BSP file is keyed by BSP-{period}-{partner_id}; reprocessing the same file is a no-op).

10. Caching strategy

Cache	TTL	Invalidation
FX rates	1 hour (intraday); historical rates cached forever	TTL + explicit refresh job
Permission set per user	5 minutes	Invalidated on role change
Chart-of-accounts tree	15 minutes	Invalidated on CoA edit
Tax profile per partner	15 minutes	Invalidated on profile edit
Customer / supplier list (paginated page 1)	30 seconds	TTL only
API rate limit counters	Rolling window	TTL only

We deliberately do not cache anything in the ledger path. Ledger reads are direct from primary DB.

11. Background jobs and schedules

The platform runs the following scheduled jobs. Full listing and SLAs in 08-system-features/03-automation.md.

Job	Frequency	Purpose
`fx_rates.refresh`	Hourly	Pull rates from configured provider
`bsp.import`	Daily 04:00 partner-local	Pull BSP settlement file, post supplier payables
`invoices.send_due_reminders`	Daily 09:00 partner-local	Dunning emails
`subscriptions.renew`	Daily 02:00 UTC	Subscription renewal billing
`gl.close_period_preview`	Daily 23:30 partner-local	Pre-compute period close artefacts
`audit.archive`	Daily 01:00 UTC	Ship audit logs older than 90 days to cold storage
`webhooks.retry`	Every 5 minutes	Retry failed webhook deliveries with exponential backoff
`notifications.flush`	Every 1 minute	Send queued notifications

12. Observability

Three signal classes:

Metrics (Prometheus-style): request rate, error rate, p50/p95/p99 latency per route; job durations; queue depth; DB connection pool usage; FX cache hit ratio.
Logs: structured JSON, one event per significant action, correlated by request_id and actor_id. Logs are not the audit log — audit lives in the database.
Traces: optional OpenTelemetry traces for slow-path debugging.

A redaction filter strips PII (passport numbers, card PANs, full names of passengers) from logs before shipping.

13. Disaster recovery

Metric	Target
RPO (data loss tolerated)	≤ 5 minutes
RTO (time to restore)	≤ 60 minutes
Backup frequency	Continuous binlog + hourly logical dump
Backup retention	35 days hot, 1 year cold
Geo-redundancy	Primary + warm standby in second region
Quarterly restore drill	Mandatory; documented in `12-compliance/03-audit-readiness.md`

14. Security controls (cross-reference)

This section summarises; full detail in Volume XII.

At rest: AES-256 for DB and object storage; column-level encryption for PAN-like fields.
In transit: TLS 1.3 only; HSTS; certificate pinning for supplier connectors where supported.
Auth: session cookies (httpOnly, SameSite=Strict, Secure) for UI; Bearer tokens for API; optional MFA per partner policy.
Secrets: managed vault; no secrets in environment variables in production.
Dependencies: locked, scanned weekly; CVE budget enforced in CI.
AppSec: parameterised queries everywhere; output encoding by default; CSRF tokens on all cookie-auth writes; strict CSP.

15. Scaling beyond Phase 1

The architecture is deliberately conservative. Known evolution paths:

Pressure	Response
Single DB write hot-spot	Partner-sharded DB cluster; shard key = `partner_id`
Long-tail GDS latencies	Pull supplier search into a separate "shopping" service
Report generation contention	Materialised views + warehouse offload (Snowflake/BigQuery)
Webhook fan-out	Dedicated dispatch service with per-partner rate isolation
Real-time AI agent traffic	gRPC bidirectional surface backed by the same domain services

Next: 03-glossary.md — terms and abbreviations used throughout the documentation.