Chapter 0.2 — System Architecture
1. Purpose
This chapter explains the high-level architecture of travoBooks — the architectural pattern, the tiers, the logical services, the principles that govern data flow, and the failure modes the platform is designed to survive. Chapter 0.4 covers the specific technology choices that implement this architecture; this chapter explains the architecture itself.
2. Architectural Pattern — Service-Oriented Modular Architecture
travoBooks is built as a service-oriented modular architecture: multiple cleanly bounded Go services share a common transactional data store, communicate over well-defined APIs, and are deployed and scaled independently. This is a deliberate position between a pure microservice mesh and a single monolith.
2.1 Why not pure microservices
| Constraint | Implication |
|---|---|
| Booking issuance and journal entry posting must commit in the same database transaction | A separate ledger service with its own database would force eventual consistency on the ledger — unacceptable |
| Financial integrity is more important than independent deployability | Saga patterns and compensating transactions add risk we will not accept on money flows |
| Operational complexity grows with service count | A small focused engineering team should not maintain dozens of independently-deployed services |
2.2 Why not a pure monolith
| Constraint | Implication |
|---|---|
| Reporting workloads must not affect booking latency | Services must be independently scalable |
| A failure in messaging must not bring down booking | Failure isolation per logical service is required |
| Frontend and backend teams ship at different cadences | Independent deployment is required |
2.3 What we get instead
- A small set of Go services, each owning a clear domain
- One shared transactional database (MySQL Cloud Enterprise) so that operations and ledger commit atomically when they need to
- Redis as the common cache, queue, and coordination layer
- A React SPA as the primary client surface
3. The Four Tiers
┌──────────────────────────────────────────────────────────────────────┐
│ CLIENT TIER │
│ React.js SPA • Mobile Web • Partner Embedded UIs │
└─────────────────────────────────┬────────────────────────────────────┘
│ HTTPS · TLS 1.3 · REST · WebSocket
▼
┌──────────────────────────────────────────────────────────────────────┐
│ APPLICATION TIER (Go) │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ API Gateway │ │ Auth │ │ Booking │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Ledger │ │ Messaging │ │ Reporting │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
│ mTLS service-to-service · OpenTelemetry tracing │
└────────────┬──────────────────────────────────────────────┬──────────┘
│ TLS 1.3 TLS 1.3 │
▼ ▼
┌──────────────────────────┐ ┌──────────────────────────┐
│ DATA TIER │ │ CACHE & QUEUE TIER │
│ MySQL Cloud Enterprise │ │ Redis │
│ Primary + Read replicas │ │ Cluster + Sentinel │
│ Multi-AZ · PITR · TDE │ │ AOF persistence · ACL │
└──────────────────────────┘ └──────────────────────────┘
Every hop between tiers uses TLS 1.3. Inter-service traffic between Go services uses mTLS with service identities issued by SPIFFE / Vault.
4. Logical Services
| Service | Responsibility | Primary Stores |
|---|---|---|
| API Gateway | Request routing, authentication proxy, rate limiting, idempotency-key handling, request logging | Redis (rate buckets, idempotency keys) |
| Auth | Password, MFA, PAT, session management, RBAC enforcement | MySQL (users, roles, permissions); Redis (sessions) |
| Booking | Booking lifecycle, GDS/NDC orchestration, holds, issuance, voids, refunds | MySQL (bookings, tickets, segments); Redis (distributed locks, idempotency) |
| Ledger | Journal entries, account management, period state, reconciliation, recognition runs | MySQL (journals, accounts, periods) |
| Messaging | E-tickets, invoices, receipts, vouchers; email / SMS / in-app inbox / webhooks | Redis Streams (queue); MySQL (outbox, templates) |
| Reporting | Trial Balance, P&L, BS, CF, aging, deferred revenue roll-forward, audit packs | MySQL read replicas; Redis (computed report cache) |
| Integration | GDS / NDC clients, supplier adapters, BSP file ingestion | MySQL (supplier credentials, request log); Redis (response dedup) |
Each service owns its tables. Cross-service reads happen over APIs, not by reaching into another service's tables. The Booking service performs JE writes directly within the same transaction as the booking write — that is the only cross-domain write pattern, and it exists because financial integrity demands it.
5. Core Architectural Decisions
5.1 Transactional integrity — Booking and Ledger commit together
Every operational event that has a financial consequence (ticket issued, payment captured, refund processed, memo accepted) opens one MySQL transaction, writes the operational rows, writes the balanced JE lines, and commits atomically. If any part fails, all parts roll back. This is the single most important property of the platform.
The pattern is implemented in Go using sql.Tx:
tx, err := db.BeginTx(ctx, &sql.TxOptions{Isolation: sql.LevelSerializable})
// ... insert ticket
// ... update booking state
// ... insert JE header
// ... insert JE lines
err = tx.Commit()
5.2 GDS calls happen outside the DB transaction
External supplier calls are slow, can hang, and would otherwise hold database locks for tens of seconds. The platform records intent in a supplier_request_log row, commits that, and then calls the GDS in a separate Go goroutine. Only after the GDS responds does a second transaction post the ticket and JE.
The cost of this design is the possibility of "orphan tickets" (GDS succeeded, our commit failed) — handled by a Reconciler service that runs every five minutes and reconciles supplier_request_log against booking_tickets.
5.3 Multi-tenancy by partition
Every business-meaningful table carries partner_id. Every query filters on partner_id. Composite foreign keys (partner_id, *_id) make it physically impossible to reference rows belonging to a different partner. Redis keys are prefixed partner:{id}:.... S3 objects live under s3://travobooks-prod/partner-{id}/....
5.4 Multi-currency at line level
Every JE line carries transaction_currency, functional_currency, optional reporting_currency, and the snapshot FX rate used at posting time. The ledger never retranslates historical postings.
5.5 Double-entry by default
There is no path to write financial state that does not pass through a balanced JE. There is no "quick adjust" that bypasses the ledger. Total debits equal total credits — always.
5.6 Append-only financial state
journal_entries, journal_entry_lines, and audit_logs accept INSERT only. Corrections happen via reversing JEs. Hash chaining on audit_logs makes tampering detectable.
5.7 API-first
The UI itself is the first consumer of the public API. No internal-only endpoints exist in Phase 1. Anything the React SPA can do, a partner integration can do.
5.8 Stateless application tier
Every Go service is stateless — all state lives in MySQL or Redis. Any service instance can serve any request. Horizontal scaling is purely additive.
6. End-to-End Data Flow — Ticket Issuance
┌──────┐ 1. POST /v1/bookings/{id}/issue ┌────────────────┐
│React │ ───────────────────────────────────────▶│ API Gateway │
│ SPA │ │ (Go) │
└──────┘ └───────┬────────┘
│ 2. session lookup
▼
┌──────────┐
│ Redis │
└────┬─────┘
│ 3. authn ok, route
▼
┌────────────────┐
│ Booking │
│ Service (Go) │
└────┬───────────┘
4. acquire lock ◀────────┤
5. idempotency check
6. write "pending" row, commit
│
7. GDS call (goroutine, outside TX)
▼
┌──────────┐
│ GDS │
└────┬─────┘
│ 8. ticket number
▼
9. open TX, INSERT ticket + JE lines,
UPDATE booking → ISSUED, COMMIT
│
10. publish booking.issued event
▼
┌──────────────────┐
│ Redis Streams │
└────┬─────────────┘
│ 11. consume
▼
┌──────────────────┐
│ Messaging worker │
└────┬─────────────┘
│ 12. e-ticket → SES
▼
👤 Customer
This single flow exercises every tier and every architectural principle: stateless services, distributed locks, idempotency, GDS-outside-transaction, atomic ledger commit, async messaging via Redis Streams.
7. Failure Modes and Resilience
| Failure | Detection | Recovery |
|---|---|---|
| GDS timeout | Go context deadline exceeded | Client retry with same idempotency key; reconciler catches orphans |
| GDS success, our commit fails | Reconciler diff between supplier_request_log and booking_tickets |
Ops resolves manually with audit trail; ticket row inserted, JE posted, alert raised |
| MySQL primary failure | Health check via cloud-provider probe | Multi-AZ failover < 60s; in-flight transactions retried by clients |
| Redis primary failure | Sentinel detects | Automatic failover to replica; brief read-only window |
| Whole-region outage | External synthetic monitoring | Phase 1: manual failover to standby region; Phase 2: active-active |
| Webhook consumer down (partner side) | Delivery attempt fails | Exponential backoff in Streams; max 8 attempts; dead-letter |
| Background worker crash | Streams consumer group sees no ACK | Message returns to the pending list; another consumer picks up |
| Bad deployment | Canary metrics regress | Automatic rollback via deployment pipeline |
8. Deployment Topology
8.1 Phase 1 — Single primary region, multi-AZ
- One primary region (e.g. AWS Singapore for Bangladesh/India partners; AWS Frankfurt for EU)
- Three availability zones
- MySQL Cloud Enterprise: primary + 2 multi-AZ standbys + read replicas
- Redis Cluster: 6 nodes spread across AZs
- Go services: deployed on Kubernetes, replicas spread across AZs by pod anti-affinity rules
8.2 Phase 1 disaster recovery
- Standby region with continuous backup restore
- RTO target: 2 hours
- RPO target: < 5 minutes (PITR + Redis AOF)
8.3 Phase 2 — Active-active multi-region (roadmap)
- Two or more active regions
- Partner-pinned routing (each partner's reads land in their nearest region)
- Cross-region replication via MySQL Group Replication or Vitess
- RTO target: < 60 seconds
- RPO target: ~0
9. Observability
Every Go service emits:
- Structured JSON logs with
partner_id,request_id,event_idcorrelation - Prometheus metrics: RED (Rate, Errors, Duration) per endpoint; USE (Utilisation, Saturation, Errors) per node
- OpenTelemetry traces: every request carries a trace ID propagated across all hops including outbound GDS calls
- Health probes:
/healthz(liveness) and/readyz(readiness, checks DB and Redis connectivity)
Audit logs are a separate, tamper-evident stream — see Chapter 8.2.
10. Security Architecture (in Brief)
The full treatment is in Chapter 12.5 (Data Protection) and Chapter 0.4 §10. At an architectural level:
- TLS 1.3 between every hop
- mTLS between every Go service
- Secrets in HashiCorp Vault / AWS Secrets Manager — never in code or logs
- MySQL TDE at rest; Redis with disk encryption; S3 with SSE-KMS
- Append-only hash-chained
audit_logs - Maker-checker required for sensitive operations
11. Architectural Principles — Summary
- Booking + ledger commit together. Always.
- GDS calls happen outside the database transaction. Reconciler handles orphans.
- Multi-tenancy is enforced at three layers: application, database, storage.
- Financial state is append-only. Reversals, never updates.
- Every service is stateless. State lives in MySQL or Redis.
- The UI is the first API consumer. No internal-only endpoints.
- Failure of one service does not cascade. Bulkheads via Redis-backed queues and timeouts.
- Every change is traceable. Audit logs with hash chaining.
These principles, taken together, are what make travoBooks suitable for enterprise-grade global travel accounting at scale.