In this volume · VOLUME 00
Introduction
Platform Overview System Architecture Glossary Technology Stack

Chapter 0.2 — System Architecture

1. Purpose

This chapter explains the high-level architecture of travoBooks — the architectural pattern, the tiers, the logical services, the principles that govern data flow, and the failure modes the platform is designed to survive. Chapter 0.4 covers the specific technology choices that implement this architecture; this chapter explains the architecture itself.

2. Architectural Pattern — Service-Oriented Modular Architecture

travoBooks is built as a service-oriented modular architecture: multiple cleanly bounded Go services share a common transactional data store, communicate over well-defined APIs, and are deployed and scaled independently. This is a deliberate position between a pure microservice mesh and a single monolith.

2.1 Why not pure microservices

Constraint Implication
Booking issuance and journal entry posting must commit in the same database transaction A separate ledger service with its own database would force eventual consistency on the ledger — unacceptable
Financial integrity is more important than independent deployability Saga patterns and compensating transactions add risk we will not accept on money flows
Operational complexity grows with service count A small focused engineering team should not maintain dozens of independently-deployed services

2.2 Why not a pure monolith

Constraint Implication
Reporting workloads must not affect booking latency Services must be independently scalable
A failure in messaging must not bring down booking Failure isolation per logical service is required
Frontend and backend teams ship at different cadences Independent deployment is required

2.3 What we get instead

  • A small set of Go services, each owning a clear domain
  • One shared transactional database (MySQL Cloud Enterprise) so that operations and ledger commit atomically when they need to
  • Redis as the common cache, queue, and coordination layer
  • A React SPA as the primary client surface

3. The Four Tiers

┌──────────────────────────────────────────────────────────────────────┐
│                          CLIENT TIER                                 │
│  React.js SPA  •  Mobile Web  •  Partner Embedded UIs                │
└─────────────────────────────────┬────────────────────────────────────┘
                                  │  HTTPS · TLS 1.3 · REST · WebSocket
                                  ▼
┌──────────────────────────────────────────────────────────────────────┐
│                       APPLICATION TIER  (Go)                         │
│                                                                      │
│   ┌──────────────┐  ┌──────────────┐  ┌──────────────┐               │
│   │ API Gateway  │  │     Auth     │  │   Booking    │               │
│   └──────────────┘  └──────────────┘  └──────────────┘               │
│   ┌──────────────┐  ┌──────────────┐  ┌──────────────┐               │
│   │    Ledger    │  │  Messaging   │  │  Reporting   │               │
│   └──────────────┘  └──────────────┘  └──────────────┘               │
│                                                                      │
│           mTLS service-to-service · OpenTelemetry tracing            │
└────────────┬──────────────────────────────────────────────┬──────────┘
             │ TLS 1.3                              TLS 1.3 │
             ▼                                              ▼
┌──────────────────────────┐                  ┌──────────────────────────┐
│       DATA TIER          │                  │   CACHE & QUEUE TIER     │
│  MySQL Cloud Enterprise  │                  │         Redis            │
│  Primary + Read replicas │                  │  Cluster + Sentinel      │
│  Multi-AZ · PITR · TDE   │                  │  AOF persistence · ACL   │
└──────────────────────────┘                  └──────────────────────────┘

Every hop between tiers uses TLS 1.3. Inter-service traffic between Go services uses mTLS with service identities issued by SPIFFE / Vault.

4. Logical Services

Service Responsibility Primary Stores
API Gateway Request routing, authentication proxy, rate limiting, idempotency-key handling, request logging Redis (rate buckets, idempotency keys)
Auth Password, MFA, PAT, session management, RBAC enforcement MySQL (users, roles, permissions); Redis (sessions)
Booking Booking lifecycle, GDS/NDC orchestration, holds, issuance, voids, refunds MySQL (bookings, tickets, segments); Redis (distributed locks, idempotency)
Ledger Journal entries, account management, period state, reconciliation, recognition runs MySQL (journals, accounts, periods)
Messaging E-tickets, invoices, receipts, vouchers; email / SMS / in-app inbox / webhooks Redis Streams (queue); MySQL (outbox, templates)
Reporting Trial Balance, P&L, BS, CF, aging, deferred revenue roll-forward, audit packs MySQL read replicas; Redis (computed report cache)
Integration GDS / NDC clients, supplier adapters, BSP file ingestion MySQL (supplier credentials, request log); Redis (response dedup)

Each service owns its tables. Cross-service reads happen over APIs, not by reaching into another service's tables. The Booking service performs JE writes directly within the same transaction as the booking write — that is the only cross-domain write pattern, and it exists because financial integrity demands it.

5. Core Architectural Decisions

5.1 Transactional integrity — Booking and Ledger commit together

Every operational event that has a financial consequence (ticket issued, payment captured, refund processed, memo accepted) opens one MySQL transaction, writes the operational rows, writes the balanced JE lines, and commits atomically. If any part fails, all parts roll back. This is the single most important property of the platform.

The pattern is implemented in Go using sql.Tx:

tx, err := db.BeginTx(ctx, &sql.TxOptions{Isolation: sql.LevelSerializable})
// ... insert ticket
// ... update booking state
// ... insert JE header
// ... insert JE lines
err = tx.Commit()

5.2 GDS calls happen outside the DB transaction

External supplier calls are slow, can hang, and would otherwise hold database locks for tens of seconds. The platform records intent in a supplier_request_log row, commits that, and then calls the GDS in a separate Go goroutine. Only after the GDS responds does a second transaction post the ticket and JE.

The cost of this design is the possibility of "orphan tickets" (GDS succeeded, our commit failed) — handled by a Reconciler service that runs every five minutes and reconciles supplier_request_log against booking_tickets.

5.3 Multi-tenancy by partition

Every business-meaningful table carries partner_id. Every query filters on partner_id. Composite foreign keys (partner_id, *_id) make it physically impossible to reference rows belonging to a different partner. Redis keys are prefixed partner:{id}:.... S3 objects live under s3://travobooks-prod/partner-{id}/....

5.4 Multi-currency at line level

Every JE line carries transaction_currency, functional_currency, optional reporting_currency, and the snapshot FX rate used at posting time. The ledger never retranslates historical postings.

5.5 Double-entry by default

There is no path to write financial state that does not pass through a balanced JE. There is no "quick adjust" that bypasses the ledger. Total debits equal total credits — always.

5.6 Append-only financial state

journal_entries, journal_entry_lines, and audit_logs accept INSERT only. Corrections happen via reversing JEs. Hash chaining on audit_logs makes tampering detectable.

5.7 API-first

The UI itself is the first consumer of the public API. No internal-only endpoints exist in Phase 1. Anything the React SPA can do, a partner integration can do.

5.8 Stateless application tier

Every Go service is stateless — all state lives in MySQL or Redis. Any service instance can serve any request. Horizontal scaling is purely additive.

6. End-to-End Data Flow — Ticket Issuance

┌──────┐  1. POST /v1/bookings/{id}/issue        ┌────────────────┐
│React │ ───────────────────────────────────────▶│  API Gateway   │
│ SPA  │                                         │     (Go)       │
└──────┘                                         └───────┬────────┘
                                                         │ 2. session lookup
                                                         ▼
                                                   ┌──────────┐
                                                   │  Redis   │
                                                   └────┬─────┘
                                                         │ 3. authn ok, route
                                                         ▼
                                                   ┌────────────────┐
                                                   │    Booking     │
                                                   │   Service (Go) │
                                                   └────┬───────────┘
                              4. acquire lock  ◀────────┤
                              5. idempotency check
                              6. write "pending" row, commit
                                                         │
                              7. GDS call (goroutine, outside TX)
                                                         ▼
                                                   ┌──────────┐
                                                   │   GDS    │
                                                   └────┬─────┘
                                                         │ 8. ticket number
                                                         ▼
                              9. open TX, INSERT ticket + JE lines,
                                 UPDATE booking → ISSUED, COMMIT
                                                         │
                              10. publish booking.issued event
                                                         ▼
                                                   ┌──────────────────┐
                                                   │  Redis Streams   │
                                                   └────┬─────────────┘
                                                         │ 11. consume
                                                         ▼
                                                   ┌──────────────────┐
                                                   │ Messaging worker │
                                                   └────┬─────────────┘
                                                         │ 12. e-ticket → SES
                                                         ▼
                                                       👤 Customer

This single flow exercises every tier and every architectural principle: stateless services, distributed locks, idempotency, GDS-outside-transaction, atomic ledger commit, async messaging via Redis Streams.

7. Failure Modes and Resilience

Failure Detection Recovery
GDS timeout Go context deadline exceeded Client retry with same idempotency key; reconciler catches orphans
GDS success, our commit fails Reconciler diff between supplier_request_log and booking_tickets Ops resolves manually with audit trail; ticket row inserted, JE posted, alert raised
MySQL primary failure Health check via cloud-provider probe Multi-AZ failover < 60s; in-flight transactions retried by clients
Redis primary failure Sentinel detects Automatic failover to replica; brief read-only window
Whole-region outage External synthetic monitoring Phase 1: manual failover to standby region; Phase 2: active-active
Webhook consumer down (partner side) Delivery attempt fails Exponential backoff in Streams; max 8 attempts; dead-letter
Background worker crash Streams consumer group sees no ACK Message returns to the pending list; another consumer picks up
Bad deployment Canary metrics regress Automatic rollback via deployment pipeline

8. Deployment Topology

8.1 Phase 1 — Single primary region, multi-AZ

  • One primary region (e.g. AWS Singapore for Bangladesh/India partners; AWS Frankfurt for EU)
  • Three availability zones
  • MySQL Cloud Enterprise: primary + 2 multi-AZ standbys + read replicas
  • Redis Cluster: 6 nodes spread across AZs
  • Go services: deployed on Kubernetes, replicas spread across AZs by pod anti-affinity rules

8.2 Phase 1 disaster recovery

  • Standby region with continuous backup restore
  • RTO target: 2 hours
  • RPO target: < 5 minutes (PITR + Redis AOF)

8.3 Phase 2 — Active-active multi-region (roadmap)

  • Two or more active regions
  • Partner-pinned routing (each partner's reads land in their nearest region)
  • Cross-region replication via MySQL Group Replication or Vitess
  • RTO target: < 60 seconds
  • RPO target: ~0

9. Observability

Every Go service emits:

  • Structured JSON logs with partner_id, request_id, event_id correlation
  • Prometheus metrics: RED (Rate, Errors, Duration) per endpoint; USE (Utilisation, Saturation, Errors) per node
  • OpenTelemetry traces: every request carries a trace ID propagated across all hops including outbound GDS calls
  • Health probes: /healthz (liveness) and /readyz (readiness, checks DB and Redis connectivity)

Audit logs are a separate, tamper-evident stream — see Chapter 8.2.

10. Security Architecture (in Brief)

The full treatment is in Chapter 12.5 (Data Protection) and Chapter 0.4 §10. At an architectural level:

  • TLS 1.3 between every hop
  • mTLS between every Go service
  • Secrets in HashiCorp Vault / AWS Secrets Manager — never in code or logs
  • MySQL TDE at rest; Redis with disk encryption; S3 with SSE-KMS
  • Append-only hash-chained audit_logs
  • Maker-checker required for sensitive operations

11. Architectural Principles — Summary

  1. Booking + ledger commit together. Always.
  2. GDS calls happen outside the database transaction. Reconciler handles orphans.
  3. Multi-tenancy is enforced at three layers: application, database, storage.
  4. Financial state is append-only. Reversals, never updates.
  5. Every service is stateless. State lives in MySQL or Redis.
  6. The UI is the first API consumer. No internal-only endpoints.
  7. Failure of one service does not cascade. Bulkheads via Redis-backed queues and timeouts.
  8. Every change is traceable. Audit logs with hash chaining.

These principles, taken together, are what make travoBooks suitable for enterprise-grade global travel accounting at scale.