In this volume · VOLUME 11
Data Imports
Master Input Catalog BSP / ARC Import

Chapter 11.1 — Master Input Catalog

1. Purpose

This chapter is the single canonical inventory of everything travoBooks ingests — every file format, API feed, scheduled pull, manual upload, partner-initiated import, and integration push. For each input it lists the source, format, frequency, validation, target tables, security posture, and operational ownership. Detailed parsers for each input live in subsequent chapters (11.2 CSV, 11.3 BSP/ARC, 11.4 Bank, 11.5 Supplier, 11.6 Tax); this chapter is the index.

2. Why it matters

Imports are the highest-risk surface for data quality. A malformed BSP file mis-categorises a million in commission; a duplicate bank statement double-counts cash; a stale FX-rate file mis-prices new bookings for a day. Treating imports as first-class, versioned, idempotent, auditable artefacts — rather than ad-hoc cron jobs — is what separates a reliable platform from one that quietly breaks.

3. Import principles

Every import follows the same skeleton:

  1. Source identification — partner, supplier, gateway, or platform.
  2. Idempotency — file checksum or message-ID deduplication; re-import is safe.
  3. Staging — raw rows go into a *_staging table unmodified before any business interpretation.
  4. Validation — schema and business rules; failures quarantined.
  5. Application — committed to operational/financial tables.
  6. Audit — every import produces an import_run record with stats.
  7. Notification — operational owners notified of success / exceptions.

No import bypasses staging. No import silently drops rows. No import is one-shot — replay must produce the same end state.

4. The catalog

Input Source Format Frequency Target Notes
Bulk customer onboarding Partner upload CSV / Excel Ad-hoc customers, customer_addresses, customer_contacts At-onboarding migration
Customer credit-limit batch update Partner upload CSV Ad-hoc customers.credit_limit + history Maker-checker; threshold rules
Customer status sync Partner ERP via API JSON Daily customers.status Optional partner integration
Sanctions watchlist refresh Compliance feed (OFAC, UN) CSV / XML Daily compliance_watchlists Triggers re-screening
Input Source Format Frequency Target
Supplier master onboarding Partner upload CSV Ad-hoc suppliers, supplier_addresses, supplier_bank_details
Commission rate updates Supplier email + partner upload CSV / PDF Ad-hoc supplier_commission_rules
Supplier statement of account Supplier email / portal PDF / Excel / CSV Monthly supplier_statement_filessupplier_statement_lines
Net rate agreements Negotiated contracts PDF (manual capture) Per contract supplier_pricing_profiles

4.3 Operational imports

Input Source Format Frequency Target
GDS booking pull Amadeus / Sabre / Travelport XML / NDC JSON On-demand + queued bookings, booking_segments, booking_pnrs
LCC booking pull Aggregator / direct API JSON On-demand bookings, etc.
NDC offer creation Supplier OfferPriceRQ JSON On-demand booking_tsts
Hotel availability cache Hotel aggregator JSON Cache TTL hotel_availability_cache
Schedule-change feed GDS PSS XML Daily booking_segments.status updates
Flown-status updates GDS / Airline NDC XML / JSON Daily booking_segments.status = FLOWN

4.4 Financial imports

Input Source Format Frequency Target
BSP/BSPlink billing analysis BSPlink (per BSP) HOT / AGTDATA / CSV Daily/weekly bsp_import_files → various
BSP memos (ADM/ACM) BSPlink CSV Daily memos
ARC report ARC (US BSP equivalent) CSV Weekly analogous to BSP
Bank statement Bank MT940 / OFX / CSV / BAI2 Daily bank_statement_filesbank_statement_lines
Gateway settlement Stripe / SSLCommerz / Razorpay CSV / JSON Daily gateway_settlement_files
FX rate feed Provider (XE, ECB, OANDA) JSON / CSV Hourly fx_rates
Manual journal upload Accountant CSV / Excel Ad-hoc journal_entries (staging-validated)
Trial balance import (migration) Predecessor system CSV / Excel One-time Opening balance JEs
Bank charges & interest Bank statement parsing derived Daily JE adjustments
Input Source Format Frequency Target
Tax rate table refresh Vetted tax-counsel update CSV / partner-managed As needed tax_rules (with maker-checker)
Withholding tax rates Government publications manual Annual withholding_tax_rates
VAT return submission receipts Tax authority portal PDF After filing tax_returns.filed_reference

4.6 Reference data imports

Input Source Format Frequency Target
IATA airport codes IATA / OAG CSV Quarterly airport_codes (~9,800 rows)
City / location codes OAG / curated CSV Quarterly location_codes (~81,000 rows)
IATA airline directory IATA CSV Quarterly airlines
ISO currency table ISO + corrections CSV Annually currencies
ISO country table ISO CSV Annually countries
Equipment / aircraft type IATA CSV Annually aircraft_types

4.7 Configuration imports

Input Source Format Frequency Target
Chart of accounts template Partner setup CSV At onboarding chart_of_accounts
Approval thresholds Partner config YAML/JSON Ad-hoc partner_config
Notification templates Partner customisation HTML/Markdown Ad-hoc notification_templates

4.8 Integration push-ins (webhooks received)

Input Source Format Trigger
Gateway webhook (payment_intent.succeeded) Stripe etc. JSON HTTPS Customer payment
Gateway webhook (charge.refunded) Stripe etc. JSON Refund settled
Gateway webhook (chargeback.created) Stripe etc. JSON Chargeback raised
Supplier NDC push (order.changed) NDC supplier JSON Order change
BSPlink notification BSPlink Email/portal New file available

4.9 Manual operational imports

Input Source Format Frequency
Opening balance migration Predecessor system Excel Once at onboarding
Historical booking import Predecessor system CSV Once at onboarding
Document attachments Operations PDF, image Ad-hoc

5. Common file format families

5.1 CSV / Excel

  • UTF-8 with BOM expected; UTF-8 without BOM accepted; other encodings rejected with hint.
  • Header row mandatory; column names matched case-insensitive against expected schema.
  • Date format ISO-8601 by default (YYYY-MM-DD); partner-config override allowed for locale-specific.
  • Numeric: dot decimal separator default; partner-config override for comma-decimal.

5.2 XML (GDS)

  • Schema validation against captured XSDs (per supplier version).
  • Namespace-aware parsing.
  • Raw payload always archived in S3 for replay/audit.

5.3 JSON (NDC / REST APIs)

  • JSON Schema validation per endpoint version.
  • Schema versions captured; deprecation warnings logged.

5.4 MT940 / BAI2 (banking)

  • Standard banking formats; parser per format.
  • Per-bank quirks handled by adapter pattern.

5.5 HOT / AGTDATA (BSP)

  • IATA-defined fixed-width formats.
  • Per-country BSP variants handled.

6. Idempotency strategy

Input type Dedup key
File-based SHA-256 of file content + partner_id
API call replay Idempotency-Key header
Webhook Provider message ID + partner_id
GDS pull (supplier_ref, partner_id, snapshot_at quantised to minute)

The idempotency_keys table holds 30-90 days of recent keys depending on category.

7. Validation tiers

Tier Failure handling
Schema (shape, types) Reject entire file; alert operations
Referential (FK exists, partner-scope) Quarantine row; continue file
Business (amount range, status valid) Quarantine row; continue file
Cross-row (sum matches header) Reject file or flag for review per config
Cross-file (matches expected period) Warn; allow override

8. Security & permissions

Permission Allows
import.upload.partner Upload file
import.validate.partner Run validation pass
import.commit.partner Commit staged data
import.commit_financial.partner Commit financial imports (higher threshold; maker-checker)
import.history.read.partner View import runs

Uploaded files are encrypted at rest in S3 with KMS keys per partner; access logged.

9. Validation rules (cross-cutting)

Code Condition
IMPORT_FILE_DUPLICATE Same checksum already imported
IMPORT_FILE_TOO_LARGE Exceeds per-type size limit
IMPORT_ENCODING_INVALID Not UTF-8
IMPORT_SCHEMA_MISMATCH Headers don't match expected
IMPORT_PARTNER_MISMATCH File appears to belong to different partner
IMPORT_PERIOD_CLOSED Targets a closed accounting period
IMPORT_TOTAL_MISMATCH Header total ≠ sum of lines

10. Operational ownership

Input type Operational owner
BSP / ADM / ACM BSP Operations Lead
Bank statements Treasury / Accountant
Gateway settlement Treasury
GDS / Booking imports Travel Ops
Customer / Supplier master uploads Account Management
Reference data (IATA, currencies) Platform Engineering
FX rates Platform Engineering (automated)
Tax rate updates Tax Compliance

Each input has a documented runbook in the operations wiki referencing this catalog row.

11. Real-world examples

A — Daily BSP file processing

04:00 — BSPlink SFTP poll picks up the day's file. 04:01 — Checksum verified; idempotent; new file. 04:02 — Parsed into bsp_staging_lines. 04:05 — Schema validation passes. 04:06 — Business validation: 99.7% match expected tickets; 12 orphans quarantined. 04:07 — Committed; memo entries created; import_run summary email to BSP-ops. 04:30 — BSP-ops works orphan queue.

B — Bank statement of small partner

08:30 — Accountant downloads MT940 from bank portal. 08:31 — Uploads via UI; idempotent check passes. 08:32 — Parser handles bank-specific quirks via adapter. 08:33 — Auto-match against payments / payouts: 92% matched. 08:40 — Accountant works exceptions: links three unmatched credits to invoices, posts one bank-charge adjustment. 08:55 — Reconciliation report shows balanced; saved to period record.

C — Onboarding migration

New partner imports 12,000 historical customers, 2,400 suppliers, 86,000 bookings, and an opening trial balance. - Files reviewed by data team for schema fit. - Per-batch staging upload + validation. - Final commit by partner_admin under maker-checker. - One-time "migration period" — opening JE posted with effective date of cutover.

12. Step-by-step generic import flow

flowchart TD A[Source produces file / event] --> B{Idempotent?} B -->|Already processed| C[Return prior result] B -->|New| D[Persist to S3 + import_runs row] D --> E[Parse → Staging Table] E --> F{Schema valid?} F -->|No| G[Reject; alert ops] F -->|Yes| H[Row-level validation] H --> I{All rows valid?} I -->|All| J[Commit to live tables] I -->|Some| K[Quarantine invalid; commit valid; alert ops] J --> L[Update materialised balances] K --> L L --> M[Emit webhooks / notifications] M --> N[Mark import_run complete with stats]

13. Database tables touched

Table Role
import_runs One row per import attempt
import_files Uploaded file metadata + S3 ref
*_staging tables (per input type) Raw parsed rows pre-commit
*_quarantine tables Failed rows with reason
audit_logs Every import event

14. Future scalability

  • Universal import UI — drag-and-drop with auto-detect of format and supplier.
  • Streaming imports — for partners with high-volume real-time events.
  • AI-assisted column mapping — accept arbitrary CSV layouts and map to canonical schema.
  • Self-service partner imports — extend beyond admin-only to selected partner users with thresholds.
  • Compliance evidence linking — auto-attach evidence to relevant audit records.

15. Common pitfalls

  • ⚠️ Skipping staging. Direct-to-live commits make replay impossible.
  • ⚠️ Trusting header totals. Cross-validate sum of lines.
  • ⚠️ Mixed-encoding files. Reject early; don't half-import.
  • ⚠️ Manual fix-up after a bad import without a corresponding correcting import. Always go through the same path.
  • 🔒 Imported files with PII must inherit the same encryption + retention as primary data.
  • ⚠️ Closed-period imports. Block at validation tier; never silently shift dates.