Chapter 6

Cloud Architecture: Designing for Scale on AWS

Technical deep-dive into cloud-native core banking architecture using AWS services, including event sourcing, CQRS, multi-tenancy, and high availability design.

10 min read

Architecture Principles

Building a core banking platform requires architectural decisions that will shape the product for years. The following principles guide the design of modern, scalable banking systems:

  • Cloud-Native: Built for AWS from the ground up, leveraging managed services
  • Microservices: Independent, loosely-coupled services that scale independently
  • Event-Driven: Asynchronous communication via events for loose coupling and resilience
  • API-First: All functionality exposed through well-documented RESTful APIs
  • Multi-Tenant: True multi-tenancy with row-level security and data isolation
  • Security by Design: Defense in depth with encryption, authentication, and audit logging
Why These Principles Matter

Each principle addresses specific banking requirements: cloud-native enables cost efficiency; microservices enable independent scaling during peak loads; event-driven ensures audit trails for compliance; API-first enables ecosystem integration; multi-tenancy enables SaaS economics; security by design satisfies regulators.

Architecture Layers

LayerComponentsAWS Services
PresentationWeb Portal, Mobile Apps, Admin ConsoleCloudFront, S3, Amplify
API GatewayREST APIs, Authentication, Rate LimitingAPI Gateway, WAF, Cognito
ApplicationMicroservices, Business Logic, WorkflowsECS Fargate, Lambda, Step Functions
IntegrationEvent Bus, Message Queues, StreamingEventBridge, SQS, MSK (Kafka)
DataDatabases, Cache, Search, Data LakeRDS, DynamoDB, ElastiCache, OpenSearch
AnalyticsData Warehouse, BI, ML ModelsRedshift, QuickSight, SageMaker
InfrastructureNetworking, Security, MonitoringVPC, IAM, CloudWatch, X-Ray

The Ledger Engine: Event Sourcing

The ledger is the heart of any banking system. A modern approach uses event sourcing—where every state change is captured as an immutable event.

Why Event Sourcing for Banking

ConceptDescriptionBanking Benefit
Event StoreAppend-only log of all state changesComplete audit trail, regulatory compliance
Event ReplayReconstruct state by replaying eventsPoint-in-time balances, debugging, recovery
ImmutabilityEvents cannot be modified or deletedTamper-proof records, legal evidence
Temporal QueriesQuery state at any historical pointMonth-end reporting, dispute resolution

CQRS (Command Query Responsibility Segregation)

CQRS separates read and write operations for optimal performance:

  • Command Side (Write): Processes transactions, validates business rules, emits events
  • Query Side (Read): Optimized projections for fast balance queries, statements, reports
  • Event Bus: Asynchronous propagation from write to read models
  • Multiple Projections: Different views for different use cases (real-time, reporting, analytics)

Ledger Event Example

// Example event types for a banking ledger
{
  "eventType": "AccountOpened",
  "accountId": "acc_123456",
  "tenantId": "tenant_abc",
  "timestamp": "2026-01-15T10:30:00Z",
  "data": {
    "accountType": "CURRENT",
    "currency": "EUR",
    "ownerId": "cust_789"
  }
}

{
  "eventType": "DepositReceived",
  "accountId": "acc_123456",
  "tenantId": "tenant_abc",
  "timestamp": "2026-01-15T11:00:00Z",
  "data": {
    "amount": 1000.00,
    "currency": "EUR",
    "reference": "SALARY-JAN"
  }
}
Performance Target

Event sourcing with CQRS enables 1M+ TPS by separating write path (event append) from read path (pre-computed projections). Real-time balance queries return in less than 10ms regardless of account history length.

Multi-Tenant Architecture

Row-Level Security (RLS)

True multi-tenancy uses database-level tenant isolation:

-- PostgreSQL Row-Level Security example
CREATE POLICY tenant_isolation ON accounts
  USING (tenant_id = current_setting('app.current_tenant')::uuid);

-- Every query automatically filtered by tenant
SELECT * FROM accounts WHERE account_type = 'CURRENT';
-- Becomes: SELECT * FROM accounts 
--          WHERE account_type = 'CURRENT' 
--          AND tenant_id = 'tenant_abc';

Multi-Tenancy Benefits

ApproachCost per CustomerDeployment SpeedUpgrade Complexity
Single-TenantHigh (dedicated infra)Weeks-monthsPer-customer upgrades
True Multi-TenantLow (shared infra)Hours-daysSingle upgrade for all

AWS Services Architecture

Compute

  • ECS Fargate: Containerized microservices, serverless, auto-scaling
  • Lambda: Event handlers, integrations, scheduled tasks
  • Step Functions: Orchestration for complex workflows (loan origination, KYC)

Data

  • RDS PostgreSQL: Primary transactional database with RLS
  • DynamoDB: High-throughput NoSQL for session data, rate limiting
  • ElastiCache Redis: Session cache, rate limiting, real-time analytics
  • OpenSearch: Full-text search, log analytics, transaction search

Security

  • WAF: Web application firewall, OWASP rules, rate limiting
  • KMS: Key management, customer-managed keys option
  • Cognito: User authentication, OAuth 2.0, MFA
  • GuardDuty: Threat detection, anomaly monitoring

Performance Specifications

MetricTargetMeasurement Method
Peak Throughput1M+ TPSSustained load test, 1 hour
API Latency (p99)Under 200msEnd-to-end response time
Availability99.99%Annual uptime (4.3 min downtime/month)
Recovery Time (RTO)Under 15 minutesFull system recovery
Recovery Point (RPO)Under 5 minutesMaximum data loss window

Security Architecture

Banking-grade security requires defense in depth:

LayerControlsAWS Services
NetworkVPC isolation, security groups, NACLsVPC, WAF, Shield
TransportTLS 1.3 encryption, certificate managementACM, CloudFront
ApplicationOAuth 2.0, JWT tokens, rate limitingCognito, API Gateway
DataAES-256 encryption at rest, field-level encryptionKMS, RDS encryption
AuditImmutable logs, tamper detectionCloudTrail, CloudWatch

High Availability Design

99.99% availability requires eliminating single points of failure:

  • Multi-AZ Deployment: All services deployed across 3 availability zones
  • Database Replication: Synchronous replication with automatic failover
  • Load Balancing: Application load balancers distribute traffic across healthy instances
  • Auto-Scaling: Automatically add/remove capacity based on demand
  • Circuit Breakers: Prevent cascade failures when dependencies fail
  • Health Checks: Continuous monitoring with automatic instance replacement
Disaster Recovery

Banking regulations require robust DR capabilities. Target: RTO (Recovery Time Objective) under 15 minutes, RPO (Recovery Point Objective) under 5 minutes. Achieved through multi-region active-passive setup with automated failover.

Key Takeaways
1

Event sourcing is ideal for banking. Immutable audit trails, point-in-time queries, and regulatory compliance make event sourcing the natural choice for ledger design.

2

True multi-tenancy enables SaaS economics. Row-level security provides complete data isolation while sharing infrastructure—enabling 40-50% cost advantage over single-tenant competitors.

3

AWS provides banking-grade infrastructure. Multi-AZ deployment, managed services, and compliance certifications (SOC 2, ISO 27001) accelerate time-to-market while meeting regulatory requirements.

AI Assistant
00:00