Building a Scalable Postgres RDS Architecture for an E-Commerce Platform

Introduction

A rapidly expanding e-commerce platform catering to over 1 million users daily faced severe challenges with its cloud-hosted PostgreSQL database on EC2. The monolithic database setup could not scale effectively with rising user traffic, resulting in degraded performance during flash sales and high traffic campaigns. Moreover, the manual backup strategies were unreliable and error-prone, and the lack of monitoring tools led to delayed response times when issues occurred. 

The goal was to design and implement a robust, scalable, and highly available database solution that could handle peak traffic loads, improve read/write throughput, automate backup and recovery, and enhance monitoring and security. 

Architecture Overview

The new architecture replaced the existing PostgreSQL on EC2 with Amazon RDS for PostgreSQL using the following components: 

  1. Primary RDS instance with Provisioned IOPS to handle write-heavy transactions. 
  2. Read replicas deployed across two additional AWS regions for read scalability and geo-redundancy. 
  3. Automated backups and snapshot retention with AWS Backup. 
  4. Enhanced Monitoring and Performance Insights for DB performance diagnostics. 
  5. SNS notifications for alerting on performance anomalies.
  6. Secrets Manager for secure and automated credential management.
  7. IAM roles and policies to control and audit access.
  8. PoC of Aurora PostgreSQL prepared for future serverless scaling and multi-master replication.

The PostgreSQL RDS setup served as the central transactional database. It integrated with multiple services and systems across the application landscape:

 

Upstream Data Sources: 

  1. Web and Mobile Apps: Order placement, user registrations, product search queries 
  2. Microservices: Inventory, payment gateway, recommendation engine 
  3. Third-party APIs: Logistics partners, fraud detection tools 

Downstream Consumers: 

  1. Analytics and BI: Data pushed to S3 using DMS and transformed via Glue for reporting in QuickSight 
  2. Email/Notification Systems: Data triggers email confirmations and real-time status alerts using SES and SNS 
  3. CRM and Customer Support: PostgreSQL views exposed through APIs for customer query handling 

AWS Services Architecture

A highly modular and scalable architecture was designed, involving: 

  1. Amazon RDS for PostgreSQL as the primary database with Provisioned IOPS (30,000 IOPS) 
  2. Two Read Replicas in secondary AWS regions (us-east-2, eu-west-1) with cross-region replication 
  3. AWS Secrets Manager integrated with Lambda for automatic rotation of DB credentials every 30 days 
  4. CloudWatch for log collection and metric alarms 
  5. Enhanced Monitoring enabled with 1-second granularity for CPU, memory, disk I/O, etc. 
  6. SNS topics subscribed by DevOps for critical thresholds 
  7. AWS Backup managing daily backups with 35-day retention 
  8. IAM roles attached to EC2, Lambda, and Glue for secure access 
  9. DMS (Database Migration Service) for real-time CDC (Change Data Capture) to analytics layer 

Data Flow Process

Step-by-Step Flow: 

  1. Order placed via App: Triggers API Gateway → Lambda function → RDS PostgreSQL (Primary) 
  2. Inventory check and update: Another service reads from/write to RDS 
  3. Read queries (e.g., product browsing, order status): Routed to nearest Read Replica 
  4. Glue job runs hourly to fetch data from RDS to S3 for analytics 
  5. DMS streams changes to Redshift for near-real-time reporting 
  6. QuickSight dashboards pull data from Redshift for internal teams 

Error Handling and Monitoring

To minimize downtime and ensure proactive alerting: 

  1. CloudWatch Alarms configured for: 
  2. CPU utilization > 70% 
  3. Free storage < 20% 
  4. Replication lag > 1 minute 
  5. Query latency > threshold per endpoint 
  6. SNS Alerts sent to Slack channels and DevOps email alias 
  7. Enhanced Monitoring visualized via CloudWatch Dashboards 
  8. RDS Events monitored for snapshot failures, failovers, and maintenance 
  9. Automated Lambda parses logs and sends alerts on deadlocks or failed connections 

Data Access and Security

Measures Implemented: 

  1. IAM policies with least privilege principle for EC2, Lambda, and developers 
  2. Secrets Manager auto-rotates credentials; audit logs track access 
  3. SSL enforcement for all database connections 
  4. Encryption at rest enabled using AWS KMS 
  5. Security Groups restricted to private subnets and whitelisted IPs 
  6. Activity Logs (CloudTrail + PostgreSQL logs) stored in S3 for compliance and forensic auditing

Best Practices Implementation 

Practice
Description
Connection Pooling  PgBouncer used to manage hundreds of concurrent connections efficiently 
Maintenance Windows  Automatic minor version upgrades during off-peak hours 
Cross-region Replication  Ensures availability even if one region fails 
Automated Snapshots  Taken daily, retained for 35 days, and tested quarterly for DR 
Query Optimization  Slow queries identified using Performance Insights and optimized with indexing and partitioning 
Parameter Group Tuning  Increased max_connections, tuned work_mem, wal_buffers, etc. 
Failover Testing  Conducted bi-monthly to simulate region-level outage and failover scenarios 

 

Conclusion

The migration to Amazon RDS for PostgreSQL delivered measurable improvements: 

  1. 50% reduction in average query response time 
  2. 3x increase in read throughput due to replicas 
  3. Zero data loss incidents since cutover 
  4. Seamless scaling during Black Friday and similar spikes 
  5. Better observability through Enhanced Monitoring and SNS alerts 

The architecture not only supports current load but is also future-proof, allowing for further scalability through Aurora PostgreSQL, serverless options, or horizontal sharding based on business growth.