This tutorial will guide you through building and understanding a database read scaling system using PostgreSQL master-replica architecture, automatic load balancing, and Redis caching. By the end, you'll understand how to distribute read operations across multiple database instances while maintaining data consistency and improving performance.
- What is Database Read Scaling?
- Core Concepts
- System Architecture
- Prerequisites
- Step-by-Step Setup
- Understanding the Code
- Practical Examples
- Troubleshooting
- Advanced Topics
Database read scaling is a technique to handle high-volume read operations by distributing them across multiple database instances. Instead of all queries hitting a single database, read operations are load-balanced across replica databases while write operations remain centralized on a master database.
- Performance: Multiple replicas can handle more concurrent read requests
- Availability: System continues working even if one replica fails
- Cost Efficiency: Replicas can use less powerful hardware than the master
- User Experience: Faster response times for read-heavy applications
Read scaling is essential for:
- Social media platforms (timeline reads)
- E-commerce sites (product catalog browsing)
- Analytics dashboards (report generation)
- Content management systems (article viewing)
Master Database: The single source of truth that handles all write operations (INSERT, UPDATE, DELETE). It maintains the authoritative copy of all data.
Replica Databases: Read-only copies of the master that receive data changes through streaming replication. They handle SELECT queries and are automatically kept in sync.
Load balancing distributes read requests across multiple replica databases to prevent any single instance from becoming a bottleneck. Our implementation uses Npgsql's built-in load balancing, which automatically routes queries to available replicas.
Redis caching stores frequently accessed data in memory for ultra-fast retrieval. This reduces database load and improves response times for popular content.
Application → Load Balancer → Replicas (5433, 5434) + Redis Cache
↓
Master Database (5432)
- Write Operations: Application → Master Database
- Read Operations: Application → Load Balancer → Replica 1/2 (with Redis cache)
- Replication: Master → Replicas (automatic streaming)
Before starting, ensure you have:
-
.NET 10.0 SDK
# Download from https://dotnet.microsoft.com/download/dotnet/10.0 dotnet --version # Should show 10.0.x
-
Docker Desktop
# Download from https://www.docker.com/products/docker-desktop docker --version # Should show version info
-
Git (for cloning the repository)
git --version
# Clone the repository
git clone <repository-url>
cd ex-scalingreads
# Or if you already have it, navigate to the directory
cd /path/to/ex-scalingreadsOur system uses Docker Compose to run PostgreSQL master-replica setup and Redis cache.
# Start all services
docker-compose up -d
# Check that all services are running
docker-compose psYou should see output like:
Name Command State Ports
-------------------------------------------------------------------------------------
pg-master docker-entrypoint.sh postgres Up 0.0.0.0:5432->5432/tcp
pg-replica-1 docker-entrypoint.sh postgres Up 0.0.0.0:5433->5432/tcp
pg-replica-2 docker-entrypoint.sh postgres Up 0.0.0.0:5434->5432/tcp
redis-cache docker-entrypoint.sh redis ... Up 0.0.0.0:6379->6379/tcp
Let's check that our databases are properly set up and replicating.
# Check master is ready
docker exec -it pg-master pg_isready -U admin -d appdb
# Check replication status (should show 2 replicas)
docker exec -it pg-master psql -U admin -d appdb -c "SELECT * FROM pg_stat_replication;"The replication query should show 2 connected replicas.
# Navigate to the application directory
cd ScalingReads.Core
# Create and apply database migrations
dotnet ef database update -c AppDbContext
# Verify tables were created
docker exec -it pg-master psql -U admin -d appdb -c "\dt"You should see tables like "Albums" and "Songs" created.
# Run the .NET application
dotnet run
# Or for HTTPS development
dotnet run --launch-profile httpsThe application will start on:
- HTTP: http://localhost:5291
- HTTPS: https://localhost:7062
- API Documentation: http://localhost:5291/swagger
This section provides an in-depth explanation of the PostgreSQL streaming replication setup used in this tutorial, covering every configuration aspect from parameters to security.
The master database is configured with specific parameters in docker-compose.yaml to enable streaming replication:
-
wal_level=replica: Sets the Write-Ahead Logging level to 'replica', which includes all information needed for archiving and replication. This parameter is essential for streaming replication as it ensures WAL contains sufficient data for replicas to reconstruct the master's state. -
max_wal_senders=10: Specifies the maximum number of concurrent connections from standby servers or streaming base backup clients. Set to 10 to allow multiple replicas and backup operations, providing headroom for scaling. -
max_replication_slots=10: Defines the maximum number of replication slots that the server can support. Replication slots ensure that the master retains WAL segments until all replicas have received them, preventing premature cleanup that could break replication.
The init/01-replication.sql script creates a dedicated replication user:
CREATE ROLE replicator
WITH REPLICATION
LOGIN
PASSWORD 'repl_password';This user is specifically created for replication purposes and has the REPLICATION privilege, which allows it to connect for replication operations. Using a separate user follows security best practices by limiting permissions to only what's necessary for replication, reducing the attack surface compared to using administrative accounts.
The init/02-pg-hba.sh script configures client authentication by appending entries to pg_hba.conf:
echo "host replication replicator 0.0.0.0/0 scram-sha-256" >> "$PGDATA/pg_hba.conf"
echo "host all all 0.0.0.0/0 scram-sha-256" >> "$PGDATA/pg_hba.conf"- The first line allows the
replicatoruser to connect for replication from any IP address within the Docker network (0.0.0.0/0) using SCRAM-SHA-256 authentication. - The second line allows all users to connect to any database using SCRAM-SHA-256 authentication.
- SCRAM-SHA-256 (Salted Challenge Response Authentication Mechanism) is a modern, secure password-based authentication method that provides better security than older methods like MD5 by using salted hashing and preventing replay attacks.
The replica.sh script orchestrates the complete replica initialization and startup process:
-
Directory Preparation: Creates and sets proper permissions on the PostgreSQL data directory, ensuring the postgres user has ownership and secure access (700 permissions).
-
Master Readiness Verification: Uses a loop with
pg_isreadyto wait for the master database to become available, preventing startup failures. -
Base Backup Execution: When the data directory is empty, performs an initial base backup from the master using
pg_basebackup:-h pg-master: Specifies the master host-U replicator: Authenticates using the replication user-D $DATA_DIR: Sets the destination data directory-Fp: Uses plain format for easier inspection-Xs: Includes required WAL segments in the backup for consistency-P: Displays progress information-R: Automatically createsstandby.signaland configures recovery settings
-
PostgreSQL Startup: Launches PostgreSQL in hot standby mode with
hot_standby=on, enabling read-only queries while maintaining replication capability.
Docker Compose manages service dependencies and health verification:
depends_onwithcondition: service_healthy: Replicas explicitly wait for the master to pass its health check before starting, ensuring proper initialization order.- Master Health Check: Uses
pg_isreadyto verify the master database is accepting connections, with 5-second intervals, 5-second timeouts, and 10 retries for robust startup. - Sequential Startup: This dependency chain ensures the master is fully operational before replicas attempt to connect, preventing replication setup failures.
Data flows from master to replicas through PostgreSQL's streaming replication:
-
WAL Generation: The master generates Write-Ahead Log (WAL) records for every database change, providing a sequential record of all modifications.
-
Real-time Streaming: WAL records are transmitted to replicas via dedicated TCP connections established by the replication user, ensuring minimal latency.
-
Continuous Replay: Replicas receive WAL records and replay them in order, applying changes to maintain an exact copy of the master's data.
-
Hot Standby Operation: While receiving updates, replicas can simultaneously serve read-only queries, enabling load distribution without interrupting replication.
The configuration implements multiple security layers:
- Strong Authentication: SCRAM-SHA-256 provides modern cryptographic protection for passwords.
- Network Segmentation: Docker's internal networking limits database access to containerized services only.
- Principle of Least Privilege: The replication user has only the minimum permissions required for its function.
- Access Control:
pg_hba.confentries restrict connection types and authentication methods appropriately. - Encrypted Communication: All replication traffic occurs over authenticated, encrypted connections within the Docker network.
Our application uses two Entity Framework contexts to enforce separation of concerns:
AppDbContext (for writes):
public class AppDbContext : DbContext
{
public AppDbContext(DbContextOptions<AppDbContext> options) : base(options)
{
}
public DbSet<Album> Albums => Set<Album>();
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
modelBuilder.Entity<Album>(e =>
{
e.HasIndex(a => a.Id);
e.Property(a => a.Title).IsRequired();
e.OwnsMany(a => a.Songs, sa =>
{
sa.WithOwner().HasForeignKey("AlbumId");
sa.Property(s => s.Title).IsRequired();
});
});
}
}ReadOnlyDbContext (for reads):
public class ReadOnlyDbContext : AppDbContext
{
public ReadOnlyDbContext(DbContextOptions<AppDbContext> options) : base(options)
{
ChangeTracker.QueryTrackingBehavior = QueryTrackingBehavior.NoTracking;
}
public override int SaveChanges() => ThrowReadOnlyException();
public override int SaveChanges(bool acceptAllChangesOnSuccess) => ThrowReadOnlyException();
public override async Task<int> SaveChangesAsync(CancellationToken cancellationToken = default) => ThrowReadOnlyException();
public override async Task<int> SaveChangesAsync(bool acceptAllChangesOnSuccess, CancellationToken cancellationToken = default) => ThrowReadOnlyException();
private static int ThrowReadOnlyException()
=> throw new InvalidOperationException();
}In appsettings.json:
{
"ConnectionStrings": {
"MasterConnection": "Host=localhost;Port=5432;Database=appdb;Username=admin;Password=admin_password",
"ReplicaConnection": "Host=localhost,localhost;Port=5433,5434;Database=appdb;Username=postgres;Password=admin_password;Load Balance Hosts=true",
"RedisConnection": "localhost:6379"
}
}The Load Balance Hosts=true setting tells Npgsql to automatically distribute read queries across the replica databases.
In Program.cs:
var masterCs = builder.Configuration.GetConnectionString("MasterConnection");
var replicaCs = builder.Configuration.GetConnectionString("ReplicaConnection");
builder.Services.AddDbContext<AppDbContext>(options => options.UseNpgsql(masterCs));
builder.Services.AddDbContext<ReadOnlyDbContext>(options => options.UseNpgsql(replicaCs));
builder.Services.AddStackExchangeRedisCache(options =>
{
options.Configuration = "localhost:6379";
options.InstanceName = "app-cache:";
});The AlbumController demonstrates both write and read patterns:
[Route("api/[controller]")]
[ApiController]
public class AlbumController : ControllerBase
{
[HttpPost]
public async Task<ActionResult<PostAlbumOutput>> Post([FromServices] AppDbContext dbContext, [FromBody] PostAlbumInput input)
{
if (!ModelState.IsValid)
return BadRequest(ModelState);
using var transaction = await dbContext.Database.BeginTransactionAsync();
var newAlbum = new Album()
{
Title = input.Title,
Songs = [.. input.Songs.Select(s => new Song(s.Title))]
};
await dbContext.Albums.AddAsync(newAlbum);
await dbContext.SaveChangesAsync();
await transaction.CommitAsync();
var result = new PostAlbumOutput(newAlbum.Id);
return Ok(result);
}
[HttpGet("{id}")]
[Cache(ttlSeconds: 120)]
public async Task<ActionResult<GetAlbumOutput>> Get([FromServices] ReadOnlyDbContext dbContext, [FromRoute] int id)
{
var album = await dbContext.Albums
.Where(a => a.Id == id)
.Select(a => new GetAlbumOutput(
a.Id,
a.Title,
a.Songs.Select(s => new GetAlbumSongOutput(s.Title)).ToList()
)).FirstOrDefaultAsync();
return Ok(album);
}
}The CacheAttribute provides automatic Redis caching:
[AttributeUsage(AttributeTargets.Method)]
public class CacheAttribute(int ttlSeconds = 60) : Attribute, IAsyncActionFilter
{
public async Task OnActionExecutionAsync(ActionExecutingContext context, ActionExecutionDelegate next)
{
var cache = context.HttpContext.RequestServices.GetRequiredService<IDistributedCache>();
var cacheKey = BuildCacheKey(context);
var cached = await cache.GetStringAsync(cacheKey);
if (cached is not null)
{
var result = JsonSerializer.Deserialize<object>(cached);
context.Result = new OkObjectResult(result);
return;
}
var executed = await next();
if (executed.Result is ObjectResult objectResult && objectResult.Value is not null)
{
var data = JsonSerializer.Serialize(objectResult.Value);
await cache.SetStringAsync(
cacheKey,
data,
new DistributedCacheEntryOptions
{
AbsoluteExpirationRelativeToNow =
TimeSpan.FromSeconds(ttlSeconds)
});
}
}
private static string BuildCacheKey(ActionExecutingContext context)
{
var route = context.HttpContext.Request.Path.Value ?? "";
if (context.ActionDescriptor is not ControllerActionDescriptor controllerAction)
return $"endpoint:{route}";
var relevantArgs = new Dictionary<string, object?>();
foreach (var param in controllerAction.MethodInfo.GetParameters())
{
if (param.GetCustomAttributes(typeof(FromServicesAttribute), false).Length != 0)
continue;
if (context.ActionArguments.TryGetValue(param.Name!, out var value))
{
relevantArgs[param.Name!] = value;
}
}
var argsJson = JsonSerializer.Serialize(relevantArgs);
var hashBytes = SHA256.HashData(Encoding.UTF8.GetBytes(argsJson));
var hash = Convert.ToHexString(hashBytes);
return $"endpoint:{route}:{hash}";
}
}# Create a new album (writes to master)
curl -X POST http://localhost:5291/api/album \
-H "Content-Type: application/json" \
-d '{
"title": "Test Album",
"songs": [
{"title": "Song 1"},
{"title": "Song 2"}
]
}'Response:
{
"id": 1
}# First read (goes to database, then cached)
curl http://localhost:5291/api/album/1
# Second read (served from Redis cache)
curl http://localhost:5291/api/album/1Both requests return:
{
"id": 1,
"title": "Test Album",
"songs": [
{"title": "Song 1"},
{"title": "Song 2"}
]
}# Make multiple read requests
for i in {1..10}; do
curl -s http://localhost:5291/api/album/1 > /dev/null
echo "Request $i completed"
sleep 0.5
doneCheck which replica handled the requests:
# Check replica 1 connections
docker exec -it pg-replica-1 psql -U postgres -d appdb -c "SELECT count(*) FROM pg_stat_activity WHERE datname='appdb';"
# Check replica 2 connections
docker exec -it pg-replica-2 psql -U postgres -d appdb -c "SELECT count(*) FROM pg_stat_activity WHERE datname='appdb';"# View all cache keys
docker exec -it redis-cache redis-cli keys "*"
# Get cached album data
docker exec -it redis-cache redis-cli get "app-cache:album-1"Problem: Can't connect to databases
# Check container status
docker-compose ps
# View logs
docker-compose logs pg-master
docker-compose logs pg-replica-1Solution:
# Restart services
docker-compose down
docker-compose up -d
# Wait for health checks
sleep 30Problem: Replicas not receiving data
# Check replication status
docker exec -it pg-master psql -U admin -d appdb -c "SELECT * FROM pg_stat_replication;"Solution:
# Restart replicas
docker-compose restart pg-replica-1 pg-replica-2
# Check replica logs
docker-compose logs pg-replica-1Problem: Database migrations fail
# Ensure master is ready
docker exec -it pg-master pg_isready -U admin -d appdb
# Reset if needed (WARNING: destroys data)
dotnet ef database drop -c AppDbContext
dotnet ef database update -c AppDbContextProblem: Redis caching not functioning
# Test Redis connection
docker exec -it redis-cache redis-cli ping # Should return PONG
# Clear cache if needed
docker exec -it redis-cache redis-cli FLUSHALLProblem: Ports already in use
# Find conflicting processes
netstat -ano | findstr :5432
netstat -ano | findstr :5433
netstat -ano | findstr :5434
netstat -ano | findstr :6379
# Kill process (replace PID)
taskkill /PID <PID> /F# Check replication lag
docker exec -it pg-master psql -U admin -d appdb -c "
SELECT
application_name,
replay_lag,
write_lag,
flush_lag
FROM pg_stat_replication;
"- Connection Pooling: Npgsql automatically pools connections
- No-Tracking Queries: ReadOnlyDbContext disables change tracking
- Strategic Caching: Cache frequently accessed data with appropriate TTL (Time-to-live)
- Index Optimization: Ensure proper indexes on frequently queried columns
- Add More Replicas: Scale horizontally by adding more replica containers
- Read/Write Split: Extend to route different query types appropriately
- Multi-Region: Deploy replicas across geographic regions
- Cache Warming: Pre-populate cache with popular data
- Backup Strategy: Regular backups of master database
- Failover: Automatic promotion of replica to master if needed
- Security: Use proper authentication and network isolation
- Monitoring: Implement comprehensive monitoring and alerting
You've now built and understood a complete database read scaling system. The architecture demonstrates:
- Separation of concerns between read and write operations
- Automatic load balancing across replica databases
- High-performance caching with Redis
- Data consistency through streaming replication
- Production-ready patterns for scalable applications
This foundation can be extended for more complex scenarios and serves as a solid starting point for understanding enterprise-grade database scaling patterns.
- Experiment with adding more replicas
- Implement cache invalidation strategies
- Add monitoring and metrics
- Explore advanced PostgreSQL features like logical replication
- Consider implementing write-through caching
The API documentation is available at http://localhost:5291/swagger for interactive testing.