This repo contains Distributed computation engine built to explore concepts such as sharding, cordination, fault tolerance, gRPC, TLS, and idempotent execution.
Core idea is to distribute matrix multiplication across multiple worker processes, aggregates results and tolerates worker failures via lease-based retries
-
Coordinator
-
Owns global job state
-
Splits jobs into shards
-
Assigns shards to worker using leases
-
Aggregates shared results deterministically
-
Workers
-
Stateless computes nodes
-
Pull shards from coordinator
-
Compute partial matrix results
-
Report results back
-
Client
-
Submits jobs to coordinator
distributed-system-learning/
├── proto/ # Protobuf definitions and generated code
│ ├── main.proto
│ ├── main.pb.go
│ └── main_grpc.pb.go
│
├── coordinator/ # Coordinator (scheduler + aggregator)
│
├── worker/ # Worker gRPC client and compute logic
│
├── client/ # Job submission client
│
├── certs/ # TLS certificates (examples only)
│ └── README.md
│
├── .env.example # Example environment variables
├── .gitignore
└── README.md
- Shard execution: at-least-once
- Final result: exactly-one
- Safe reteries on worker failure
- Determinitisc aggregation
- gRPC (Protocol Buffers) for all internal communication
- UNary RPCs only
- Workers pull work instead of coordinator pushing
- gRPC over TLS
- Coordinator presents a server certificate
- Clients/workers verify server identity using a CA certificate
