A Go-based service for ingesting logs into ClickHouse, with migration and REST API support.
The service uses a ClickHouse table to store ingested logs. Below is an example schema:
CREATE TABLE logs (
id Int64,
user_id Int64,
title String,
ingested_at DateTime,
body String,
source String
) ENGINE = MergeTree()
ORDER BY id;- id: Unique identifier for each log entry.
- user_id: post user_id.
- title: The title of the record.
- ingested_at: Time the post was ingested.
- body: Content body of the post.
- source: Source of the log (e.g., placeholder_api).
- Docker
- Docker Compose
- Go 1.24+ (for local development)
-
Clone the repository:
git clone git@github.com:Ankush405/data-ingestion.git cd data-ingestion -
Start the services using Docker Compose:
docker-compose up --build
This will:
- Start a ClickHouse server
- Run database migrations
- Start the ingestion server on http://localhost:8080
-
API Endpoints:
GET /health— Health check endpointGET /logs— Retrieve ingested logs
Note: Add unit/integration tests as needed.
-
Run tests locally:
go test ./... -
(Optional) Run tests inside Docker:
- Add a test stage to your Dockerfile or use a separate test container.
-
Build Docker images:
docker build -f Dockerfile.server -t yourrepo/ingestion-server:latest . docker build -f Dockerfile.migrate -t yourrepo/ingestion-migrate:latest .
-
Push images to your container registry:
docker push yourrepo/ingestion-server:latest docker push yourrepo/ingestion-migrate:latest
-
Provision a ClickHouse instance (e.g., Altinity.Cloud, Aiven, or self-hosted).
-
Set environment variables for your deployment (e.g.,
CLICKHOUSE_URL). -
Deploy using your preferred orchestrator (e.g., Kubernetes, ECS, GCP Cloud Run, etc.), referencing the pushed images and environment variables.
-
Code Structure:
cmd/server/— Main server applicationcmd/migrate/— Database migration toolinternal/log/— Log ingestion business logic
-
Configuration:
- Environment variable:
CLICKHOUSE_URL(e.g.,clickhouse://default:password123@clickhouse:9000/default)
- Environment variable:
-
Migrations:
- SQL migration files are in
cmd/migrate/migrations/
- SQL migration files are in
-
Extending:
- Add new endpoints in
cmd/server/server.go - Add new migrations in
cmd/migrate/migrations/
- Add new endpoints in
- Simplicity vs. Flexibility:
The service is designed with a simple schema and API to enable quick ingestion and querying. More flexible schemas (e.g., supporting arbitrary fields) were avoided to keep the implementation straightforward and performant. - ClickHouse as the Storage Engine:
ClickHouse was chosen for its high performance with analytical queries and large-scale log data. However, this comes at the cost of more complex setup and less transactional support compared to traditional relational databases. - Dockerized Deployment:
Using Docker and Docker Compose simplifies local development and deployment, but may not reflect all production nuances (e.g., security, scaling, persistent storage).
- Database Migrations:
Ensuring that migrations run reliably and idempotently, especially when deploying to new environments, required careful scripting and testing.
- Error Handling for Edge Cases:
Handling API timeouts, invalid responses, and database errors in a robust way is required to simulate various failure scenarios. - Enhanced Schema Flexibility:
Support for dynamic fields or a more flexible schema to accommodate different posts formats. - Observability:
Add structured logging, metrics, and tracing to improve monitoring and debugging in production. - Automated CI/CD:
Integrate automated testing, linting, and deployment pipelines for faster and safer releases. - Security Enhancements:
Implement authentication, authorization, and secure handling of secrets and environment variables.
For more details, see the source code and comments in each file.