Industrial-grade NLG system for Abstract Wikipedia, Wikifunctions, and standalone API use.
Semantik Architect is a data-driven Natural Language Generation (NLG) system built around a modular Python backend, GF-based grammar assets, language-specific schema/config data, and a separate Next.js frontend.
Instead of maintaining one monolithic renderer per language, the project combines:
- A modular Python backend for API, orchestration, persistence, and generation
- GF grammars and PGF artifacts for high-precision generation
- Family-oriented engines and morphology modules for rule-based text realization
- Frame schemas and structured payloads for language-agnostic input
- A background worker for long-running build and onboarding tasks
- A separate frontend for tools, dashboards, and operator workflows
The result is a practical architecture for rule-based NLG that can run as a standalone service while staying aligned with the broader Abstract Wikipedia workflow.
Semantik Architect is best understood as a modular monolith on the Python side, with a separate frontend application.
SemantiK_Architect/
├── app/ # Python backend
│ ├── core/ # Domain logic, constructions, ports, use cases
│ ├── adapters/ # API, worker, engines, persistence, messaging
│ └── shared/ # Config, DI container, observability, utilities
│
├── architect_frontend/ # Next.js frontend
├── gf/ # GF source grammars and compiled artifacts
├── schemas/ # JSON schemas for frame payloads
├── tools/ # Diagnostics, audits, QA, indexing, health tools
├── builder/ # Grammar/build orchestration
├── ai_services/ # Optional AI-assisted services
└── docs/ # Architecture and operational documentation
app/core/contains the domain model, semantic constructions, ports, and use casesapp/adapters/contains FastAPI routes, generation engines, repositories, and infrastructure adaptersapp/shared/contains configuration, dependency injection, logging, and observability helpers
architect_frontend/is a separate Node/Next.js application- It talks to the backend through the canonical API under
/api/v1 - In local development it usually runs on
:3000, while the backend runs on:8000
In containerized mode, the stack is split into:
- Redis
- API backend
- ARQ worker
- Next.js frontend
- Nginx reverse proxy
The reverse proxy exposes the app under:
- UI:
/semantik_architect/ - API:
/semantik_architect/api/v1
Semantic frames are the language-agnostic inputs to the generator. They describe what should be said before any language-specific realization happens.
Examples include:
- biographical facts
- entity descriptions
- relational/classification payloads
- event payloads
Example:
{
"frame_type": "bio",
"subject": { "name": "Marie Curie", "qid": "Q7186" },
"properties": {
"profession": "physicist",
"nationality": "polish"
}
}The backend contains reusable constructions for common meaning patterns, such as:
- copular classification
- transitive events
- passive events
- topic-comment structures
- relative clauses
- possession and existential forms
These constructions let the system stay semantic-first rather than language-script-first.
Semantik Architect supports multiple realization strategies:
- GF-backed generation for higher-precision, grammar-driven output
- Python family/morphology engines for rule-based realization paths and fallback strategies
The repository includes a dedicated schemas/ directory with JSON Schemas for structured frame payloads. These schemas are separate from the Python packaging layer and act as contracts for incoming content and tooling.
Long-running operations such as onboarding or building language resources are handled asynchronously through the worker stack rather than blocking the API process.
The easiest way to run the full stack is Docker Compose.
docker compose up --build- Reverse-proxied UI:
http://localhost:4000/semantik_architect/ - Backend docs:
http://localhost:8000/docs - Direct frontend container port:
http://localhost:3000 - Redis:
localhost:6379
curl http://localhost:8000/api/v1/health/readyYou can also use the direct health route outside the API prefix:
curl http://localhost:8000/health/readyThe canonical backend contract lives under /api/v1.
Path-style language selection:
curl -X POST http://localhost:8000/api/v1/generate/eng \
-H "x-api-key: secret" \
-H "Content-Type: application/json" \
-d '{
"frame_type": "bio",
"subject": {"name": "Marie Curie"},
"properties": {"profession": "physicist", "nationality": "polish"}
}'There is also a payload-driven generation route for clients that send language inside the request body.
curl http://localhost:8000/api/v1/languagesAdministrative language lifecycle operations live under the management layer and are intended to be protected.
Example:
curl -X POST http://localhost:8000/api/v1/languages/ \
-H "x-api-key: secret" \
-H "Content-Type: application/json" \
-d '{"code": "zul", "name": "Zulu", "family": "Bantu"}'Keep local development simple:
- one Python environment at the repo root
- one Node environment in
architect_frontend/
Do not create a separate Python virtual environment for every subdirectory unless you intentionally refactor the repo into multiple installable Python packages.
Run the backend in WSL/Linux or Docker. The GF runtime and related dependencies are not a good fit for native Windows execution.
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtcd architect_frontend
npm install
npm run devBackend:
source .venv/bin/activate
uvicorn app.adapters.api.main:create_app --factory --host 0.0.0.0 --port 8000 --reloadWorker:
source .venv/bin/activate
arq app.workers.worker.WorkerSettings --watch appFrontend:
cd architect_frontend
npm run devFor lifecycle operations such as build, doctor, align, and service startup, manage.py is the canonical orchestrator.
Examples:
python manage.py doctor
python manage.py align --force
python manage.py build --langs en frThe test suite is organized into:
- unit tests for core/domain behavior
- integration tests for adapters and infrastructure-aware flows
- e2e tests for API behavior
Typical commands:
pytest
pytest tests/unit
pytest tests/integrationSemantik Architect includes a GF build/orchestration pipeline and a compiled PGF runtime artifact.
Important directories:
gf/for grammar sources and compiled outputbuilder/for build orchestrationtools/for audits, inventory, diagnostics, and health checks
If you are working on the grammar layer, prefer the documented build flow through manage.py and the builder/orchestrator tooling rather than ad hoc commands.
The backend also exposes a protected Tools API used by the frontend dashboard.
This layer is designed around:
- an allowlisted registry of tool commands
- repo-root path confinement
- argument allowlisting
- output truncation and timeouts
- optional AI-gated tools
This keeps operational tooling usable from the UI without turning the backend into a generic remote shell.
Current repository direction:
- canonical FastAPI backend under
app/adapters/api/main.py - canonical API contract under
/api/v1 - separate Next.js frontend under
architect_frontend/ - async worker for background processing
- Dockerized multi-service deployment
- schema-driven input contracts
- observability hooks for structured logging and tracing
- Backend:
app/ - Frontend:
architect_frontend/ - Grammars:
gf/ - Schemas:
schemas/ - Operational tools:
tools/ - Build orchestration:
builder/ - AI helpers:
ai_services/ - Docs:
docs/
- Repository:
README.md,docs/, and the Docker files in this repo are the best current source of truth - Setup guide:
docs/00-SETUP_AND_DEPLOYMENT.md - Tools inventory:
docs/17-TOOLS_AND_TESTS_INVENTORY.md - API/UI unification notes:
docs/APIUI Unification Update.txt