Semantik Architect

Industrial-grade NLG system for Abstract Wikipedia, Wikifunctions, and standalone API use.

Semantik Architect is a data-driven Natural Language Generation (NLG) system built around a modular Python backend, GF-based grammar assets, language-specific schema/config data, and a separate Next.js frontend.

Instead of maintaining one monolithic renderer per language, the project combines:

A modular Python backend for API, orchestration, persistence, and generation
GF grammars and PGF artifacts for high-precision generation
Family-oriented engines and morphology modules for rule-based text realization
Frame schemas and structured payloads for language-agnostic input
A background worker for long-running build and onboarding tasks
A separate frontend for tools, dashboards, and operator workflows

The result is a practical architecture for rule-based NLG that can run as a standalone service while staying aligned with the broader Abstract Wikipedia workflow.

Architecture Overview

Semantik Architect is best understood as a modular monolith on the Python side, with a separate frontend application.

SemantiK_Architect/
├── app/                       # Python backend
│   ├── core/                 # Domain logic, constructions, ports, use cases
│   ├── adapters/             # API, worker, engines, persistence, messaging
│   └── shared/               # Config, DI container, observability, utilities
│
├── architect_frontend/       # Next.js frontend
├── gf/                       # GF source grammars and compiled artifacts
├── schemas/                  # JSON schemas for frame payloads
├── tools/                    # Diagnostics, audits, QA, indexing, health tools
├── builder/                  # Grammar/build orchestration
├── ai_services/              # Optional AI-assisted services
└── docs/                     # Architecture and operational documentation

Backend responsibilities

app/core/ contains the domain model, semantic constructions, ports, and use cases
app/adapters/ contains FastAPI routes, generation engines, repositories, and infrastructure adapters
app/shared/ contains configuration, dependency injection, logging, and observability helpers

Frontend responsibilities

architect_frontend/ is a separate Node/Next.js application
It talks to the backend through the canonical API under /api/v1
In local development it usually runs on :3000, while the backend runs on :8000

Runtime topology

In containerized mode, the stack is split into:

Redis
API backend
ARQ worker
Next.js frontend
Nginx reverse proxy

The reverse proxy exposes the app under:

UI: /semantik_architect/
API: /semantik_architect/api/v1

Core Concepts

1. Semantic Frames

Semantic frames are the language-agnostic inputs to the generator. They describe what should be said before any language-specific realization happens.

Examples include:

biographical facts
entity descriptions
relational/classification payloads
event payloads

Example:

{
  "frame_type": "bio",
  "subject": { "name": "Marie Curie", "qid": "Q7186" },
  "properties": {
    "profession": "physicist",
    "nationality": "polish"
  }
}

2. Constructions

The backend contains reusable constructions for common meaning patterns, such as:

copular classification
transitive events
passive events
topic-comment structures
relative clauses
possession and existential forms

These constructions let the system stay semantic-first rather than language-script-first.

3. Generation Engines

Semantik Architect supports multiple realization strategies:

GF-backed generation for higher-precision, grammar-driven output
Python family/morphology engines for rule-based realization paths and fallback strategies

4. Schemas and Configuration

The repository includes a dedicated schemas/ directory with JSON Schemas for structured frame payloads. These schemas are separate from the Python packaging layer and act as contracts for incoming content and tooling.

5. Background Work

Long-running operations such as onboarding or building language resources are handled asynchronously through the worker stack rather than blocking the API process.

Quick Start (Docker)

The easiest way to run the full stack is Docker Compose.

Start everything

docker compose up --build

Main endpoints

Reverse-proxied UI: http://localhost:4000/semantik_architect/
Backend docs: http://localhost:8000/docs
Direct frontend container port: http://localhost:3000
Redis: localhost:6379

Health check

curl http://localhost:8000/api/v1/health/ready

You can also use the direct health route outside the API prefix:

curl http://localhost:8000/health/ready

API Usage

The canonical backend contract lives under /api/v1.

Generate text

Path-style language selection:

curl -X POST http://localhost:8000/api/v1/generate/eng \
  -H "x-api-key: secret" \
  -H "Content-Type: application/json" \
  -d '{
    "frame_type": "bio",
    "subject": {"name": "Marie Curie"},
    "properties": {"profession": "physicist", "nationality": "polish"}
  }'

There is also a payload-driven generation route for clients that send language inside the request body.

Supported languages

curl http://localhost:8000/api/v1/languages

Onboard or manage languages

Administrative language lifecycle operations live under the management layer and are intended to be protected.

Example:

curl -X POST http://localhost:8000/api/v1/languages/ \
  -H "x-api-key: secret" \
  -H "Content-Type: application/json" \
  -d '{"code": "zul", "name": "Zulu", "family": "Bantu"}'

Local Development

Recommended mental model

Keep local development simple:

one Python environment at the repo root
one Node environment in architect_frontend/

Do not create a separate Python virtual environment for every subdirectory unless you intentionally refactor the repo into multiple installable Python packages.

Backend setup

Run the backend in WSL/Linux or Docker. The GF runtime and related dependencies are not a good fit for native Windows execution.

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Frontend setup

cd architect_frontend
npm install
npm run dev

Run services manually

Backend:

source .venv/bin/activate
uvicorn app.adapters.api.main:create_app --factory --host 0.0.0.0 --port 8000 --reload

Worker:

source .venv/bin/activate
arq app.workers.worker.WorkerSettings --watch app

Frontend:

cd architect_frontend
npm run dev

Unified orchestration

For lifecycle operations such as build, doctor, align, and service startup, manage.py is the canonical orchestrator.

Examples:

python manage.py doctor
python manage.py align --force
python manage.py build --langs en fr

Testing

The test suite is organized into:

unit tests for core/domain behavior
integration tests for adapters and infrastructure-aware flows
e2e tests for API behavior

Typical commands:

pytest
pytest tests/unit
pytest tests/integration

Build and Grammar Layer

Semantik Architect includes a GF build/orchestration pipeline and a compiled PGF runtime artifact.

Important directories:

gf/ for grammar sources and compiled output
builder/ for build orchestration
tools/ for audits, inventory, diagnostics, and health checks

If you are working on the grammar layer, prefer the documented build flow through manage.py and the builder/orchestrator tooling rather than ad hoc commands.

Tools and Operator Workflows

The backend also exposes a protected Tools API used by the frontend dashboard.

This layer is designed around:

an allowlisted registry of tool commands
repo-root path confinement
argument allowlisting
output truncation and timeouts
optional AI-gated tools

This keeps operational tooling usable from the UI without turning the backend into a generic remote shell.

Status

Current repository direction:

canonical FastAPI backend under app/adapters/api/main.py
canonical API contract under /api/v1
separate Next.js frontend under architect_frontend/
async worker for background processing
Dockerized multi-service deployment
schema-driven input contracts
observability hooks for structured logging and tracing

Repository Map

Backend: app/
Frontend: architect_frontend/
Grammars: gf/
Schemas: schemas/
Operational tools: tools/
Build orchestration: builder/
AI helpers: ai_services/
Docs: docs/

Links

Repository: README.md, docs/, and the Docker files in this repo are the best current source of truth
Setup guide: docs/00-SETUP_AND_DEPLOYMENT.md
Tools inventory: docs/17-TOOLS_AND_TESTS_INVENTORY.md
API/UI unification notes: docs/APIUI Unification Update.txt

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
_quarantine_languages/gf		_quarantine_languages/gf
ai_services		ai_services
alembic		alembic
app		app
architect_frontend		architect_frontend
builder		builder
data		data
deploy		deploy
discourse		discourse
docker		docker
docs/Technical-Reference		docs/Technical-Reference
generated/src		generated/src
gf-rgl		gf-rgl
gf		gf
logs		logs
nlg		nlg
registry		registry
schemas		schemas
scripts		scripts
tests		tests
tools		tools
utils		utils
.env.example		.env.example
.flake8		.flake8
.gitattributes		.gitattributes
.gitignore		.gitignore
.smartignore		.smartignore
Config.hs		Config.hs
Entity		Entity
Fact		Fact
GitSink.bat		GitSink.bat
LICENSE		LICENSE
LICENSE_EXCLUSION_RATIONALE.md		LICENSE_EXCLUSION_RATIONALE.md
Modifier		Modifier
Predicate		Predicate
Property		Property
README.md		README.md
RELEASE.html		RELEASE.html
RELEASE.md		RELEASE.md
RUN-Architect.bat		RUN-Architect.bat
Run-Architect.ps1		Run-Architect.ps1
SKA_Process_Manager.ps1		SKA_Process_Manager.ps1
SKA_Process_Manager_RUNNER.bat		SKA_Process_Manager_RUNNER.bat
Setup.hs		Setup.hs
alembic.ini		alembic.ini
api_health.json		api_health.json
check_models.py		check_models.py
context_gatherer.py		context_gatherer.py
debug_matrix.py		debug_matrix.py
disable_broken_compile.sh		disable_broken_compile.sh
docker-compose.yml		docker-compose.yml
fix_config.py		fix_config.py
fix_grammar_files.py		fix_grammar_files.py
generate_path_map.py		generate_path_map.py
health_report.json		health_report.json
lex_stats.json		lex_stats.json
link_libraries.py		link_libraries.py
manage.py		manage.py
package-lock.json		package-lock.json
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
robots.txt		robots.txt
router.py		router.py
smoke_test.py		smoke_test.py
stdout		stdout
sync_config_from_gf.py		sync_config_from_gf.py
tempo.py		tempo.py
test.bat		test.bat
wsl_shell_venv.bat		wsl_shell_venv.bat

Folders and files

Latest commit

History

Repository files navigation

Semantik Architect

Architecture Overview

Backend responsibilities

Frontend responsibilities

Runtime topology

Core Concepts

1. Semantic Frames

2. Constructions

3. Generation Engines

4. Schemas and Configuration

5. Background Work

Quick Start (Docker)

Start everything

Main endpoints

Health check

API Usage

Generate text

Supported languages

Onboard or manage languages

Local Development

Recommended mental model

Backend setup

Frontend setup

Run services manually

Unified orchestration

Testing

Build and Grammar Layer

Tools and Operator Workflows

Status

Repository Map

Links

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages