Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 8 additions & 10 deletions .github/workflows/gh-pages.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,27 +25,25 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
uses: actions/checkout@v6
- name: Setup Pages
uses: actions/configure-pages@v5
uses: actions/configure-pages@v6
- name: Setup Node.js
uses: actions/setup-node@v4
uses: actions/setup-node@v6
with:
node-version: '22'
node-version: '24'
cache: 'npm'
cache-dependency-path: website/package-lock.json
- name: Install npm dependencies
working-directory: website
run: npm ci
- name: Build
working-directory: website
run: npx ng build --base-href /arabterm/
- name: Create 404.html file
run: cp website/dist/website/browser/index.html website/dist/website/browser/404.html
run: npm run build
- name: Upload artifact
uses: actions/upload-pages-artifact@v3
uses: actions/upload-pages-artifact@v5
with:
path: website/dist/website/browser/
path: website/dist/

# Deployment job
deploy:
Expand All @@ -57,4 +55,4 @@ jobs:
steps:
- name: Deploy to GitHub Pages
id: deployment
uses: actions/deploy-pages@v4
uses: actions/deploy-pages@v5
14 changes: 12 additions & 2 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co

`arabterm` is a curated SQLite database of Arabic/English/French multilingual dictionaries (~400k terms across ~50 dictionaries). It is the upstream data source for [Wiki Term Base](https://wikitermbase.toolforge.org/). The Python package is a thin layer over SQLAlchemy models and migration scripts — there is no application, library API, or test suite. Most "work" in this repo is producing the SQL dumps in [db/](db/) from the canonical `arabterm.db`.

A small Astro static site under [website/](website/) is also derived from `arabterm.db` at build time and deployed to GitHub Pages — see "Website" below.

## Common commands

```sh
Expand Down Expand Up @@ -44,8 +46,16 @@ The FULLTEXT index is the *reason* MariaDB exists in this project: SQLite has no

[`.github/workflows/notify-wikitermbase.yml`](.github/workflows/notify-wikitermbase.yml) fires only when `db/mariadb/arabterm.sql.gz` changes on `main`. It uses the `WIKITERMBASE_DISPATCH_PAT` secret to dispatch an `arabterm-data-updated` repository_dispatch event to `forzagreen/wikitermbase`, which auto-opens a PR there. SQLite-only changes do *not* trigger the notification — if you intend to publish a data change, regenerate **both** dumps.

### Website

[website/](website/) is an [Astro](https://astro.build/) static site (deployed to <https://forzagreen.github.io/arabterm/>) that reads `arabterm.db` at build time via `better-sqlite3` and emits one HTML page per dictionary (paginated 1000 terms / page) plus a per-dict JSON download. It's a third derived view of the DB alongside the SQLite and MariaDB dumps — no JSON is committed.

Legacy unprefixed URLs from the original Angular site (e.g. `/water_engineering/`) are preserved as static HTML redirects to the canonical `name_tech` URL (`/arabterm_water_engineering/`). The legacy slug list lives in `LEGACY_SLUGS` in [website/src/lib/db.ts](website/src/lib/db.ts) — never remove a legacy slug from this list, even if its underlying dictionary changes.

[`.github/workflows/gh-pages.yml`](.github/workflows/gh-pages.yml) runs `npm run build` inside `website/` on every push to `main` and uploads `website/dist/` to GitHub Pages.

## Conventions

- Adding a new dictionary: insert into the `dictionary` table with a unique `name_tech` slug, then bulk-insert terms with the matching `dictionary_id`. Run `make regenerate_dumps` before opening a PR so the MariaDB dump is in sync.
- The repo contains large notebooks (`V2.ipynb`, `MigrateDB.ipynb`, etc.) and scratch directories (`playground/`, `data/`, `exports/`, `samples/`) used for historical scraping/ingestion. They are not part of the published pipeline — don't edit them as part of routine changes.
- Adding a new dictionary: insert into the `dictionary` table with a unique `name_tech` slug, then bulk-insert terms with the matching `dictionary_id`. Run `make regenerate_dumps` before opening a PR so the MariaDB dump is in sync. The website will pick up the new dictionary automatically on the next deploy.
- The repo contains large notebooks (`V2.ipynb`, `MigrateDB.ipynb`, etc.) and scratch directories (`playground/`, `samples/`) used for historical scraping/ingestion. They are not part of the published pipeline — don't edit them as part of routine changes.
- Python 3.10+, SQLAlchemy 2.x style (`Mapped[...]`, `mapped_column`).
10 changes: 9 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,9 @@ The dictionaries are available as an [SQLite](https://www.sqlite.org/) database

- [SQLite database](#sqlite-database)
- [Dictionaries](#dictionaries)
- [Other datasets](#other-datasets)
- [Website](#website)
- [Generating database dumps](#generating-database-dumps)
- [History](#history)
- [References](#references)


Expand Down Expand Up @@ -106,6 +108,12 @@ It contains 2 tables: `dictionary` and `term`.
| معجم المصطلحات الطبية (ج.3، 1997) | | 3098 | [Q124465892](https://www.wikidata.org/wiki/Q124465892) |


## Website

A static reference site is published at [forzagreen.github.io/arabterm](https://forzagreen.github.io/arabterm/), with one page per dictionary. Live search is delegated to [Wikitermbase](https://wikitermbase.toolforge.org/); the site itself is purely a static index plus paginated term tables and per-dictionary JSON downloads.

The site is built with [Astro](https://astro.build/) and reads `arabterm.db` directly at build time, so it stays in sync with the data on every push. See [website/README.md](website/README.md) for the build setup.

## Generating database dumps

This is an internal development workflow for generating the SQL dumps in [db/](db/) for SQLite and MariaDB. This requires Docker and a python venv.
Expand Down
Loading