Skip to content

Latest commit

 

History

History
83 lines (75 loc) · 6.09 KB

File metadata and controls

83 lines (75 loc) · 6.09 KB

Maintenance Report - ovos-localize

[2026-03-23] - Request a new language workflow

  • AI Model: Claude Sonnet 4.6
  • Actions Taken:
    • Added config/enabled_languages.txt — allowlist of explicitly enabled BCP-47 codes; empty by default, managed by automation.
    • Added _load_enabled_languages() in scripts/generate_data.py — reads the allowlist and seeds all_langs before scanning, so those codes appear in coverage.json/stats.json at 0 % when no locale files exist yet.
    • Added .github/ISSUE_TEMPLATE/new_language.yml — issue form for requesting a new language.
    • Added .github/workflows/enable_new_language.yml — two-job workflow: (1) on issue open, validates the BCP-47 code and opens a PR for maintainer review; (2) on PR merge into dev, triggers update_data.yml for a data refresh.
    • Updated index.html submitLangRequest() — BCP-47 code is now required, validated client-side, and embedded as a <!-- NEW_LANGUAGE_META ... --> block in the issue body so the workflow can parse it reliably.
  • Oversight: Human review required (PR must be approved before language is enabled).

[2026-03-20] - Four New Open Data Dataset Generators

  • AI Model: Claude Sonnet 4.6
  • Actions Taken:
    • Added ovos_localize/datasets/slot_filling.py — slot-filling / NER dataset: intent templates + slot names + known entity values from .entity files.
    • Added ovos_localize/datasets/response_pairs.py — intent→dialog response pairs derived from context.triggers_dialog (AST-extracted handler analysis); no string heuristics.
    • Added ovos_localize/datasets/tts_corpus.py — TTS training corpus from all .dialog files across all languages; template-expanded and deduplicated.
    • Added ovos_localize/datasets/skill_metadata.py — multilingual skill name/description/examples/tags from skill.json files.
    • Updated ovos_localize/datasets/__init__.py to export all six generators.
    • Rewrote scripts/generate_datasets.py to wire all generators; outputs to data/datasets/{slot_filling,response_pairs,tts,skill_metadata}/.
    • Added test/unittests/test_datasets.py — 28 unit tests covering all four generators.
    • Updated FAQ.md with dataset table and AST-pairing explanation.
  • Oversight: 168 unit tests passing; generator verified against live data/skills/ corpus.

[2026-03-20] - Fix Entity Create Mode Crash + Improve UX

  • AI Model: Claude Sonnet 4.6
  • Actions Taken:
    • Fixed TypeError: Cannot read properties of undefined (reading 'type') crash in renderEditor() (index.html:1986) — replaced fileData.type with already-computed fileType variable; fileData is undefined in create mode.
    • In entity create mode, source panel now shows intent files that use {slotName} (derived from skill.files — no extra fetch). Panel header changes from "Source" to "Used in intents".
    • fileHelp message in create mode now names the slot and intent count for context.
    • Source language <select> hidden in create mode (no source langs exist).
    • Updated FAQ.md.
  • Oversight: 140 unit tests passing; JS syntax clean via node.

[2026-03-19] - Fix Frontend Onboarding Guard

  • AI Model: Claude Sonnet 4.6
  • Actions Taken:
    • Extended public pages list to include #/stats, #/entities, #/open-data so they render without a saved profile.
    • Removed permanent accent styling on Open Data nav link (index.html:96).
    • Updated FAQ.md.
  • Oversight: Verified via Chromium CDP — all three pages render without a profile.

[2026-03-19] - Dataset Cleanup After BCP-47 Normalization

  • AI Model: Claude Sonnet 4.6
  • Actions Taken:
    • Deleted stale dataset files using deprecated lang codes (eu-EU.jsonl, eu.jsonl, es-LM.jsonl and translation counterparts).
    • Added regenerated datasets with normalized codes (eu-ES.jsonl, es-419.jsonl).
    • Staged and committed all modified skill JSON, coverage, stats, repos, entities, and TSV files.
    • Updated FAQ.md to explain the file removal.
  • Oversight: 140 unit tests passing.

[2026-03-19] - Dependency Fixes & Test Validation

  • AI Model: Gemini 2.0 Flash
  • Actions Taken:
    • Added language_data>=1.1 to pyproject.toml to resolve ModuleNotFoundError in langcodes during name lookups.
    • Added PyYAML to pyproject.toml to support parsing of settingsmeta.yml files.
    • Synced local .venv using uv.
    • Verified all 139 unit tests pass with 90% coverage.
  • Oversight: Automated verification via pytest.

[2026-03-19] - Dataset Generator (Open Data)

  • AI Model: Gemini 2.0 Flash
  • Actions Taken:
    • Created ovos_localize.datasets package for generating ML datasets from parsed skills.
    • Implemented classification.py for NLU intent datasets.
    • Implemented translation.py for parallel corpora machine translation datasets.
    • Created pipeline script scripts/generate_datasets.py to auto-generate JSONL files.
    • Updated .github/workflows/update_data.yml to run the dataset generation in CI.
    • Updated docs/index.md to document the Open Data datasets.
  • Oversight: Manual code review and local execution verified dataset generation success.

[2026-03-19] - Dataset Refactoring & File Splitting

  • AI Model: Gemini 2.0 Flash
  • Actions Taken:
    • Refactored generate_data.py and generate_datasets.py to enforce a 48MB limit per file.
    • Implemented chunked JSON loading for per-skill detail files (e.g., ovos-skill-days-in-history.json split into 2 chunks).
    • Updated index.html with a new fetchSkill helper to seamlessly handle multi-chunk skill data.
    • Updated ML dataset generators to expand all sentence templates ((a|b), [optional]) into unique utterances.
    • Implemented data cleaning for ML datasets: lowercase, remove extra whitespace, and deduplicate.
    • Refactored dataset.tsv to use expansion and splitting (now 100MB+ split into 3 files).
    • Removed JSON indentation across all generated data to optimize file size.
  • Oversight: Verified file sizes are < 50MB and content is expanded/cleaned via local execution.