This repository contains a comprehensive, open-source Kazakh-Russian-English linguistic database. The project focuses on etymological documentation, morphological analysis, and the implementation of various Latin transcription standards.
The primary goal of this initiative is to create a structured, machine-readable resource for the Kazakh language that:
- Document word origins (etymology) and historical development.
- Maps morphological relationships between root words and their derivatives.
- Provides accurate phonetic transcriptions (IPA) and multiple Latin script implementations.
- Serves as a public domain (CC0) foundation for linguistic research, educational tools, and software development.
Data is stored in standardized YAML files, ensuring ease of parsing and human readability. Each entry includes:
- Trilingual Definitions: Full support for Kazakh, Russian, and English.
- Phonetics: International Phonetic Alphabet (IPA) transcriptions.
- Transcription Systems: Support for the 2017 and 2021 official standards, as well as the AnmiTaliDev Latin proposal.
- Semantic Relations: Comprehensive mapping of synonyms and antonyms.
- Contextual Data: Real-world usage examples for every definition.
Entries are organized hierarchically within the dictionary/ directory, sorted by the initial Cyrillic character of the headword.
- id: 1
word: "кітап"
parent_id: null
type: "noun"
transcription: "kɪˈtɑp"
writing_systems:
latin_2017: "kitap"
latin_2021: "kitap"
latin_my: "kıtap"
root_word: "кітап"
etymology: "Arabic origin"
history: "Derived from Arabic 'kitab' (کتاب)..."
definitions:
- meaning: "Басылып шыққан әдеби, ғылыми немесе оқу туындысы"
translation_ru: "Книга"
translation_en: "Book"
examples:
- kk: "Бұл кітап өте қызықты."
ru: "Эта книга очень интересная."
en: "This book is very interesting."Contributions are governed by strict quality standards to maintain the integrity of the database.
- Review the technical specifications in CONTRIBUTING.md.
- For AI-assisted contributions, consult the mandatory instructions in AGENTS.md.
- Ensure all fields (including English translations and IPA) are completed.
- Submit changes via a Pull Request.
Note: The LIST.md file is managed automatically via GitHub Actions. Manual modifications to this file will be overwritten.
This project is dedicated to the public domain under the CC0 1.0 Universal License.
- No copyright protection is claimed.
- You may copy, modify, distribute, and perform the work, even for commercial purposes, without asking permission.
- Attribution is not required but is appreciated for the continued growth of the project.
- Maintainer: AnmiTaliDev
- Communication: anmitali198@gmail.com
- Technical Issues: GitHub Issue Tracker
Қазақ тілі мәңгі жасай берсін!