Skip to content

add latin#1053

Open
beackers wants to merge 7 commits intosspanak:masterfrom
beackers:latin
Open

add latin#1053
beackers wants to merge 7 commits intosspanak:masterfrom
beackers:latin

Conversation

@beackers
Copy link

I saw TT9 has Greek in it, so since I'm a Latin nerd, I was like "hmm, let me see". I use TT9 enough for typing Latin assignments as well, so... :)

I did get Codex by OpenAI to write this, and its notes are in laWordlistReadme.txt. Let me know if something needs to get changed.

@sspanak
Copy link
Owner

sspanak commented Feb 28, 2026

Thanks for contributing!

Just to clarify, the Greek dictionary is not Ancient Greek. It is modern Greek used nowadays in Greece.

Either way, I am OK with adding dead or fictional languages, just for fun. But, if possible for these languages, I would like to add big dicitionaries that permit typing a lot of words in different contexts. For Latin, this should be easy - a lot of dictionaries exists today, and they can be easily incorporated in TT9. Unfortunately, I am quite busy now, so I can not spend time on this myself. On the other hand, you seem to have a lot of free time, and if you are willing to make some more effort, I will include Latin in the app. 🙂

I suggest that we use this wordlist. It contains more than 1 million words, so it should produce very nice typing experience. It may require some cleaning, e.g. remove the single letters, remove any words with corrupted or non-Latin letters, but it should mostly be fine.

As for the macrons, the word list from Winedt doesn't contain any. I would recommend actually installing and using the latin-macronizer tool you have found to make the dictionary nicer.

After that, I can build an APK for you to do some real-world testing, and if it feels alright, I'll merge it an publish it.

Go for it!

@sspanak sspanak self-requested a review February 28, 2026 11:11
Co-authored-by: Dimo Karaivanov <doftor.livain@gmail.com>
@beackers
Copy link
Author

... wow I should have caught 90% of that. My apologies for submitting that.

I do indeed have quite a bit of free time, I'll get right to it 👍🏻

@beackers
Copy link
Author

@sspanak
Okay, big fork in the road. I had a look at the Winedt Latin dictionary... and you're right, it doesn't contain macrons. Which, Chat is telling me a lot of Latin people don't type macrons, so I won't bother that.

The bigger question is that the dict is also huge. 29.517 MB, and 1243950 words. For its size, it does have the Latin I study and a lot of it that I haven't...

It doesn't appear to have duplicates, I haven't looked the thing over completely, but it does look like it's not unnecessarily duplicated (for instance: -que is a suffix basically meaning "and <that word"; que and *que appear once in the dictionary).

Do you want me to include the whole 29 MB or try to trim it down to what most classical Latin folks will actually use?

@sspanak
Copy link
Owner

sspanak commented Mar 10, 2026

it doesn't contain macrons. Which, Chat is telling me a lot of Latin people don't type macrons, so I won't bother that.

From what I know, Romans were trying to find ways to mark long vowels. They felt the need to mark them, and I can see how adding macrons with modern-day fonts is useful, that's why I suggested using that tool to macronize the words. Also, from what I know, people who refuse to use macrons are hardcore fans who think they know everything. 😄

The bigger question is that the dict is also huge. 29.517 MB, and 1243950 words. For its size, it does have the Latin I study and a lot of it that I haven't...
Do you want me to include the whole 29 MB or try to trim it down to what most classical Latin folks will actually use?

TT9 can handle even 2 million words, don't worry about it. See, in T9 keyboards, you must have as many words as possible, otherwise, you have to compose them letter by letter, which is painfully slow. And even if you don't know some word, maybe someone else knows it and may want to type. Also, you may want to quote or copy some text, even if you don't know all the words. You need to have them for nicer typing experience.

So, bring it on! The more, the better.

It doesn't appear to have duplicates, I haven't looked the thing over completely, but it does look like it's not unnecessarily duplicated

Yes, these dictionaries are quite good. They rarely contain misspelled words, garbage words and whatnot.

-que is a suffix basically meaning "and <that word"; que and *que appear once in the dictionary).

I am not quite sure what you mean by that. There are quite a few words that end with "-que". Maybe, I misunderstood you...

@beackers
Copy link
Author

I think I've got the ducks in a row now.

Unfortunately I'll have to leave out macrons, at least for now. Turns out that the macronizers out there are tuned for making historical documents more readable, not dictionaries (even if I put in ~25000 words at a time), and they do such a great job at it based on contextual clues. With a dictionary, the only context for the _n_th word are the n-1 words behind it... and accordingly, all the tools I tried (especially on edge cases

@beackers beackers closed this Mar 11, 2026
@beackers beackers reopened this Mar 11, 2026
@beackers
Copy link
Author

Whoops misclick.

especially on edge cases like malo and ma_lo_, with evil and I prefer) either macronized incorrectly or didn't at all. So for the sake of preserving accuracy (and not having to go through all 1.24m words by hand) I'll leave it out for now and keep my eyes out for a macronizer for dictionaries.

@beackers beackers requested a review from sspanak March 11, 2026 18:29
@sspanak
Copy link
Owner

sspanak commented Mar 12, 2026

I see. I had a similar problem a couple of times already. The most notable case was Russian, where they have "е" and "ё", but most people use them inconsistently, instead of following the grammar rules. Naturally, this means most word lists found on the Internet contain misspelled words. So, I've found a tool that knew how to correct these two letters, and in the ambiguous cases, where two different words could be written with "е" or "ё" (just like "malo" and "mālō"), I kept both variants, because both are valid words. In my case, it was easy, because the tool had the option of producing one of the variants or both, and it didn't require context, it just processed according to the configuration. But, as I understand, this is not the case with the Latin macronizers.

So, in summary, if you can't make the macronizer tool(s) produce all valid word variants, ignoring the context, then we can go without macrons, I guess. They would have been valuable though. Dropping them is a bit sad.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants