Skip to content

added malayalam language#838

Draft
shyjun wants to merge 1 commit intosspanak:masterfrom
shyjun:shyjun/added_malayalam_language_dictionary
Draft

added malayalam language#838
shyjun wants to merge 1 commit intosspanak:masterfrom
shyjun:shyjun/added_malayalam_language_dictionary

Conversation

@shyjun
Copy link

@shyjun shyjun commented Jul 8, 2025

added malayalam language

@shyjun shyjun marked this pull request as draft July 8, 2025 05:19
@shyjun shyjun force-pushed the shyjun/added_malayalam_language_dictionary branch from dab02aa to fddd135 Compare July 8, 2025 16:14
Copy link
Owner

@sspanak sspanak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't accept this pull request as-is. The resulting language will not be usable at all. See my previous comments.

@shyjun
Copy link
Author

shyjun commented Jul 21, 2025

Yes, i understand that it can't be merged. Thats why kept it as Draft.

The Malayalam keyboard/words I am referring is like an extension to the English words.

Probably, what I have to do is, take English word list, and add Malayalam words with it and publish it as Manglish keyboard.

@sspanak
Copy link
Owner

sspanak commented Jul 24, 2025

Yes, i understand that it can't be merged. Thats why kept it as Draft.

The Malayalam keyboard/words I am referring is like an extension to the English words.

Probably, what I have to do is, take English word list, and add Malayalam words with it and publish it as Manglish keyboard.

It sounds like a good idea. You may choose to use a British English or an American English, or even an Indian English word base depending on what's more appropriate. Still, just 800 Malayalam words sound very little. Shouldn't there be at least tens of thousands of words? Comparing to the Hinglish dictionary in TT9, I can see maybe half of the words are in Hindi and half are in English.

Perhaps, you can make use of some word lists from the Indic Project? Or, if there is an official Malayalam dictionary, maybe use that instead? Of course, the words would need to be converted to Latin but it shouldn't be a problem, I believe.

@shyjun
Copy link
Author

shyjun commented Jul 24, 2025

Yes, i understand that it can't be merged. Thats why kept it as Draft.
The Malayalam keyboard/words I am referring is like an extension to the English words.
Probably, what I have to do is, take English word list, and add Malayalam words with it and publish it as Manglish keyboard.

It sounds like a good idea. You may choose to use a British English or an American English, or even an Indian English word base depending on what's more appropriate. Still, just 800 Malayalam words sound very little. Shouldn't there be at least tens of thousands of words? Comparing to the Hinglish dictionary in TT9, I can see maybe half of the words are in Hindi and half are in English.

Perhaps, you can make use of some word lists from the Indic Project? Or, if there is an official Malayalam dictionary, maybe use that instead? Of course, the words would need to be converted to Latin but it shouldn't be a problem, I believe.

Yes, there are thousands of words in Malayalam, but these are the words that I use regularly. The word list is maintained in https://github.com/shyjun/manglish_word_list/blob/main/words.txt

In fact, I created this word list using the export feature of TT9 and TouchPal keyboard.

Going forward, while typing, when i type a Malayalam word which is not in user words list, I will immediately add it to user words., and occationaly(once in 6 months or so) i will export it to cvs file, and update https://github.com/shyjun/manglish_word_list/blob/main/words.txt with the new words. Probably once in a year or so, i will create PR with dictionary update

Hope you are ok with this approach.

@shyjun shyjun force-pushed the shyjun/added_malayalam_language_dictionary branch from fddd135 to d4f51f9 Compare July 26, 2025 10:27
@shyjun
Copy link
Author

shyjun commented Jul 26, 2025

thanks for the comments. as you suggested, i took words from the Indic Project(https://gitlab.com/indicproject/dictionaries), and added my regularly used words and updated the PR.

please confirm if this approach is ok, if so, i will make further changes, and mark the PR as ready.

Thanks

@sspanak
Copy link
Owner

sspanak commented Jul 31, 2025

Yes, there are thousands of words in Malayalam, but these are the words that I use regularly

Please, don't consider this keyboard is for you only. My philosophy is that TT9 should be as convenient as possible to everyone. This is why it is important to include a dictionary as big as possible.

Anyway, I can't find too many Latin Malayalam word lists, so I guess, it's best we can do. However, I don't see so many English words in the list you included. Shouldn't there be some more, like in the Hinglish dictionary? Perhaps, we should include English words from here for convenience? Although, I am not sure whether it makes sense, and if it does, is it better to use the British or American variant.

Comment on lines +1 to +3
locale: ml
dictionaryFile: ml-utf8.csv
abcString: xyz
Copy link
Owner

@sspanak sspanak Jul 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not truly Malayalam but rather a mixture of English and Malayalam. It should have a different locale and name, something like Hinglish, which is a mixture of Hindi and English. Based on the discussion below, "Malayalam" should be fine as this is going to be Malayalam using Latin alphabet.

Either way, the locale must consist of four letters, not two. English is the only exception.

Let's do the following:

Suggested change
locale: ml
dictionaryFile: ml-utf8.csv
abcString: xyz
locale: ml-GB
name: Malayalam
dictionaryFile: malayalam-latin-utf8.csv

Let's as well rename the .yml file to MalayalamLatin.yml for consistency with Tamazight Latin.

Also, must add icons.

@@ -0,0 +1,5 @@
Malayalam word list by: Shyju N
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should also include a link to the Indic project.

@shyjun
Copy link
Author

shyjun commented Jul 31, 2025

My plan is to create 2 dictionaries. First is for Malayalam, and second is Manglish(Malayalam + English).

The word Manglish is commonly used which means writing Malayalam in English letters.

Malayalam language will have only Malayalam words, and Manglish will have Malayalam + English words.

Let me know if this approach is ok, if so I will update the PR

@sspanak
Copy link
Owner

sspanak commented Aug 1, 2025

My plan is to create 2 dictionaries. First is for Malayalam, and second is Manglish(Malayalam + English).

The word Manglish is commonly used which means writing Malayalam in English letters.

Malayalam language will have only Malayalam words, and Manglish will have Malayalam + English words.

Let me know if this approach is ok, if so I will update the PR

"Manglish" is also used for Malay (the language of Malaysia) + English. Since TT9 is used by people from all around the world, I would like to avoid confusion and not use "Manglish". Or perhaps, use "Manglish / IN" and "Manglish / MS" to distinguish them in the app?

Other than that, it sounds good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants