Conversation
dab02aa to
fddd135
Compare
sspanak
left a comment
There was a problem hiding this comment.
I can't accept this pull request as-is. The resulting language will not be usable at all. See my previous comments.
|
Yes, i understand that it can't be merged. Thats why kept it as Draft. The Malayalam keyboard/words I am referring is like an extension to the English words. Probably, what I have to do is, take English word list, and add Malayalam words with it and publish it as Manglish keyboard. |
It sounds like a good idea. You may choose to use a British English or an American English, or even an Indian English word base depending on what's more appropriate. Still, just 800 Malayalam words sound very little. Shouldn't there be at least tens of thousands of words? Comparing to the Hinglish dictionary in TT9, I can see maybe half of the words are in Hindi and half are in English. Perhaps, you can make use of some word lists from the Indic Project? Or, if there is an official Malayalam dictionary, maybe use that instead? Of course, the words would need to be converted to Latin but it shouldn't be a problem, I believe. |
Yes, there are thousands of words in Malayalam, but these are the words that I use regularly. The word list is maintained in https://github.com/shyjun/manglish_word_list/blob/main/words.txt In fact, I created this word list using the export feature of TT9 and TouchPal keyboard. Going forward, while typing, when i type a Malayalam word which is not in user words list, I will immediately add it to user words., and occationaly(once in 6 months or so) i will export it to cvs file, and update https://github.com/shyjun/manglish_word_list/blob/main/words.txt with the new words. Probably once in a year or so, i will create PR with dictionary update Hope you are ok with this approach. |
fddd135 to
d4f51f9
Compare
|
thanks for the comments. as you suggested, i took words from the Indic Project(https://gitlab.com/indicproject/dictionaries), and added my regularly used words and updated the PR. please confirm if this approach is ok, if so, i will make further changes, and mark the PR as ready. Thanks |
Please, don't consider this keyboard is for you only. My philosophy is that TT9 should be as convenient as possible to everyone. This is why it is important to include a dictionary as big as possible. Anyway, I can't find too many Latin Malayalam word lists, so I guess, it's best we can do. However, I don't see so many English words in the list you included. Shouldn't there be some more, like in the Hinglish dictionary? Perhaps, we should include English words from here for convenience? Although, I am not sure whether it makes sense, and if it does, is it better to use the British or American variant. |
| locale: ml | ||
| dictionaryFile: ml-utf8.csv | ||
| abcString: xyz |
There was a problem hiding this comment.
This is not truly Malayalam but rather a mixture of English and Malayalam. It should have a different locale and name, something like Hinglish, which is a mixture of Hindi and English. Based on the discussion below, "Malayalam" should be fine as this is going to be Malayalam using Latin alphabet.
Either way, the locale must consist of four letters, not two. English is the only exception.
Let's do the following:
| locale: ml | |
| dictionaryFile: ml-utf8.csv | |
| abcString: xyz | |
| locale: ml-GB | |
| name: Malayalam | |
| dictionaryFile: malayalam-latin-utf8.csv |
Let's as well rename the .yml file to MalayalamLatin.yml for consistency with Tamazight Latin.
Also, must add icons.
| @@ -0,0 +1,5 @@ | |||
| Malayalam word list by: Shyju N | |||
There was a problem hiding this comment.
This file should also include a link to the Indic project.
|
My plan is to create 2 dictionaries. First is for Malayalam, and second is Manglish(Malayalam + English). The word Manglish is commonly used which means writing Malayalam in English letters. Malayalam language will have only Malayalam words, and Manglish will have Malayalam + English words. Let me know if this approach is ok, if so I will update the PR |
"Manglish" is also used for Malay (the language of Malaysia) + English. Since TT9 is used by people from all around the world, I would like to avoid confusion and not use "Manglish". Or perhaps, use "Manglish / IN" and "Manglish / MS" to distinguish them in the app? Other than that, it sounds good! |
added malayalam language