Text Prediction

Dependancies

Files

The code consists of three files:

utils.py: Contains functions for cleaning, parsing and loading pickle of models. There is also a dictionary of nearby keys for possible mistyping and bad letter prediction.
train.py: Creates word tuple models based on selected dataset and saves os.file paths of models.
- To train the model:
  1. Load a corpus and split words into an array.
  2. Create an model of 2-tuple or bigrams of the word frequencies and one of unigrams or single words
  3. Save pickle file of models
predict.py: Contains functions for next word predictions and an interative testing loop for prediction tests.
- To test a model:
  1. Use command: python predict.py -m model_test3.pk1
  2. Follow terminal commands

Corpora

text.txt:Project Gutenberg's Moby Dick; or The Whale

213 533 words
Published 1851
American
Open Source

tv_text.txt:Sample of the TV Corpus

21 000 000 words in linear text of a completely random sample of the full corporus.
British and American
Works from 1950-2018
Every 200 words, ten words are removed and are replaced with ten "@".

Model

Bigram with Markov Chains.

This model learns the frequencies of words and pairs (bigrams) of words in a corpus to autocomplete by the most top 3 likely word by the typed letters.

Markov chains reduce possible next words under the assumption that the current or last word is only needed to predict the next; the markov property.

A large and diverse corpus is assumed to asymptoticlly approximate every day English.

Features

Autocomplete:

Autocompletes an incomplete word by the pevious prefix characters.

Next Word Prediction

Autocomplete the next word given a previous word and an incomplete word's prefixes.

Example:

Incomplete Word	User Character Selection	Autocomplete Options
'h'	'e'	(1. he , 2. her , 3. help)
'he'	'l'	(1. he , 2. her , 3. help)
'hel'	'2'	(1. help, 2. hello, 3. held)
'hello '	'w'	( 1. alex , 2. bonjour , 3. hi )
'hello w'	'1'	( 1. world , 2. weekend , 3. windchill )
'hello world '	''	()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Prediction

Dependancies

Files

Corpora

Model

Features

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Next_Word.ipynb		Next_Word.ipynb
README.md		README.md
model_test2.pk1		model_test2.pk1
model_test3.pk1		model_test3.pk1
predict.py		predict.py
text.txt		text.txt
train.py		train.py
tv_text.txt		tv_text.txt
utils.py		utils.py

Folders and files

Latest commit

History

Repository files navigation

Text Prediction

Dependancies

Files

Corpora

Model

Features

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages