Language models assign probabilities to sequences of words. They are widely used in many natural language processing applications.
The probability of a sequence can be modeled as a product of local probabilities as shown in
'wi' is the ith word, and
'hi' is the word history preceding 'wi'.
'i' starts from 0.
Therefore the task of language modeling reduces to estimating a set of conditional distributions {P(w|h)}.
For complex models, this poses a computational challenge for learning, because the resulting objective functions are expensive to normalize.
Subsampling is a simple solution to get around
the constraint of computing resources. For the purpose of language modeling, it amounts to taking only part of the text corpus to train the Language Model.
For complex models, it has been shown that subsampling can speed up training greatly, at the cost of some degradation in predictive performance, allowing for trade-off between computational cost and Language Model quality
Subsampling is also one of the techniques that we use when we are building word pairs which removes the most frequently used words.
Generally thses words are:
- Prepositions (eg. of, on, for)
- Articles (a, an, the)
- Reduces the Training time for Model
- Reduces the bias of the model towards training data.
- Reduces the overall Model accuracy.