The tokenize method of the nltk library is able to process some of the vocabulary elements. Some of the sample text documents are being successfully processed but it seems to be having issues with other less used vocab. the program can create a knowledge bank for one category of files provided it does not consider the error causing phrases. This text has an error log attached to it which is generated when we tried to process specific text files, also this does not happen in every case.

Whereas when we tried the same on other files the required result was generated

The tokenize method of the nltk library is able to process some of the vocabulary elements. Some of the sample text documents are being successfully processed but it seems to be having issues with other less used vocab. the program can create a knowledge bank for one category of files provided it does not consider the error causing phrases. This text has an error log attached to it which is generated when we tried to process specific text files, also this does not happen in every case.


Whereas when we tried the same on other files the required result was generated