Problem with nltk.word_tokenize

The tokenize method of the nltk library is able to process some of the vocabulary elements. Some of the sample text documents are being successfully processed but it seems to be having issues with other less used vocab. the program can create a knowledge bank for one category of files provided it does not consider the error causing phrases. This text has an error log attached to it which is generated when we tried to process specific text files, also **this does not happen in every case**.
![image](https://cloud.githubusercontent.com/assets/14812246/12750096/e5eaac42-c9da-11e5-8020-bcd4a61ca5a4.png)
Whereas when we tried the same on other files the required result was generated
![image](https://cloud.githubusercontent.com/assets/14812246/12750185/809151a6-c9db-11e5-9958-471905ad7312.png)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with nltk.word_tokenize #1

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Problem with nltk.word_tokenize #1

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions