Skip to content

include method to remove stopwords#163

Open
mynameisvinn wants to merge 1 commit intosloria:devfrom
mynameisvinn:dev
Open

include method to remove stopwords#163
mynameisvinn wants to merge 1 commit intosloria:devfrom
mynameisvinn:dev

Conversation

@mynameisvinn
Copy link
Copy Markdown

@mynameisvinn mynameisvinn commented May 14, 2017

included method to remove stopwords from text.

here is an example with the blob text:

>>> blob = TextBlob(text)
>>> blob.remove_stopwords

... should produce the following output

'\nThe titular threat of The Blob has always struck me as the ultimate movie\n
monster: an insatiably hungry, amoeba-like mass able to penetrate\n
virtually any safeguard, capable of--as a doomed doctor chillingly\n
describes it--"assimilating flesh on contact.\n
Snide comparisons to gelatin be damned, it\'s a concept with the most\n
devastating of potential consequences, not unlike the grey goo scenario\n
proposed by technological theorists fearful of\n
artificial intelligence run rampant.\n'

Copy link
Copy Markdown

@JiwaniZakir JiwaniZakir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The @cached_property decorator in blob.py is incompatible with the fname parameter on remove_stopwords — cached properties are accessed as attributes (e.g., blob.remove_stopwords), not called as methods, so the fname argument can never actually be passed by a caller and the default "stopwords.txt" is baked in permanently. The relative path "stopwords.txt" will also raise a FileNotFoundError for any caller whose working directory isn't the project root; it should resolve to an absolute path using os.path.join(os.path.dirname(__file__), ...) or leverage NLTK's built-in nltk.corpus.stopwords, which TextBlob already depends on. Additionally, the return type is a plain str rather than a TextBlob instance, which breaks method chaining and is inconsistent with the rest of the API (e.g., words returns a WordList). The method name itself should be a noun (e.g., clean_text or filtered_words) to match the property convention used throughout the class.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants