Skip to content

Latest commit

 

History

History
23 lines (20 loc) · 1.76 KB

File metadata and controls

23 lines (20 loc) · 1.76 KB

anlp21

Course materials for "Applied Natural Language Processing" (INFO 256, Fall 2021, UC Berkeley) Syllabus: http://people.ischool.berkeley.edu/~dbamman/info256.html

Notebook Description
1.words/EvaluateTokenizationForSentiment The impact of tokenization choices on sentiment classification.
1.words/ExploreTokenization Different methods for tokenizing texts (whitespace, NLTK, spacy, regex)
1.words/TokenizePrintedBooks Design a better tokenizer for printed books
1.words/Text_Complexity Implement type-token ratio and Flesch-Kincaid Grade Level scores for text
2.compare/ChiSquare, Mann-Whitney Tests Explore two tests for finding distinctive terms
2.compare/Log-odds ratio with priors Implement the log-odds ratio with an informative (and uninformative) Dirichlet prior
3.dictionaries/DictionaryTimeSeries Plot sentiment over time using human-defined dictionaries
3.dictionaries/Empath Explore using Empath dictionaries to characterize texts
4.embeddings/DistributionalSimilarity Explore distributional hypothesis to build high-dimensional, sparse representations for words
4.embeddings/WordEmbeddings Explore word embeddings using Gensim
4.embeddings/Semaxis Implement SemAxis for scoring terms along a user-defined axis (e.g., positive-negative, concrete-abstract, hot-cold),
4.embeddings/BERT Explore the basics of token representations in BERT and use it to find token nearest neighbors
4.embedings/SequenceEmbeddings Use sequence embeddings to find TV episode summaries most similar to a short description
5.eda/WordSenseClustering Inferring distinct word senses using KMeans clustering over BERT representations
5.eda/Haiku KMeans Explore text representation in clustering by trying to group haiku and non-haiku poems into two distinct clusters