diff --git a/experiment/aim.md b/experiment/aim.md
index 063e045..5d4e4ae 100644
--- a/experiment/aim.md
+++ b/experiment/aim.md
@@ -1,12 +1,3 @@
-To understand and demonstrate the application of the Viterbi algorithm for Part-of-Speech (POS) tagging in Natural Language Processing. This experiment provides hands-on experience with the Viterbi decoding process, which is a fundamental dynamic programming algorithm used to find the most likely sequence of hidden states (POS tags) given observable sequences (words) in Hidden Markov Models.
+**To understand and practice sequence decoding for Part-of-Speech (POS) tagging using the Viterbi algorithm in Natural Language Processing.**
-The Viterbi algorithm is crucial in statistical NLP for solving the decoding problem: given a sequence of words and pre-computed emission and transition probabilities from a training corpus, determine the most probable sequence of POS tags that generated those words. This experiment allows learners to practice filling Viterbi tables step-by-step and understand how dynamic programming efficiently finds optimal tag sequences.
-
-For example, given the sentence "Book a park", the algorithm determines whether "Book" should be tagged as a noun or verb, considering both:
-
-- **Emission probabilities**: How likely each word is to be generated by each POS tag
-- **Transition probabilities**: How likely each POS tag is to follow another in sequence
-
-Through interactive simulation, learners will master the mathematical foundations of the Viterbi algorithm and its practical application in modern POS tagging systems.
-
-
+This experiment aims to help students develop proficiency in applying the Viterbi algorithm to find the most probable sequence of POS tags for a given sentence, using emission and transition probabilities. Through interactive exercises, learners will gain hands-on experience with dynamic programming and sequence labeling in NLP.
diff --git a/experiment/assignment.md b/experiment/assignment.md
index e12b0d4..281461b 100644
--- a/experiment/assignment.md
+++ b/experiment/assignment.md
@@ -14,21 +14,21 @@
**Emission Matrix P(word|tag):**
-```
+
The dog runs
Noun 0.1 0.6 0.1
Verb 0.0 0.1 0.8
Det 0.9 0.0 0.0
-```
+
**Transition Matrix P(tag_j|tag_i):**
-```
+
Noun Verb Det
Noun 0.3 0.4 0.1
Verb 0.4 0.1 0.2
Det 0.7 0.2 0.1
-```
+
Assume equal initial probabilities π[tag] = 1/3 for all tags.
diff --git a/experiment/extended-study.md b/experiment/extended-study.md
index fbd93a7..42b27ae 100644
--- a/experiment/extended-study.md
+++ b/experiment/extended-study.md
@@ -1,490 +1,97 @@
-### Advanced Topics in Viterbi Decoding and Dynamic Programming
+### Advanced Topics in Viterbi Decoding
-### 1. Mathematical Foundations of Viterbi Algorithm
+#### 1. Sequence Decoding Techniques
-**Dynamic Programming Principles:**
+- **Viterbi Algorithm**: Study the dynamic programming approach for finding the most probable sequence of hidden states (POS tags) in Hidden Markov Models.
+- **Forward-Backward Algorithm**: Learn about parameter estimation and marginal probabilities in HMMs.
+- **Beam Search and Approximations**: Explore faster, memory-efficient alternatives to full Viterbi decoding.
-The Viterbi algorithm exemplifies dynamic programming with two key properties:
+#### 2. Applications Across Domains
-- **Optimal Substructure:** The optimal solution contains optimal solutions to subproblems
-- **Overlapping Subproblems:** The same subproblems are solved multiple times
+- Speech recognition and error correction
+- Bioinformatics (gene/protein sequence analysis)
+- Financial modeling and time series analysis
+- Named Entity Recognition and Information Extraction
-**Mathematical Formulation:**
+#### 3. Computational Implementation
-For a sequence of words w₁, w₂, ..., wₙ and tags t₁, t₂, ..., tₘ:
+- Efficient storage and computation for large tagsets
+- Log-space computation for numerical stability
+- Handling data sparsity and smoothing techniques
--V[i,j] = max(V[k,j-1] × P(tᵢ|tₖ)) × P(wⱼ|tᵢ) - k -+#### 4. Research Papers -Where: +1. "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition" (Rabiner, 1989) +2. "The Viterbi Algorithm" by G.D. Forney Jr. (1973) +3. "End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF" (Ma & Hovy) -- V[i,j] = maximum probability of any tag sequence ending in tag i at position j -- P(tᵢ|tₖ) = transition probability from tag k to tag i -- P(wⱼ|tᵢ) = emission probability of word j given tag i +#### 5. Online Resources -**Complexity Analysis:** +1. **Video Lectures** -- **Time Complexity:** O(N × T²) where N = sentence length, T = number of tags -- **Space Complexity:** O(N × T) for the Viterbi table -- **Without Dynamic Programming:** O(T^N) - exponentially worse! + - Stanford CS224N: Sequence Models and HMMs + - NPTEL: Hidden Markov Models in NLP + - Coursera: Sequence Models in NLP -### 2. Advanced Viterbi Implementations +2. **Interactive Tools** -**Numerical Stability:** + - Online HMM POS Taggers + - Viterbi algorithm visualizers + - Sequence labeling simulators -Real implementations must handle extremely small probabilities: +3. **Code Repositories** + - Open-source HMM and Viterbi implementations (Python, Java) + - Sequence labeling datasets + - Tutorials for building POS taggers -**Log-Space Computation:** +#### 6. Practical Exercises -
-log V[i,j] = max(log V[k,j-1] + log P(tᵢ|tₖ)) + log P(wⱼ|tᵢ) - k -+1. **Basic Exercises** -**Advantages of Log-Space:** + - Implement a simple Viterbi POS tagger + - Calculate emission and transition probabilities + - Visualize state transitions in Markov chains -- Avoids numerical underflow -- Converts multiplications to additions -- More computationally stable +2. **Advanced Projects** -**Memory Optimization:** + - Build a domain-adapted POS tagger + - Compare Viterbi with neural sequence models + - Analyze tagging errors and confusion matrices -- **Online Algorithm:** Only store previous column, not entire table -- **Beam Search:** Keep only top-K paths instead of all paths -- **Sparse Representations:** Skip impossible transitions +3. **Research Projects** + - Study the impact of smoothing on tagging accuracy + - Explore multilingual POS tagging with HMMs + - Integrate morphological features into sequence models -**Parallel Computation:** +#### 7. Further Reading -- Each cell in a column can be computed independently -- GPU implementations can process thousands of words simultaneously -- SIMD instructions optimize matrix operations +##### Books -### 3. Variants of the Viterbi Algorithm +1. "Speech and Language Processing" by Jurafsky & Martin (Chapters on HMMs and Viterbi) +2. "Pattern Recognition and Machine Learning" by Bishop (Sequence models section) +3. "Foundations of Statistical Natural Language Processing" by Manning & Schütze -**Forward-Backward Algorithm:** +##### Journals -Unlike Viterbi (which finds the single best path), Forward-Backward computes: +1. Computational Linguistics +2. Natural Language Engineering +3. Journal of Machine Learning Research -- **Forward:** Probability of observation sequence up to time t -- **Backward:** Probability of observation sequence from time t+1 onwards -- **Purpose:** Parameter estimation and computing marginal probabilities +#### 8. Tools and Software -**Viterbi vs. Forward-Backward:** +1. **Analysis Tools** -- Viterbi: "What's the best tag sequence?" -- Forward-Backward: "What's the probability of each tag at each position?" + - NLTK HMM Tagger + - Stanford POS Tagger + - spaCy sequence labeling modules -**Beam Search Approximation:** +2. **Development Frameworks** -- Keep only the top-B best paths at each step -- Trades accuracy for speed and memory -- Essential for very large tag sets or long sequences + - scikit-learn HMM modules + - CRF++ toolkit + - TensorFlow/Keras for neural sequence models -**Constrained Viterbi:** - -- Add external constraints (e.g., named entity boundaries) -- Force certain tags at specific positions -- Useful for semi-supervised learning - -### 4. Viterbi in Other Domains - -**Speech Recognition:** - -- **Observation:** Acoustic features (MFCCs, spectrograms) -- **Hidden States:** Phonemes or words -- **Challenge:** Continuous observations require Gaussian mixture models - -**Bioinformatics Applications:** - -- **Gene Prediction:** Find protein-coding regions in DNA -- **Sequence Alignment:** Align biological sequences optimally -- **Hidden States:** Exon, intron, non-coding regions - -**Part-of-Speech vs. Gene Prediction:** - -
-POS: [Noun] [Verb] [Det] [Noun] - "Cat" "ate" "the" "fish" - -Gene: [Exon] [Intron] [Exon] [Stop] - ATGC GTAAGT CGTT TAG -- -**Financial Modeling:** - -- **Hidden States:** Market regimes (bull, bear, volatile) -- **Observations:** Price movements, trading volumes -- **Applications:** Algorithmic trading, risk management - -### 5. Modern Alternatives to Viterbi - -**Neural Sequence Models:** - -**CRF (Conditional Random Fields):** - -- Discriminative models vs. HMM's generative approach -- Can incorporate overlapping features -- Still use Viterbi for inference! - -**LSTM-CRF Models:** - -- LSTM encodes sequence context -- CRF layer ensures valid tag transitions -- Viterbi decoding finds optimal path - -**Transformer Models:** - -- Self-attention mechanisms -- Can process entire sequence simultaneously -- Often use greedy decoding instead of Viterbi - -**When Viterbi Still Matters:** - -- Neural models often use Viterbi in final layer -- Structured prediction requires path optimization -- Interpretability and guaranteed optimality - -**When to Use HMMs:** - -- Limited computational resources -- Need for model interpretability -- Educational purposes -- Quick prototyping - -### 6. Debugging and Optimizing Viterbi - -**Common Implementation Errors:** - -**Probability Underflow:** - -- Problem: Probabilities become too small (approach 0) -- Solution: Use log-space computation -- Detection: Results become NaN or infinite - -**Incorrect Backtracking:** - -- Problem: Path reconstruction gives wrong sequence -- Solution: Verify pointer array construction -- Testing: Compare with ground truth on small examples - -**Matrix Indexing Errors:** - -- Problem: Off-by-one errors in array access -- Solution: Consistent 0-based or 1-based indexing -- Prevention: Unit tests for each function - -**Performance Optimization:** - -**Memory Access Patterns:** - -- Store matrices in row-major or column-major order -- Optimize cache usage for large vocabularies -- Use sparse matrices for limited tag sets - -**Vectorization:** - -- Use SIMD instructions for parallel computation -- NumPy/BLAS operations for matrix multiplication -- GPU kernels for massive parallelization - -**Profiling Tips:** - -- Measure actual bottlenecks, not assumed ones -- Profile on realistic data sizes -- Consider both time and memory usage - -### 7. Advanced Viterbi Extensions - -**Higher-Order Models:** - -**Second-Order Viterbi:** - -- Consider two previous tags: P(tag₃|tag₁, tag₂) -- Complexity increases to O(N × T³) -- Better linguistic modeling at computational cost - -**Maximum Entropy Markov Models:** - -- Combine Viterbi with feature-based models -- Can incorporate arbitrary features -- More flexible than pure HMMs - -**Semi-CRF Models:** - -- Segments of variable length -- Each segment has a single label -- Applications: Named entity recognition, chunking - -**Approximate Viterbi Methods:** - -**Pruning Strategies:** - -- Beam search: Keep top-K candidates -- Threshold pruning: Discard low-probability paths -- Forward-backward pruning: Use forward probabilities to guide search - -**Hierarchical Decoding:** - -- First pass: Coarse tag categories -- Second pass: Fine-grained tags within categories -- Reduces computational complexity -- Consistent POS tag definitions -- Enables cross-lingual model development - -**Language-Specific Considerations:** - -- **Agglutinative Languages:** Complex morphology requires sub-word analysis -- **Isolating Languages:** Fewer morphological variations -- **Fusional Languages:** Multiple grammatical features per word - -### 8. Practical Viterbi Implementation - -**Data Structures:** - -**Viterbi Table Storage:** - -
-# 2D array: viterbi[tag][position] -viterbi = [[0.0] * sentence_length for _ in range(num_tags)] - -# Backpointer array for path reconstruction -backpointer = [[0] * sentence_length for _ in range(num_tags)] -- -**Memory-Efficient Implementation:** - -
-# Only store current and previous columns -current_column = [0.0] * num_tags -previous_column = [0.0] * num_tags -- -**Handling Edge Cases:** - -**Zero Probabilities:** - -- Replace with small epsilon value (e.g., 1e-10) -- Use smoothing for unseen word-tag combinations -- Graceful degradation for OOV words - -**Sentence Boundaries:** - -- Special START and END tokens -- Initialize first column with start probabilities -- Terminate at END token - -**Efficiency Considerations:** - -**Sparse Matrices:** - -- Many transition probabilities are zero -- Use compressed sparse row (CSR) format -- Skip impossible transitions during computation - -**Parallel Processing:** - -- Each tag in a column can be computed independently -- Multi-threading for large vocabularies -- GPU implementations for massive datasets - -### 9. Research and Applications - -**Current Research Areas:** - -**Neural-Symbolic Integration:** - -- Combining neural networks with Viterbi inference -- Differentiable dynamic programming -- End-to-end learning with structured output - -**Structured Attention:** - -- Attention mechanisms that mimic Viterbi paths -- Soft vs. hard alignment in sequence models -- Interpretable neural sequence models - -**Online Learning:** - -- Updating Viterbi models with streaming data -- Incremental parameter estimation -- Concept drift adaptation - -**Emerging Applications:** - -**Computational Biology:** - -- Protein structure prediction -- Gene regulatory network inference -- Phylogenetic analysis using HMMs - -**Signal Processing:** - -- Speech enhancement and denoising -- Gesture recognition from sensor data -- Financial time series analysis - -**Computer Vision:** - -- Object tracking in video sequences -- Action recognition in temporal data -- Medical image sequence analysis - -### 10. Hands-on Viterbi Projects - -**Beginner Projects:** - -1. **Pure Viterbi Implementation** - - - Code the algorithm from scratch in Python - - Implement both probability and log-space versions - - Test on the experiment's corpus data - -2. **Viterbi Visualization** - - - Create animated visualizations of table filling - - Show path probability evolution - - Highlight optimal path discovery - -3. **Performance Analysis** - - Compare execution times for different sentence lengths - - Measure memory usage growth - - Analyze complexity empirically - -**Intermediate Projects:** - -1. **Multi-Domain Viterbi** - - - Build taggers for different text domains - - Compare transition matrix patterns - - Implement domain adaptation techniques - -2. **Approximate Viterbi** - - - Implement beam search variants - - Compare accuracy vs. speed trade-offs - - Analyze when approximations fail - -3. **Parallel Viterbi** - - Multi-threaded implementation - - GPU acceleration using CUDA/OpenCL - - Benchmark parallel efficiency - -**Advanced Projects:** - -1. **Neural-Viterbi Hybrid** - - - Use neural networks for emission probabilities - - Keep Viterbi for structured inference - - Compare with end-to-end neural models - -2. **Structured Perceptron with Viterbi** - - - Implement discriminative training - - Use Viterbi for loss-augmented inference - - Compare with CRF models - -3. **Real-Time Viterbi System** - - Build streaming POS tagger - - Handle partial observations - - Optimize for low latency - -### 11. Resources for Further Learning - -**Core Algorithms and Theory:** - -**Essential Papers:** - -- "The Viterbi Algorithm" by G.D. Forney Jr. (1973) - Original IEEE paper -- "A Tutorial on Hidden Markov Models and Selected Applications" by Rabiner (1989) -- "Dynamic Programming and the Viterbi Algorithm" by Viterbi (1967) - -**Textbooks:** - -- "Introduction to Algorithms" by Cormen et al. - Dynamic Programming chapter -- "Speech and Language Processing" by Jurafsky & Martin - HMM and Viterbi sections -- "Pattern Recognition and Machine Learning" by Bishop - Sequence models - -**Advanced Topics:** - -**Structured Prediction:** - -- "Structured Prediction Models via the Matrix-Tree Theorem" by Koo et al. -- "Discriminative Training Methods for Hidden Markov Models" by Povey & Woodland - -**Modern Applications:** - -- "Neural Architectures for Named Entity Recognition" by Lample et al. -- "End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF" by Ma & Hovy - -**Implementation Resources:** - -**Programming Libraries:** - -- **Python:** NLTK, scikit-learn, TensorFlow Probability -- **Java:** OpenNLP, Stanford CoreNLP -- **C++:** HTK, Julius (speech recognition) -- **R:** HMM package, RHmm - -**Datasets for Practice:** - -- Penn Treebank (English POS tagging) -- Universal Dependencies (multilingual) -- CoNLL shared tasks (various sequence labeling tasks) - -**Online Tutorials:** - -- Interactive Viterbi visualization: https://web.stanford.edu/~jurafsky/slp3/ -- Dynamic programming tutorials with Viterbi examples -- YouTube lectures on HMMs and dynamic programming - -### 12. Career Applications - -**Industry Roles Utilizing Viterbi:** - -**Algorithm Engineer:** - -- Implementing efficient Viterbi variants for production systems -- Optimizing dynamic programming algorithms for specific hardware -- Developing domain-specific sequence models - -**Machine Learning Engineer:** - -- Integrating Viterbi into neural architectures -- Building hybrid statistical-neural models -- Optimizing inference pipelines for real-time applications - -**Research Scientist:** - -- Developing new structured prediction algorithms -- Exploring applications beyond NLP (biology, finance, robotics) -- Publishing on algorithmic innovations and theoretical advances - -**Application Domains:** - -**Healthcare:** - -- Electronic health record processing -- Medical image sequence analysis -- Drug discovery sequence modeling - -**Autonomous Systems:** - -- Robot navigation and path planning -- Sensor fusion for state estimation -- Behavior prediction in dynamic environments - -**Financial Technology:** - -- Algorithmic trading with regime detection -- Risk modeling with hidden state models -- Market sentiment analysis from text streams - -**Telecommunications:** - -- Error correction in digital communications -- Network state monitoring and optimization -- Speech compression and enhancement - -This extended study demonstrates how mastering the Viterbi algorithm opens doors to diverse applications across computer science and provides a solid foundation for understanding modern structured prediction methods in machine learning. +3. **Evaluation Tools** + - POS tagging accuracy metrics + - Confusion matrix generators + - Error analysis scripts diff --git a/experiment/glossary.md b/experiment/glossary.md index 29f70a7..1f592d5 100644 --- a/experiment/glossary.md +++ b/experiment/glossary.md @@ -1,131 +1,75 @@ -### A +### Core Viterbi Decoding and Sequence Labeling Terms -**Algorithm** - A step-by-step procedure for solving a problem or completing a task, such as the Viterbi algorithm for finding the most likely POS tag sequence. +**Viterbi Algorithm**: +A dynamic programming algorithm for finding the most probable sequence of hidden states (such as POS tags) in a Hidden Markov Model. -**Ambiguity** - The property of words that can belong to multiple grammatical categories depending on context (e.g., "run" can be a noun or verb). +**Hidden Markov Model (HMM)**: +A statistical model where the system being modeled is assumed to be a Markov process with unobserved (hidden) states. -### B +**Part-of-Speech (POS) Tagging**: +The process of assigning grammatical categories (noun, verb, adjective, etc.) to each word in a sentence. -**Backtracking** - The final step in the Viterbi algorithm where the most likely path is traced backwards to determine the complete POS tag sequence. +**Transition Probability**: +The probability of moving from one state (POS tag) to another in a sequence, e.g., P(VERB | NOUN). -**Beam Search** - An approximation to Viterbi that keeps only the top-K most probable paths at each step, trading accuracy for computational efficiency. +**Emission Probability**: +The probability of observing a word given a particular POS tag, e.g., P(dog | NOUN). -**Bigram** - A sequence of two adjacent elements, in HMM context referring to consecutive POS tags used in transition probabilities. +**Sequence Decoding**: +The process of finding the most likely sequence of hidden states given a sequence of observations. -### C +**Dynamic Programming**: +An algorithmic technique that solves complex problems by breaking them down into simpler subproblems, used in the Viterbi algorithm. -**Corpus** - A large collection of written or spoken texts used for linguistic analysis and training statistical models. +**Ambiguity**: +The property of words that can belong to multiple grammatical categories depending on context (e.g., "can" as a verb or noun). -**Conditional Probability** - The probability of an event occurring given that another event has occurred, fundamental to HMM calculations. +**Training Data**: +Annotated sentences used to learn the parameters of a statistical model. -### D +**Corpus**: +A large collection of written or spoken texts used for linguistic analysis and training statistical models. -**Decoding** - The process of finding the most likely sequence of hidden states (POS tags) given the observed sequence (words). +**N-gram**: +A contiguous sequence of n items (words or tags) from a given sequence of text or speech. -**Dynamic Programming** - An algorithmic technique that solves complex problems by breaking them down into simpler subproblems, used in the Viterbi algorithm. +**Unigram, Bigram, Trigram**: +A single word/tag, a sequence of two, or a sequence of three, respectively. -### E +**Smoothing**: +Techniques used to handle zero probabilities in statistical models by redistributing probability mass. -**Emission Probability** - The probability of observing a particular word given a specific POS tag, denoted as P(word|tag). +**Sequence Labeling**: +The task of assigning labels (such as POS tags) to elements in a sequence. -**End of Sentence (EOS)** - A special marker used to denote sentence boundaries in corpus annotation and HMM training. +**Observation**: +The visible outputs (words) generated by the hidden states (POS tags) in an HMM. -### F +**State**: +A condition or situation in a system, in HMM referring to the hidden grammatical categories. -**First-order Markov Model** - A model where the probability of the next state depends only on the current state, not on the entire history. +**Lexical Category**: +The grammatical class of a word (noun, verb, adjective, etc.). -**Forward Algorithm** - An algorithm for computing the probability of an observation sequence in an HMM. +**Word Tokenization**: +The process of breaking text into individual words or tokens. -### G +**Out-of-Vocabulary (OOV)**: +Words that appear in test data but were not seen during training. -**Grammar** - The set of structural rules governing the composition of clauses, phrases, and words in a language. +**Probability Matrix**: +A matrix containing probability values, such as transition or emission probabilities in an HMM. -**Grammatical Category** - A class of words that have similar grammatical properties (noun, verb, adjective, etc.). +**Statistical Model**: +A mathematical model that uses probability distributions to represent data and make predictions. -### H +**Decoding**: +The process of finding the most likely sequence of hidden states (POS tags) given the observed sequence (words). -**Hidden Markov Model (HMM)** - A statistical model where the system being modeled is assumed to be a Markov process with unobserved states. +**Maximum Likelihood Estimation**: +A method of estimating model parameters by finding values that maximize the likelihood of the observed data. -**Hidden States** - The unobserved states in an HMM, which in POS tagging correspond to the grammatical categories. +**Natural Language Processing (NLP)**: +A field of computer science and artificial intelligence concerned with interactions between computers and human language. -### I - -**Independence Assumption** - The assumption that the probability of observing a word depends only on its POS tag, not on other words or tags. - -**Initial State Distribution** - The probability distribution over the possible starting states in an HMM. - -### L - -**Lexical Category** - Another term for part-of-speech or grammatical category of a word. - -**Likelihood** - The probability of observing the given data under a particular model or set of parameters. - -**Log-Space Computation** - A numerical technique used in Viterbi algorithm to prevent underflow by working with logarithms of probabilities instead of probabilities themselves. - -### M - -**Markov Assumption** - The assumption that the probability of the next state depends only on the current state. - -**Markov Chain** - A mathematical system that undergoes transitions from one state to another according to certain probabilistic rules. - -**Maximum Likelihood Estimation** - A method of estimating model parameters by finding values that maximize the likelihood of the observed data. - -### N - -**Natural Language Processing (NLP)** - A field of computer science and artificial intelligence concerned with interactions between computers and human language. - -**N-gram** - A contiguous sequence of n items from a given sequence of text or speech. - -### O - -**Observation** - In HMM context, the visible outputs (words) that are generated by the hidden states (POS tags). - -**Optimal Substructure** - A key property of dynamic programming problems where optimal solutions contain optimal solutions to subproblems, enabling the Viterbi algorithm's efficiency. - -**Out-of-Vocabulary (OOV)** - Words that appear in test data but were not seen during training. - -### P - -**Part-of-Speech (POS)** - A category of words that have similar grammatical properties (noun, verb, adjective, etc.). - -**POS Tagging** - The process of assigning part-of-speech tags to words in a sentence. - -**Probability Matrix** - A matrix containing probability values, such as transition or emission probabilities in an HMM. - -**Pruning** - Optimization techniques in Viterbi algorithm that discard low-probability paths to reduce computational complexity. - -### S - -**Sequence Labeling** - The task of assigning labels to elements in a sequence, such as POS tags to words. - -**Smoothing** - Techniques used to handle zero probabilities in statistical models by redistributing probability mass. - -**State** - A condition or situation in a system, in HMM referring to the hidden grammatical categories. - -**Statistical Model** - A mathematical model that uses probability distributions to represent data and make predictions. - -### T - -**Transition Probability** - The probability of moving from one state to another in a sequence, denoted as P(tag₂|tag₁). - -**Training Data** - Annotated data used to learn the parameters of a statistical model. - -### U - -**Unigram** - A single word or token, used in calculating base probabilities for words. - -**Unsupervised Learning** - Machine learning where the algorithm learns patterns from data without labeled examples. - -### V - -**Viterbi Algorithm** - A dynamic programming algorithm for finding the most likely sequence of hidden states in an HMM. - -**Viterbi Table** - The matrix used to store intermediate probability calculations during Viterbi decoding, where each cell represents the maximum probability of any path ending at a specific state and time. - -**Vocabulary** - The set of all unique words in a corpus or dataset. - -### W - -**Word Sense Disambiguation** - The process of determining which meaning of a word is used in a particular context. - -**Word Tokenization** - The process of breaking text into individual words or tokens. +--- diff --git a/experiment/posttest.json b/experiment/posttest.json index 9019954..2eb10e7 100644 --- a/experiment/posttest.json +++ b/experiment/posttest.json @@ -19,7 +19,7 @@ "difficulty": "beginner" }, { - "question": "In the simulation, when filling the Viterbi table for the second word onwards, what is the correct formula for computing V[i][j]?", + "question": "In the simulation, when filling the Viterbi table for the second word onwards,
diff --git a/experiment/trivia.md b/experiment/trivia.md
index 9e29e34..0f72a1a 100644
--- a/experiment/trivia.md
+++ b/experiment/trivia.md
@@ -1,79 +1,21 @@
-### Historical Facts
+### Fun Facts About Viterbi Decoding
-🔍 **Did you know?** The Viterbi algorithm was named after Andrew Viterbi, who developed it in 1967 for decoding convolutional codes in digital communications before it revolutionized sequence analysis in bioinformatics and NLP!
+1. **Algorithm Origins**: The Viterbi algorithm was invented by Andrew Viterbi in 1967 for decoding signals in digital communications, but it is now a cornerstone in NLP for sequence labeling tasks like POS tagging.
-🎯 **Nobel Connection:** Andrew Viterbi was awarded the 2006 Marconi Prize for his contributions to telecommunications, and his algorithm now powers everything from speech recognition to gene sequencing.
+2. **Efficiency Breakthrough**: Viterbi decoding reduces the search for the best tag sequence from trillions of possibilities to a manageable computation using dynamic programming.
-📚 **Cross-Domain Impact:** Originally designed for error correction in noisy communication channels, the Viterbi algorithm found its way into computational linguistics in the 1980s and became fundamental to statistical POS tagging.
+3. **Cross-Disciplinary Impact**: Beyond language, Viterbi is used in speech recognition, gene sequencing, error correction in telecommunications, and even financial modeling.
-### Technical Insights
+4. **Optimal Path Guarantee**: Unlike heuristic algorithms, Viterbi always finds the most probable sequence of tags given the model’s probabilities.
-⚡ **Complexity Magic:** The Viterbi algorithm reduces the complexity of finding the best POS sequence from exponential O(T^N) to polynomial O(N×T²), making real-time tagging possible!
+5. **Memory Magic**: The algorithm only needs to remember the best path to each state at each step, making it both fast and memory-efficient.
-🧠 **Dynamic Programming Genius:** The algorithm's brilliance lies in the optimal substructure property - the best path to any state contains the best paths to all previous states.
+6. **Tiny Probabilities**: Viterbi often works with extremely small probabilities (like 10⁻¹⁵), so implementations use logarithms to avoid underflow errors.
-🔢 **Memory Efficiency:** Despite evaluating millions of possible tag sequences, Viterbi only needs to remember the best path to each state at each time step, dramatically reducing memory requirements.
+7. **Real-Time Applications**: Modern smartphones use Viterbi-based algorithms for autocorrect and voice-to-text, enabling fast and accurate language processing.
-### Viterbi Algorithm Specifics
+8. **Ambiguous Words**: Words like "book," "can," and "round" can be tagged as different parts of speech depending on context—Viterbi helps resolve these ambiguities.
-� **Decoding Challenge:** For a 10-word sentence with 45 possible POS tags, there are 45^10 = 2.8 trillion possible tag sequences! Viterbi finds the best one efficiently.
+9. **Educational Value**: Understanding Viterbi decoding is foundational for learning about more advanced neural sequence models like LSTMs and Transformers.
-🔍 **Backtracking Beauty:** The algorithm fills the probability table forward but traces the optimal path backward - like solving a maze by remembering the best route to each junction.
-
-� **Probability Precision:** Viterbi calculations often involve very small probabilities (like 10^-15), requiring careful numerical handling to avoid underflow errors in implementations.
-
-### Computational Curiosities
-
-💻 **Matrix Operations:** Each cell in the Viterbi table requires T multiplications and comparisons, where T is the number of POS tags - the algorithm is essentially a smart matrix multiplication!
-
-🔍 **Path Optimization:** Unlike other algorithms that might find "good enough" solutions, Viterbi is guaranteed to find the globally optimal POS tag sequence given the HMM parameters.
-
-📊 **Training vs. Decoding:** Training an HMM requires counting occurrences in the corpus, but Viterbi decoding uses those probabilities to make optimal predictions on new sentences.
-
-### Practical Applications
-
-🌐 **Beyond POS Tagging:** The Viterbi algorithm is used in speech recognition, bioinformatics (gene sequencing), and even predicting stock market trends!
-
-📱 **Real-Time Processing:** Modern smartphones use Viterbi-based algorithms for autocorrect and voice-to-text conversion, processing speech in real-time.
-
-🔤 **Error Correction:** The algorithm's original purpose in telecommunications - correcting transmission errors - shares the same mathematical foundation as finding optimal POS sequences.
-
-### Fun Challenges
-
-🎯 **Tricky Words:** Words like "that," "will," and "can" are among the most challenging for POS taggers due to their multiple grammatical roles.
-
-🔀 **Context Matters:** The word "book" can be a noun ("read a book") or a verb ("book a flight"), showing why sequential context is important.
-
-📝 **Rare Phenomena:** Some words can function as almost any part of speech - "round" can be a noun, verb, adjective, adverb, or preposition!
-
-### Educational Insights
-
-🎓 **Learning Challenge:** Students often confuse forward probability (likelihood of observations) with Viterbi probability (likelihood of the best path) - they're related but different!
-
-📈 **Debugging Tip:** When Viterbi gives unexpected results, check if emission and transition probabilities sum correctly and whether the corpus represents the test domain.
-
-🔬 **Foundation Importance:** Understanding Viterbi is crucial for grasping modern neural sequence models like LSTMs and Transformers, which use similar dynamic programming principles.
-
-### Algorithm Surprises
-
-📊 **Optimality Guarantee:** The Viterbi algorithm is guaranteed to find the most probable tag sequence - no heuristic approximation needed!
-
-🎲 **Probability Precision:** The algorithm handles probabilities so small that standard floating-point arithmetic fails - logarithmic computation is essential in practice.
-
-🔄 **Table Filling Magic:** Each cell calculation in the Viterbi table depends only on the previous column, enabling efficient parallel computation and memory optimization.
-
-### Cultural and Linguistic Notes
-
-🌍 **Language Variation:** Different languages have varying numbers of POS categories - Chinese has fewer than English, while agglutinative languages like Turkish have many more.
-
-📚 **Historical Change:** The parts of speech we use today were largely codified by ancient Greek and Latin grammarians over 2,000 years ago.
-
-🎨 **Creative Usage:** Poets often deliberately violate POS conventions (like using nouns as verbs) to create artistic effects, challenging automatic taggers.
-
-### Technology Evolution
-
-🔧 **Implementation Evolution:** Early Viterbi implementations used lookup tables and required careful memory management; modern versions leverage GPU parallel processing.
-
-⚙️ **From Telecommunications to NLP:** The same mathematical principles that decode satellite communications now help computers understand human language structure.
-
-🚀 **Neural Integration:** Modern transformer models incorporate attention mechanisms that mirror Viterbi's dynamic programming approach, showing the algorithm's enduring influence.
+10. **Language Diversity**: The number of POS tags varies widely across languages—some have just a few, while others, like Turkish, have many due to rich morphology.