A Python port of the R Shiny app for analyzing Indian Independence Day speeches delivered annually on August 15th since 1947.
Note: The Streamlit app was created entirely with Claude Code.
This interactive web application provides visualizations including:
- Speech Length: Word count trends over time
- Most Frequent Words: Top words after removing stopwords (with optional faceting)
- Most Important Words: TF-IDF analysis to identify distinctive words by year
- +/- Sentiment Words: Most frequent positive and negative words using NLTK opinion lexicon
- Net Sentiment: Difference between positive and negative word counts over time
- Specific Word Trend: Track any word's frequency across speeches
- Python 3.10 or higher
- pip (Python package installer)
cd /path/to/aug15/pythonpython3 -m venv venvOn macOS/Linux:
source venv/bin/activateOn Windows:
venv\Scripts\activatepip install -r requirements.txtThe app will automatically download required NLTK data (stopwords and opinion lexicon) on first run. If you encounter issues, you can manually download:
import nltk
nltk.download('stopwords')
nltk.download('opinion_lexicon')streamlit run app.pyThe app will open automatically in your default web browser at http://localhost:8501.
-
Filters (Sidebar):
- Adjust year range with the slider
- Select Prime Ministers to include
- Select political parties to include
- Click "Reset All Inputs" to restore defaults
-
Plot Options:
- Choose plot type from dropdown
- Conditional inputs appear based on plot type:
- Number of words (for word frequency/importance plots)
- Facet variable (for breaking down by year/PM/party)
- Word to track (for specific word trend)
-
Main Display:
- Interactive plot with hover tooltips
- Explanation text below each plot
python/
├── app.py # Main Streamlit application
├── utils/
│ ├── __init__.py
│ ├── data_prep.py # Data loading and filtering
│ ├── text_analysis.py # Tokenization, TF-IDF, sentiment analysis
│ └── plotting.py # Plotly visualization functions
├── data/
│ └── corpus.csv # Symlink to ../inst/final_csv/corpus.csv
├── tests/
│ ├── __init__.py
│ ├── test_data_prep.py
│ └── test_text_analysis.py
├── requirements.txt # Python dependencies
├── .python-version # Python version specification
├── .gitignore # Python-specific ignores
└── README.md # This file
The project includes unit tests using pytest:
# Install pytest if not already installed
pip install pytest
# Run all tests
pytest tests/
# Run with verbose output
pytest -v tests/
# Run specific test file
pytest tests/test_data_prep.py- Tokenization: Regular expression-based word extraction
- Stopwords: NLTK English stopword list
- TF-IDF: scikit-learn's
TfidfVectorizer - Sentiment: NLTK opinion lexicon (positive/negative word lists)
- All plots use Plotly for interactivity
- Color scheme extracted from ggplot2 defaults to match R Shiny app:
- BJP: #F8766D (red-ish)
- INC: #7CAE00 (green)
- Janata Dal: #00BFC4 (cyan)
- Janata Party: #C77CFF (purple)
- Corpus data is cached using
@st.cache_data - Text processing is performed on-demand based on selected filters
This Python port aims to replicate the R Shiny app as closely as possible, with these minor differences:
- Sentiment Lexicon: Uses NLTK opinion lexicon instead of Bing lexicon (both provide positive/negative word classifications)
- Reset Button: Simplified to "Clear Filters" functionality rather than full session state management
- Styling: Close approximation of R Shiny theme using Streamlit's CSS customization
If port 8501 is already in use:
streamlit run app.py --server.port 8502If auto-download fails, manually download required data:
python -c "import nltk; nltk.download('stopwords'); nltk.download('opinion_lexicon')"Ensure the symlink is correctly set up:
ls -la data/corpus.csv
# Should show: corpus.csv -> ../../inst/final_csv/corpus.csvIf broken, recreate:
rm data/corpus.csv
ln -s "../../inst/final_csv/corpus.csv" "data/corpus.csv"The project follows Python best practices:
- PEP 8 formatting (can use
blackfor auto-formatting) - Type hints for function signatures
- Google-style docstrings
- Data processing functions →
utils/data_prep.pyorutils/text_analysis.py - New plot types →
utils/plotting.py - UI changes →
app.py
# Install black
pip install black
# Format all Python files
black python/- Original R package and Shiny app: github.com/seanangio/aug15
- Python port: Streamlit implementation with equivalent functionality
Same as parent repository.