Indian Independence Day Speeches - Streamlit App

A Python port of the R Shiny app for analyzing Indian Independence Day speeches delivered annually on August 15th since 1947.

Note: The Streamlit app was created entirely with Claude Code.

Features

This interactive web application provides visualizations including:

Speech Length: Word count trends over time
Most Frequent Words: Top words after removing stopwords (with optional faceting)
Most Important Words: TF-IDF analysis to identify distinctive words by year
+/- Sentiment Words: Most frequent positive and negative words using NLTK opinion lexicon
Net Sentiment: Difference between positive and negative word counts over time
Specific Word Trend: Track any word's frequency across speeches

Prerequisites

Python 3.10 or higher
pip (Python package installer)

Installation

1. Clone or Navigate to Repository

cd /path/to/aug15/python

2. Create Virtual Environment

python3 -m venv venv

3. Activate Virtual Environment

On macOS/Linux:

source venv/bin/activate

On Windows:

venv\Scripts\activate

4. Install Dependencies

pip install -r requirements.txt

5. NLTK Data (Auto-downloaded)

The app will automatically download required NLTK data (stopwords and opinion lexicon) on first run. If you encounter issues, you can manually download:

import nltk
nltk.download('stopwords')
nltk.download('opinion_lexicon')

Running the App

Start the Streamlit Server

streamlit run app.py

The app will open automatically in your default web browser at http://localhost:8501.

Using the App

Filters (Sidebar):
- Adjust year range with the slider
- Select Prime Ministers to include
- Select political parties to include
- Click "Reset All Inputs" to restore defaults
Plot Options:
- Choose plot type from dropdown
- Conditional inputs appear based on plot type:
  - Number of words (for word frequency/importance plots)
  - Facet variable (for breaking down by year/PM/party)
  - Word to track (for specific word trend)
Main Display:
- Interactive plot with hover tooltips
- Explanation text below each plot

Project Structure

python/
├── app.py                   # Main Streamlit application
├── utils/
│   ├── __init__.py
│   ├── data_prep.py        # Data loading and filtering
│   ├── text_analysis.py    # Tokenization, TF-IDF, sentiment analysis
│   └── plotting.py         # Plotly visualization functions
├── data/
│   └── corpus.csv          # Symlink to ../inst/final_csv/corpus.csv
├── tests/
│   ├── __init__.py
│   ├── test_data_prep.py
│   └── test_text_analysis.py
├── requirements.txt        # Python dependencies
├── .python-version        # Python version specification
├── .gitignore            # Python-specific ignores
└── README.md             # This file

Running Tests

The project includes unit tests using pytest:

# Install pytest if not already installed
pip install pytest

# Run all tests
pytest tests/

# Run with verbose output
pytest -v tests/

# Run specific test file
pytest tests/test_data_prep.py

Technical Details

Data Processing

Tokenization: Regular expression-based word extraction
Stopwords: NLTK English stopword list
TF-IDF: scikit-learn's TfidfVectorizer
Sentiment: NLTK opinion lexicon (positive/negative word lists)

Visualization

All plots use Plotly for interactivity
Color scheme extracted from ggplot2 defaults to match R Shiny app:
- BJP: #F8766D (red-ish)
- INC: #7CAE00 (green)
- Janata Dal: #00BFC4 (cyan)
- Janata Party: #C77CFF (purple)

Performance

Corpus data is cached using @st.cache_data
Text processing is performed on-demand based on selected filters

Differences from R Shiny App

This Python port aims to replicate the R Shiny app as closely as possible, with these minor differences:

Sentiment Lexicon: Uses NLTK opinion lexicon instead of Bing lexicon (both provide positive/negative word classifications)
Reset Button: Simplified to "Clear Filters" functionality rather than full session state management
Styling: Close approximation of R Shiny theme using Streamlit's CSS customization

Troubleshooting

Port Already in Use

If port 8501 is already in use:

streamlit run app.py --server.port 8502

NLTK Download Issues

If auto-download fails, manually download required data:

python -c "import nltk; nltk.download('stopwords'); nltk.download('opinion_lexicon')"

Data File Not Found

Ensure the symlink is correctly set up:

ls -la data/corpus.csv
# Should show: corpus.csv -> ../../inst/final_csv/corpus.csv

If broken, recreate:

rm data/corpus.csv
ln -s "../../inst/final_csv/corpus.csv" "data/corpus.csv"

Development

Code Style

The project follows Python best practices:

PEP 8 formatting (can use black for auto-formatting)
Type hints for function signatures
Google-style docstrings

Adding Features

Data processing functions → utils/data_prep.py or utils/text_analysis.py
New plot types → utils/plotting.py
UI changes → app.py

Formatting Code

# Install black
pip install black

# Format all Python files
black python/

Credits

Original R package and Shiny app: github.com/seanangio/aug15
Python port: Streamlit implementation with equivalent functionality

License

Same as parent repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Indian Independence Day Speeches - Streamlit App

Features

Prerequisites

Installation

1. Clone or Navigate to Repository

2. Create Virtual Environment

3. Activate Virtual Environment

4. Install Dependencies

5. NLTK Data (Auto-downloaded)

Running the App

Start the Streamlit Server

Using the App

Project Structure

Running Tests

Technical Details

Data Processing

Visualization

Performance

Differences from R Shiny App

Troubleshooting

Port Already in Use

NLTK Download Issues

Data File Not Found

Development

Code Style

Adding Features

Formatting Code

Credits

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Indian Independence Day Speeches - Streamlit App

Features

Prerequisites

Installation

1. Clone or Navigate to Repository

2. Create Virtual Environment

3. Activate Virtual Environment

4. Install Dependencies

5. NLTK Data (Auto-downloaded)

Running the App

Start the Streamlit Server

Using the App

Project Structure

Running Tests

Technical Details

Data Processing

Visualization

Performance

Differences from R Shiny App

Troubleshooting

Port Already in Use

NLTK Download Issues

Data File Not Found

Development

Code Style

Adding Features

Formatting Code

Credits

License