An end-to-end machine learning pipeline forecasting the 140 constituencies of the 2026 Kerala Legislative Assembly Elections.
This project simulates and predicts the electoral outcomes by fusing historical results, parliamentary momentum, local body trends, demographic data, and regional political issues.
The pipeline consists of two main components:
create_dataset.py: A heuristic engine that synthesizes a comprehensive 43-feature dataset for all 140 constituencies. It combines baseline 2021 results with 2024 Lok Sabha momentum, 2025 Local Body trends, demographic makeup, and constituency-specific issue impacts to generate projected vote shares.train.py: A robust Neural Network pipeline that trains on these features to predict the winning alliance (LDF, UDF, NDA, OTHERS) and exact vote shares.
Predicting elections with data is challenging—especially in Kerala, where there are only 140 constituencies (which means a very small dataset) and where the dominant parties (LDF and UDF) win almost every seat, making it incredibly hard for an AI to learn how third fronts like the NDA or independent candidates might win.
To tackle these unique challenges, our approach uses a few clever strategies:
Because our dataset is incredibly small (only 140 rows), training just one AI model is risky. It might just memorize the data instead of learning real patterns. To fix this, we train 15 separate models on different randomized slices of the state. When predicting the final results, we ask all 15 models to vote on the outcome. By averaging their predictions together, we get a much more stable, reliable, and highly confident forecast.
Historically, the NDA rarely wins seats in Kerala. If we only ask the AI to predict "Who wins?", it will almost never see enough examples to effectively learn what an NDA victory looks like.
Instead, we ask the AI to do two things at once:
- Predict the winning party.
- Predict the exact vote share percentage for every party.
Because every party gets some vote share in every constituency, the AI constantly learns what makes a party perform well, even in places where they ultimately lose. By learning how to calculate vote shares, the model organically figures out how traditional strongholds might tip toward a third party in extremely close races.
If left to its own devices, an AI will naturally ignore rare events (like an independent candidate winning a seat) to focus on the big, common patterns. During training, we use specialized math techniques that force the AI to pay extra attention to these incredibly rare scenarios, keeping it from taking the easy way out and predicting LDF/UDF every single time.
It's important to understand what this AI is actually doing behind the scenes.
Usually, to build a "true" predictive AI, you feed it historical data (like 2011 election factors) and ask it to predict the 2016 outcome. Once it learns those rules against hard historical truth, you use it to predict the future.
However, because we don't have perfectly paired historical data stretching back decades, we had to be creative. Our dataset builder (create_dataset.py) acts as a human logic engine: it takes the most recent available data (2021 results, 2024 parliamentary momentum, etc.) and uses documented political science formulas to estimate a "projected truth."
Our neural network then trains on this data. What it's actually doing is learning to deeply mimic that political human logic, smoothing out the hard math, and finding hidden relationships between demographics, geography, and political momentum. It acts as an incredible digital strategist applying complex political logic statewide, rather than an independent crystal ball.
Generate the dataset:
python create_dataset.pyTrain the ensemble and output predictions:
python train.pyThe final output is saved to predictions_2026.csv.