Developed for the UIDAI Data Hackathon 2026, this project introduces a predictive, data-driven framework to transform Aadhaar from a static identity program into a dynamic, lifecycle-based service. LAIF enables the UIDAI to anticipate update demands, optimize center capacity, and proactively respond to societal shifts.
- Team Lead: Ayush Singh
- Institution: Guru Nanak Institute of Technology (GNIT), Kolkata
The current Aadhaar update system is largely reactive, leading to predictable overcrowding at centers during specific months (e.g., March and July). Resources are often unevenly distributed across states, and enrollment/update cycles lack a planned, anticipatory approach.
Due to GitHub's file size constraints (100MB limit), the raw and processed Aadhaar datasets are hosted externally to maintain Git history integrity.
- Download the dataset from Kaggle: https://www.kaggle.com/datasets/ayushs1ngh/uidai-2026-data-hackathon/
- Extract the downloaded
.zipfile. - Place the extracted
raw/andprocessed/folders directly into thedata/directory of this repository.
Instead of merely tracking historical volumes, LAIF asks: "WHO updated WHAT, WHY, and WHEN—and what will happen NEXT?"
- Lifecycle Segmentation: Categorizes behavior by age (0-5: Enrollment; 5 & 15: Mandatory Biometric Updates; 17+: Demographic corrections).
- Temporal Demand Forecasting: Links update spikes to real-world triggers like fiscal deadlines in March and school admissions in September.
- Operational Efficiency Index: Introduces "Average Entries per Center per Day" as a National KPI to identify and manage overloaded regional hotspots.
- Language: Python (Pandas, NumPy, Pathlib)
- Visualization: Power BI, Matplotlib
- Data Cleaning: Standardized inconsistent date formats and rigorously validated 766+ districts against official government references.
- Scope: Analyzed massive-scale enrollment, biometric, and demographic datasets from March to December 2025.
UIDAI-DATA-HACKATHON-26/
├── dashboards/ # Power BI dashboard for trend analysis
│ └── data_visualization.pbix
├── data/ # Hosted externally on Kaggle (See Data Setup)
│ ├── processed/ # Cleaned & merged output CSVs
│ └── raw/ # Immutable original datasets
├── docs/ # Presentations and analytical reports
│ └── data_cleaning_insights_report.xlsx
├── notebooks/ # Jupyter notebooks for EDA
│ └── biometric_demographic_analysis.ipynb
├── src/ # Production Python scripts
│ ├── __init__.py
│ ├── biometric/
│ │ ├── cleaning_3_states_biometric.py
│ │ ├── cleaning_4_districts_biometric.py
│ │ └── implementation_5_biometric_merging.py
│ ├── demography/
│ │ ├── cleaning_5_states_demography.py
│ │ ├── cleaning_6_districts_demography.py
│ │ └── implementation_6_demography_merging.py
│ └── enrolment/
│ ├── cleaning_1_states_enrolment.py
│ ├── cleaning_2_districts_enrolment.py
│ └── implementation_4_enrolment_merging.py
├── .gitignore # Ignores large data files and system caches
├── LICENSE # MIT License
├── README.md # Project documentation
└── requirements.txt # Python dependencies