No-code CSV and Excel data quality audits for analysts, auditors, and teams that need a shareable report before analysis, submission, or review.
AuditIQ scans a tabular dataset, flags common quality issues, writes a plain-language narrative, and generates a downloadable PDF report. It works without an AI key by using a deterministic local narrative. If you add a Gemini API key, the report narrative can be generated by Gemini.
- Missing values, empty rows, and high-null columns
- Duplicate rows and inferred duplicate keys
- Mixed types, invalid dates, and malformed identifiers
- Future dates and impossible numeric values
- Inconsistent casing, whitespace, and suspicious formats
- Outliers and high value concentration
- Cross-field issues such as start date after end date
- Optional schema violations for required columns, types, ranges, and formats
The GIF below shows the local Streamlit flow: upload data, run checks, review the findings, and download a PDF.
View a sample generated report: Titanic PDF report.
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
streamlit run app.pyOn macOS or Linux:
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
streamlit run app.pyThen click Try sample data to audit samples/Titanic-Dataset.csv.
AuditIQ does not require an AI key to run. Without a key, it generates a deterministic narrative from the check results.
To enable Gemini-generated narratives, create a .env file in the project root:
GEMINI_API_KEY=your_key_hereThe app uses gemini-2.5-flash-lite by default.
A schema file is optional. Upload a CSV or Excel file with this shape:
column_name,expected_type,min_value,max_value,allowed_formats,notes
Age,numeric,0,120,,
Embarked,text,,,,
PassengerId,identifier,,,^\d+$,Expected numeric identifierSupported expected_type values are:
textnumericdateidentifier
For the included Titanic sample, the engine loads 891 rows and 12 columns and returns findings such as:
Age: 177 blank or null values
Cabin: 687 blank or null values
Embarked: 2 blank or null values
The app then presents a quality score, issue summaries, a narrative explanation, and a downloadable PDF report.
pip install -r requirements-dev.txt
pytest -qAuditIQ/
|-- app.py # Streamlit app
|-- ai/
| `-- narrator.py # Gemini and deterministic narratives
|-- engine/
| |-- parser.py # CSV/XLSX loading
| |-- aggregator.py # Registry-driven check runner
| |-- schema.py # Optional schema loader
| `-- checks/ # 18 inference checks plus schema checks
|-- report/
| `-- pdf_builder.py # PDF report generation
|-- samples/
| `-- Titanic-Dataset.csv # Sample dataset
|-- styles/
| `-- app.css # Streamlit styling
|-- tests/
| `-- engine/checks/ # Unit tests for check behavior
|-- requirements.txt
`-- requirements-dev.txt
Each audit produces a score from 0 to 100:
score = 100 - sum((affected_rows / total_rows) * severity_weight)
Severity weights:
bad= 1.0warn= 0.6info= 0.2
Scores above 85 usually indicate minor issues. Scores below 60 indicate data quality problems that should be addressed before downstream use.
.csv.xlsx- Tested with Python 3.11+
- Streamlit upload limit is configured as 200 MB in the UI copy
- AuditIQ loads the full dataset into memory after chunked CSV reading, so very large files still need enough RAM.
- The schema format is intentionally simple and does not replace full data contracts.
- AI narratives are optional and depend on external Gemini API availability when enabled.
- Results are heuristics for triage and review; they are not a substitute for domain-specific validation.
MIT License. See LICENSE.
