What Makes a Crash Fatal? — A FARS Traffic-Fatality Analysis

An exploratory and predictive analysis of 37,951 fatal U.S. traffic crashes from the federal FARS (Fatality Analysis Reporting System) dataset — asking which conditions (weather, lighting, road type, collision manner) actually drive fatal outcomes, and whether crash severity can be predicted.

Originally a collaborative University of Tennessee Data Analytics course project. This repo packages the analysis as a reproducible data-storytelling case study.

The question

It's tempting to assume bad weather and darkness cause the deadliest crashes. The data tells a more nuanced story — so the analysis works through descriptive breakdowns toward a single framing question: single-vehicle vs. multi-vehicle — what separates them, and can we predict it?

Key findings

Human error > environment. Clear weather accounts for the majority of fatal crashes, and single-vehicle crashes appear ~7.4% more than multi-vehicle — pointing at driver error, not weather, as the dominant factor.
Darkness matters, but not simply. Night conditions (lit + unlit) make up ~49% of fatal crashes vs. ~44.6% in daylight. Yet multi-vehicle crashes peak in daylight (56.6%), while single-vehicle crashes dominate the dark (64.4% on unlit roads) — likely because unlit hazards (trees, obstacles, pedestrians) are invisible at night, while other vehicles carry lights.
Road class shapes collision type. Arterial roads (53% of crashes) are the only class where multi-vehicle crashes outnumber single — as speed drops toward local roads, single-vehicle crashes rise.
Severity scales with vehicles involved. As fatalities climb 1 → 7, the share involving multiple vehicles rises from 44.4% to 100% — near-linearly.

The model

A Decision Tree classifier predicts single- vs. multi-vehicle crashes. The honest result:

Metric	Score
Accuracy	0.61
Precision	0.14
Recall	0.67
F1	0.23

The low precision exposes the real lesson: the data is heavily imbalanced (far more single-vehicle crashes), which limits a vanilla decision tree. Harmful-event type and road classification were the strongest predictors — consistent with the descriptive findings. Documenting why the model underperforms is the point: knowing the limits of a result matters as much as the result.

What this demonstrates

End-to-end workflow: cleaning → EDA → interpretation → modeling → honest evaluation
Turning 37k messy rows into a decision-relevant narrative, not just charts
Intellectual honesty about imbalance, correlation vs. causation, and model limits

Tech stack

Python · pandas · NumPy · scikit-learn (Decision Tree, metrics) · Matplotlib · Jupyter

Run it

pip install pandas numpy scikit-learn matplotlib jupyter
jupyter notebook fars_analysis.ipynb

Data: fars_data.csv (public FARS data — crash conditions only, no personal information).

_{Austin Stevens — Applied AI & Data Analytics, University of Tennessee, Knoxville. Originally a team course project; published as a portfolio case study.}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md
fars_analysis.ipynb		fars_analysis.ipynb
fars_data.csv		fars_data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What Makes a Crash Fatal? — A FARS Traffic-Fatality Analysis

The question

Key findings

The model

What this demonstrates

Tech stack

Run it

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

What Makes a Crash Fatal? — A FARS Traffic-Fatality Analysis

The question

Key findings

The model

What this demonstrates

Tech stack

Run it

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages