Static APK Analysis β Feature Engineering β ML Malware Detection β Visual Dashboard
This project is a complete end-to-end pipeline to analyze Android applications (APKs), extract static features, classify malware, and display a detailed security report through a modern UI.
Built using:
- Python
- Streamlit Dashboard
- Drebin Malware Dataset
- RandomForest (best model)
- AhoβCorasick Feature Scanner
- Static Manifest + Permission Parser
- Custom Feature Engineering Pipeline
This system performs end-to-end Android malware analysis, fully automated and fully visual.
Everything runs inside one Streamlit application:
streamlit run streamlit_app.py
Inside the dashboard, you get:
- Connect your Android device
- Pull installed APKs (USB debugging)
- Run static analysis
- Run feature engineering
- Run ML malware prediction (Drebin-trained RandomForest)
- Generate a complete device report JSON
- Select device β Select app
- View metadata (package name, version, size, SDK)
- View ML malware probability (colored risk bar)
- See permissions categorized by risk
- See dangerous APIs detected
- View full JSON output
A beautiful horizontal bar:
- < 30% β π© Safe
- 30β60% β π¨ Suspicious
-
60% β π₯ High Risk
- Total permissions
- High-risk
- Medium-risk
- Low-risk
Each permission is displayed with:
- Color-coded risk strip
- Icon
- Technical name
- Category
- AhoβCorasick feature hits
- Binder calls
- Crypto calls
- Process injections
- Intent receivers
Complete technical report for researchers.
| Metric | Score |
|---|---|
| π΅ Accuracy | 94.08% |
| π£ F1-Score | 93.42% |
| π ROC-AUC | 98.46% |
| π’ PR-AUC | 98.11% |
Directly predicts:
Used to compute the UI risk visualization.
streamlit_app.py β Main multi-page UI
βββ Page: Pull & Analyze (pull_analyze_st.py)
βββ Page: App Deep Analysis (app.py)
β
βββ Util: new_app.py # Static analyzer
βββ Util: feature_pipeline.py # Feature engineering
βββ Util: final_pipeline.py # Analysis + ML workflow
βββ model/best_model949398.pkl # RandomForest model
βββ categories.json # Permission risk labels
βββ analysis_report/*.json # Generated device reports
βββ dataset/*.csv # ML feature CSVs
Everything is orchestrated inside Streamlit, no CLI scripts required.
git clone https://github.com/tejas-130704/Malware-Detection.git
cd Malware-Detectionpip install -r requirements.txtSettings β Developer Options β USB Debugging ON
streamlit run streamlit_app.py
Using USB debugging + ADB, the system automatically extracts all installed apps:
pull_analyze_st.py
Runs a custom-built analyzer (new_app.py) which extracts:
- AndroidManifest.xml
- Permissions
- Intents
- Dangerous API signatures & patterns
- DEX string-based detection via AhoβCorasick
- SDK info
- App metadata (name, package, size, version)
feature_pipeline.py converts raw extracted data into a structured ML-ready vector:
- Permission binaries
- API-call existence flags
- Intent signals
- Weight-based features (Drebin-style)
- Category scores (from
categories.json)
Final classification is done through the best performing model:
Probability output is used as malware risk score (0%β100%).
final_pipeline.py scans a device folder and produces a JSON report:
analysis_report/<DEVICE_ID>.json
Each APK entry contains:
"com_example_app.apk": {
"apk_metadata": {
"package": "com.example.app",
"version": "1.2.3",
"min_sdk": "26",
"target_sdk": "33",
"apk_size": 21500321
},
"permissions_true": [..],
"intents_true": [..],
"features_true": [..],
"malware_prediction": 0.492,
"file_name": "com_example_app_results.csv"
} ββββββββββββββββββββββββββββ
β Android Device β
β (USB Debugging Enabled) β
βββββββββββββββββ¬βββββββββββ
β adb pull
βΌ
ββββββββββββββββββββββββββββββββββ
β pull_analyze_st.py β
β Pull all APKs from the device β
β Save inside /extracted_apks β
ββββββββββββββββββββ¬ββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββ
β new_app.py β
β Static APK Analyzer β
β β’ Extract Manifest XML β
β β’ Parse Permissions & Intents β
β β’ AhoβCorasick DEX Feature Scan β
β β’ SDK, App Size, Version Info β
ββββββββββββββββββββ¬ββββββββββββββββββββββββ
β CSV output
βΌ
ββββββββββββββββββββββββββββββββββββ
β feature_pipeline.py β
β Build ML Feature Vector β
β β’ Drebin-style Feature Mapping β
β β’ Permission β Feature Encoding β
β β’ API β Feature Encoding β
β β’ categories.json Risk Weights β
ββββββββββββββββββββ¬βββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββ
β RandomForest ML Malware Model β
β best_model949398.pkl (Trained on β
β Drebin Dataset) β
β Outputs: malware_prediction (0β1) β
ββββββββββββββββββββ¬βββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββ
β final_pipeline.py β
β Combine (Analyzer + Features + ML) β
β Generate JSON report per device: β
β analysis_report/<DEVICE_ID>.json β
ββββββββββββββββββββ¬ββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Streamlit Dashboard β
β streamlit_app.py β
β β’ Select Device β
β β’ Select App β
β β’ Show Metadata (package, version, size, SDK) β
β β’ Malware ML Probability Bar (GreenβRed) β
β β’ High / Medium / Low Risk Permissions β
β β’ Permission Cards with Color Indicators β
β β’ Feature & Intent Breakdown β
β β’ Full JSON App Report β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ