🌐 Android Malware Analysis & ML Risk Scoring Dashboard

Static APK Analysis → Feature Engineering → ML Malware Detection → Visual Dashboard

This project is a complete end-to-end pipeline to analyze Android applications (APKs), extract static features, classify malware, and display a detailed security report through a modern UI.

Built using:

Python
Streamlit Dashboard
Drebin Malware Dataset
RandomForest (best model)
Aho–Corasick Feature Scanner
Static Manifest + Permission Parser
Custom Feature Engineering Pipeline

🚀 What This Project Does

This system performs end-to-end Android malware analysis, fully automated and fully visual.

Everything runs inside one Streamlit application:

streamlit run streamlit_app.py

Inside the dashboard, you get:

🔹 Page 1 — Pull, Analyze & Predict

Connect your Android device
Pull installed APKs (USB debugging)
Run static analysis
Run feature engineering
Run ML malware prediction (Drebin-trained RandomForest)
Generate a complete device report JSON

🔹 Page 2 — App Deep Analysis

Select device → Select app
View metadata (package name, version, size, SDK)
View ML malware probability (colored risk bar)
See permissions categorized by risk
See dangerous APIs detected
View full JSON output

🔹 Machine Learning Malware Probability

A beautiful horizontal bar:

< 30% → 🟩 Safe
30–60% → 🟨 Suspicious
60% → 🟥 High Risk

🔹 Permission Overview

Total permissions
High-risk
Medium-risk
Low-risk

Each permission is displayed with:

Color-coded risk strip
Icon
Technical name
Category

🔹 Feature & Intent Analysis

Aho–Corasick feature hits
Binder calls
Crypto calls
Process injections
Intent receivers

🔹 Raw JSON Debug View

Complete technical report for researchers.

🧠 Machine Learning Model

RandomForestClassifier — Tuned (Drebin-inspired features)

Metric	Score
🔵 Accuracy	94.08%
🟣 F1-Score	93.42%
🟠 ROC-AUC	98.46%
🟢 PR-AUC	98.11%

Directly predicts:

`malware_probability` → 0 to 1

Used to compute the UI risk visualization.

🏗 Project Architecture

streamlit_app.py      → Main multi-page UI
    ├── Page: Pull & Analyze (pull_analyze_st.py)
    ├── Page: App Deep Analysis (app.py)
    │
    ├── Util: new_app.py               # Static analyzer
    ├── Util: feature_pipeline.py      # Feature engineering
    ├── Util: final_pipeline.py        # Analysis + ML workflow
    ├── model/best_model949398.pkl     # RandomForest model
    ├── categories.json                # Permission risk labels
    ├── analysis_report/*.json         # Generated device reports
    ├── dataset/*.csv                  # ML feature CSVs

Everything is orchestrated inside Streamlit, no CLI scripts required.

⚙️ Installation

1️⃣ Clone the repo

git clone https://github.com/tejas-130704/Malware-Detection.git
cd Malware-Detection

2️⃣ Install Dependencies

pip install -r requirements.txt

3️⃣ Enable USB debugging on Android

Settings → Developer Options → USB Debugging ON

4️⃣ Run the Application

streamlit run streamlit_app.py

📷 Screenshots

🚀 Features Overview

✔ 1. Pull APKs from any Android device

Using USB debugging + ADB, the system automatically extracts all installed apps: pull_analyze_st.py

✔ 2. Static APK Analysis

Runs a custom-built analyzer (new_app.py) which extracts:

AndroidManifest.xml
Permissions
Intents
Dangerous API signatures & patterns
DEX string-based detection via Aho–Corasick
SDK info
App metadata (name, package, size, version)

✔ 3. Feature Engineering

feature_pipeline.py converts raw extracted data into a structured ML-ready vector:

Permission binaries
API-call existence flags
Intent signals
Weight-based features (Drebin-style)
Category scores (from categories.json)

✔ 4. ML Malware Classification

Final classification is done through the best performing model:

Probability output is used as malware risk score (0%–100%).

✔ 5. Device-wide Report Generation

final_pipeline.py scans a device folder and produces a JSON report:

analysis_report/<DEVICE_ID>.json

Each APK entry contains:

"com_example_app.apk": {
  "apk_metadata": {
    "package": "com.example.app",
    "version": "1.2.3",
    "min_sdk": "26",
    "target_sdk": "33",
    "apk_size": 21500321
  },
  "permissions_true": [..],
  "intents_true": [..],
  "features_true": [..],
  "malware_prediction": 0.492,
  "file_name": "com_example_app_results.csv"
}

🧭 System Workflow Diagram

                           ┌──────────────────────────┐
                           │    Android Device        │
                           │  (USB Debugging Enabled) │
                           └───────────────┬──────────┘
                                           │  adb pull
                                           ▼
                        ┌────────────────────────────────┐
                        │      pull_analyze_st.py        │
                        │ Pull all APKs from the device  │
                        │ Save inside /extracted_apks    │
                        └──────────────────┬─────────────┘
                                           │
                                           ▼
                    ┌──────────────────────────────────────────┐
                    │               new_app.py                 │
                    │        Static APK Analyzer               │
                    │  • Extract Manifest XML                  │
                    │  • Parse Permissions & Intents           │
                    │  • Aho–Corasick DEX Feature Scan         │
                    │  • SDK, App Size, Version Info           │
                    └──────────────────┬───────────────────────┘
                                       │ CSV output
                                       ▼
                        ┌──────────────────────────────────┐
                        │       feature_pipeline.py        │
                        │   Build ML Feature Vector         │
                        │ • Drebin-style Feature Mapping    │
                        │ • Permission → Feature Encoding   │
                        │ • API → Feature Encoding          │
                        │ • categories.json Risk Weights    │
                        └──────────────────┬────────────────┘
                                           │
                                           ▼
                      ┌─────────────────────────────────────────┐
                      │    RandomForest ML Malware Model        │
                      │    best_model949398.pkl (Trained on     │
                      │           Drebin Dataset)               │
                      │   Outputs: malware_prediction (0–1)     │
                      └──────────────────┬──────────────────────┘
                                         │
                                         ▼
                   ┌────────────────────────────────────────────┐
                   │         final_pipeline.py                  │
                   │  Combine (Analyzer + Features + ML)        │
                   │  Generate JSON report per device:          │
                   │  analysis_report/<DEVICE_ID>.json          │
                   └──────────────────┬─────────────────────────┘
                                      │
                                      ▼
             ┌────────────────────────────────────────────────────────┐
             │                   Streamlit Dashboard                  │
             │                    streamlit_app.py                    │
             │  • Select Device                                       │
             │  • Select App                                          │
             │  • Show Metadata (package, version, size, SDK)         │
             │  • Malware ML Probability Bar (Green→Red)              │
             │  • High / Medium / Low Risk Permissions                │
             │  • Permission Cards with Color Indicators              │
             │  • Feature & Intent Breakdown                          │
             │  • Full JSON App Report                                │
             └────────────────────────────────────────────────────────┘

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌐 Android Malware Analysis & ML Risk Scoring Dashboard

🚀 What This Project Does

🔹 Page 1 — Pull, Analyze & Predict

🔹 Page 2 — App Deep Analysis

🔹 Machine Learning Malware Probability

🔹 Permission Overview

🔹 Feature & Intent Analysis

🔹 Raw JSON Debug View

🧠 Machine Learning Model

RandomForestClassifier — Tuned (Drebin-inspired features)

`malware_probability` → 0 to 1

🏗 Project Architecture

⚙️ Installation

1️⃣ Clone the repo

2️⃣ Install Dependencies

3️⃣ Enable USB debugging on Android

4️⃣ Run the Application

📷 Screenshots

🚀 Features Overview

✔ 1. Pull APKs from any Android device

✔ 2. Static APK Analysis

✔ 3. Feature Engineering

✔ 4. ML Malware Classification

✔ 5. Device-wide Report Generation

🧭 System Workflow Diagram

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
model		model
.gitignore		.gitignore
README.md		README.md
app.py		app.py
categories.json		categories.json
feature_pipeline.py		feature_pipeline.py
final_pipeline.py		final_pipeline.py
new_app.py		new_app.py
pull_analyze_st.py		pull_analyze_st.py
requirements.txt		requirements.txt
streamlit_app.py		streamlit_app.py

Folders and files

Latest commit

History

Repository files navigation

🌐 Android Malware Analysis & ML Risk Scoring Dashboard

🚀 What This Project Does

🔹 Page 1 — Pull, Analyze & Predict

🔹 Page 2 — App Deep Analysis

🔹 Machine Learning Malware Probability

🔹 Permission Overview

🔹 Feature & Intent Analysis

🔹 Raw JSON Debug View

🧠 Machine Learning Model

RandomForestClassifier — Tuned (Drebin-inspired features)

malware_probability → 0 to 1

🏗 Project Architecture

⚙️ Installation

1️⃣ Clone the repo

2️⃣ Install Dependencies

3️⃣ Enable USB debugging on Android

4️⃣ Run the Application

📷 Screenshots

🚀 Features Overview

✔ 1. Pull APKs from any Android device

✔ 2. Static APK Analysis

✔ 3. Feature Engineering

✔ 4. ML Malware Classification

✔ 5. Device-wide Report Generation

🧭 System Workflow Diagram

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`malware_probability` → 0 to 1

Packages