Skip to content

tejas-130704/Malware-Detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🌐 Android Malware Analysis & ML Risk Scoring Dashboard

Static APK Analysis β†’ Feature Engineering β†’ ML Malware Detection β†’ Visual Dashboard

This project is a complete end-to-end pipeline to analyze Android applications (APKs), extract static features, classify malware, and display a detailed security report through a modern UI.

Built using:

  • Python
  • Streamlit Dashboard
  • Drebin Malware Dataset
  • RandomForest (best model)
  • Aho–Corasick Feature Scanner
  • Static Manifest + Permission Parser
  • Custom Feature Engineering Pipeline

πŸš€ What This Project Does

This system performs end-to-end Android malware analysis, fully automated and fully visual.

Everything runs inside one Streamlit application:

streamlit run streamlit_app.py

Inside the dashboard, you get:

πŸ”Ή Page 1 β€” Pull, Analyze & Predict

Screenshot 2025-11-28 184927
  • Connect your Android device
  • Pull installed APKs (USB debugging)
  • Run static analysis
  • Run feature engineering
  • Run ML malware prediction (Drebin-trained RandomForest)
  • Generate a complete device report JSON

πŸ”Ή Page 2 β€” App Deep Analysis

Screenshot 2025-11-28 184740
  • Select device β†’ Select app
  • View metadata (package name, version, size, SDK)
  • View ML malware probability (colored risk bar)
  • See permissions categorized by risk
  • See dangerous APIs detected
  • View full JSON output

πŸ”Ή Machine Learning Malware Probability

Screenshot 2025-11-28 184844

A beautiful horizontal bar:

  • < 30% β†’ 🟩 Safe
  • 30–60% β†’ 🟨 Suspicious
  • 60% β†’ πŸŸ₯ High Risk

πŸ”Ή Permission Overview

Screenshot 2025-11-28 184753
  • Total permissions
  • High-risk
  • Medium-risk
  • Low-risk

Each permission is displayed with:

  • Color-coded risk strip
  • Icon
  • Technical name
  • Category

πŸ”Ή Feature & Intent Analysis

Screenshot 2025-11-28 184816
  • Aho–Corasick feature hits
  • Binder calls
  • Crypto calls
  • Process injections
  • Intent receivers

πŸ”Ή Raw JSON Debug View

Complete technical report for researchers.


🧠 Machine Learning Model

RandomForestClassifier β€” Tuned (Drebin-inspired features)

Metric Score
πŸ”΅ Accuracy 94.08%
🟣 F1-Score 93.42%
🟠 ROC-AUC 98.46%
🟒 PR-AUC 98.11%

Directly predicts:

malware_probability β†’ 0 to 1

Used to compute the UI risk visualization.


πŸ— Project Architecture

streamlit_app.py      β†’ Main multi-page UI
    β”œβ”€β”€ Page: Pull & Analyze (pull_analyze_st.py)
    β”œβ”€β”€ Page: App Deep Analysis (app.py)
    β”‚
    β”œβ”€β”€ Util: new_app.py               # Static analyzer
    β”œβ”€β”€ Util: feature_pipeline.py      # Feature engineering
    β”œβ”€β”€ Util: final_pipeline.py        # Analysis + ML workflow
    β”œβ”€β”€ model/best_model949398.pkl     # RandomForest model
    β”œβ”€β”€ categories.json                # Permission risk labels
    β”œβ”€β”€ analysis_report/*.json         # Generated device reports
    β”œβ”€β”€ dataset/*.csv                  # ML feature CSVs

Everything is orchestrated inside Streamlit, no CLI scripts required.


βš™οΈ Installation

1️⃣ Clone the repo

git clone https://github.com/tejas-130704/Malware-Detection.git
cd Malware-Detection

2️⃣ Install Dependencies

pip install -r requirements.txt

3️⃣ Enable USB debugging on Android

Settings β†’ Developer Options β†’ USB Debugging ON

4️⃣ Run the Application

streamlit run streamlit_app.py

πŸ“· Screenshots

Screenshot 2025-11-28 184927 Screenshot 2025-11-28 184740 Screenshot 2025-11-28 184628 Screenshot 2025-11-28 184658

πŸš€ Features Overview

βœ” 1. Pull APKs from any Android device

Using USB debugging + ADB, the system automatically extracts all installed apps: pull_analyze_st.py

βœ” 2. Static APK Analysis

Runs a custom-built analyzer (new_app.py) which extracts:

  • AndroidManifest.xml
  • Permissions
  • Intents
  • Dangerous API signatures & patterns
  • DEX string-based detection via Aho–Corasick
  • SDK info
  • App metadata (name, package, size, version)

βœ” 3. Feature Engineering

feature_pipeline.py converts raw extracted data into a structured ML-ready vector:

  • Permission binaries
  • API-call existence flags
  • Intent signals
  • Weight-based features (Drebin-style)
  • Category scores (from categories.json)

βœ” 4. ML Malware Classification

Final classification is done through the best performing model:

Probability output is used as malware risk score (0%–100%).

βœ” 5. Device-wide Report Generation

final_pipeline.py scans a device folder and produces a JSON report:

analysis_report/<DEVICE_ID>.json

Each APK entry contains:

"com_example_app.apk": {
  "apk_metadata": {
    "package": "com.example.app",
    "version": "1.2.3",
    "min_sdk": "26",
    "target_sdk": "33",
    "apk_size": 21500321
  },
  "permissions_true": [..],
  "intents_true": [..],
  "features_true": [..],
  "malware_prediction": 0.492,
  "file_name": "com_example_app_results.csv"
}

🧭 System Workflow Diagram

                           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                           β”‚    Android Device        β”‚
                           β”‚  (USB Debugging Enabled) β”‚
                           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                           β”‚  adb pull
                                           β–Ό
                        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                        β”‚      pull_analyze_st.py        β”‚
                        β”‚ Pull all APKs from the device  β”‚
                        β”‚ Save inside /extracted_apks    β”‚
                        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                           β”‚
                                           β–Ό
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚               new_app.py                 β”‚
                    β”‚        Static APK Analyzer               β”‚
                    β”‚  β€’ Extract Manifest XML                  β”‚
                    β”‚  β€’ Parse Permissions & Intents           β”‚
                    β”‚  β€’ Aho–Corasick DEX Feature Scan         β”‚
                    β”‚  β€’ SDK, App Size, Version Info           β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                       β”‚ CSV output
                                       β–Ό
                        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                        β”‚       feature_pipeline.py        β”‚
                        β”‚   Build ML Feature Vector         β”‚
                        β”‚ β€’ Drebin-style Feature Mapping    β”‚
                        β”‚ β€’ Permission β†’ Feature Encoding   β”‚
                        β”‚ β€’ API β†’ Feature Encoding          β”‚
                        β”‚ β€’ categories.json Risk Weights    β”‚
                        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                           β”‚
                                           β–Ό
                      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                      β”‚    RandomForest ML Malware Model        β”‚
                      β”‚    best_model949398.pkl (Trained on     β”‚
                      β”‚           Drebin Dataset)               β”‚
                      β”‚   Outputs: malware_prediction (0–1)     β”‚
                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                         β”‚
                                         β–Ό
                   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                   β”‚         final_pipeline.py                  β”‚
                   β”‚  Combine (Analyzer + Features + ML)        β”‚
                   β”‚  Generate JSON report per device:          β”‚
                   β”‚  analysis_report/<DEVICE_ID>.json          β”‚
                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                      β”‚
                                      β–Ό
             β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
             β”‚                   Streamlit Dashboard                  β”‚
             β”‚                    streamlit_app.py                    β”‚
             β”‚  β€’ Select Device                                       β”‚
             β”‚  β€’ Select App                                          β”‚
             β”‚  β€’ Show Metadata (package, version, size, SDK)         β”‚
             │  ‒ Malware ML Probability Bar (Green→Red)              │
             β”‚  β€’ High / Medium / Low Risk Permissions                β”‚
             β”‚  β€’ Permission Cards with Color Indicators              β”‚
             β”‚  β€’ Feature & Intent Breakdown                          β”‚
             β”‚  β€’ Full JSON App Report                                β”‚
             β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

About

A complete Android malware analysis system that pulls APKs from a connected device, performs static analysis, generates ML features, predicts malware risk using a trained RandomForest model, and displays everything through an interactive Streamlit dashboard with detailed per-app security insights.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages