This Python project is designed to analyze web server access logs (Common Log Format) and identify potential security anomalies, such as Brute-Force attempts, Web Scanning activity, and Server Issues. By leveraging built-in Python modules (re for parsing, collections for counting), the tool quickly extracts statistical data from log files and generates an actionable security report.
-
Log Parsing: Uses Regular Expressions (re) to process the Common Log Format typically generated by Apache/Nginx web servers.
-
Statistical Summary: Provides core metrics, including total requests, total bytes transferred, and HTTP status code distribution.
-
Anomaly Detection:
Brute-Force Detection: Flags repetitive instances of 401 Unauthorized or 403 Forbidden errors from the same IP address.
Web Scanning: Identifies aggressive probing for sensitive files based on a high volume of 404 Not Found errors (e.g., searching for /wp-config.php, .env files).
Volume Anomaly: Lists IP addresses that exceed a defined request threshold, indicating potential DDoS or heavy automated scraping.
-
Clean Reporting: Presents analysis results in a structured, easy-to-read format in the terminal.
Prerequisites
- Python 3.x (Uses standard library modules; no external packages required.)
Running the Tool
-
Clone the Repository:
git clone https://github.com/YourUsername/LogAnalyzer.git cd LogAnalyzer -
Prepare the Log File:
Place your log data file, named access.log, in the project directory. (Note: The repository includes an example access.log file with simulated attack scenarios for testing.)
-
Execute the Analyzer:
python log_file_analyzer.py
The project is structured around the following key Python functions:
-
analyze_log_file(file_path): Reads the log file, handles decoding issues (encoding='latin-1'), parses each line using Regex, and computes core statistics (status_counter, ip_counter).
-
detect_anomalies(log_data): Scans the collected statistics (e.g., 5xx status codes) against predefined thresholds to identify large-scale operational failures.
-
error_based_analysis(status_counter): Filters and highlights 4xx and 5xx status codes to pinpoint security-relevant risks (Brute-Force vs. Scanning).
-
generate_report(...): Compiles all the findings into the final, readable report structure.
This project is intended for educational and research purposes only. Performing brute-force attacks on systems without explicit permission is illegal. The author is not responsible for any misuse of this code.