Skip to content

Latest commit

 

History

History
252 lines (182 loc) · 6.56 KB

File metadata and controls

252 lines (182 loc) · 6.56 KB

TODO for Main Branch (Modernization)

Branch: main (default) Purpose: Modern, production-ready implementation Status: Ready to start modernization


Session Complete: Archive Branches ✅

Archive work is DONE and merged to main:

  • archive-2020-research - Original code preserved with all flaws
  • archive-2020-fixed - Bug fixes with 2020 dependencies
  • ✅ All critical bugs fixed and tested
  • ✅ Data leakage validated (minimal 1.2% impact)
  • ✅ Comprehensive documentation created

Key finding: Original 2020 research was scientifically sound. The 99%+ accuracy is legitimate due to highly distinctive botnet traffic patterns.


Session Complete: Modernization Planning ✅

Date: 2025-10-26

Completed:

  • ✅ Comprehensive modernization roadmap (1600+ lines)
  • ✅ Architecture documentation created
  • ✅ Graphviz diagram generation scripts (current + target systems)
  • ✅ 5 new GitHub issues created (#25-#29)
  • ✅ Updated existing issues (#17, #22, #23)

Key Documents Created:

  • docs/MODERNIZATION_ROADMAP.md - Complete 7-phase plan
  • docs/MODERNIZATION_SUMMARY.md - Session summary
  • docs/architecture/ARCHITECTURE.md - System architecture
  • docs/architecture/diagrams/ - Diagram generation scripts

GitHub Issues Created:

  • #25: Architecture Visualization with Graphviz
  • #26: Directory Restructuring - Modern src/ Layout
  • #27: GitHub Actions CI/CD Pipeline
  • #28: Implement Federated Learning with Flower
  • #29: Comprehensive Documentation Overhaul

Sprint 1 Complete: Architecture Visualization ✅

Date Completed: 2025-10-26 Committed: d56d93e Pushed: ✅ GitHub Priority: HIGH Status: COMPLETED

Phase 1.1: Architecture Visualization

Issue: #25 ✅

Completed Tasks:

  • Environment created (botnet-modern: Python 3.12.12, TF 2.19.1)
  • Graphviz installed (system binary + Python wrapper)
  • Generated current system diagrams (4 diagrams)
  • Generated target system diagrams (7 diagrams)
  • Reviewed and validated diagram quality
  • Integrated diagrams into README.md
  • Updated badges to reflect modernization status
  • Cleaned up repository (removed SVG files, kept PNG + .gv source)

Deliverables:

  • 11 PNG diagrams with clear naming (current_* vs target_*)
  • Graphviz .gv sources (local only, gitignored)
  • Updated README with Architecture section and badges
  • Comprehensive planning docs (ROADMAP, STACK, etc.)
  • Environment: Python 3.12.12, TensorFlow 2.19.1, Flower 1.22

Commands for Regeneration:

# Activate modern environment
conda activate botnet-modern

# Regenerate all diagrams
cd docs/architecture/diagrams
python generate_diagrams.py

# Output:
# - current_*.png (4 files in images/current/)
# - target_*.png (7 files in images/)
# - *.gv sources (local only, gitignored)

Repository Cleanup:

  • 22 files removed (SVG, .bat/.sh, summaries)
  • Clear naming convention (current_* vs target_*)
  • .gv files local only (not committed)

Current Sprint: Sprint 2 (Weeks 3-4) - Directory Restructuring

Priority: CRITICAL Status: Ready to start

Phase 2: Directory Restructuring

Issue: #26

Tasks:

  • Review target structure in MODERNIZATION_ROADMAP.md
  • Create src/ directory structure
  • Move anomaly-detection/ → src/anomaly_detection/
  • Move classification/ → src/classification/
  • Create src/data/, src/models/, src/utils/
  • Update all import paths
  • Create tests/ directory structure
  • Add init.py files for packages
  • Update documentation with new paths
  • Verify all scripts still run

Target Structure:

src/
├── __init__.py
├── anomaly_detection/
├── classification/
├── data/
├── models/
└── utils/
tests/
├── unit/
└── integration/

Open Issues

Critical Priority:

  1. #25 - Architecture Visualization (Sprint 1)
  2. #26 - Directory Restructuring (Sprint 2)
  3. #23 - Overfitting analysis (Sprint 3)

High Priority: 4. #27 - GitHub Actions CI/CD (Sprint 4) 5. #28 - Flower Federated Learning (Sprint 5)

Medium Priority: 6. #29 - Documentation Overhaul (Sprint 6) 7. #22 - FL Research (Sprint 5-6)

Low Priority: 8. #16 - Test/train overlap (Sprint 3) 9. #17 - Modernization tracking (ongoing)


Quick Start for Next Session

# 1. Pull latest main
git checkout main
git pull origin main

# 2. Create bleeding-edge modern environment (Python 3.12)
mamba env create -f environment-modern.yaml  # Fast with mamba
# OR
conda env create -f environment-modern.yaml  # Slower with conda

conda activate botnet-modern

# 3. Verify installation
python --version  # Should be 3.12.x
python -c "import tensorflow as tf; print(f'TensorFlow: {tf.__version__}')"
python -c "import flwr; print(f'Flower: {flwr.__version__}')"

# 4. Generate architecture diagrams (First task!)
cd docs/architecture/diagrams
python generate_diagrams.py  # Generates both current and target

# 5. Download all IoT devices (later, takes 1-2 hours)
cd ../../../scripts
python download_data.py

# 6. Run overfitting analysis (Sprint 3)
cd ../analysis
python overfitting_analysis.py

New Bleeding-Edge Stack (see docs/BLEEDING_EDGE_STACK.md):

  • Python 3.12 (~25% faster than 3.8!)
  • TensorFlow 2.18 (latest stable)
  • Keras 3.6 (framework-agnostic)
  • NumPy 2.1 (major version)
  • Pandas 2.2 (modern, no .append())
  • Flower 1.13 (best FL framework)
  • SHAP 0.46 (modern explainability)

Key Files

  • analysis/overfitting_analysis.py - Tests 1-4 done, need 5-7
  • docs/project-analysis/RETROSPECTIVE.md - Full results
  • docs/project-analysis/DATA_LEAKAGE_IMPACT.md - Context

User Preferences (Important!)

  • No emojis in documentation
  • Prefer .yaml over .yml
  • No signatures in commit messages
  • Use GitHub issues/PRs for tracking
  • develop branch was deleted - main is now default

Technical Notes

Markdown Linter Warning (MD033)

  • VS Code linter complains about <div align="center"> tags in README
  • This is fine - GitHub supports HTML in markdown
  • Created .markdownlint.json to disable this overly-strict rule
  • Badges need <div> for centering

Expected TF 2.15+ Breaking Changes

  • Keras 3.0 has different API
  • Loss functions may have different defaults
  • May need to update model.compile() calls

pandas 2.x Changes

  • .append() completely removed (already fixed)
  • Stricter dtype handling
  • Some indexing behavior changes

Last Updated: 2025-10-26 Current Sprint: Sprint 1 - Foundation & Architecture Next Priority: #25 - Generate architecture diagrams