This document provides information for developers interested in contributing to or extending the PaleoNet dinosaur classification project.
-
Clone the Repository
git clone https://github.com/yourusername/PaleoNet.git cd PaleoNet
-
Create a Virtual Environment
# Using venv python -m venv venv # Activate on Windows .\venv\Scripts\activate # Activate on macOS/Linux source venv/bin/activate
-
Install Dependencies
pip install -r requirements.txt
PaleoNet/
├── PaleoNet.py # Main Streamlit application
├── utils.py # Utility functions
├── pages/ # Additional app pages
│ ├── 01_Model_Info.py
│ ├── 02_Dinosaur_Encyclopedia.py
│ └── 03_Model_Performance.py
├── assets/ # Images and static assets
│ ├── banner.png
│ └── logo.png
├── data/ # Dataset directory
│ └── dinosaur_dataset_split/
│ ├── train/ # Training data (70%)
│ ├── val/ # Validation data (15%)
│ └── test/ # Test data (15%)
├── docs/ # Documentation
│ ├── development.md
│ ├── model_info.md
│ └── user_guide.md
├── model/ # Saved model files
│ ├── best_model_checkpoint.h5
│ ├── confusion_matrix.csv
│ ├── confusion_matrix.png
│ ├── dinosaur_classifier_transfer_learning.keras
│ ├── dinosaur_class_mapping.json
│ ├── dinosaur_model_performance.json
│ ├── training_history.csv
│ ├── training_history.png
│ └── training_history_detailed.json
├── opdracht_CNN_stijnen_simon.ipynb # Model training notebook
├── requirements.txt # Project dependencies
├── README.md # Main documentation
└── LICENSE # MIT License
The Jupyter notebook contains the complete workflow for:
- Loading and preprocessing the dataset
- Building the EfficientNetB0-based model
- Training with transfer learning
- Evaluating on test data
- Saving model artifacts
To retrain the model with different parameters or architectures, modify this notebook.
The main application file:
- Loads the trained model
- Provides the user interface with a tabbed navigation system
- Handles image upload and processing
- Displays classification results
- Contains three main tabs: Home, Upload Image, and Sample Gallery
Additional application pages:
01_Model_Info.py: Displays model architecture and performance02_Dinosaur_Encyclopedia.py: Information about dinosaur species03_Model_Performance.py: Displays model performance metrics and visualizations
Contains helper functions for:
- Image preprocessing
- Visualization
- Model interpretation
-
Create a feature branch
git checkout -b feature/your-feature-name
-
Implement your changes
- Update application code
- Add tests for your feature (if applicable)
- Update documentation in
docs/
-
Run tests locally
# If using pytest pytest tests/
-
Create a pull request
- Provide a clear description of your changes
- Reference any related issues
To improve or change the classification model:
- Open the training notebook
opdracht_CNN_stijnen_simon.ipynb - Modify the model architecture, training parameters, or data augmentation
- Retrain the model
- Evaluate performance
- Export the model artifacts:
dinosaur_classifier_transfer_learning.kerasdinosaur_class_mapping.jsondinosaur_model_performance.json
- Place the new model artifacts in the
model/directory
To add a new page to the Streamlit application:
- Create a new Python file in
pages/(the filename should start with a number to control ordering) - Import needed modules, especially
streamlit as st - Set page configuration at the top
- Implement the page content
- Update documentation to reference your new page
Example:
# pages/03_Your_New_Page.py
import streamlit as st
st.set_page_config(
page_title="PaleoNet - Your New Page",
page_icon="🦖",
layout="wide"
)
st.title("Your New Page Title")
st.markdown("## Your content here")
# Rest of your page implementationThe application uses robust path handling to ensure portability across different operating systems and environments. When working with file paths:
-
Always use OS-independent path construction
import os from pathlib import Path # Get the current file's directory current_dir = Path(__file__).parent # Navigate to parent directory root_dir = current_dir.parent # Create path to a file file_path = os.path.join(root_dir, "model", "model_file.keras")
-
Avoid hardcoded relative paths
- Don't use:
"../model/file.json" - Instead use:
os.path.join(root_dir, "model", "file.json")
- Don't use:
-
Add error handling for file operations
try: with open(file_path, "r") as f: data = json.load(f) except FileNotFoundError: st.error(f"Could not find file: {file_path}") # Provide fallback behavior
When adding features or making changes:
- Update relevant documentation in
docs/ - Add inline comments for complex code sections
- Update README.md if needed
- Include example usage where appropriate
To deploy the application:
-
Ensure all dependencies are in
requirements.txt -
For Streamlit Cloud:
- Push to GitHub
- Connect repository to Streamlit Cloud
- Configure settings as needed
-
For self-hosting:
- Install dependencies
- Run with
streamlit run PaleoNet.py - Consider using Docker for containerization
If you need assistance with development:
- Check existing documentation
- Look for similar issues in the issue tracker
- Contact the maintainers
- Create a new issue with a clear description of your problem