Thank you for your interest in contributing to the CMM Data package!
-
Clone the repository
git clone https://github.com/PNNL-CMM/cmm-data.git cd cmm-data -
Create a virtual environment
python3 -m venv .venv source .venv/bin/activate -
Install in development mode
pip install -e ".[full]" pip install pytest pytest-cov ruff pre-commit -
Install pre-commit hooks
pre-commit install
-
Configure data path
export CMM_DATA_PATH=/path/to/Globus_Sharing
We use pre-commit to run code quality checks before each commit.
Installed hooks:
- ruff: Linting and formatting
- mypy: Type checking
- bandit: Security checks
- codespell: Spell checking
- pydocstyle: Docstring style
- rstcheck: RST documentation syntax
- Various file checks (YAML, TOML, trailing whitespace, etc.)
Usage:
# Run all hooks on all files
pre-commit run --all-files
# Run specific hook
pre-commit run ruff --all-files
# Update hooks to latest versions
pre-commit autoupdate
# Skip hooks temporarily (not recommended)
git commit --no-verify -m "message"We use ruff for linting and formatting:
# Check code
ruff check src/
# Fix auto-fixable issues
ruff check --fix src/
# Format code
ruff format src/
# Check formatting without changing
ruff format --check src/Style guidelines:
- Line length: 100 characters
- Quote style: double quotes
- Import sorting: isort-compatible (handled by ruff)
- Docstrings: Google style
# Run tests
pytest tests/ -v
# With coverage
pytest tests/ -v --cov=cmm_data- Create a new file in
src/cmm_data/loaders/ - Inherit from
BaseLoader - Implement required methods:
load(**kwargs) -> pd.DataFramelist_available() -> List[str]
- Add to
__init__.pyexports - Update
catalog.pywith dataset info - Add tests and documentation
Example:
from .base import BaseLoader
class NewDatasetLoader(BaseLoader):
dataset_name = "new_dataset"
def list_available(self):
# Return list of available items
pass
def load(self, **kwargs):
# Load and return DataFrame
pass- Fork the repository
- Create a feature branch:
git checkout -b feature/my-feature - Make your changes
- Run tests:
pytest tests/ - Run linter:
ruff check src/ - Commit with clear message:
git commit -m "Add feature X" - Push:
git push origin feature/my-feature - Open a Pull Request
Use clear, descriptive commit messages:
Add: new loader for XYZ datasetFix: handle missing values in USGS dataUpdate: improve documentation for visualizationsRefactor: simplify caching logic
- Use the issue templates
- Include Python version and cmm_data version
- Provide minimal reproducible example
- Include full error traceback
Contact the CMM team at PNNL.