Contributing to CMM Data

Thank you for your interest in contributing to the CMM Data package!

Development Setup

Clone the repository

git clone https://github.com/PNNL-CMM/cmm-data.git
cd cmm-data

Create a virtual environment

python3 -m venv .venv
source .venv/bin/activate

Install in development mode

pip install -e ".[full]"
pip install pytest pytest-cov ruff pre-commit

Install pre-commit hooks
```
pre-commit install
```

Configure data path

export CMM_DATA_PATH=/path/to/Globus_Sharing

Pre-commit Hooks

We use pre-commit to run code quality checks before each commit.

Installed hooks:

ruff: Linting and formatting
mypy: Type checking
bandit: Security checks
codespell: Spell checking
pydocstyle: Docstring style
rstcheck: RST documentation syntax
Various file checks (YAML, TOML, trailing whitespace, etc.)

Usage:

# Run all hooks on all files
pre-commit run --all-files

# Run specific hook
pre-commit run ruff --all-files

# Update hooks to latest versions
pre-commit autoupdate

# Skip hooks temporarily (not recommended)
git commit --no-verify -m "message"

Code Style

We use ruff for linting and formatting:

# Check code
ruff check src/

# Fix auto-fixable issues
ruff check --fix src/

# Format code
ruff format src/

# Check formatting without changing
ruff format --check src/

Style guidelines:

Line length: 100 characters
Quote style: double quotes
Import sorting: isort-compatible (handled by ruff)
Docstrings: Google style

Testing

# Run tests
pytest tests/ -v

# With coverage
pytest tests/ -v --cov=cmm_data

Adding a New Loader

Create a new file in src/cmm_data/loaders/
Inherit from BaseLoader
Implement required methods:
- load(**kwargs) -> pd.DataFrame
- list_available() -> List[str]
Add to __init__.py exports
Update catalog.py with dataset info
Add tests and documentation

Example:

from .base import BaseLoader

class NewDatasetLoader(BaseLoader):
    dataset_name = "new_dataset"

    def list_available(self):
        # Return list of available items
        pass

    def load(self, **kwargs):
        # Load and return DataFrame
        pass

Pull Request Process

Fork the repository
Create a feature branch: git checkout -b feature/my-feature
Make your changes
Run tests: pytest tests/
Run linter: ruff check src/
Commit with clear message: git commit -m "Add feature X"
Push: git push origin feature/my-feature
Open a Pull Request

Commit Messages

Use clear, descriptive commit messages:

Add: new loader for XYZ dataset
Fix: handle missing values in USGS data
Update: improve documentation for visualizations
Refactor: simplify caching logic

Reporting Issues

Use the issue templates
Include Python version and cmm_data version
Provide minimal reproducible example
Include full error traceback

Questions?

Contact the CMM team at PNNL.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contributing to CMM Data

Development Setup

Pre-commit Hooks

Code Style

Testing

Adding a New Loader

Pull Request Process

Commit Messages

Reporting Issues

Questions?

FilesExpand file tree

CONTRIBUTING.md

Latest commit

History

CONTRIBUTING.md

File metadata and controls

Contributing to CMM Data

Development Setup

Pre-commit Hooks

Code Style

Testing

Adding a New Loader

Pull Request Process

Commit Messages

Reporting Issues

Questions?