Thank you for your interest in contributing to OSH Datasets. This guide covers how to set up your development environment and submit changes.
- Fork and clone the repository:
git clone https://github.com/your-username/OSH_Datasets.git
cd OSH_Datasets- Create a virtual environment and install dependencies:
uv venv
uv pip install -e ".[dev]"- Copy the environment template and add your API keys:
cp .env.example .env
# Edit .env with your credentials- Create a branch for your changes:
git checkout -b feature/your-feature-name-
Make your changes following the code standards below.
-
Run the full check suite before committing:
uv run ruff check src/ tests/
uv run ruff format src/ tests/
uv run mypy src/
uv run pytest tests/ -v- Submit a pull request with a clear description of your changes.
- Python 3.11+ with full type hints on all function signatures
- Ruff for linting and formatting (88-character line limit)
- mypy for static type checking (no unresolved errors)
- pytest for testing with mocked external dependencies
- orjson for JSON serialization (not stdlib
json) - Polars for dataframe operations (not pandas)
- Docstrings on all public functions, classes, and methods
- No hardcoded API keys or secrets -- use
.envandconfig.require_env()
- Create
src/osh_datasets/scrapers/your_source.py - Subclass
BaseScraperand setsource_name - Implement the
scrape()method returning aPathto the output JSON - Register the class in
src/osh_datasets/scrapers/__init__.py(ALL_SCRAPERS) - Add mocked tests in
tests/test_scrapers.py
from pathlib import Path
from osh_datasets.scrapers.base import BaseScraper
class YourScraper(BaseScraper):
source_name = "your_source"
def scrape(self) -> Path:
# Fetch data, write JSON to self.output_dir
out = self.output_dir / "your_source_data.json"
out.write_bytes(orjson.dumps(results, option=orjson.OPT_INDENT_2))
return out- Create
src/osh_datasets/loaders/your_source.py - Subclass
BaseLoaderand setsource_name - Implement the
load(db_path)method returning a record count - Register the class in
src/osh_datasets/loaders/__init__.py(ALL_LOADERS) - Add tests in
tests/test_loaders.py
Open an issue on GitHub with:
- A clear description of the problem or feature request
- Steps to reproduce (for bugs)
- Expected vs. actual behavior
- Your Python version and OS
By contributing, you agree that your contributions will be licensed under the MIT License.