Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,6 @@ htmlcov/

# Working data (not part of plugin)
data/

# Sphinx build output
docs/_build/
23 changes: 23 additions & 0 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Read the Docs configuration for htan
# https://docs.readthedocs.io/en/stable/config-file/v2.html

version: 2

build:
os: ubuntu-22.04
tools:
python: "3.12"

sphinx:
configuration: docs/conf.py
fail_on_warning: false

formats:
- htmlzip

python:
install:
- method: pip
path: .
extra_requirements:
- docs
Empty file added docs/_static/.gitkeep
Empty file.
6 changes: 6 additions & 0 deletions docs/api/config.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# `htan.config`

```{eval-rst}
.. automodule:: htan.config
:members:
```
7 changes: 7 additions & 0 deletions docs/api/download.gen3.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# `htan.download.gen3`

```{eval-rst}
.. automodule:: htan.download.gen3
:members:
:exclude-members: cli_main, gen3, download_cmd, resolve_cmd
```
7 changes: 7 additions & 0 deletions docs/api/download.synapse.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# `htan.download.synapse`

```{eval-rst}
.. automodule:: htan.download.synapse
:members:
:exclude-members: cli_main, synapse
```
7 changes: 7 additions & 0 deletions docs/api/files.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# `htan.files`

```{eval-rst}
.. automodule:: htan.files
:members:
:exclude-members: cli_main, files, update_cmd, lookup_cmd, stats_cmd
```
18 changes: 18 additions & 0 deletions docs/api/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# API reference

Python API for each `htan` module. Everything the CLI does is also available
as a normal Python import.

```{toctree}
:maxdepth: 1

config
query.portal
query.bq
download.synapse
download.gen3
pubs
model
files
init
```
10 changes: 10 additions & 0 deletions docs/api/init.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# `htan.init`

First-run setup wizard implementation. Most users will invoke this through
the [`htan init`](../cli/index.md) CLI rather than calling it directly.

```{eval-rst}
.. automodule:: htan.init
:members:
:exclude-members: cli_main, init
```
7 changes: 7 additions & 0 deletions docs/api/model.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# `htan.model`

```{eval-rst}
.. automodule:: htan.model
:members:
:exclude-members: cli_main, model, fetch_cmd, components_cmd, attributes_cmd, describe_cmd, valid_values_cmd, search_cmd, required_cmd, deps_cmd
```
7 changes: 7 additions & 0 deletions docs/api/pubs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# `htan.pubs`

```{eval-rst}
.. automodule:: htan.pubs
:members:
:exclude-members: cli_main, pubs, search_cmd, fetch_cmd, fulltext_cmd
```
7 changes: 7 additions & 0 deletions docs/api/query.bq.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# `htan.query.bq`

```{eval-rst}
.. automodule:: htan.query.bq
:members:
:exclude-members: cli_main, bq, query_cmd, sql_cmd, tables_cmd, describe_cmd
```
7 changes: 7 additions & 0 deletions docs/api/query.portal.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# `htan.query.portal`

```{eval-rst}
.. automodule:: htan.query.portal
:members:
:exclude-members: cli_main, portal, files, demographics, diagnosis, cases, specimen, summary, sql_cmd, tables, describe, manifest
```
12 changes: 12 additions & 0 deletions docs/cli/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# CLI reference

The `htan` command is built with [Click](https://click.palletsprojects.com).
Every subcommand below is generated from the Click definition via
[`sphinx-click`](https://sphinx-click.readthedocs.io) — these pages always
match what `htan ... --help` prints in your terminal.

```{eval-rst}
.. click:: htan.cli:cli
:prog: htan
:nested: full
```
68 changes: 68 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
"""Sphinx configuration for the htan documentation."""

from __future__ import annotations

import importlib.metadata

project = "htan"
author = "HTAN DCC"
copyright = "2026, HTAN DCC"

try:
release = importlib.metadata.version("htan")
except importlib.metadata.PackageNotFoundError:
release = "0.0.0"
version = ".".join(release.split(".")[:2])

extensions = [
"sphinx.ext.autodoc",
"sphinx.ext.autosummary",
"sphinx.ext.napoleon",
"sphinx.ext.intersphinx",
"sphinx.ext.viewcode",
"sphinx_click",
"myst_parser",
]

source_suffix = {
".rst": "restructuredtext",
".md": "markdown",
}

# Mock heavy / optional imports so autodoc works on Read the Docs without them.
autodoc_mock_imports = [
"synapseclient",
"gen3",
"google",
"pandas",
"db_dtypes",
"certifi",
]

autodoc_default_options = {
"members": True,
"undoc-members": False,
"show-inheritance": True,
"member-order": "bysource",
}
autosummary_generate = True
napoleon_google_docstring = True
napoleon_numpy_docstring = True

intersphinx_mapping = {
"python": ("https://docs.python.org/3", None),
"click": ("https://click.palletsprojects.com/en/stable/", None),
"pandas": ("https://pandas.pydata.org/docs/", None),
}

html_theme = "furo"
html_title = f"htan {release}"
html_static_path = ["_static"]

# Don't fail the build on the missing _static dir on a fresh checkout.
exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]

myst_enable_extensions = ["colon_fence", "deflist"]

# Sphinx-click introspects Click groups by import path.
sphinx_click_attrs = ["cli"]
59 changes: 59 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# htan

Python tools for accessing Human Tumor Atlas Network (HTAN) data.

The `htan` package provides:

- **A unified `htan` CLI** for the HTAN portal database, BigQuery metadata,
Synapse and CRDC/Gen3 downloads, the data model, and PubMed publication search.
- **A Python library** wrapping the same functionality, suitable for use in
notebooks and pipelines.

```{tip}
New to the project? Start with [Installation](install.md), then
[Quickstart](quickstart.md), then look up specific commands in the [CLI
reference](cli/index.md).
```

## At a glance

```bash
pip install htan
htan init # First-run wizard
htan query portal files --organ Breast --limit 10
htan query bq sql "SELECT COUNT(*) FROM ..."
htan download synapse syn26535909
htan download gen3 download "drs://dg.4DFC/<guid>"
htan pubs search --keyword "spatial transcriptomics"
htan model components
htan files lookup HTA9_1_19512
```

## Data access tiers

HTAN data has multiple access levels. The portal provides a unified query
interface; downloads route through Synapse (open access) or CRDC/Gen3
(controlled access).

| Tier | Source | Auth | Module |
|------|--------|------|--------|
| Portal metadata + file discovery | ClickHouse | Synapse team membership | {mod}`htan.query.portal` |
| Open access (de-identified, processed) | Synapse | PAT | {mod}`htan.download.synapse` |
| Controlled access (raw, protected) | CRDC/Gen3 | dbGaP + Gen3 creds | {mod}`htan.download.gen3` |
| Metadata query | BigQuery (`isb-cgc-bq`) | ADC | {mod}`htan.query.bq` |

```{toctree}
:maxdepth: 2
:caption: User guide

install
quickstart
```

```{toctree}
:maxdepth: 2
:caption: Reference

cli/index
api/index
```
49 changes: 49 additions & 0 deletions docs/install.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Installation

The `htan` package is published on PyPI as [`htan`](https://pypi.org/project/htan/)
and requires Python 3.10 or newer.

## Quick install

```bash
pip install htan
```

This installs the CLI and library along with all default dependencies
(Synapse client, Gen3 SDK, Google BigQuery client, pandas).

## With `uv` (recommended for development)

```bash
uv pip install htan # in an active venv
uv pip install -e ".[dev,docs]" # editable, with test + docs deps
```

## First-run setup

After installing, run the interactive wizard:

```bash
htan init
```

This walks through credential setup for each backend (Synapse, portal
ClickHouse, BigQuery, CRDC/Gen3). You can rerun it at any point with
`htan init --force` or check the current state with `htan init --status`.

Credentials live in the conventional locations:

| Service | Location |
|---------|----------|
| Portal ClickHouse | OS keychain or `~/.config/htan-skill/portal.json` |
| Synapse | `SYNAPSE_AUTH_TOKEN` env var or `~/.synapseConfig` |
| BigQuery | `gcloud auth application-default login` (or service account JSON) |
| CRDC/Gen3 | `~/.gen3/credentials.json` (download from CRDC after dbGaP auth) |

## Verifying the install

```bash
htan --version
htan config check
htan query portal tables # requires portal credentials
```
77 changes: 77 additions & 0 deletions docs/quickstart.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# Quickstart

This page walks through a complete end-to-end workflow: discover files via the
portal, look up download coordinates, then fetch a file from Synapse.

## 1. Configure credentials

```bash
htan init
htan config check
```

## 2. Find files of interest

The HTAN portal database is the most direct entry point. Filter by organ,
assay, atlas, or any other column on the `files` table.

```bash
htan query portal tables # Show all tables
htan query portal describe files # Schema for files
htan query portal files \
--organ Breast \
--assay "scRNA-seq" \
--level "Level 1" \
--output json \
--limit 5
```

For ad-hoc analytical queries, use `sql`:

```bash
htan query portal sql \
"SELECT atlas_name, COUNT(*) AS n FROM files GROUP BY atlas_name ORDER BY n DESC"
```

## 3. Generate a download manifest

```bash
htan query portal manifest HTA9_1_19512 HTA9_1_19553 --output-dir ./manifests
```

This writes `synapse_manifest.tsv` and/or `gen3_manifest.json` depending on
which platform each file lives on.

## 4. Download

For open-access files (Synapse):

```bash
htan download synapse syn26535909 --output-dir ./data
```

For controlled-access files (CRDC/Gen3):

```bash
htan download gen3 download "drs://dg.4DFC/<guid>" \
--credentials ~/.gen3/credentials.json \
--output-dir ./data
```

## 5. Use the library directly

Everything the CLI does is also exposed as Python:

```python
from htan.query.portal import PortalClient

client = PortalClient()
files = client.find_files(organ="Breast", assay="scRNA-seq", limit=10)
for row in files:
print(row["DataFileID"], row["Filename"])
```

## See also

- [CLI reference](cli/index.md) — the full command tree, generated from Click.
- [API reference](api/index.md) — module-by-module Python API.
Loading