Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
205 commits
Select commit Hold shift + click to select a range
8bfacc6
Format AF data and demo weighted average
DAWells Jul 18, 2022
01d3db0
Removed formatting and validation
DAWells Jul 18, 2022
4559d49
Demo to compare allele freq of countries
DAWells Jul 19, 2022
d2dcc10
Calculate AF weighted average: combineAF()
DAWells Jul 19, 2022
e7782e2
Calculate population coverage using HWE
DAWells Jul 19, 2022
49c941e
makeURL from parts, specify query
DAWells Jul 19, 2022
9062427
Add .gitignore
DAWells Jul 20, 2022
abbbd34
Demo population coverage calculation
DAWells Jul 20, 2022
0564447
Add missing alleles as a function
DAWells Jul 20, 2022
e6a57d7
Demo averaging allele frequencies across countries
DAWells Jul 20, 2022
9fc7469
Check for unequal sample sizes after combineAF
DAWells Jul 20, 2022
8f3f613
View supertype population coverage
DAWells Aug 19, 2022
80d02cc
Add colour to country data
DAWells Aug 19, 2022
a807b3c
formatAF & unmeasured_alleles handled in combineAF
DAWells Aug 22, 2022
63d210f
Prepare allele frequencies for dimension reduction
DAWells Aug 22, 2022
1343dfd
Add loci to makeURL and error handling to Npages
DAWells Aug 30, 2022
0f68fc6
Download all country HLA-A data & plot by region
DAWells Aug 30, 2022
9f02aef
Change plot style
DAWells Sep 1, 2022
e4bbb7f
Get started with scipy dirichlet
DAWells Sep 1, 2022
307646b
Visualise dirichlet and updates for k=3
DAWells Sep 2, 2022
aa00cda
Calculate average allele_freq from Dirichlet distribution
DAWells Sep 2, 2022
c0f918b
Identify incomplete, or overfilled studies
DAWells Sep 2, 2022
f96861e
Check study allele freq sum and allele resolution
DAWells Sep 5, 2022
17906c5
Collapse alleles after reducing resolution
DAWells Sep 6, 2022
4501ecc
Add notes about why to save downloaded AF
DAWells Sep 8, 2022
d2d06cc
Explicitly sort grouped dataframe when combining
DAWells Sep 8, 2022
770314a
Add conda env dependencies
DAWells Sep 8, 2022
8be3b3e
Calculate entropy and plot as choropleth
DAWells Sep 8, 2022
dd72857
Visualise high K dirichlet as beta distributions
DAWells Sep 8, 2022
37580b0
Fail to identify outlier populations
DAWells Sep 30, 2022
8e946a7
Outlier population demo deleted as pdf always 0
DAWells Sep 30, 2022
424d249
Calculate central credibility interval for beta
DAWells Sep 30, 2022
1a44f5d
Calculate and plot credible intervals
DAWells Oct 7, 2022
6d874f2
Replace "population" with datasetID in plotAFprob
DAWells Oct 7, 2022
b4e736d
Allow plotting only a subset of alleles
DAWells Oct 7, 2022
f6f8971
Demo single country allele frequency combination
DAWells Oct 10, 2022
50eab29
demo multicountry average. add custom xlim to plot
DAWells Oct 10, 2022
ca73713
Confidence intervals are misleadingly tight
DAWells Oct 10, 2022
708f706
Writing demos
DAWells Oct 12, 2022
26fe146
scrapeAF env
DAWells Oct 12, 2022
b189886
Correct country name matches
DAWells Oct 13, 2022
875a865
Handle G group when formatting alleles
DAWells Oct 13, 2022
c587822
Only complete in global PCA, rename wav to caf
DAWells Oct 13, 2022
5d857f5
Calculating study prior is correct
DAWells Oct 13, 2022
9c5857f
Add more options to makeURL
DAWells Oct 13, 2022
f0280f2
Update docstrings for all functions
DAWells Oct 13, 2022
bce51e6
Change scrapeAF to HLAfreq
DAWells Oct 13, 2022
8220478
Restructuring to confirm to package guide
DAWells Oct 13, 2022
b9c04d6
Build package
DAWells Oct 13, 2022
819c12d
Change build process to setuptools
DAWells Oct 14, 2022
6f23db3
python -m build with setuptools
DAWells Oct 14, 2022
12e3dd7
Add data loading functions, remove src directory
DAWells Oct 14, 2022
7926abf
import HLAfreq from __init__.py
DAWells Oct 14, 2022
276b366
include data in MANIFEST
DAWells Oct 14, 2022
c9f9ef6
Add MANIFEST to build
DAWells Oct 14, 2022
d73584e
Correctly add data to package that can be used
DAWells Oct 14, 2022
d5f84a4
Create tests
DAWells Oct 17, 2022
f7bbaf8
Add readme to tests
DAWells Oct 17, 2022
1e8e302
Update readme, list section (to be written)
DAWells Oct 17, 2022
3e8e490
Sort examples
DAWells Oct 17, 2022
74c0098
Clarify descriptions in examples
DAWells Oct 17, 2022
ee4680e
Add pdoc3 docs
DAWells Oct 17, 2022
48f0b8d
Add sample_year and sample_size to makeURL
DAWells Oct 17, 2022
3b40231
Change docs to markdown
DAWells Oct 17, 2022
dc11d33
Update readme
DAWells Oct 17, 2022
56007a7
Improve readme
DAWells Oct 17, 2022
2308352
Update build 0.0.1.dev1
DAWells Oct 17, 2022
3bdb5ce
improve hla panel plot
DAWells Oct 18, 2022
20184a8
Add blurb to readme
DAWells Nov 1, 2022
c8f0ad6
edit blurb spacing
DAWells Nov 1, 2022
d0ab8eb
Ignore paper write ups
DAWells Nov 4, 2022
49f1000
Use prior for Venezuela and plot result
DAWells Nov 4, 2022
5b2a81a
Basic use case for paper
DAWells Nov 4, 2022
c493ef6
Rename paper > writeup and update gitignore
DAWells Nov 4, 2022
4b408e4
Improve examples
DAWells Dec 8, 2022
f0f7e2b
Add title to plotAFprob
DAWells Dec 8, 2022
9637261
Create LICENSE
DAWells Jan 11, 2023
d73f581
Calculate confidence intervals on simulated data
DAWells Jan 18, 2023
1149d37
Use more realistic simulation parameters
DAWells Jan 19, 2023
e547649
Use pymc to estimate HDI for empirical data
DAWells Jan 19, 2023
7b2953c
confidence interval plot based on pymc.
DAWells Jan 24, 2023
23ca871
Update examples with new plotAF()
DAWells Jan 24, 2023
c1c1caf
Add posterior mean to AFhdi() and format as dataframe
DAWells Jan 25, 2023
1e0ff53
Update read me with CI info
DAWells Jan 25, 2023
738a67d
Allow prior specification for pymc models and plots
DAWells Jan 25, 2023
12aad0f
Update docs and remove docs/ from .gitignore
DAWells Jan 25, 2023
33c2df7
Update paper code and add label to plotAF()
DAWells Jan 25, 2023
efc070f
Fix global PCA plot overlapping text
DAWells Jan 25, 2023
9140645
add CI to basic use plot
DAWells Jan 25, 2023
7806b62
0.0.1dev2 release Improve release workflow
DAWells Feb 9, 2023
1fa707d
Add more dev instructions to readme
DAWells Feb 9, 2023
2df29e1
0.0.1dev3 add pymc to required
DAWells Feb 9, 2023
e20139a
Compare default and compound model estimates
DAWells Feb 23, 2023
cbf1146
AFplot takes hdi rather than calculating CI
DAWells Feb 23, 2023
f682316
Improved examples on CI and priors
DAWells Mar 6, 2023
936db02
Update checks and plots
DAWells Mar 7, 2023
8b19902
Add hdi plot to basic use case
DAWells Mar 7, 2023
27b9875
Plot cumulative frequency of IEDB
DAWells Mar 7, 2023
fd08740
Explore high dimensions issue
DAWells Mar 7, 2023
b2ed199
Update plot and access the underlying model
DAWells Mar 7, 2023
2c6f2e7
Update single country example plot and description
DAWells Mar 7, 2023
c8b22f1
Multicountry compound model weighted by population size
DAWells Mar 8, 2023
bf2925a
Reorder cells
DAWells Mar 8, 2023
df49de6
Plot compound af vs default af
DAWells Mar 8, 2023
743a081
Filter Guinea from the global PCA
DAWells Mar 8, 2023
6b29ddb
Move data checks from hdiAF to _make_c_array
DAWells Mar 8, 2023
ba7503d
Improve function doc strings: plotAF, compare_estimates
DAWells Mar 8, 2023
80cc2a2
Update to version 0.0.1.dev4
DAWells Mar 8, 2023
1e2a163
Correct spelling errors
DAWells Mar 10, 2023
e5c2c43
Format src/ and tests/ with black
DAWells Jun 2, 2023
902ff6f
require arviz and pymc>=3, add nox file
DAWells Jun 28, 2023
80312df
Improve installation guide with conda
DAWells Jul 4, 2023
3fe73b8
Update install readme
DAWells Jul 5, 2023
b42df9b
Simplify HLAfreq import in baseic use case for paper
DAWells Jul 5, 2023
6ae2e6b
Improve install guide
DAWells Jul 12, 2023
8d2fe5b
list OS
DAWells Jul 17, 2023
4595f95
Explicity country argument
DAWells Jul 17, 2023
cd4aaec
Ask people to open issues if they can't install
DAWells Aug 29, 2023
57c736f
Add biorxiv citation
DAWells Sep 25, 2023
a569cc5
Add instructions for debugging package versions
DAWells Oct 27, 2023
84548da
HLA supertype data from Sidney et al 2008
DAWells Oct 27, 2023
2d54e9e
Countries and regions as defined by AFND
DAWells Oct 27, 2023
37c7cb4
More specific conda install instructions
DAWells Dec 20, 2023
d91f51b
Only add missing rows if there are some
DAWells Dec 20, 2023
97543ee
Update version number
DAWells Dec 20, 2023
cc29826
Don't track virtual environment
DAWells Jul 2, 2024
40f2ce2
Generate html docs with pdoc
DAWells Jul 2, 2024
627af1d
Use pdoc's github action template
DAWells Jul 2, 2024
ae85e5d
Delete docs markdown because it's generated as an html file now.
DAWells Jul 2, 2024
8efc67b
Add detail about docs hosted on a github page and how to regenerate it
DAWells Jul 2, 2024
d1af3c2
Create python-package-conda.yml
DAWells Jul 3, 2024
d8a2a16
Github action to test on multiple python versions without conda
DAWells Jul 3, 2024
a888ba9
Install HLAfreq before testing
DAWells Jul 3, 2024
db27fe0
Remove `build/` and `dist/` from version control
DAWells Jul 3, 2024
69e1b42
Reformat doc strings to fit line length and spacing
DAWells Jul 3, 2024
3c4c42d
github action test guide to readme
DAWells Jul 3, 2024
e2a93e4
.gitignore build/ and dist/
DAWells Jul 3, 2024
d8c89ac
Update version number 0.0.3.dev1
DAWells Jul 3, 2024
4dd4135
flake8 linting
DAWells Jul 3, 2024
ac95423
github doc page only runs on pushes to main,
DAWells Jul 3, 2024
0671cc4
Merge pull request #3 from Vaccitech/docs
DAWells Jul 3, 2024
f98fbf9
Detail submodules in readme
DAWells Jul 3, 2024
2c629c5
update nox and github action testing to 3.10-3.12 as recommended http…
DAWells Jul 3, 2024
210b0ea
Test AFhdi runs without complaint
DAWells Jul 3, 2024
d92b9f4
Simplify naming of scripts to reproduce paper results
DAWells Jul 3, 2024
3c266a3
Add bandit to nox file
DAWells Jul 4, 2024
48e3fbd
Make code complient with bandit
DAWells Jul 4, 2024
f113d81
Set minimum python version to 3.10
DAWells Jul 4, 2024
6044a42
Fix spacing in paper example code
DAWells Jul 4, 2024
b02385d
Report pytest coverage
DAWells Aug 7, 2024
8f37ae5
Host example jupyter notebooks on github pages
DAWells Aug 7, 2024
7f0d20c
Merge pull request #5 from Vaccitech/html_examples
DAWells Aug 7, 2024
e709fe4
Correct url links for jupyter notebook htmls
DAWells Aug 7, 2024
2b7df1e
Update vaccitech references to barinthusbio
DAWells Aug 22, 2024
c0146ee
Update detailed examples link in readme to examples module
DAWells Aug 22, 2024
a8adab4
Pytest coverage
DAWells Aug 27, 2024
8151d19
Specify minimum dependancy versions
DAWells Aug 27, 2024
7868e97
gitignore egg info
DAWells Aug 27, 2024
2c84b7c
Tests for simulated data, including failure tests
DAWells Aug 28, 2024
7ab5b63
Black formatting, mostly assertion error message wraps
DAWells Aug 28, 2024
16bd045
Black formatting of tests
DAWells Aug 28, 2024
6dabdec
GHA python linting and testing for push/pull_request on main and dev
DAWells Aug 28, 2024
ec1d0bf
Move simulation functions to HLAfreq
DAWells Aug 28, 2024
ec30aa0
Write tests for non-pymc functions of HLAfreq_pymc
DAWells Aug 28, 2024
829fc5e
Merge pull request #9 from BarinthusBio/HLAF-8-better_tests into dev
DAWells Aug 28, 2024
87423ca
Remove data loaders as they are not working and not needed
DAWells Aug 29, 2024
103d60b
Merge pull request #10 from BarinthusBio/HLAF-15-data-loaders
DAWells Aug 29, 2024
802596e
Add documentation link to setup.py
DAWells Aug 29, 2024
04ac0ec
Update version to 0.0.4
DAWells Aug 29, 2024
7753d57
Update install and troubleshooting guides, add links to specific func…
DAWells Sep 11, 2024
a074eba
Add `example/quickstart` and link in readme.
DAWells Oct 11, 2024
c3f17ac
Describe multiprocessing issue in troubleshooting of readme
DAWells Oct 11, 2024
bff71c4
Add long timeout for quickstart.py
DAWells Oct 14, 2024
6133e91
Simply getAFdata timeout message
DAWells Oct 14, 2024
a379ce3
update version to 0.0.5
DAWells Oct 14, 2024
66dc506
Merge pull request #12 from BarinthusBio/relsease-0.0.5
DAWells Oct 14, 2024
484f3cd
Clarify dosctring for `only_complete()`
DAWells Jan 7, 2025
112687b
Calculate coverage of just HLA-A
DAWells Feb 10, 2025
b57b8bd
Add contribution and community guidlines
DAWells Sep 15, 2025
53f878a
Add python 3.13 and 3.14 to tests
DAWells Jan 11, 2026
58ccec8
Update install instructions
DAWells Jan 11, 2026
c422602
Merge pull request #15 from BarinthusBio/dev
DAWells Jan 11, 2026
395cec4
git push fix contribution linnk
DAWells Mar 8, 2026
f753c9b
Remove reference to epitope_aligner
DAWells Mar 8, 2026
1dd3eb1
add asserts to `test_correct_c_array` and `test_correct_c_array_alleles`
DAWells Mar 17, 2026
3357b84
check data type using is_string_dtype to handle pandas 3 update
DAWells Mar 18, 2026
60e1c0f
use https for url links
DAWells Mar 19, 2026
d984e47
avoid depreciation warning by setting include_groups for .apply on gr…
DAWells Mar 19, 2026
9ed2227
update to version 0.0.6
DAWells Mar 19, 2026
2debf53
Update to 0.0.6
DAWells Mar 19, 2026
7d34504
Fix `test_correct_c_array` test
DAWells Apr 6, 2026
3124b9c
Create dependabot.yml
DAWells Apr 6, 2026
3e1a9b8
fix dependabot cool down param
DAWells Apr 6, 2026
360c648
Merge pull request #18 from BarinthusBio/dev
DAWells Apr 6, 2026
5dbb9f0
add github actions to dependabot
DAWells Apr 9, 2026
0b3200e
Bump actions/checkout from 4 to 6
dependabot[bot] Apr 9, 2026
807f709
Bump actions/deploy-pages from 4 to 5
dependabot[bot] Apr 9, 2026
7a790ce
Bump actions/upload-pages-artifact from 3 to 4
dependabot[bot] Apr 9, 2026
4c91a02
Merge pull request #19 from BarinthusBio/dependabot/github_actions/ac…
DAWells Apr 9, 2026
1e20d92
Merge pull request #22 from BarinthusBio/dependabot/github_actions/ac…
DAWells Apr 9, 2026
9c0513e
Bump actions/setup-python from 3 to 6
dependabot[bot] Apr 9, 2026
72d7489
Merge pull request #20 from BarinthusBio/dependabot/github_actions/ac…
DAWells Apr 9, 2026
1034fe6
Merge pull request #21 from BarinthusBio/dependabot/github_actions/ac…
DAWells Apr 9, 2026
c381462
Bump actions/upload-pages-artifact from 4 to 5
dependabot[bot] Apr 28, 2026
34f1619
Merge pull request #23 from BarinthusBio/dependabot/github_actions/ac…
DAWells May 2, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# To get started with Dependabot version updates, you'll need to specify which
# package ecosystems to update and where the package manifests are located.
# Please see the documentation for all configuration options:
# https://docs.github.com/code-security/dependabot/dependabot-version-updates/configuration-options-for-the-dependabot.yml-file

version: 2
updates:
- package-ecosystem: "pip" # See documentation for possible values
directory: "/" # Location of package manifests
schedule:
interval: "weekly"
cooldown:
default-days: 14
- package-ecosystem: "github-actions" # See documentation for possible values
directory: "/" # Location of package manifests
schedule:
interval: "weekly"
cooldown:
default-days: 14
52 changes: 52 additions & 0 deletions .github/workflows/docs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
name: website

# build the documentation whenever there are new commits on main
on:
push:
branches:
- main
# Alternative: only build for tags.
# tags:
# - '*'

# security: restrict permissions for CI jobs.
permissions:
contents: read

jobs:
# Build the documentation and upload the static HTML files as an artifact.
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: actions/setup-python@v6
with:
python-version: '3.12'

# ADJUST THIS: install all dependencies (including pdoc)
- run: pip install -e .
- run: pip install pdoc
- run: pip install jupyter
# ADJUST THIS: build your documentation into docs/.
# We use a custom build script for pdoc itself, ideally you just run `pdoc -o docs/ ...` here.
# - run: python docs/make.py
- run: pdoc -d google -o docs/ HLAfreq
- run: jupyter nbconvert --to html --output-dir docs/HLAfreq/examples examples/*.ipynb
- uses: actions/upload-pages-artifact@v5
with:
path: docs/

# Deploy the artifact to GitHub pages.
# This is a separate job so that only actions/deploy-pages has the necessary permissions.
deploy:
needs: build
runs-on: ubuntu-latest
permissions:
pages: write
id-token: write
environment:
name: github-pages
url: ${{ steps.deployment.outputs.page_url }}
steps:
- id: deployment
uses: actions/deploy-pages@v5
43 changes: 43 additions & 0 deletions .github/workflows/python-package.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# This workflow will install Python dependencies, run tests and lint with a variety of Python versions
# For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python

name: Python package

on:
push:
branches: [ "main", "dev" ]
pull_request:
branches: [ "main", "dev" ]

jobs:
build:

runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python-version: ["3.10", "3.11", "3.12", "3.13", "3.14"]

steps:
- uses: actions/checkout@v6
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v6
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
python -m pip install flake8 pytest pytest-cov
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
- name: Install HLAfreq
run: |
pip install .
- name: Lint with flake8
run: |
# stop the build if there are Python syntax errors or undefined names
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
- name: Test with pytest
run: |
pytest --cov=HLAfreq
9 changes: 9 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
data/
writeup/
*__pycache__/
*.code-workspace
envs/.venv
build/
dist/
src/*.egg-info/
.coverage
36 changes: 36 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Community guidelines
Thank you for using `HLAfreq`!
It's great that you'd like to contribute in any form!
Asking questions as issues is an important contribution.
See below for particular information to include. If you see an
issue you can help with, your answer would be greatly appreciated.
When creating or answering issues please be kind!

## Seek support
If you need help using this tool, the first port of call is the
examples and documentation [here](https://barinthusbio.github.io/HLAfreq/HLAfreq.html).
But if the answer is not there, please open an [issue](https://github.com/BarinthusBio/HLAfreq/issues).

## Report problems with the software
If the software isn't working or you think you've found a bug,
please open an [issue](https://github.com/BarinthusBio/HLAfreq/issues). Please include:
- the version of `HLAfreq`
- code to reproduce the error
- any error message/the result
- what you expected/think should have happened instead

## Contribute to the software
If you want to add to `HLAfreq` that's great! Fork the
repo and create a branch named after the change you're making.
Make your contribution to the new branch on your fork and add
tests to verify that your code runs as expected. The tests should
pass when run by nox before submitting a pull request. See this
[guide](https://learn.scientific-python.org/development/guides/tasks/)
on using nox.

When you're happy with your changes and the tests are passing,
submit a pull request. In the pull request describe the motivation
for your changes, their impact, and the tests you have written for
them.

# Thank you for contributing!
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2022 Vaccitech PLC

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
178 changes: 178 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
# HLAfreq

`HLAfreq` allows you to download and combine HLA allele
frequencies from multiple datasets, e.g. combine data from
several studies within a country or combine countries.
Useful for studying regional diversity in immune genes
and, when paired with epitope prediction, estimating a population's
ability to mount an immune response to specific epitopes.

Automated download of allele frequency data download from
[allelefrequencies.net](https://www.allelefrequencies.net/).

Full documentation at [HLAfreq/docs](https://BarinthusBio.github.io/HLAfreq/HLAfreq.html). Source code is available at [BarinthusBio/HLAfreq](https://github.com/BarinthusBio/HLAfreq).

## Details
Estimates are combined by modelling allele frequency as a
Dirichlet distribution which defines the probability of drawing each
allele. When combining studies their estimates are weighted as 2x sample size by
default. Sample size is doubled as each person in the study
contributes two alleles. Alternative weightings can be used,
for example population size when averaging across countries.

When selecting a panel of HLA alleles to represent a population,
allele frequency is not the only thing to consider. Depending on
the purpose of the panel, you should include a range of loci and
supertypes (grouped alleles sharing binding specificies).

## Install
`HLAfreq` is a `python` package available on windows, mac, and linux. We recommend installing
with `conda`.
```
conda create -n hlafreq bioconda::hlafreq
conda activate hlafreq
```

### Troubleshooting
`HLAfreq` uses `pymc` to estimate credible intervals,
which is the source of most installation difficulty, see
[pymc installation guide](https://www.pymc.io/projects/docs/en/stable/installation.html) and [tips and tricks](https://conda-forge.org/docs/user/tipsandtricks/#using-multiple-channels).

You may see an error about g++ and degraded performance:
```
WARNING (pytensor.configdefaults): g++ not detected! PyTensor will be unable to compile C-implementations and will default to Python. Performance may be severely degraded. To remove this warning, set PyTensor flags cxx to an empty string.
```

This means that one of the pymc backends is missing and estimating confidence
intervals will be very slow. But don't worry, try one of these fixes below:

- Set the channel priority to strict, then install as above (using conda-forge then bioconda channels).
```
conda config --set channel_priority strict
```

- Install a conda compiler to handle g++ based on your os.
```
conda create -n hlafreq -c conda-forge -c bioconda hlafreq cxx-compiler
```

When running entire scripts on windows, you may see an error about
"Safe importing of main module", multiprocessing, and starting
new processes. To fix this, main guard your code with
`if __name__ == "__main__":` after the `import`s as demonstrated in
[`examples/quickstart.py`](https://github.com/BarinthusBio/HLAfreq/blob/main/examples/quickstart.py).

If you do run into trouble please open an [issue](https://github.com/BarinthusBio/HLAfreq/issues).

### conda
If you're new to conda see the miniconda [installation guide](https://conda.io/projects/conda/en/stable/user-guide/install/index.html) and [documentation](https://docs.conda.io/projects/conda/en/stable/user-guide/index.html)
to get started with `conda`.

Enter the install command from above into your conda prompt to create and
activate a conda environment with `HLAfreq` installed.
Typing `python` into this activated environment will start
a python session where you can enter your python code such as
the HLAfreq [minimal example](#minimal-example) below.

If you prefer to write your python code as scripts using an IDE such as
PyCharm or VScode, you'll need to look up how to configure a conda
virtual environment with those tools.

### pip
If you don't intend to use credible intervals you can install
with pip: `pip install HLAfreq`.
However, if you do import `HLAfreq_pymc` you may get warnings
about degraded performance.

See the [pip documentation](https://pip.pypa.io/en/stable/)
to get started with pip. If you do have issues with pip,
try installing with conda as described above.

## Minimal example
Download HLA data using `HLAfreq.HLAfreq.makeURL()` and `HLAfreq.HLAfreq.getAFdata()`.
All arguments that can be specified in the webpage form are available,
see the [`makeURL()` docs](https://barinthusbio.github.io/HLAfreq/HLAfreq/HLAfreq.html#makeURL) for details.
```
import HLAfreq
base_url = HLAfreq.makeURL("Uganda", locus="A")
aftab = HLAfreq.getAFdata(base_url)
```

After downloading the data, it must be filtered so that all studies
sum to allele frequency 1 (within tolerence). Then we must ensure
that all studies report alleles at the same resolution.
Finaly we can combine frequency estimates, for more details see
the [`combineAF()` api documentation](https://barinthusbio.github.io/HLAfreq/HLAfreq/HLAfreq.html#combineAF).
```
aftab = HLAfreq.only_complete(aftab)
aftab = HLAfreq.decrease_resolution(aftab, 2)
caf = HLAfreq.combineAF(aftab)
```

To add confidence intervals to estimates see
[`examples/quickstart.py`](https://github.com/BarinthusBio/HLAfreq/blob/main/examples/quickstart.py).

## Detailed examples
For more detailed walkthroughs see [HLAfreq/examples](https://barinthusbio.github.io/HLAfreq/HLAfreq/examples.html).

- [Single country](https://BarinthusBio.github.io/HLAfreq/HLAfreq/examples/single_country.html) download and combine
- [Multi-country](https://BarinthusBio.github.io/HLAfreq/HLAfreq/examples/multi_country.html) download and combine, weight by population coverage
- [Using priors](https://BarinthusBio.github.io/HLAfreq/HLAfreq/examples/working_with_priors.html)
- [Credible intervals](https://BarinthusBio.github.io/HLAfreq/HLAfreq/examples/credible_intervals.html)

## Docs
Full documentation at [HLAfreq/docs](https://BarinthusBio.github.io/HLAfreq/HLAfreq.html).
API documentation for functions are under the submodules on the left.
- `HLAfreq.HLAfreq` documents most functions, specifically download and combine
allele data.
- `HLAfreq.HLAfreq_pymc` is functions using pymc to acurately estimate credible intervals on allele frequency estimates.

For help on specific functions view the docstring, `help(function_name)`.

Run `pdoc -d google -o docs/ HLAfreq` to generate the
documentation in `./docs`.
<!-- Documentation generated by pdoc should not be commited
as it is auto generated by a github action. -->

## Community guidelines
Thank you for using `HLAfreq`! Contributions in any form
are immensely helpful: asking questions, answering issues,
and pull requests are all great.

For full details see [CONTRIBUTING.md](https://github.com/BarinthusBio/HLAfreq/blob/main/CONTRIBUTING.md). In short, if the answer to
your question is not in the docs, open an [issue](https://github.com/BarinthusBio/HLAfreq/issues). If you'd like to improve `HLAfreq` create a fork and a pull request.

<!-- ## Developer notes
Install in dev mode
pip install -e HLAfreq
pip install -e .

Update version in setup.py

Update documentation with: `pdoc -d google -o docs/ HLAfreq`.
Note that github actions will automatically run this when pushed
to `main` branch.

Run tests `pytest`
Or allow nox to do it `nox`. Nox will also run linting.
On push github actions will run linting and pytest

Clear old build info
rm -rf build dist src/*.egg-info

Build with `python -m build`.

twine check dist/*

Upload to test pypi
twine upload --repository testpypi dist/*

Install from test pypi
python3 -m pip install --extra-index-url https://test.pypi.org/simple/ HLAfreq

Upload to pypi
twine upload dist/*
-->

## Citation
Wells, D. A., & McAuley, M. (2023). HLAfreq: Download and combine HLA allele frequency data. bioRxiv, 2023-09. https://doi.org/10.1101/2023.09.15.557761
Empty file removed code/__init__.py
Empty file.
Loading