Tesseract Core is an open-source project and, as such, we welcome contributions from developers, engineers, scientists, and end-users in general. Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
Ensure your contributions adhere to the Code of Conduct.
Constructive feedback is very welcome. We are interested in hearing from you!
In the case things aren't working as expected, or the documentation is lacking, please file a bug report.
In the case you want to suggest a new feature, please file a new feature request. In particular, we recommend you open an issue before contributing code in a pull request. This allows all parties to talk things over before jumping into action, and increase the likelihood of pull requests getting merged.
In case you have general questions or feedback, need support from the community, or have a cool demo to share, start a thread in our Discourse Forum. We use GitHub Issues for bug reports and feature requests only.
Tesseract documentation is kept under the docs/ directory of the repository,
written in Markdown and using Sphinx to generate the final HTMLs. Fixes and
enhancements to the documentation should be submitted as pull requests, we
treat the same as code contributions.
To build the documentation locally, install the documentation dependencies in
addition to the project itself, then run make html:
$ . venv/bin/activate
$ pip install -e .[dev]
$ cd docs
$ make htmlThe resulting HTMLs are in docs/build/html/.
Contributions in the form of tutorials, examples, demos, blog posts (including those posted elsewhere already) are best highlighted and celebrated in the Discourse Forum.
Tesseract is developed under the Apache 2.0 license. By contributing to the Tesseract project you agree that your code contributions are governed by this license. We require you to sign our Contributor License Agreement to state so.
Make sure you have Docker installed
on your machine and you can run docker commands via your user. After that,
clone the repository, install the dependencies, and setup pre-commit hooks:
$ git clone git@github.com:pasteurlabs/tesseract-core.git
$ cd tesseract-core
$ python -m venv venv
$ . venv/bin/activate
$ pip install -e .[dev]
$ pre-commit installThis project uses the pytest framework for all tests. New code should be covered by new or existing tests.
To run the tests simply run pytest in the root of the project. This will run
the entire test suite, including the end-to-end tests that take quite a while
to finish. Instead, you can run the tests separately:
$ pytest --skip-endtoend
$ pytest --always-run-endtoend tests/endtoend_testsWe follow these principles when writing tests:
-
Prefer end-to-end tests over unit tests — Tests that exercise real Tesseract builds and invocations catch more bugs than isolated unit tests. When in doubt, write an end-to-end test.
-
Avoid mocks where feasible — Mocks can hide real integration issues. If a test requires complex mocking to work, consider whether an end-to-end test would be more valuable.
-
Don't test implementation details — Tests should verify behavior, not internal structure. If refactoring breaks your test but not the actual functionality, the test was too tightly coupled.
-
Be mindful of slow tests — End-to-end tests that build Tesseracts are slow. Before adding a new one, check if an existing test can be extended, or if a faster unit test would suffice for your specific case.
We use pre-commit to run linters and formatters
automatically before each commit. The hooks are configured in
.pre-commit-config.yaml and include:
- Ruff — Fast Python linter and formatter (replaces flake8, isort, black)
- Various file checks — Trailing whitespace, YAML validation, etc.
To run all pre-commit hooks manually on all files:
$ pre-commit run --all-filesTo run a specific hook:
$ pre-commit run ruff --all-filesPre-commit also handles automatic dependency updates via Dependabot-style version bumps. These are configured to run periodically in CI.
This project uses Git for version control and follows a GitHub workflow. To contribute follow these steps:
- Fork the project via the GitHub UI.
- Clone your fork to your machine.
- Add an upstream remote:
git remote add upstream git@github.com:pasteurlabs/tesseract-core.git. - Create a new branch for your code contribution:
git switch --create my_branch. - Implement your changes.
- Commit and push to your fork:
git push --set-upstream origin my_branch. - Open a Pull Request with your changes.
It is a good practice to rebase often on top of main to keep your code up to
date with latest development and minimize merge conflicts:
$ git fetch upstream
$ git switch main
$ git merge upstream/main
$ git switch my_branch
$ git rebase main
$ git push --forceWe follow the Conventional
Commits specification for all
commits that reach the main branch. Each commit is crafted from a pull
request that is squash-merged. The commit title and message comes from the pull
request title and message, respectively. As such, they should be structured
following the specfication.
The title consists of a type, and optional scope, and a short
description: type[(scope)]: description. The types we use are:
chore: for changes that affect the build system, external dependencies, or general housekeeping.ci: for changes in the CI.doc: for documentation only changes.feat: for a new feature.fix: for fixing a bug.perf: for a code change that improves performance.refactor: for a code change that neither adds a feature nor fixes a bug.security: for a change that fixes a security issue.test: for adding new tests or fixing existing ones.
The scopes we use are:
cli: for changes that affecttesseractCLI.engine: for changes that affect the CLI engine.sdk: for changes that affect the Tesseract Python API.example: for changes in the examples.runtime: for changes in the Tesseract Runtime.deps: for changes in the dependencies.
In case there are breaking changes in your code, this should be indicated in
the message either by appending an exclamation mark (!) after the type/scope
or by adding a BREAKING CHANGE: trailer to the message.
The Tesseract project follows semantic versioning.
(code owners only)
Releases are done via GitHub Actions, which automatically build the release artifacts and publish them to the GitHub Releases page. To create a new release, follow these steps:
- Make sure the code is in a good state, all tests pass, and the documentation is up to date.
- Trigger a new release action through the GitHub UI. This opens a new pull request with the release notes and the version number.
- Add any additional release notes to the pull request message. They will automatically be included at the top of the release notes.
- In the meantime, you can add more commits to
main(and update the release branch) which will trigger re-generation of the changelog and release notes. - Once the pull request is ready, merge it into
main. - GitHub Actions will then automatically release the new version. Verify that the release artifacts are correctly built and published.
- Make an announcement in the Discourse Forum and on social media, if applicable.