Improve documentation with comprehensive docstrings#35
Conversation
…pts to enable Sphinx/ReadTheDocs API docs, clarifying baseline classifier semantics
… and add technical details
|
Hi @gwtaylor and @Mo-Moustafa, I’ve submitted a PR that includes updates to the docstrings. Details are included in the PR description above. I would be grateful if you could review it when convenient. Please feel free to share any comments or suggestions. Thank you! |
There was a problem hiding this comment.
Pull Request Overview
This PR adds comprehensive documentation to the PROTAX-GPU codebase. The changes improve code maintainability by adding module-level docstrings, detailed function/class documentation with parameter descriptions, return types, examples, and usage notes.
- Added module-level docstrings to all major Python files explaining their purpose
- Added detailed function and class docstrings following a consistent format with Args, Returns, Side Effects, Notes, and Examples sections
- Documented known issues and limitations (e.g., argmax bug in baseline.py, hardcoded paths)
Reviewed Changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/train_gd.py | Added comprehensive module and function docstrings for the gradient descent training script |
| scripts/process_seqs.py | Added module-level documentation explaining CLI usage |
| scripts/convert.py | Enhanced function docstrings with parameter descriptions and return types |
| protax/taxonomy.py | Updated class docstrings with field descriptions |
| protax/protax_utils.py | Added detailed documentation for utility functions |
| protax/ops/knn_register.py | Documented custom JAX primitives and MLIR lowering rules |
| protax/ops/init.py | Added module-level documentation for custom operations |
| protax/model.py | Comprehensive documentation of core model functions |
| protax/classify.py | Added docstrings for classification functions |
| protax/baseline.py | Documented baseline classifier with bug warnings |
| protax/init.py | Added package-level documentation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| Example: | ||
| >>> default_layouts((3, 4), (5,)) | ||
| [range(1, -1, -1), range(0, -1, -1)] |
There was a problem hiding this comment.
The example output is incorrect. For shape (3, 4), the range should be range(1, -1, -1) which produces [1, 0], not the implied [1, 0] in row-major order. For shape (5,), range(0, -1, -1) produces [0]. The comment is correct but the example output format is unclear. Consider showing the actual list values: [[1, 0], [0]].
| [range(1, -1, -1), range(0, -1, -1)] | |
| [[1, 0], [0]] |
| This implementation currently selects the most DISTANT reference (argmax | ||
| instead of argmin), which appears to be a bug. Use with caution. |
There was a problem hiding this comment.
Documentation acknowledges that argmax is used instead of argmin on line 51, causing the function to return the most distant reference rather than the nearest. This defeats the purpose of a nearest-neighbor classifier. The logic should use jnp.argmin(1 - (match_tots / ok)) or equivalently jnp.argmax(match_tots / ok) to find the nearest reference.
| Prints total classification time to stdout. | ||
|
|
||
| Note: | ||
| Model directory is currently hardcoded to "/home/roy/Documents/PROTAX-dsets/30k_small". |
There was a problem hiding this comment.
The hardcoded absolute path /home/roy/Documents/PROTAX-dsets/30k_small on line 113 should be parameterized as a function argument with a default value or read from a configuration file. This makes the code non-portable across different systems and users.
Summary
This PR adds comprehensive docstrings across all core PROTAX-GPU modules and scripts to enable better API documentation generation with Sphinx/ReadTheDocs in the next step. All docstrings follow Google/NumPy style and include detailed parameter descriptions, return values, and usage examples.
Changes Made
Core Modules
protax/__init__.py: Added module-level docstring with package overview and exampleprotax/classify.py: Comprehensive docstrings forclassify_file()andclassify()functionsprotax/model.py: Detailed documentation for all probability computation functionsprotax/taxonomy.py: Enhanced docstrings forTaxTreeandProtaxModeldata structuresprotax/protax_utils.py: Complete documentation for I/O utilities and sequence processingprotax/ops/knn_register.py: Technical details for GPU/CPU KNN operationsprotax/baseline.py: Added docstrings for baseline classifierprotax/ops/__init__.py: Module-level docstring explaining custom CPU/GPU operations architectureScripts
scripts/train_gd.py: Complete documentation for training workflowscripts/process_seqs.py: Usage documentation for classification scriptscripts/convert.py: Documentation for data conversion utilities