Improve documentation with comprehensive docstrings by ShutingXie · Pull Request #35 · uoguelph-mlrg/PROTAX-GPU

ShutingXie · 2025-10-22T10:40:49Z

Summary

This PR adds comprehensive docstrings across all core PROTAX-GPU modules and scripts to enable better API documentation generation with Sphinx/ReadTheDocs in the next step. All docstrings follow Google/NumPy style and include detailed parameter descriptions, return values, and usage examples.

Changes Made

Core Modules

protax/__init__.py: Added module-level docstring with package overview and example
protax/classify.py: Comprehensive docstrings for classify_file() and classify() functions
protax/model.py: Detailed documentation for all probability computation functions
protax/taxonomy.py: Enhanced docstrings for TaxTree and ProtaxModel data structures
protax/protax_utils.py: Complete documentation for I/O utilities and sequence processing
protax/ops/knn_register.py: Technical details for GPU/CPU KNN operations
protax/baseline.py: Added docstrings for baseline classifier
protax/ops/__init__.py: Module-level docstring explaining custom CPU/GPU operations architecture

Scripts

scripts/train_gd.py: Complete documentation for training workflow
scripts/process_seqs.py: Usage documentation for classification script
scripts/convert.py: Documentation for data conversion utilities

…pts to enable Sphinx/ReadTheDocs API docs, clarifying baseline classifier semantics

… and add technical details

ShutingXie · 2025-10-28T02:53:22Z

Hi @gwtaylor and @Mo-Moustafa,

I’ve submitted a PR that includes updates to the docstrings. Details are included in the PR description above. I would be grateful if you could review it when convenient. Please feel free to share any comments or suggestions. Thank you!

Copilot

Pull Request Overview

This PR adds comprehensive documentation to the PROTAX-GPU codebase. The changes improve code maintainability by adding module-level docstrings, detailed function/class documentation with parameter descriptions, return types, examples, and usage notes.

Added module-level docstrings to all major Python files explaining their purpose
Added detailed function and class docstrings following a consistent format with Args, Returns, Side Effects, Notes, and Examples sections
Documented known issues and limitations (e.g., argmax bug in baseline.py, hardcoded paths)

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
scripts/train_gd.py	Added comprehensive module and function docstrings for the gradient descent training script
scripts/process_seqs.py	Added module-level documentation explaining CLI usage
scripts/convert.py	Enhanced function docstrings with parameter descriptions and return types
protax/taxonomy.py	Updated class docstrings with field descriptions
protax/protax_utils.py	Added detailed documentation for utility functions
protax/ops/knn_register.py	Documented custom JAX primitives and MLIR lowering rules
protax/ops/init.py	Added module-level documentation for custom operations
protax/model.py	Comprehensive documentation of core model functions
protax/classify.py	Added docstrings for classification functions
protax/baseline.py	Documented baseline classifier with bug warnings
protax/init.py	Added package-level documentation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-10-30T12:40:19Z

+
+    Example:
+        >>> default_layouts((3, 4), (5,))
+        [range(1, -1, -1), range(0, -1, -1)]


The example output is incorrect. For shape (3, 4), the range should be range(1, -1, -1) which produces [1, 0], not the implied [1, 0] in row-major order. For shape (5,), range(0, -1, -1) produces [0]. The comment is correct but the example output format is unclear. Consider showing the actual list values: [[1, 0], [0]].

Suggested change

[range(1, -1, -1), range(0, -1, -1)]

[[1, 0], [0]]

Copilot · 2025-10-30T12:40:20Z

+    This implementation currently selects the most DISTANT reference (argmax
+    instead of argmin), which appears to be a bug. Use with caution.


Documentation acknowledges that argmax is used instead of argmin on line 51, causing the function to return the most distant reference rather than the nearest. This defeats the purpose of a nearest-neighbor classifier. The logic should use jnp.argmin(1 - (match_tots / ok)) or equivalently jnp.argmax(match_tots / ok) to find the nearest reference.

Copilot · 2025-10-30T12:40:20Z

+        Prints total classification time to stdout.
+
+    Note:
+        Model directory is currently hardcoded to "/home/roy/Documents/PROTAX-dsets/30k_small".


The hardcoded absolute path /home/roy/Documents/PROTAX-dsets/30k_small on line 113 should be parameterized as a function argument with a default value or read from a configuration file. This makes the code non-portable across different systems and users.

ShutingXie added 2 commits October 22, 2025 05:11

Improve and add comprehensive docstrings across core modules and scri…

2ee678f

…pts to enable Sphinx/ReadTheDocs API docs, clarifying baseline classifier semantics

Unify docstrings so that all docstrings follow the Google/NumPy style…

18accfc

… and add technical details

gwtaylor requested a review from Copilot October 30, 2025 12:38

Copilot AI reviewed Oct 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve documentation with comprehensive docstrings#35

Improve documentation with comprehensive docstrings#35
ShutingXie wants to merge 2 commits into
uoguelph-mlrg:mainfrom
ShutingXie:docstring_improvement

ShutingXie commented Oct 22, 2025

Uh oh!

ShutingXie commented Oct 28, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Oct 30, 2025

Uh oh!

Copilot AI Oct 30, 2025

Uh oh!

Copilot AI Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		This implementation currently selects the most DISTANT reference (argmax
		instead of argmin), which appears to be a bug. Use with caution.

Conversation

ShutingXie commented Oct 22, 2025

Summary

Changes Made

Core Modules

Scripts

Uh oh!

ShutingXie commented Oct 28, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants