Skip to content

Improve documentation with comprehensive docstrings#35

Open
ShutingXie wants to merge 2 commits into
uoguelph-mlrg:mainfrom
ShutingXie:docstring_improvement
Open

Improve documentation with comprehensive docstrings#35
ShutingXie wants to merge 2 commits into
uoguelph-mlrg:mainfrom
ShutingXie:docstring_improvement

Conversation

@ShutingXie

Copy link
Copy Markdown

Summary

This PR adds comprehensive docstrings across all core PROTAX-GPU modules and scripts to enable better API documentation generation with Sphinx/ReadTheDocs in the next step. All docstrings follow Google/NumPy style and include detailed parameter descriptions, return values, and usage examples.

Changes Made

Core Modules

  • protax/__init__.py: Added module-level docstring with package overview and example
  • protax/classify.py: Comprehensive docstrings for classify_file() and classify() functions
  • protax/model.py: Detailed documentation for all probability computation functions
  • protax/taxonomy.py: Enhanced docstrings for TaxTree and ProtaxModel data structures
  • protax/protax_utils.py: Complete documentation for I/O utilities and sequence processing
  • protax/ops/knn_register.py: Technical details for GPU/CPU KNN operations
  • protax/baseline.py: Added docstrings for baseline classifier
  • protax/ops/__init__.py: Module-level docstring explaining custom CPU/GPU operations architecture

Scripts

  • scripts/train_gd.py: Complete documentation for training workflow
  • scripts/process_seqs.py: Usage documentation for classification script
  • scripts/convert.py: Documentation for data conversion utilities

…pts to enable Sphinx/ReadTheDocs API docs, clarifying baseline classifier semantics
@ShutingXie

Copy link
Copy Markdown
Author

Hi @gwtaylor and @Mo-Moustafa,

I’ve submitted a PR that includes updates to the docstrings. Details are included in the PR description above. I would be grateful if you could review it when convenient. Please feel free to share any comments or suggestions. Thank you!

@gwtaylor gwtaylor requested a review from Copilot October 30, 2025 12:38

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds comprehensive documentation to the PROTAX-GPU codebase. The changes improve code maintainability by adding module-level docstrings, detailed function/class documentation with parameter descriptions, return types, examples, and usage notes.

  • Added module-level docstrings to all major Python files explaining their purpose
  • Added detailed function and class docstrings following a consistent format with Args, Returns, Side Effects, Notes, and Examples sections
  • Documented known issues and limitations (e.g., argmax bug in baseline.py, hardcoded paths)

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
scripts/train_gd.py Added comprehensive module and function docstrings for the gradient descent training script
scripts/process_seqs.py Added module-level documentation explaining CLI usage
scripts/convert.py Enhanced function docstrings with parameter descriptions and return types
protax/taxonomy.py Updated class docstrings with field descriptions
protax/protax_utils.py Added detailed documentation for utility functions
protax/ops/knn_register.py Documented custom JAX primitives and MLIR lowering rules
protax/ops/init.py Added module-level documentation for custom operations
protax/model.py Comprehensive documentation of core model functions
protax/classify.py Added docstrings for classification functions
protax/baseline.py Documented baseline classifier with bug warnings
protax/init.py Added package-level documentation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


Example:
>>> default_layouts((3, 4), (5,))
[range(1, -1, -1), range(0, -1, -1)]

Copilot AI Oct 30, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example output is incorrect. For shape (3, 4), the range should be range(1, -1, -1) which produces [1, 0], not the implied [1, 0] in row-major order. For shape (5,), range(0, -1, -1) produces [0]. The comment is correct but the example output format is unclear. Consider showing the actual list values: [[1, 0], [0]].

Suggested change
[range(1, -1, -1), range(0, -1, -1)]
[[1, 0], [0]]

Copilot uses AI. Check for mistakes.
Comment thread protax/baseline.py
Comment on lines +9 to +10
This implementation currently selects the most DISTANT reference (argmax
instead of argmin), which appears to be a bug. Use with caution.

Copilot AI Oct 30, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation acknowledges that argmax is used instead of argmin on line 51, causing the function to return the most distant reference rather than the nearest. This defeats the purpose of a nearest-neighbor classifier. The logic should use jnp.argmin(1 - (match_tots / ok)) or equivalently jnp.argmax(match_tots / ok) to find the nearest reference.

Copilot uses AI. Check for mistakes.
Comment thread protax/baseline.py
Prints total classification time to stdout.

Note:
Model directory is currently hardcoded to "/home/roy/Documents/PROTAX-dsets/30k_small".

Copilot AI Oct 30, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded absolute path /home/roy/Documents/PROTAX-dsets/30k_small on line 113 should be parameterized as a function argument with a default value or read from a configuration file. This makes the code non-portable across different systems and users.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants