Complete assignment 2 #2
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
TITLE: UofT-DSI | Python - Assignment 2
What changes are you trying to make?
Built a complete data analysis pipeline to evaluate arthritis drug efficacy across a 12-session clinical trial. Created three main components:
All code uses NumPy for efficient array operations rather than loops, with proper axis specification for row-wise calculations.
What did you learn from the changes you have made?
axis=1operates across rows (patients) vs columns (days) is critical for correct computationpatient_summary()as a reusable function madedetect_problems()implementation cleaner and more maintainableWas there another approach you were thinking about making?
Could have used pandas DataFrames instead of NumPy arrays for more intuitive column/row selection, but NumPy was more efficient and aligned with course focus. Also considered using loops with native Python to calculate statistics, but NumPy vectorization is significantly faster for 60 × 40 datasets.
Were there any challenges?
Main challenge: Understanding the
axisparameter in NumPy functions. Initially wasn't clear whetheraxis=0oraxis=1operated on patients vs days. Resolved by testing on the first file and verifying output shape matched 60 patients.Secondary challenge: Interpreting the
check_zeros()helper function's logic withnp.where()and checking if the resulting flag was empty. Clarified by working through the function step-by-step.How were these changes tested?
patient_summary()on first inflammation file, verified output length equals 60 (one per patient)detect_problems()on first file, confirmed False output (no patients with zero mean inflammation)patient_summary()with invalid operation parameterChecklist