Skip to content

Conversation

@tanveerrouf
Copy link
Owner

TITLE: UofT-DSI | Python - Assignment 2

What changes are you trying to make?

Built a complete data analysis pipeline to evaluate arthritis drug efficacy across a 12-session clinical trial. Created three main components:

  1. Data Reading: Loaded and displayed inflammation data from 12 CSV files (60 patients × 40 days each)
  2. patient_summary() Function: Implemented NumPy-based function to compute mean, max, and min inflammation scores across 40-day periods for all 60 patients
  3. detect_problems() Function: Built error detection system to identify data anomalies (patients with zero mean inflammation, indicating potential data entry errors or ineligible participants)

All code uses NumPy for efficient array operations rather than loops, with proper axis specification for row-wise calculations.

What did you learn from the changes you have made?

  • NumPy axis mechanics: Understanding how axis=1 operates across rows (patients) vs columns (days) is critical for correct computation
  • Data validation importance: Detecting zero-mean values caught potential data quality issues that could skew efficacy analysis
  • Function modularity: Building patient_summary() as a reusable function made detect_problems() implementation cleaner and more maintainable
  • CSV handling: Working with file paths and numpy.loadtxt() with delimiters showed practical file I/O in data analysis workflows

Was there another approach you were thinking about making?

Could have used pandas DataFrames instead of NumPy arrays for more intuitive column/row selection, but NumPy was more efficient and aligned with course focus. Also considered using loops with native Python to calculate statistics, but NumPy vectorization is significantly faster for 60 × 40 datasets.

Were there any challenges?

Main challenge: Understanding the axis parameter in NumPy functions. Initially wasn't clear whether axis=0 or axis=1 operated on patients vs days. Resolved by testing on the first file and verifying output shape matched 60 patients.

Secondary challenge: Interpreting the check_zeros() helper function's logic with np.where() and checking if the resulting flag was empty. Clarified by working through the function step-by-step.

How were these changes tested?

  • Tested patient_summary() on first inflammation file, verified output length equals 60 (one per patient)
  • Ran all three operations ('mean', 'max', 'min') to confirm correct behavior
  • Tested detect_problems() on first file, confirmed False output (no patients with zero mean inflammation)
  • Verified error handling in patient_summary() with invalid operation parameter

Checklist

  • I can confirm that my changes are working as intended
  • All code cells execute without errors
  • Functions return expected output shapes and values
  • Code is organized with clear comments explaining NumPy operations

Copy link

@Dmytro-Bonislavskyi Dmytro-Bonislavskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants