|
| 1 | +# Data And Ethics |
| 2 | + |
| 3 | +This project uses medical imaging and biomarker data related to pancreatic cancer detection. Because the work sits in a health context, the main ethical obligation is not just model quality, but careful handling of privacy, bias, uncertainty, and claims. |
| 4 | + |
| 5 | +## Data Handling |
| 6 | + |
| 7 | +- raw clinical data is intentionally kept local and is not tracked in Git |
| 8 | +- processed datasets are also kept local because they remain research assets, not public benchmark files |
| 9 | +- thesis files, checkpoints, embeddings, and presentation materials are treated as local-only by default |
| 10 | +- contributors should never commit patient-level source data, exported scans, or derived files that could create privacy or governance issues |
| 11 | + |
| 12 | +## Privacy Posture |
| 13 | + |
| 14 | +- this repository is structured to avoid publishing raw patient data |
| 15 | +- tracked artifacts are limited to lightweight code, documentation, reports, and curated figures |
| 16 | +- any future sharing of sample data should use explicitly approved, de-identified, non-sensitive examples |
| 17 | + |
| 18 | +## Bias And Scientific Validity |
| 19 | + |
| 20 | +Bias is a central concern in this project, especially for the CT branch. |
| 21 | + |
| 22 | +The dissertation-aligned interpretation is: |
| 23 | + |
| 24 | +- CT results are strong, but they may still contain residual dataset-of-origin signal |
| 25 | +- the biomarker branch is the cleanest reproducible result in the repository |
| 26 | +- fusion is exploratory because CT and biomarker cohorts are not patient-paired |
| 27 | + |
| 28 | +That means high headline metrics should not be read as proof of clinical readiness. |
| 29 | + |
| 30 | +## Intended Use |
| 31 | + |
| 32 | +This repository is intended for: |
| 33 | + |
| 34 | +- research documentation |
| 35 | +- thesis support |
| 36 | +- method development |
| 37 | +- portfolio demonstration of multimodal and bias-aware ML work |
| 38 | + |
| 39 | +This repository is not intended for: |
| 40 | + |
| 41 | +- clinical decision support |
| 42 | +- patient triage |
| 43 | +- diagnosis in real care settings |
| 44 | +- unattended deployment in healthcare environments |
| 45 | + |
| 46 | +## Ethical Reporting Expectations |
| 47 | + |
| 48 | +When describing the project publicly, keep these points explicit: |
| 49 | + |
| 50 | +- the CT branch required extensive bias-aware preprocessing and still has unresolved domain-generalization questions |
| 51 | +- the biomarker branch is more defensible than the fusion branch as a standalone positive result |
| 52 | +- decision-level and feature-level fusion were evaluated under synthetic pairing assumptions, not true paired-patient multimodal data |
| 53 | +- the project is a research artifact, not a validated clinical system |
| 54 | + |
| 55 | +## Future Ethical And Methodological Improvements |
| 56 | + |
| 57 | +- external validation on additional CT cohorts |
| 58 | +- domain-adversarial CT training to suppress dataset-of-origin shortcuts |
| 59 | +- clearer uncertainty reporting and calibration tracking |
| 60 | +- real paired multimodal cohorts instead of synthetic pairing |
| 61 | +- stronger dataset documentation and governance notes if data-sharing constraints change |
0 commit comments