TDA κΈ°λ° μ λ°μ΄μ€λ§μ»€ λ°κ²¬ νμ΄νλΌμΈ
TCGA-BRCA RNA-seq λ°μ΄ν°μ Topological Data Analysis(TDA)λ₯Ό μ μ©νμ¬, κΈ°μ‘΄ μ ν΄λ¦¬λ ν΅κ³μμ λ°κ²¬ν μ μμλ μ κ΄λ ¨ λ°μ΄μ€λ§μ»€ μ μ μ μ‘°ν©(H2C Gene Panel)μ μλ³ν νλ‘μ νΈμ λλ€.
| λ°κ²¬ | λ΄μ© |
|---|---|
| H1 루ν ꡬ쑰 | μ’ μμ μ μ λλΉ 2.5λ°° λ λ§μ H1 루νλ₯Ό κ°μ§ (p < 0.001) |
| μ μ μ κ²ΉμΉ¨ 0% | TDA Top 200κ³Ό μ ν΄λ¦¬λ Top 200μ΄ μμ ν λ€λ₯Έ μ μ μ μΈνΈ |
| H2C Gene Panel | μ ν΄λ¦¬λμμ λΉμ μλ―Έ(p>0.05)ν 37κ° μ μ μλ‘ AUC=0.993 λ¬μ± |
| Pathway μ§κ΅μ± | TDA: μΈν¬μΉ¨μ΅/골격 vs μ ν΄λ¦¬λ: λμ¬/μ΄μ¨μ±λ (Pathway κ²ΉμΉ¨ 0) |
FindVar/
βββ README.md β μ΄ λ¬Έμ
βββ plan.md β μ 체 λΆμ κ³ν
βββ result.md β μ’
ν© κ²°κ³Ό μ 리 (λ
Όλ¬Έ μμ
μ©)
β
βββ phase1_tda_setup/ β Phase 1: TDA νμμ λΆμ
β βββ verify_install.py β λΌμ΄λΈλ¬λ¦¬ μ€μΉ κ²μ¦
β βββ explore_ph.py β Persistent Homology νμ
β βββ PHASE1_REPORT.md β λΆμ λ³΄κ³ μ
β βββ results/
β βββ ph_comparison_summary.csv β PH λΉκ΅ μμ½ ν
μ΄λΈ
β βββ ph_diagram_*.png β Persistence Diagram (5κ° μ€μ )
β βββ distance_comparison.png β Wasserstein/Bottleneck λΉκ΅
β
βββ phase2_persistent_homology/ β Phase 2: ν΅κ³ κ²μ¦
β βββ analyze_ph.py β Permutation test + Bootstrap
β βββ PHASE2_REPORT.md β λΆμ λ³΄κ³ μ
β βββ results/
β βββ permutation_test_results.csv β Permutation p-value ν
μ΄λΈ
β βββ h1_count_test_results.csv β H1 count test (ν΅μ¬ κ²°κ³Ό)
β βββ bootstrap_stability_results.csv β Bootstrap μμ μ±
β βββ permutation_null_distributions.png
β βββ h1_count_comparison.png β β
H1 count: μ’
μ vs μ μ
β βββ observed_vs_null_comparison.png
β βββ bootstrap_stability.png
β
βββ phase3_gene_traceback/ β Phase 3: μ μ μ μμΆμ
β βββ traceback_genes.py β λμ½λ Jacobian κΈ°λ° μμΆμ
β βββ PHASE3_REPORT.md β λΆμ λ³΄κ³ μ
β βββ results/
β βββ gene_importance_full.csv β μ 체 20,876 μ μ μ TDA λνΉ
β βββ gene_importance_top100.csv β Top 100 μμΈ
β βββ tda_only_genes.csv β TDA-only 200κ° μ μ μ
β βββ both_methods_genes.csv β μμͺ½ λ°κ²¬ μ μ μ (0κ°)
β βββ latent_dimension_analysis.csv β 32κ° latent μ°¨μ λΆμ
β βββ top30_genes.png β Top 30 μ μ μ λ° μ°¨νΈ
β βββ tda_vs_euclidean_rank.png β β
TDA vs μ ν΄λ¦¬λ μ°μ λ
β βββ discovery_comparison.png β λ°κ²¬ μ μ μ λ²€ λ€μ΄μ΄κ·Έλ¨
β βββ latent_dimension_importance.png
β βββ latent_pca.png
β
βββ phase4_biological_interpretation/ β Phase 4: Pathway + λΆλ₯ κ²μ¦
β βββ pathway_and_validation.py β GO/KEGG + λΆλ₯ μ±λ₯
β βββ PHASE4_REPORT.md β λΆμ λ³΄κ³ μ
β βββ results/
β βββ enrichment_tda_top200.csv β TDA Pathway enrichment
β βββ enrichment_euclidean_top200.csv β μ ν΄λ¦¬λ Pathway enrichment
β βββ classification_results.csv β μ 체 λΆλ₯ μ±λ₯ κ²°κ³Ό
β βββ pathway_overlap_summary.csv β Pathway κ²ΉμΉ¨ μμ½
β βββ classification_comparison.png β β
λΆλ₯ μ±λ₯ λΉκ΅
β βββ pathway_comparison.png β β
Pathway λΉκ΅
β
βββ phase5_visualization_paper/ β Phase 5: λ
Όλ¬Έμ© μκ°ν
βββ generate_figures.py β Figure μμ± μ€ν¬λ¦½νΈ
βββ figures/
βββ fig2_persistence_diagrams.pdf β Persistence Diagram
βββ fig3_statistical_validation.pdf β ν΅κ³ κ²μ¦
βββ fig4_gene_discovery.pdf β μ μ μ λ°κ²¬
βββ fig5_pathway_comparison.pdf β Pathway λΉκ΅
βββ fig6_classification.pdf β λΆλ₯ μ±λ₯
βββ fig7_latent_space.pdf β Latent Space
βββ summary_figure.pdf β μ 체 μμ½
βββ *.png β (PNG λ²μ λλ΄)
TCGA-BRCA RNA-seq (1,215 samples Γ 20,862 genes)
β
ββ [μ μ²λ¦¬] log1p β GPU ComBat β μ μ μ νν°λ§ (Data-preprocessing 리ν¬)
β
ββ [TAE] Topological Autoencoder (32d cosine latent)
β
ββ [Phase 1] Persistent Homology νμ β μ’
μ/μ μ μ°¨μ΄ νμΈ
β
ββ [Phase 2] Size-matched permutation test β H1 루ν p < 0.001
β
ββ [Phase 3] λμ½λ Jacobian β μ μ μ μμΆμ β TDA vs μ ν΄λ¦¬λ κ²ΉμΉ¨ 0%
β
ββ [Phase 4] Pathway enrichment + λΆλ₯ κ²μ¦ β H2C AUC=0.993
β
ββ [Phase 5] λ
Όλ¬Έμ© Figure μμ± (PDF 벑ν°)
μ ν΄λ¦¬λ ν΅κ³μμ **μμ ν λΉμ μλ―Έ(p > 0.05)**νμΌλ, TDAμμ ν΅μ¬μΌλ‘ μλ³λ 37κ° μ μ μ.
λν μ μ μ:
| μ μ μ | TDA μμ | μ ν΄λ¦¬λ P-value | κΈ°λ₯ |
|---|---|---|---|
| EFCAB3 | 8 | 0.791 | Ca2+ κ²°ν© λλ©μΈ |
| PGC | 11 | 0.908 | Pepsinogen C |
| RPRM | 13 | 0.206 | p53 νμ , G2 체ν¬ν¬μΈνΈ |
| RPRML | 14 | 0.333 | Reprimo-like |
| HSPB9 | 18 | 0.924 | μν μ΄μΆ©κ²©λ¨λ°±μ§ |
μ 체 λͺ©λ‘: phase3_gene_traceback/results/tda_only_genes.csv
| νλͺ© | κ° |
|---|---|
| Python | 3.12.13 (conda: tda) |
| PyTorch | 2.11.0+cu126 |
| ripser | 0.6.14 |
| persim | 0.3.8 |
| gudhi | 3.12.0 |
| scikit-learn | 1.8.0 |
| gseapy | 1.1.13 |
| λ¦¬ν¬ | λ΄μ© |
|---|---|
| Data-preprocessing | μ μ²λ¦¬ + TAE νμ΅ |
| FindVar (μ΄ λ¦¬ν¬) | TDA λΆμ + H2C λ°κ²¬ |