Skip to content

Latest commit

Β 

History

History
149 lines (126 loc) Β· 6.58 KB

File metadata and controls

149 lines (126 loc) Β· 6.58 KB

FindVar

TDA 기반 μ•” λ°”μ΄μ˜€λ§ˆμ»€ 발견 νŒŒμ΄ν”„λΌμΈ

TCGA-BRCA RNA-seq 데이터에 Topological Data Analysis(TDA)λ₯Ό μ μš©ν•˜μ—¬, κΈ°μ‘΄ μœ ν΄λ¦¬λ“œ ν†΅κ³„μ—μ„œ λ°œκ²¬ν•  수 μ—†μ—ˆλ˜ μ•” κ΄€λ ¨ λ°”μ΄μ˜€λ§ˆμ»€ μœ μ „μž μ‘°ν•©(H2C Gene Panel)을 μ‹λ³„ν•œ ν”„λ‘œμ νŠΈμž…λ‹ˆλ‹€.


핡심 발견

발견 λ‚΄μš©
H1 루프 ꡬ쑰 쒅양은 정상 λŒ€λΉ„ 2.5λ°° 더 λ§Žμ€ H1 루프λ₯Ό 가짐 (p < 0.001)
μœ μ „μž κ²ΉμΉ¨ 0% TDA Top 200κ³Ό μœ ν΄λ¦¬λ“œ Top 200이 μ™„μ „νžˆ λ‹€λ₯Έ μœ μ „μž μ„ΈνŠΈ
H2C Gene Panel μœ ν΄λ¦¬λ“œμ—μ„œ λΉ„μœ μ˜λ―Έ(p>0.05)ν•œ 37개 μœ μ „μžλ‘œ AUC=0.993 달성
Pathway 직ꡐ성 TDA: μ„Έν¬μΉ¨μŠ΅/골격 vs μœ ν΄λ¦¬λ“œ: λŒ€μ‚¬/μ΄μ˜¨μ±„λ„ (Pathway κ²ΉμΉ¨ 0)

ν”„λ‘œμ νŠΈ ꡬ쑰

FindVar/
β”œβ”€β”€ README.md                                ← 이 λ¬Έμ„œ
β”œβ”€β”€ plan.md                                  ← 전체 뢄석 κ³„νš
β”œβ”€β”€ result.md                                ← μ’…ν•© κ²°κ³Ό 정리 (λ…Όλ¬Έ μž‘μ—…μš©)
β”‚
β”œβ”€β”€ phase1_tda_setup/                        ← Phase 1: TDA 탐색적 뢄석
β”‚   β”œβ”€β”€ verify_install.py                    β”‚  라이브러리 μ„€μΉ˜ 검증
β”‚   β”œβ”€β”€ explore_ph.py                        β”‚  Persistent Homology 탐색
β”‚   β”œβ”€β”€ PHASE1_REPORT.md                     β”‚  뢄석 λ³΄κ³ μ„œ
β”‚   └── results/
β”‚       β”œβ”€β”€ ph_comparison_summary.csv        β”‚  PH 비ꡐ μš”μ•½ ν…Œμ΄λΈ”
β”‚       β”œβ”€β”€ ph_diagram_*.png                 β”‚  Persistence Diagram (5개 μ„€μ •)
β”‚       └── distance_comparison.png          β”‚  Wasserstein/Bottleneck 비ꡐ
β”‚
β”œβ”€β”€ phase2_persistent_homology/              ← Phase 2: 톡계 검증
β”‚   β”œβ”€β”€ analyze_ph.py                        β”‚  Permutation test + Bootstrap
β”‚   β”œβ”€β”€ PHASE2_REPORT.md                     β”‚  뢄석 λ³΄κ³ μ„œ
β”‚   └── results/
β”‚       β”œβ”€β”€ permutation_test_results.csv     β”‚  Permutation p-value ν…Œμ΄λΈ”
β”‚       β”œβ”€β”€ h1_count_test_results.csv        β”‚  H1 count test (핡심 κ²°κ³Ό)
β”‚       β”œβ”€β”€ bootstrap_stability_results.csv  β”‚  Bootstrap μ•ˆμ •μ„±
β”‚       β”œβ”€β”€ permutation_null_distributions.png
β”‚       β”œβ”€β”€ h1_count_comparison.png          β”‚  β˜… H1 count: μ’…μ–‘ vs 정상
β”‚       β”œβ”€β”€ observed_vs_null_comparison.png
β”‚       └── bootstrap_stability.png
β”‚
β”œβ”€β”€ phase3_gene_traceback/                   ← Phase 3: μœ μ „μž 역좔적
β”‚   β”œβ”€β”€ traceback_genes.py                   β”‚  디코더 Jacobian 기반 역좔적
β”‚   β”œβ”€β”€ PHASE3_REPORT.md                     β”‚  뢄석 λ³΄κ³ μ„œ
β”‚   └── results/
β”‚       β”œβ”€β”€ gene_importance_full.csv         β”‚  전체 20,876 μœ μ „μž TDA λž­ν‚Ή
β”‚       β”œβ”€β”€ gene_importance_top100.csv       β”‚  Top 100 상세
β”‚       β”œβ”€β”€ tda_only_genes.csv               β”‚  TDA-only 200개 μœ μ „μž
β”‚       β”œβ”€β”€ both_methods_genes.csv           β”‚  μ–‘μͺ½ 발견 μœ μ „μž (0개)
β”‚       β”œβ”€β”€ latent_dimension_analysis.csv    β”‚  32개 latent 차원 뢄석
β”‚       β”œβ”€β”€ top30_genes.png                  β”‚  Top 30 μœ μ „μž λ°” 차트
β”‚       β”œβ”€β”€ tda_vs_euclidean_rank.png        β”‚  β˜… TDA vs μœ ν΄λ¦¬λ“œ 산점도
β”‚       β”œβ”€β”€ discovery_comparison.png         β”‚  발견 μœ μ „μž λ²€ λ‹€μ΄μ–΄κ·Έλž¨
β”‚       β”œβ”€β”€ latent_dimension_importance.png
β”‚       └── latent_pca.png
β”‚
β”œβ”€β”€ phase4_biological_interpretation/        ← Phase 4: Pathway + λΆ„λ₯˜ 검증
β”‚   β”œβ”€β”€ pathway_and_validation.py            β”‚  GO/KEGG + λΆ„λ₯˜ μ„±λŠ₯
β”‚   β”œβ”€β”€ PHASE4_REPORT.md                     β”‚  뢄석 λ³΄κ³ μ„œ
β”‚   └── results/
β”‚       β”œβ”€β”€ enrichment_tda_top200.csv        β”‚  TDA Pathway enrichment
β”‚       β”œβ”€β”€ enrichment_euclidean_top200.csv  β”‚  μœ ν΄λ¦¬λ“œ Pathway enrichment
β”‚       β”œβ”€β”€ classification_results.csv       β”‚  전체 λΆ„λ₯˜ μ„±λŠ₯ κ²°κ³Ό
β”‚       β”œβ”€β”€ pathway_overlap_summary.csv      β”‚  Pathway κ²ΉμΉ¨ μš”μ•½
β”‚       β”œβ”€β”€ classification_comparison.png    β”‚  β˜… λΆ„λ₯˜ μ„±λŠ₯ 비ꡐ
β”‚       └── pathway_comparison.png           β”‚  β˜… Pathway 비ꡐ
β”‚
└── phase5_visualization_paper/              ← Phase 5: λ…Όλ¬Έμš© μ‹œκ°ν™”
    β”œβ”€β”€ generate_figures.py                  β”‚  Figure 생성 슀크립트
    └── figures/
        β”œβ”€β”€ fig2_persistence_diagrams.pdf    β”‚  Persistence Diagram
        β”œβ”€β”€ fig3_statistical_validation.pdf  β”‚  톡계 검증
        β”œβ”€β”€ fig4_gene_discovery.pdf          β”‚  μœ μ „μž 발견
        β”œβ”€β”€ fig5_pathway_comparison.pdf      β”‚  Pathway 비ꡐ
        β”œβ”€β”€ fig6_classification.pdf          β”‚  λΆ„λ₯˜ μ„±λŠ₯
        β”œβ”€β”€ fig7_latent_space.pdf            β”‚  Latent Space
        β”œβ”€β”€ summary_figure.pdf              β”‚  전체 μš”μ•½
        └── *.png                            β”‚  (PNG 버전 동봉)

뢄석 νŒŒμ΄ν”„λΌμΈ

TCGA-BRCA RNA-seq (1,215 samples Γ— 20,862 genes)
  β”‚
  β”œβ”€ [μ „μ²˜λ¦¬] log1p β†’ GPU ComBat β†’ μœ μ „μž 필터링 (Data-preprocessing 리포)
  β”‚
  β”œβ”€ [TAE] Topological Autoencoder (32d cosine latent)
  β”‚
  β”œβ”€ [Phase 1] Persistent Homology 탐색 β†’ μ’…μ–‘/정상 차이 확인
  β”‚
  β”œβ”€ [Phase 2] Size-matched permutation test β†’ H1 루프 p < 0.001
  β”‚
  β”œβ”€ [Phase 3] 디코더 Jacobian β†’ μœ μ „μž 역좔적 β†’ TDA vs μœ ν΄λ¦¬λ“œ κ²ΉμΉ¨ 0%
  β”‚
  β”œβ”€ [Phase 4] Pathway enrichment + λΆ„λ₯˜ 검증 β†’ H2C AUC=0.993
  β”‚
  └─ [Phase 5] λ…Όλ¬Έμš© Figure 생성 (PDF 벑터)

H2C Gene Panel

μœ ν΄λ¦¬λ“œ ν†΅κ³„μ—μ„œ **μ™„μ „νžˆ λΉ„μœ μ˜λ―Έ(p > 0.05)**ν–ˆμœΌλ‚˜, TDAμ—μ„œ ν•΅μ‹¬μœΌλ‘œ μ‹λ³„λœ 37개 μœ μ „μž.

λŒ€ν‘œ μœ μ „μž:

μœ μ „μž TDA μˆœμœ„ μœ ν΄λ¦¬λ“œ P-value κΈ°λŠ₯
EFCAB3 8 0.791 Ca2+ κ²°ν•© 도메인
PGC 11 0.908 Pepsinogen C
RPRM 13 0.206 p53 ν‘œμ , G2 체크포인트
RPRML 14 0.333 Reprimo-like
HSPB9 18 0.924 μ†Œν˜• μ—΄μΆ©κ²©λ‹¨λ°±μ§ˆ

전체 λͺ©λ‘: phase3_gene_traceback/results/tda_only_genes.csv


μ‹€ν–‰ ν™˜κ²½

ν•­λͺ© κ°’
Python 3.12.13 (conda: tda)
PyTorch 2.11.0+cu126
ripser 0.6.14
persim 0.3.8
gudhi 3.12.0
scikit-learn 1.8.0
gseapy 1.1.13

κ΄€λ ¨ 리포지터리

리포 λ‚΄μš©
Data-preprocessing μ „μ²˜λ¦¬ + TAE ν•™μŠ΅
FindVar (이 리포) TDA 뢄석 + H2C 발견