GitHub - frankmarchesan/ScreeningClusters

Requirements

Install dependencies for this to run:

pip install -r requirements.txt

Commands to run for this

Parse open text objections which are grouped together into bullet points:

python main.py parse --responses "responses.xlsx" --output parsed_points.csv

Run clustering analysus on the coded data:

python main.py cluster --coded "pathtocoded.xlsx" --taxonomy "pathtotaxonomy.xlsx" --output output --k N --pcoa-k N --pcoa-sub-k N

NOTE: --k N argument is optional

Specifying K for hierarchical

--k N is optional. If you pass it, hierarchical clustering will use that exact N clusters. If you leave it out, script picks K automatically using the silhouette score from KMeans, and hierarchical clustering reuses same K.

Specifying K for PCoA

--pcoa-k N

Same function as --k N, but for PCoA.

Specifying K for sub-clusters within each K for PCoA

--pcoa-sub-k N

Same function as --k N, but for subclusters within the PCoA clsuters.

Output

Cluster command writes these files into the output folder:

binary_matrix.csv: participants by codes table with 0 or 1.
pcoa_coordinates.csv: 3D PCoA coordinates for each participant.
kmeans_assignments.csv: Participant # and KMeans cluster, sorted by cluster.
hierarchical_assignments.csv: Participant # and hierarchical cluster (complete linkage), sorted by cluster.
top_codes_clusters.csv: top 3 taxonomy codes per KMeans cluster (Group, Code, Count, Proportion, Group Size).
top_codes_clusters.csv: top 3 codes per subcluster. Group ids are composite (parent*100 + sub,e.g. 201 = cluster 2 / subcluster 1). Inly written when --pcoa-sub-k is passed.
pcoa_2d.png, pcoa_3d.png, elbow_silhouette.png, dendrogram.png: plots.
pcoa_2d_subclusters.png: PCoA 2D scatter colored by subcluster. Only written when --pcoa-sub-k is passed

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
clustering.py		clustering.py
data_loader.py		data_loader.py
encoding.py		encoding.py
main.py		main.py
parser.py		parser.py
plotting.py		plotting.py
requirements.txt		requirements.txt
subclustering.py		subclustering.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Requirements

Commands to run for this

Specifying K for hierarchical

Specifying K for PCoA

Specifying K for sub-clusters within each K for PCoA

Output

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Requirements

Commands to run for this

Specifying K for hierarchical

Specifying K for PCoA

Specifying K for sub-clusters within each K for PCoA

Output

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages