CatPop Combinatorics

This software package uses combinatorics to create all possible scenarios of two population assignments, performs a permutation test for each comparison, and generates a p-value distribution plot, along with graphs that show the association of between/within ecotype comparisons, and additionally returns the genes of significance.

There is also a random data generator to ensure the accuracy of results.

Authors: Baylee Christensen, Reagan Mckee, Dante Celani, Candice Johnson

Program Requirements

To get the full usage out of the repository, you will need:

Git
Latest version of Python
R, with the packages optparse, ggplot2, reshape2, and tidyverse. To install these packages associated with R, enter R and the command to install is: install.packages("optparse", "tidyverse", "ggplot2", "reshape2")

Quick Start

Assuming you have the repository cloned and meet the program requirements, all you need to do is format your files, then you can use the bash function, run_all_scripts.sh
To see correct how to correctly format your files, please navigate to "Step 2: Implementing File Structure within the Instructions.
Lastly, you can type in this command to get all outputs, including full function of CatPop and the graphs.
```
./run_all_functions.sh -i <input_prefix>
```

Instructions

Step 1: Cloning the Repository

Ensure you have git installed. Instructions on installing git can be found here
Open your terminal and navigate to the directory you wish to put the repository. This would look something like cd ~/GitHubRepositories
Assuming you are reading this, you are on the page of the repository. Scroll up to click on the green clone button and copy the repository's URL for cloning. Then, on your terminal, use the command:
```
git clone <repository URL>
```
Git then downloads the entire repository to your local device. You'll see progress information as the cloning takes place.
Once cloning is complete, you'll have a copy of the repository on your local machine in the subdirectory with the same name as the repository. You now should navigate into this directory to use functions this repository has.

Step 2: Implementing File Structure

This program takes in two files as input. Your two files must have a particular format.
For the categorical assignment file, please review rand_example_categories.csv for the proper format assignments.
This image has a basic layout:

A consistent naming convention for you categories will make your results much more understandable. This input file should be named: <input_prefix>_categories.csv
For the format of the fst file, please review rand_example_fst.csv. This image shows a basic layout:

Note that you may label these as you wish, but the columns need to be labelled with the underscore between population names. formatting. Also, the input file needs to be named <input_prefix>_fst.csv

*Please Note: comparisons with non-numeric fst values will be ignored.

Step 3: Running CatPop

Ensure you have all the program requirements (see above)
Open your terminal and navigate to CatPop directory
Type the following

python main.py -i <input_file_prefix>

CatPop will notify you the process was initialized, and once finished, you will see a message describing the names of the output files.

Step 4: Initializing R to get histogram (optional)

Ensure you have the "optparse" package for R installed. If you don't have this package, this command will not work!
Next, I'd recommend using the command and copying your path:

pwd

NOT REQUIRED, but that there are several options you can adjust when using this command. You can label your output file what you prefer by changing the text after '-o', change the p-value delineation by adjust the value after '-p', and change the amount of bins with '-b'.
You MUST change your working directory after the '-d'. This is what you copied after using the pwd command.
There is a R Argument parser within the directory. The command that will work for you is a variation of this command:

Rscript create_plots.R -i <input_prefix>_all_output.csv -o <input_prefix>_plots.pdf -p 0.05 -b 50 -d '/your/path/where/you/saved/CatPop'

Below is what the plots look like.
*Example of P-value Histogram
*Visualization of Compares

Outputs

The outputs of this program is as follows:

p-value plot, which will need to be initilaized through R
Ecotype comparison of within and between plot, initialized through R
results.txt, which will list all the genes and their related p-values
log.txt, which will contain all the genes with a p-value below .05
sig_output.csv shows the significant genes
all_output.csv reports every delta_fst and p-values for the genes

Example

The following is how I ran CatPop on my terminal after using the random value generator. Please note, I already had all the program requirements installed.

I cloned repository using the command:

git clone https://github.com/KLab-UT/CatPop.git

In the CatPop directory, I created the file 'rand_example_categories.csv', and named the populations, along with what 'category' they were in. The input prefix is therefore 'rand_example'. Assign your categories in relation to your data.
I used the random number generator to obtain the 'genetic divergence' values, as well as formatting my file type. The file created was called 'rand_example_fst.csv', which follows the input prefix naming convention of 'rand_example'.
Now that my files were formatted and named appropriately, I ran CatPop on my csv files with this function:
```
python3 main.py -i rand_example
```
To get the Rscript to automatically generate my plots with this title:

   rand_example_density_plot.pdf

I used this command:

Rscript create_plots.R -i rand_example_all_output.csv -o rand_example_density_plot.pdf -p 0.05 -b 50 -d '/Users/myusername/GitHubDirectories/CatPop'

Alternatively, to complete this whole process using one command, I would use:

./run_all_functions.sh -i rand_example

Other Information

If you get an error saying "Fst_Pop1_Pop2 and Fst_Pop2_Pop1 not found, check your input files and verify that the populations are spelled exactly the same in the fst and ecotype files.

Flow of Data

Please refer to the image below to understand the flow of data through the program.

Name		Name	Last commit message	Last commit date
Latest commit History 223 Commits
Beacham_2017		Beacham_2017
Diagrams		Diagrams
Hu_2021		Hu_2021
Moura_2014		Moura_2014
literature		literature
simulated_data		simulated_data
.gitignore		.gitignore
Beacham_2017_salmon_categories.csv		Beacham_2017_salmon_categories.csv
Hu_2021_orchids_all_output.csv		Hu_2021_orchids_all_output.csv
Hu_2021_orchids_categories.csv		Hu_2021_orchids_categories.csv
Hu_2021_orchids_fst.csv		Hu_2021_orchids_fst.csv
Hu_2021_orchids_log.txt		Hu_2021_orchids_log.txt
Hu_2021_orchids_plots.pdf		Hu_2021_orchids_plots.pdf
Hu_2021_orchids_results.txt		Hu_2021_orchids_results.txt
Hu_2021_orchids_sig_output.csv		Hu_2021_orchids_sig_output.csv
MeetingNotes.txt		MeetingNotes.txt
Moura_2014_orca_all_output.csv		Moura_2014_orca_all_output.csv
Moura_2014_orca_categories.csv		Moura_2014_orca_categories.csv
Moura_2014_orca_fst.csv		Moura_2014_orca_fst.csv
Moura_2014_orca_log.txt		Moura_2014_orca_log.txt
Moura_2014_orca_plots.pdf		Moura_2014_orca_plots.pdf
Moura_2014_orca_results.txt		Moura_2014_orca_results.txt
Moura_2014_orca_sig_output.csv		Moura_2014_orca_sig_output.csv
PracticeFstData_OneGene.csv		PracticeFstData_OneGene.csv
README.md		README.md
calculate_FST_avg.py		calculate_FST_avg.py
category_csv_structure.png		category_csv_structure.png
compare.py		compare.py
create_plots.r		create_plots.r
data_flow.png		data_flow.png
fst_csv_structure.png		fst_csv_structure.png
geneID_dict.py		geneID_dict.py
geneID_dictionary.json		geneID_dictionary.json
get_all_combinations.py		get_all_combinations.py
main.py		main.py
rand_example_all_output.csv		rand_example_all_output.csv
rand_example_categories.csv		rand_example_categories.csv
rand_example_fst.csv		rand_example_fst.csv
rand_example_log.txt		rand_example_log.txt
rand_example_plots.pdf		rand_example_plots.pdf
rand_example_results.txt		rand_example_results.txt
rand_example_sig_output.csv		rand_example_sig_output.csv
random_fst_test.py		random_fst_test.py
run_all_functions.sh		run_all_functions.sh
simlist		simlist

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CatPop Combinatorics

Contents

Program Requirements

Quick Start

Instructions

Step 1: Cloning the Repository

Step 2: Implementing File Structure

Step 3: Running CatPop

Step 4: Initializing R to get histogram (optional)

Outputs

Example

Other Information

Flow of Data

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CatPop Combinatorics

Contents

Program Requirements

Quick Start

Instructions

Step 1: Cloning the Repository

Step 2: Implementing File Structure

Step 3: Running CatPop

Step 4: Initializing R to get histogram (optional)

Outputs

Example

Other Information

Flow of Data

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages