This software package uses combinatorics to create all possible scenarios of two population assignments, performs a permutation test for each comparison, and generates a p-value distribution plot, along with graphs that show the association of between/within ecotype comparisons, and additionally returns the genes of significance.
There is also a random data generator to ensure the accuracy of results.
Authors: Baylee Christensen, Reagan Mckee, Dante Celani, Candice Johnson
To get the full usage out of the repository, you will need:
- Git
- Latest version of Python
- R, with the packages optparse, ggplot2, reshape2, and tidyverse. To install these
packages associated with R, enter R and the command to install is:
install.packages("optparse", "tidyverse", "ggplot2", "reshape2")
- Assuming you have the repository cloned and meet the program requirements, all you need to do is format your files, then you can use the bash function, run_all_scripts.sh
- To see correct how to correctly format your files, please navigate to "Step 2: Implementing File Structure within the Instructions.
- Lastly, you can type in this command to get all outputs, including full
function of CatPop and the graphs.
./run_all_functions.sh -i <input_prefix>
- Ensure you have git installed. Instructions on installing git can be found here
- Open your terminal and navigate to the directory you wish to put the
repository. This would look something like
cd ~/GitHubRepositories - Assuming you are reading this, you are on the page of the repository. Scroll up to click on the green clone button and copy
the repository's URL for cloning. Then, on your terminal, use the command:
git clone <repository URL> - Git then downloads the entire repository to your local device. You'll see progress information as the cloning takes place.
- Once cloning is complete, you'll have a copy of the repository on your local machine in the subdirectory with the same name as the repository. You now should navigate into this directory to use functions this repository has.
- This program takes in two files as input. Your two files must have a particular format.
- For the categorical assignment file, please review
rand_example_categories.csvfor the proper format assignments.
This image has a basic layout:

A consistent naming convention for you categories will make your results much more understandable. This input file should be named:<input_prefix>_categories.csv - For the format of the fst file, please review
rand_example_fst.csv. This image shows a basic layout:

Note that you may label these as you wish, but the columns need to be labelled with the underscore between population names. formatting. Also, the input file needs to be named<input_prefix>_fst.csv
*Please Note: comparisons with non-numeric fst values will be ignored.
- Ensure you have all the program requirements (see above)
- Open your terminal and navigate to CatPop directory
- Type the following
python main.py -i <input_file_prefix>
- CatPop will notify you the process was initialized, and once finished, you will see a message describing the names of the output files.
- Ensure you have the "optparse" package for R installed. If you don't have this package, this command will not work!
- Next, I'd recommend using the command and copying your path:
pwd
- NOT REQUIRED, but that there are several options you can adjust when using this command. You can label your output file what you prefer by changing the text after '-o', change the p-value delineation by adjust the value after '-p', and change the amount of bins with '-b'.
- You MUST change your working directory after the '-d'. This is what you
copied after using the
pwdcommand. - There is a R Argument parser within the directory. The command that will work for you is a variation of this command:
Rscript create_plots.R -i <input_prefix>_all_output.csv -o <input_prefix>_plots.pdf -p 0.05 -b 50 -d '/your/path/where/you/saved/CatPop'
Below is what the plots look like.
*Example of P-value Histogram
*Visualization of Compares
The outputs of this program is as follows:
- p-value plot, which will need to be initilaized through R
- Ecotype comparison of within and between plot, initialized through R
- results.txt, which will list all the genes and their related p-values
- log.txt, which will contain all the genes with a p-value below .05
- sig_output.csv shows the significant genes
- all_output.csv reports every delta_fst and p-values for the genes
The following is how I ran CatPop on my terminal after using the random value
generator. Please note, I already had all
the program requirements installed.
- I cloned repository using the command:
git clone https://github.com/KLab-UT/CatPop.git
- In the CatPop directory, I created the file 'rand_example_categories.csv', and named the populations, along with what 'category' they were in. The input prefix is therefore 'rand_example'. Assign your categories in relation to your data.
- I used the random number generator to obtain the 'genetic divergence' values, as well as formatting my file type. The file created was called 'rand_example_fst.csv', which follows the input prefix naming convention of 'rand_example'.
- Now that my files were formatted and named appropriately, I ran CatPop on my
csv files with this function:
python3 main.py -i rand_example - To get the Rscript to automatically generate my plots with this title:
rand_example_density_plot.pdf
I used this command:
Rscript create_plots.R -i rand_example_all_output.csv -o rand_example_density_plot.pdf -p 0.05 -b 50 -d '/Users/myusername/GitHubDirectories/CatPop'
- Alternatively, to complete this whole process using one command, I would use:
./run_all_functions.sh -i rand_example
If you get an error saying "Fst_Pop1_Pop2 and Fst_Pop2_Pop1 not found, check your input files and verify that the populations are spelled exactly the same in the fst and ecotype files.
Please refer to the image below to understand the flow of data through the
program.

