API Overview

Welcome to the manual for the Parallel PopGen Package (3P)! This is a library of CUDA C/C++ APIs for performing population genetics simulations and analyses on the GPU. Using the 3P APIs, complex, powerful, and fast simulations and analyses can be written in otherwise standard C++ - requiring no knowledge of parallel/GPU programming from the end-user. Below is a brief description of the information found in this manual, organized in the tabs above:

Namespaces provides all functions and classes related to an API (e.g. GO_Fish).
Modules page organizes the Sim_Model namespace into groups of functions (e.g. Simulation Models: Selection Group) - each module describes the parameters and output of the function group and how to write custom simulation functions.
Files shows the file hierarchy, the functions and classes in each file, and the detailed description of each header file provides information on how to include the files for a project.
Classes shows all classes and structs organized by namespace.
Examples provides code examples and custom makefiles found in the examples/ folder.

3P currently contains 3 APIs, each in its own namespace:

GO_Fish: Namespace for single-locus, forward, Monte-Carlo Wright-Fisher simulation and output data structures.
Sim_Model: Namespace of functions for controlling GO_Fish simulations. Sim_Model is optional as the user is free to write their own population genetics and simulation control functions following the rules laid out in each function group's Module.
Spectrum: Namespace for site frequency spectrum data structure and functions. (in prototype-phase)

The relevant header files are in the folder include/

go_fish.cuh: GO Fish Simulation API (contains namespaces GO_Fish and Sim_Model)
go_fish_data_struct.h: Subset of go_fish.cuh (the GO_Fish data structures)
spectrum.h: proto-API for building site frequency spectra (contains the titular namespace Spectrum)

The implementations of these header files are in the source/ folder. All outside libraries used in the Parallel PopGen Package (beyond standard CUDA and C/C++ libs) are stored in the folder outside_libraries/ and are automatically included. The examples in the examples/ folder show how to compile a project which using a 3P API (also covered in Parallel PopGen Package v0.3 README).

Tips

For faster GO_Fish simulations:

Play around with GO_Fish::allele_trajectories::sim_constants::compact_interval.
- The variable compact_interval determines when to trow away mutations that have been lost or fixed by all populations in the simulation. Raise it to do it less often, lower it for more often. Optimal compact intervals can only be determined heuristically - larger simulations (with more mutations) need to be compacted more often than smaller simulations, while faster GPUs with more cores need to compact less often than slower ones with fewer cores. Note: Changing the compact interval will change the result of the simulation run for the same seed numbers. However, these are not independent simulation runs! Changing the compact interval produces new random, but correlated simulation results.
When simulating full allele_trajectories, turn off compact_interval.
- To record the frequency of every mutation for every generation, pass Sim_Model::bool_on to GO_Fish::run_sim's parameter Functor_timesample take_sample and set GO_Fish::allele_trajectories::sim_constants::compact_interval = 0.
When simulating for the purposes of generating SFS (and other applications), scale the simulation by a reference population size to simulate fewer generations/smaller populations.
- See Example2-DaDi and δaδi-to-GOFish Parameter Conversion for an examples and general how-to.
GO_Fish::run_sim has an important Pro tip regarding input template functions.

(Pro Tip - Advanced users) In implementation file shared.cuh, there are several variables that can be tuned:

typedef r123::Philox4x32_R<10> P;: <10> can be set as low as <8> for extra-speed.
- Philox is the random number generator for GO_Fish. The 10 refers to the number of bijections it performs to generate a new random number. The authors of Philox state that 8 is the lowest safe limit, but 10 is recommended for maximum speed. Current default setting in shared.cuh is 10.
#define RNG_MEAN_BOUNDARY_NORM 6: RNG_MEAN_BOUNDARY_NORM can be set between 1-33
- recommended setting is between 6-15, current default is 6. If the expected (mean) result of the RNG for the distribution is greater than RNG_MEAN_BOUNDARY_NORM, the normal approximation is used; less than RNG_MEAN_BOUNDARY_NORM, the poisson/ binomial distribution is used. Set higher and the distribution becomes closer to a true binomial at the cost of speed, makes very little to no difference (<< 1%) for many applications.
#define RNG_N_BOUNDARY_POIS_BINOM 100: RNG_N_BOUNDARY_POIS_BINOM can be set between 1-200
- recommended setting is between 100-200, current default is 100. When simulating a binomial distribution, if the total number of trials, n, for the distribution is greater than RNG_N_BOUNDARY_POIS_BINOM, the poisson approximation is used; less than RNG_N_BOUNDARY_POIS_BINOM, the binomial distribution is used. The binomial calculation gets numerically unstable for large n >> 200.

Troubleshooting

Use the embedded link for all bug reporting, feature requests, to do, and discussion. Below is a description of the two kinds of errors to encounter during API use:

API errors will be reported as: function name: description.
CUDA errors will be reported as: error # s file s line # generation # population #, where # = number and s = string.
- If a CUDA error is encountered, set __DEBUG__ (shared.cuh line 23) to true - the program will run slower, but the error will be guaranteed to be reported with the correct file/line number/etc ... __DEBUG__ set to true ensures that the CPU and GPU are always in-sync where normally the CPU is allowed to race ahead of the GPU and thus by the time an error from the GPU is reported, the CPU may be executing a different piece of code. If problem is rectified, make sure to turn __DEBUG__ back to false to speed up program execution. A reported generation or population of -1 means that generation or population was not relevant to the error.
- CUDA out-of-memory errors will typically be the result of a simulation/spectrum that is too big for the GPU's onboard memory. For instance, if running a GO_Fish simulation that starts in mutation-selection equilibrium (i.e. GO_Fish::allele_trajectories::sim_constants::init_mse = true) and the population size is too big, so that the program fails in function initialize_mse in go_fish_impl.cuh before the arrays in struct mutations are allocated, try rescaling the simulation as shown in δaδi-to-GOFish Parameter Conversion - either temporarily just for generation 0 with a couple of burn-in generations after all the parameters are reset (selection too!) or for the whole simulation. If there are too many mutations, simply lower the number of independent sites in the simulation using GO_Fish::allele_trajectories::sim_constants::num_sites.

δaδi-to-GOFish Parameter Conversion

GO_Fish simulations are discrete time/discrete frequency simulations. That said, the parameters of such a simulation can be translated into the continuous time/continuous frequency paradigm of δaδi. In δaδi, the main parameter is N_ref - the reference population size of the initial population at time 0. Time, theta, selection, migration, and the population sizes of later generations/other populations are all in reference to N_ref. The list below covers the population genetics functions/variable in a GO_Fish simulation, and how to match that to the corresponding δaδi parameter. Example2-DaDi gives a specific instance of an analogous δaδi and GO_Fish simulation with population scaling.

Inbreeding

Inbreeding functions return a float parameter, F, between 0 (outbred) and 1 (inbred) inclusive - see Simulation Models: Inbreeding, Mutation, and Dominance Group for more detail. In δaδi, populations are outbred, F = 0.

Dominance

Dominance functions return a float parameter, h, between -inf and +inf - see Simulation Models: Inbreeding, Mutation, and Dominance Group for more detail. h_GOFish = h_δaδi.

Selection

Selection functions return a selection coefficient, which is a float greater than or equal to -1 (lethal) - see Simulation Models: Selection Group for more detail. In δaδi the selection parameter is the effective selection coefficient, gamma = 2Ns, where the selection on alleles AA, Aa, aa are 1, 1+2hs, 1+2s and N is the number of individuals. Thus the equivalent GO_Fish selection coefficient is s_GOFish = 2*s_δaδi. Sim_Model, like δaδi, has constructors in Simulation Models: Selection Group that allow effective selection coefficients to be input into the simulation (e.g. Sim_Model::selection_constant). For an outbred population, F = 0, gamma_GOFish = 2Ns_GOFish = 2*2Ns_δaδi = 2*gamma_δaδi. Effective selection coefficients are invariant to population scaling - see Demography.

Demography

Demography functions return a number of individuals, an integer between 0 and max 2³²-1 inclusive - see Simulation Models: Demography Group for more detail. Like in δaδi, population sizes in simulations can be scaled - doing so requires re-scaling migration rates, the number of generations, and the mutation rate. If taking samples of the populations, it is best to choose a reference population size for the initial population, N_{ref_GOFish}, such that the size(s) of the sampled population(s) are >> than the sample size, so that using the binomial is an appropriate choice of sampling distribution.

Migration

Migration functions return a float, m, between 0 and 1 inclusive - see Simulation Models: Migration Group for more detail. Both δaδi and GO_Fish use a conservative model of migration. In δaδi, the migration rate is scaled by the population size, so m_GOFish = m_δaδi/(2N_{ref_GOFish}), where N_{ref_GOFish} is the number of initial individuals in the reference population of the GO_Fish simulation. This automatically scales migration rates for different reference population sizes.

Number of Generations

The number of generations in the simulation is a constant integer between 0 and max 2³²-1 inclusive - use GO_Fish::allele_trajectories::sim_constants::num_generations to set the number of generations for a simulation. In δaδi, time T is measured in units of 2N_ref, so the corresponding number of generations in GO_Fish would be T*2N_{ref_GOFish}. Thus, the number of generations in a simulation can be scaled by the population size. Similarly in δaδi Integration.py, one can set the variable dadi.Integration.set_timescale_factor = # where # = 1/(num_generations) so that a time step in the integration routine is equal to a discrete generation in the simulation.

Mutation rate

Mutation functions return a per-site, per-generation mutation rate, which a float parameter greater than 0 - see Simulation Models: Inbreeding, Mutation, and Dominance Group for more detail. If the simulation population sizes (N_{ref_GOFish}) have been scaled relative to the actual population sizes N, then, to maintain the same polymorphism level (theta), set mu_ref = mu_actual*N/N_{ref_GOFish}. In δaδi, theta_ref, defined as 4N_refmu, is often (not always) defined to be 1 for the diffusion calculations, then later fit to the data. GO_Fish does not actually use theta as a relevant parameter in the simulations, but the fitted δaδi mutation rates should be equivalent to the pre-scaled GO_Fish mu_actual.

Number of Sites

The number of sites in the simulation is a constant float between 0 and +inf - use GO_Fish::allele_trajectories::sim_constants::num_sites to set the number of independent sites for a simulation. While δaδi doesn't have this parameter, grid_size is somewhat analogous in that the more sites there are in a GO_Fish simulation, then the less noise, the more precise the output of the simulation will be - similarly, the finer the grid size in δaδi, then the more accurate, the more precise the integration. Other than being parameters of precision in their respective tools, they are otherwise quite different in intent.

Table of Contents