A Matlab library, **Genetics** implements the
functions being used in the driver files (provided with datasets). Genetics uses
some of the mathematical tools developed in **MCMatrix**.
*Both these libraries need to be unzipped and their respective paths added to
MATLAB for the driver files to work.* Kindly contact
Asif Javed if any
problems are encountered executing the code. Please note that the functions
implemented might be updated as more efficient techniques are developed.
Suggestions on improvement or general comments regarding the code are
much appreciated.

The high level functions are explained below. Once the libraries have been added to MATLAB path, help on these functions can also be accessed using

help <*function_name*>

**linear_structure**

This function takes as input a target percentage between 0 and 1 and returns
statistics for the number of eigenSNPs and actual SNPs that were necessary to
recover dataset with at most (1-target)*100 percent erroneous entries.

**intra_population
**This function takes as input a target percentage between 0 and 1 which
denotes the percentage of people to be considered training data; the rest will
comprise test data. For each population, the function splits the data randomly
in test and training sets, and attempt to guess the test data using only the
training data and the CUR algorithm. Statistics are reported in a return
variable and stored in as a .mat file. For each population, multiple splits of
the data are evaluated for the reconstruction accuracy of multiples of
SNP_interval (an input parameter) SNPs.

**inter_population**

This function takes as input a percentage between 0 and 1 that determines
how many actual SNPs to choose from source population (via the coverage that
they provide) and estimates the prediction error after assaying the selected
SNPs in the other population. The function returns a populations-by-populations
matrix whose (i,j)-th entry stores various statistics regarding the error when
the i-th population is used to predict the j-th population. It also returns
number of SNPs retained for each population while predicting every other
population.

Some of the functions used in the above are explained here.