### Pearson Correlation Coefficient

*Pearson's correlation coefficient*
(*r*^{2}) is widely used in genetics. It lies at the heart of *Carlson* et. al. and
*De
Bakker* et. al.'s tagging SNP selection methods. It is also frequently used in
disease association studies to reveal SNPs in high linkage with various
diseases. *r*^{2} can easily be computed from haplotype data.
However most currently available genetic datasets contain genotype data; which
makes computing *r*^{2 }more expensive.
All currently available software utilize expectation
maximization to compute *r*^{2}. This is computationally expensive
and hence does not scale to chromosome wide data. Most publicly available
implementations go around this problem by restricting the pair wise computations
to sliding windows of neighboring SNPs only. For one of our studies we needed to
look at long range linkage. We developed a numeric method to compute *r*^{2}.

The implementation with detailed explanation can be
downloaded from this link. Please send me an email if you need the C code or
for any suggestions to add another input format.

The plot to the right was computed for 541 individuals. Since we are
computing *r*^{2 }for all pairs of SNPs, the curve is quadratic.
But our implementation takes very little time for each pair. In this plot, it
took roughly **50-100 ****μsec** for each
pair on a commodity desktop.