Parallel Data Mining

Data mining is looking for patterns in a dataset. Due to the massive size of the modern day databases and their exponential growth, there is a need for efficient parallel algorithms. We designed and implemented a distributed algorithm to identify frequent patterns in a transactional database. Our approach is an extension of Han et. al.'s FP-Growth algorithm and it scales extremely well with additional processors. The code was implemented in C using MPI and MPI-2 was used for file handling. The results reported in the papers were generated on a 14-processor HP-9000/800 platform.


A. Javed and A. Khokhar, Frequent Pattern Mining on Message Passing Multiprocessor Systems, Distributed and Parallel Databases-An International Journal (DAPD), November 2004. [ACM portal]

A. Javed and A. Khokhar, Scalable Parallel Algorithm for Mining Frequent Patterns on Message Passing Parallel Systems, ISCA Parallel and Distributed Computing Systems ,(PDCS), August 2003.

(The image is Diego Rivera's Miners in Guerrero)