Parallel Data Mining

Data mining is looking for patterns in a dataset. Due to the massive size of the modern day databases and their exponential growth, there is a need for efficient parallel algorithms. We designed and implemented a distributed algorithm to identify frequent patterns in a transactional database. Our approach is an extension of Han et. al.'s FP-Growth algorithm and it scales extremely well with additional processors. The code was implemented in C using MPI and MPI-2 was used for file handling. The results reported in the papers were generated on a 14-processor HP-9000/800 platform.


