Article Abstract
International Journal of Trends in Emerging Research and Development, 2024;2(3):80-84
A comparative study of MapReduce-based Apriori Algorithm performance on small vs. large Hadoop clusters
Author : Shweta Mittal and Dr. Prerna Sidana
Abstract
This research paper compares the performance of the MapReduce-based Apriori algorithm on small and large Hadoop clusters, with an emphasis on scalability and efficiency in processing large datasets. The Apriori algorithm is widely used for mining frequent itemsets in transactional databases, but its computational complexity poses challenges in big data environments. This study evaluates the algorithm’s performance on two different Hadoop cluster configurations-one small and one significantly larger-to determine how cluster size impacts execution time, resource utilization, and overall scalability. Through extensive experimentation, we find that while larger clusters offer improved performance, they also introduce new challenges such as increased network latency and resource management complexity. The paper concludes with a discussion of best practices for deploying Apriori on Hadoop clusters of varying sizes and suggests directions for future research.
Keywords
MapReduce, Apriori, Algorithm, Hadoop clusters, Computer Science