Analysing Crime Datasets Using Hive and Pig: A Performance Perspective
Keywords:
Big data, Hadoop, hive, pig, analysis, crime analysisAbstract
Nowadays, as the population continues to grow, the incidence of crime and the crime rate
also rise. To identify a crime pattern, it is essential to use an appropriate data mining technique, as
superior data mining methods yield improved pattern outcomes, enabling us to manage the crime rate
effectively. However, currently, the amount of data generated is extremely large, and traditional tools
and techniques cannot manage the analysis of such vast and complex data. Thus, we require a robust
instrument and method to manage significant volumes of data. This paper presents big data analytics
through Pig and Hive, highlighting critical challenges that governments encounter in decision-making
processes to lower crime rates. By analyzing extensive crime datasets with big data analytical tools, we
can determine the crime rate categorized by year, district, and type of crime. Queries in Hive and scripts
in Pig are run on the crime dataset. Considering factors such as execution duration and the count of
map-reduce tasks, it has been analyzed that Hive is more efficient and superior to Pig.
