Analysing Crime Datasets Using Hive and Pig: A Performance Perspective

Authors

  • Harsh Kumar Tomar Vikrant University, Gwalior, Author
  • Sandeep K Tiwari Vikrant University, Gwalior, Author
  • Shashank Swami Vikrant University, Gwalior, Author
  • Shashi Pratap Tomar Vikrant University, Gwalior, Author

Keywords:

Big data, Hadoop, hive, pig, analysis, crime analysis

Abstract

Nowadays, as the population continues to grow, the incidence of crime and the crime rate 
also rise. To identify a crime pattern, it is essential to use an appropriate data mining technique, as 
superior data mining methods yield improved pattern outcomes, enabling us to manage the crime rate 
effectively. However, currently, the amount of data generated is extremely large, and traditional tools 
and techniques cannot manage the analysis of such vast and complex data. Thus, we require a robust 
instrument and method to manage significant volumes of data. This paper presents big data analytics 
through Pig and Hive, highlighting critical challenges that governments encounter in decision-making 
processes to lower crime rates. By analyzing extensive crime datasets with big data analytical tools, we 
can determine the crime rate categorized by year, district, and type of crime. Queries in Hive and scripts 
in Pig are run on the crime dataset. Considering factors such as execution duration and the count of 
map-reduce tasks, it has been analyzed that Hive is more efficient and superior to Pig. 

Downloads

Published

24-12-2025

Issue

Section

Articles