**1. Introduction**

Today, with the enhancement of computer performance and data analysis techniques, it has become possible to process large amounts of data with ease. Pre-processed big data is used for prediction and analysis in various fields, such as stock price predictions [1,2] and financial analysis [3,4], using machine learning or artificial neural networks. In the field of crime prediction, various studies related to online crime detection [5] and the identification of crime hotspots [6] are being actively conducted.

Many governmen<sup>t</sup> authorities across the world are already making efforts to prevent crime by applying crime prediction systems. In the case of PredPol, a crime prediction system for the Santa Cruz Police Department in the United States, the number of breaking and entering cases dropped by 27% from July 2010 to July 2011, when the system was in place, and fell by 25–29% in June and July 2013 compared to the same months of the previous year, which demonstrated the consistency in its effect. In Korea, GeoPros and CLUE are being used as part of the Smart City initiative. As a result of the GeoPros pilot run in 2013, robbery cases declined by 44.4%, while rape and theft decreased by 22.1% and 13.1%, respectively. Meanwhile, CLUE provides similar cases and investigation clues based on police investigation records, as well as crime prediction. Other crime prediction systems such as HunchLab and COMPStat, used in the Miami and New York Police Departments, are also contributing meaningfully to crime reduction, with reliable results.

In order to effectively carry out crime prevention activities through crime prediction, it is important to accurately set the prediction range, as well as making precise predictions, so that crime prevention resources such as CCTVs and police personnel can be properly allocated. Recently, researchers have actively studied machine learning-based crime prediction using grids as the units of analysis. In this regard, because grids have a uniform shape

**Citation:** Kim, D.; Jung, S.; Jeong, Y. Theft Prediction Model Based on Spatial Clustering to Reflect Spatial Characteristics of Adjacent Lands. *Sustainability* **2021**, *13*, 7715. https:// doi.org/10.3390/su13147715

Academic Editors: Pierfrancesco De Paola, Francesco Tajani, Marco Locurcio and Felicia Di Liddo

Received: 17 June 2021 Accepted: 8 July 2021 Published: 10 July 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

and size compared to administrative districts or census output areas, statistical information can be objectively examined. Moreover, because grids can be flexibly applied to changes in map scale, microscopic analysis is also possible. Yu et al. [7] predicted residential burglaries by training an algorithm using the crime records of each cell, based on the crime's spatiotemporal concentration characteristics. To reflect the effects of crimes in an adjacent land and the physical environment during training, Lin et al. [8] predicted vehicle theft crimes by using 84 types of landmark data through Google API, along with crime information from adjacent cells for learning. Here, in the landmark data of Google API, the purposes and addresses of establishments such as schools, pubs, and restaurants are indicated, and these were used to reflect the geographical characteristics of the study site. As an extension of earlier research, the purpose of this study is to develop a crime prediction model that reflects the influence of surrounding areas and geographic characteristics on crime.

An actual land is continuous, rather than independently divided like a grid, in a geographic information system (GIS), and when a crime occurs, the risk of crime in the adjacent areas increases [9–11]. According to studies analyzing the relationship between spatial characteristics and crime, environmental factors such as patterns, establishments, and land use were found to be different, depending on the detailed modus operandi of crimes [12–15]. Even in the case of the same type of crime, the related factors were shown to be different depending on the detailed modus operandi [16]. Therefore, in micro-scale studies using a grid, there is concern regarding how to reflect the characteristics of the adjacent land. When training with crime information, this issue can be solved by combining all the cells adjacent to the point to be predicted and using it in the training. However, this solution requires a focus on a specific method of crime, and if the cell to be predicted and its adjacent cells vary greatly in spatial characteristics, the training may be negatively affected. Accordingly, to distinguish cells with similar spatial characteristics for use in training, this study proposes a crime prediction method that applies spatial clustering as shown in Figure 1. To perform spatial clustering, high weights are assigned between geographically adjacent cells, according to the distance between each instance in the vector space. Thus, cells adjacent to the cell to be analyzed and with similar attributes are included in the same cluster and, rather than training all the adjacent cells, only the cells with similar spatial characteristics can be used for training.


