2.1. Data Preparation and Organization
The data for identifying HPAI quarantine zones are intricately organized by major categories closely related to HPAI outbreak factors and their respective sub-categories. The final structure consists of a total of 7 major categories and 13 sub-categories. Firstly,
Table 1 is composed of terrain items, including sub-categories of mountain ranges [
12] and proportion of river size [
13].
Table 2 addresses the status around the farm, with sub-categories including proximity to roads [
14], population density [
13], farm density [
14], farmland ratio [
12], and proximity to traditional markets. Lastly,
Table 3 consists of breeding information, epidemic information, weather information, epidemiological information, and ecological environmental information, with sub-categories including breeding types, distance from nearby farms subject to analysis [
15], temperature [
16], wind direction [
16], analysis farm occurrence history, and distance from migratory bird habitats [
14]. Ultimately, all major categories are termed under one large ’rule table’.
The composition of the sub-items in the rule table is divided into two main forms. The first includes items considering the correlation between the target farm and nearby farms, such as mountain ranges, distance to nearby farms, and wind direction. These three items can verify the influence of nearby farms on the target farm based on certain conditions. The other form calculates the conditions of the sub-items based on the target farm itself. Finally, scores are set for all sub-items of the rule table.
Once the overall score setting for the conditions of the sub-items in the rule table is completed, we collect raw data that correspond to the sub-items. For the collection of items such as mountain range, proportion of river size, road proximity, population density, and distance from bird arrival areas, we utilized data from [
17]. Additionally, information on farm density, farmland ratio, and proximity to traditional markets was based on data from [
18]. Data related to weather, such as temperature and wind direction, were collected through [
19]. Finally, information related to the farm, such as types of livestock breeding, distance to nearby farms under analysis, and outbreak history of the analysis farm, was provided through the relevant agency [
20].
Following the raw data collection phase, a preprocessing step is conducted to ensure that each farm’s data can be directly applied to the rule table. In this stage, the raw data are matched to each farm according to
Table 1,
Table 2 and
Table 3 of the entire rule table, so that each farm possesses the variables and values of the sub-items. However, for sub-items derived from interrelationships, such as distance to nearby farms, temperature, wind direction, and mountain ranges, or for weather information items that change daily, distance analysis and weather data processing are carried out using latitude and longitude coordinate values for each farm variable. Through these data configuration processes, various sub-items and rules necessary for identifying HPAI quarantine zones are accurately integrated and preprocessed for each farm, ultimately preparing them for final analysis.
2.2. Rule-Based Scoring
To identify the HPAI quarantine zones, the final evaluation score for all farms is calculated based on the scoring rule table. This scoring rule is structured to assign points to specific items that meet certain conditions according to the rules shown in
Table 1,
Table 2 and
Table 3.
The evaluation scoring method designates each chicken and duck farm across the country as an analysis target, represented by
a. The nearby farms within a 3000 m radius from the designated analysis farm
a are referred to as
b.
Figure 1 illustrates an example of the six processes for setting the range of these nearby farms when each farm becomes the benchmark for analysis. The evaluation score is derived by calculating the item scores according to specific rule items for these designated nearby farms. Equation (
1) represents the formula to determine the single evaluation score, where
denotes the evaluation score of farm
a in relation to its nearby farms
b.
m stands for the number of rule items, and
signifies the score corresponding to rule item
x.
If there are no other farms within a 3000 m radius, the evaluation score for the analysis farm is calculated considering only its surrounding environmental rules.
After evaluating all the chicken and duck farms nationwide, a single evaluation score for the nearby farms within a 3000 m radius can be derived, as shown in the example of
Figure 1.
Table 4 depicts each example from
Figure 1 in a table format and illustrates the calculation of the final evaluation score using the average value after the single evaluation scores have been derived. Equation (
2) explains the method to derive the final evaluation score. Here,
is the final evaluation score of the analysis target farm
a,
n is the number of nearby farms within a 3000 m radius centered on the analysis target farm
a,
k is the number of nearby farms determined as outliers, and
represents the single score between farm
a and the nearby farm
b.
2.3. Decision Model
SVM inherently possesses excellent generalization capabilities and is useful for building accurate and reliable classification models even with limited data [
21]. It is particularly specialized for binary and multi-classification, making it highly suitable for accurately classifying whether the final evaluation score of a farm is at a dangerous level.
First, during the training process, the final evaluation scores of farms nationwide that were analyzed in
Section 2.2 are combined with historical occurrence data, as depicted in
Figure 2. Farms with at least one past occurrence of HPAI are designated as Class 1, and farms with no such history are designated as Class 0 [
22]. The completed training dataset then uses the final evaluation score just before training as the feature variable and the class information regarding occurrence history as the target variable, and training is conducted.
Figure 3 illustrates the training process and the criteria for deriving the score.
However, an important challenge in this approach is the potential issue of data imbalance, given that historical data on HPAI occurrences in farms are not abundant [
23]. To address this challenge, we employ a strategy involving the adjustment of class-specific weights [
24]. Through iterative testing, we optimize these weights to balance the training process [
25]. SVM training is subsequently performed using a Gaussian kernel to improve the model’s ability to generalize from the training data to unseen instances. The model in
Figure 4a was trained by assigning equal weights to all classes, resulting in a relatively low baseline score. Consequently, the accuracy reached 85.19%. On the other hand, the
Figure 4b model was trained by applying optimal weights, achieving a high accuracy of 99.74%. The accuracy difference between these two models is 14.55%, confirming that weight adjustment has a significant impact on model training.
In our upcoming experiment details, we delve into a comparative analysis illustrating the effect of these strategic weight adjustments. We designate these weights as
w, representing the specific values assigned during the training phase. This notation aids in clearly distinguishing the contribution of each weight parameter to the model’s overall performance, highlighting the pivotal role of fine-tuning the balance between classes to enhance the prediction accuracy for HPAI occurrences [
26].
Upon completion of the training phase, we establish a criterion score, which is illustrated in
Figure 3, to serve as the risk threshold score for classification. Farms that have evaluation scores surpassing this criterion are labeled as “dangerous farms” that require immediate attention and potentially stringent measures. On the other hand, farms that score below this criterion are further segmented into two categories: “caution farms” and “safe farms”. This categorization is based on their average evaluation scores, allowing for a nuanced understanding of the risk levels and thereby enabling more targeted interventions.
2.4. Experiment Setup
In this research, we focus on HPAI issues affecting farms in the Jeollanam-do region of Korea. The temporal scope spans from 14 March 2014 to 7 April 2023, encapsulating 105 reported HPAI cases across 381 distinct farms. Designed as a bifurcated experimental inquiry, the research engages deeply with two different but related facets of the HPAI outbreak scenario.
The first experimental session adopts a micro view by examining the geographical cluster formed by each farm that has experienced an HPAI outbreak. When such a farm is identified, it becomes a point of criterion, and the surrounding farms within a 3000 m radius are closely examined. If any of these nearby farms report an HPAI case within a month from the date of the outbreak at the criterion farm, that cluster is treated as a single case for the analysis. This method allowed us to pinpoint 47 unique cases of HPAI spread in the region.
The second session shifts the lens to a macro view by analyzing the data year by year. The occurrence dates of the farms from the 47 cases of the first session are grouped by year, resulting in 8 separate case sessions corresponding to the years 2014, 2015, 2016, 2017, 2020, 2021, 2022, and 2023.
Then, the methodology remains consistent across both experimental sessions. The term “positive” is assigned to farms within the 3000 m radius of a subject farm if they also experienced an HPAI outbreak within one month of the incident at the subject farm. Those that do not meet these criteria are tagged as “negative”. Our ground-truth data are then formed based on these designations.
To evaluate various efficiencies for identifying high-risk farms, we compared four approaches using actual data. The first approach is the conventional method of culling by rule [
15], which classifies all neighboring farms as ’positive’. The remaining three approaches hinge on the final evaluation scores calculated through a rule engine to classify high-risk farms. These methods, distinct from one another, depend on how the weights, designated as
w, are adjusted. This adjustment is critical in deriving the criterion score that becomes instrumental in future risk assessments conducted via the rule engine.
The first of these, termed the
w = 1, maintains the status quo in learning, with no weight adjustment to offset class imbalance [
27], thereby not considering the ratio. This approach derives the risk criterion score based on the existing data distribution, without any regard for potential skewness between classes.
In what we have designated the
w = 485, we take a different tack. Here, the ratio between classes 0 and 1 is meticulously adjusted to attain parity [
28]. Through this method, the risk criterion score is derived by considering more nuanced factors, even if there is a pronounced imbalance within each class. This strategy allows for a more balanced view, potentially uncovering risks that a more lopsided approach might overlook.
The
w = 8.5 represents our most refined approach. This method involves learning with the most optimized weight [
29], determined through the painstaking process of fine-tuning the weight ratio. The risk criterion score in this model benefits from the most balanced and nuanced perspective, carefully honed through this optimization process.
Upon establishing these methodologies, we conducted a comprehensive comparison. We juxtaposed the actual data from individual farm sessions and annual sessions against the outcomes predicted by all four approaches.