The dataset used in this work was obtained through open-source websites. The websites included were as follows: various official countries’ armies’ websites, different countries’ official army profiles on popular video streaming platforms, open-access gore websites, and news portals. The used dataset represents various military equipment and weapon systems used in modern combat. These weapon systems have different countries of origin and were recorded from different angles and in different conditions. Some weapon systems were recorded from the air by UAV, while others were recorded from the ground by humans. Moreover, the dataset has examples of maneuver forces’ vehicles such as tanks, infantry fighting vehicles, and armored personnel carriers, examples of air force vehicles such as transport or assault helicopters and transport or assault airplanes, and examples of engineering vehicles, anti-aircraft vehicles, and artillery vehicles. This research’s focus is weapon systems only, and therefore people were not included. Furthermore, a large portion of the collected videos are real combat footage, recorded mostly in the current conflict between the Russian Federation and Ukraine. Other videos were taken at various military exercises or expos and posted on the internet. Each video obtained through open source websites was annotated using the Dark label annotation tool [
27]. The dark label tool is utility software that can label and name the object bounding boxes in videos and photos. Additionally, it may be used to mosaic image regions, sample moving images, and crop videos. Handling this software is quite simple. First, the class names are defined in the darklabel.yml file. After that, the YOLO annotation format is selected, and the process of labeling images of the desired classes begins. The classes of interest for this research are as follows:
The dataset is divided into train/test/validation sets in the ratio of 70/20/10, respectively. As a result, 16,900 images form the training set, 4849 the test set, and 2429 the validation set.
Military Data Curation Process: Unveiling the Significance and Challenges of Comprehensive Datasets in a Strategic Context
The quality of the dataset is a key prerequisite for successful research in the field of artificial intelligence, and this especially applies to the development and training of algorithms. According to generally accepted standards, as much as 70% of the total effort in the process of training an AI algorithm is devoted to the collection, processing, and preparation of data. In order for research to achieve an acceptable level of precision and reliability, it is essential to have a high-quality dataset.
Figure 3 illustrates the arduous nature of the dataset collection process, comprising a series of sub-steps that demand considerable time investment. The compilation of a dataset to a level acceptable for annotation, let alone training AI algorithms, is a painstaking endeavor. This visual representation underscores the pivotal role of a well-curated dataset in the development of artificial intelligence. It highlights that research based on vaguely defined or low-quality data may yield unrealistic results and draw incorrect conclusions. Various methods, including surveys, camera recordings across different devices, and even oral transmission of information, can be employed in data collection. Diverse collection conditions, encompassing factors such as the environment, weather conditions, and context, can exert a substantial impact on research outcomes. Hence, it becomes crucial to meticulously account for these factors when curating a dataset.
As depicted in
Figure 3, the data acquisition process comprises two pivotal phases. The initial phase entails a comprehensive exploration of available videos and diverse image repositories accessible on the internet. The primary objective of this stage was to meticulously curate a set of high-caliber computer data that would undergo subsequent processing. The quest for data traversed various platforms, including YouTube, Google Images, Wikimedia Commons, and others.
Prior to commencing the analytical phase, meticulous consideration was devoted to adapting and scrutinizing all designated video formats. This involved the critical assessment of video quality, addressing challenges related to perspective constraints, accounting for temporal variables, and discerning potential manipulations and edits within the material. Subsequent to the successful compilation of a substantial dataset, comprising 101 videos averaging 180 s in length, each video comprising 60 frames per second (FPS) for a cumulative total of 1,090,800 images, a judicious analysis and evaluation process ensued. Each image underwent scrutiny as a prospective candidate for annotation, necessitating precise labeling and identification of pertinent information to ensure a nuanced and pertinent analysis in subsequent research endeavors. Following the implementation of solutions to potential challenges, a comprehensive review process was undertaken to assess all acquired frames. The primary objective was to ascertain the presence of objects exhibiting military characteristics, thereby enhancing the overall quality of the dataset and refining the definition of the target class of military objects. Should a specific frame fail to meet the pre-established criteria, signifying a departure from the defined conditions, it was systematically rejected from further consideration. This meticulous curation process ensured that only frames aligning with the specified criteria were retained, contributing to the precision and reliability of subsequent analyses and findings within the research context. Following the successful resolution of potential challenges, a meticulous examination of all acquired frames was initiated, with the objective of discerning objects exhibiting military characteristics. The primary objective of this procedure was to enhance the quality of the dataset and provide a more accurate delineation of the presence of the targeted class of military objects. Instances where the specified conditions were not met, where a particular frame failed to meet the predefined criteria, prompted its systematic exclusion. This systematic curation process was undertaken to uphold the consistency and high quality of the data utilized in the course of the research.
Upon the successful extraction and meticulous curation of images, the subsequent step involved annotating the images to facilitate the training of the YOLOv5 detection algorithm. In this phase, dual considerations were paramount. Firstly, the military perspective was taken into account, encompassing elements deemed significant from the standpoint of a soldier, lieutenant, and the like. Simultaneously, the YOLOv5 algorithm was loaded and subjected to rigorous testing to evaluate its performance under real-world conditions. A more intricate exposition of the data collection process, along with illustrative examples, is presented in
Figure 4,
Figure 5,
Figure 6,
Figure 7,
Figure 8 and
Figure 9.
From military operations planners’ and military commanders’ standpoint, the significance of this dataset lies in the variety of military equipment classes included. An algorithm trained with this particular dataset can help military commanders and military headquarters to have better situational awareness about opposition forces’ structure and operational capabilities, especially in intense battle rhythm operations, when quick information flow is essential. From a machine learning standpoint, the significance of this dataset lies in its pivotal role in enhancing the robustness and generalization of algorithms. The incorporation of luminance contrast diversifies the learning experience, enabling the algorithm to adeptly discern varying levels of luminance and thus bolstering its resilience to changes in lighting conditions. Furthermore, the dataset richness in colors and textures contributes to the capacity of the algorithm to generalize across diverse object types and backgrounds, as exemplified in
Figure 7 and
Figure 8. In terms of preventing overfitting, the dataset inclusion of different perspectives is instrumental in averting model specialization to particular positions or viewing angles. Moreover, the incorporation of varied recording conditions, such as distinct cameras and weather scenarios, serves as a safeguard against overfitting to a specific dataset, as demonstrated in
Figure 5 and
Figure 7. The dataset emphasis on increasing variation is evident through its incorporation of geographical and environmental diversity. This inclusion exposes the algorithm to different locations and environments, fostering an ability to adapt to various conditions, such as multiple entry with similar properties as shown on
Figure 9. Notably, the dataset captures scenarios where autonomous humans (AH) are situated outdoors, or within dense vegetation such as tall trees, or the elevated roofs of buildings. The dataset further contributes to the model’s versatility in solving various problems. The introduction of different object sizes and distances enables the model to develop proficiency in accurately detecting objects within diverse contexts. This is illustrated, for instance, in
Figure 4, where AHs exhibit scaling, with one AH appearing smaller in relation to the other. Additionally, the dataset encompasses variations in object positions within images, facilitating the model’s ability to recognize objects across different parts of an image. Lastly, the inclusion of diverse time periods in the dataset, encompassing night, day, snow, dust, smoke, and more, augments the model’s performance on real-world data as in
Figure 6. This is particularly relevant in the context of autonomous vehicles, where variations in driving conditions, including night, day, rain, and snow, contribute to preparing the model for a spectrum of real road situations.
As perceived from the perspective of ML engineers and AI algorithms, the presented dataset furnishes exemplary instances under diverse conditions. This dataset serves as a comprehensive evaluation ground, testing the algorithm’s performance capabilities in varied scenarios. Additionally, it caters to the specific needs of military operators, providing valuable insights tailored to their operational requirements.
The description of the dataset is finished with this part, and the detection algorithm used to solve the given problem is presented in the next subsection.