1. Introduction
With the constant increase in global trade and commerce, maritime activity across the globe is on the rise. Shipping lanes are becoming more crowded, fishing vessels are traveling farther as fisheries deplete, and yachts are increasingly becoming common property among the wealthy [
1,
2,
3]. These increases in traffic create opportunities for illicit or dangerous activity to occur, such as illegal fishing, trafficking, piracy, infrastructure tampering, etc. [
4,
5,
6]. Organizations such as government enforcement agencies are very interested in preventing these activities from happening. Luckily, industry has responded to these desires in the form of commercially available, open-source geospatial intelligence. This increasing availability of information allows for heightened monitoring of ever-increasing global maritime activity. Imagery in the visible and infrared bands are available from vendors such as Planet or Maxar Technologies, or aggregate sources such as Google Earth. Synthetic aperture radar (SAR) imagery is available from companies such as ICEYE or Capella Space. Automatic Identification System (AIS) data, which acts as easily accessible telemetry and descriptive data that ships broadcast worldwide, is available from entities such as Spire or ORBCOMM. All of these forms of data allow for detection of illicit activity by tracking where ships are and what they are doing in correspondence to other vessels around them.
The data modalities provided by these vendors do not simply show illicit activity at sea. The data is raw and must be analyzed by algorithms or artificial intelligence (AI) to determine what exactly is occurring within it. Some companies do provide some form of analytics with their data when purchased [
7,
8], but it is often surface-level and is only meant to provide further context for analysts, algorithms, or AI to further discern what might be occurring within the data.
Bad actors at sea are obviously aware of these various forms of surveillance and observation being conducted by government agencies and commercial entities. Many bad actors will seek to avoid detection by visual or AIS means by concealing themselves. For example, in visual spectrum image data, the bad actor might choose to conduct their activity at night or during inclement weather. For AIS data, the bad actor might choose to spoof their reported location or not broadcast their AIS data at all. Many boats that do not meet the legal minimum requirements for AIS transmitters do not possess them at all. These types of actors are colloquially known as dark ships and are seeking simply not to be detected. That, however, does not mean all dark ships are bad. In some cases, their reasons for concealment could be accidental or completely benign. A cargo ship might be concealing its AIS transmissions when traversing a seaway known for high rates of piracy. A fishing vessel could be conducting fishing at night, since nighttime is when things such as squid are most easily caught [
9]. A lot of things happen at sea, and it should be up to the analysts and their AI algorithms to determine if they are bad or good behaviors.
Since AIS was internationally adopted as a standard safety measure in 2002 [
10], many publications have investigated using AIS information to detect dark ships [
11,
12]. The definition of dark ship that these publications use can vary from paper to paper, but the term typically describes ships that are failing to transmit AIS messages at a rate that falls within international maritime law compliance or are blatantly spoofing messages [
13,
14,
15]. What these studies lack, however, is any form of cross-referenced data that can corroborate their claimed observations based on the AIS data. AIS data only tells part of the story, so claims purely based on this single data modality can be considered weak or incomplete.
More recently, some other publications have begun looking into verifying ship positions and linking AIS data to specific ships found in satellite imagery [
16,
17,
18,
19,
20]. This process is known as ship-pairing, and often involves some sort of algorithmic approach to pairing AIS messages received at discrete points in time with satellite imagery taken at separate points in time. These studies perform the task of ship-pairing, but often do not dare to go much further beyond discerning any form of higher level of behavior from the information extracted. Studies such as [
16] do discuss some manually derived speculation about what certain dark ships found within the ship-pairing process might be doing, but none of it is obtained using algorithms or AI. Without any AI assistance when analyzing the high volumes of data in geospatial intelligence, an analyst’s job is still slow, tedious, and cumbersome. Having an AI that goes one step further than the ship-pairing process could help identify tangible behaviors or activity occurring within multi-modal information.
Image and AIS data are both fantastic starting points for fusing multiple forms of data together to form a more complete picture of what is occurring within maritime space. However, simple images and entries in an AIS data table do not tell the full picture of what might be happening in a specific area. External contextual info such as weather, tidal information, and air pollution can give dynamic information regarding the local situation about factors that might be influencing ship behavior. Likewise, more general geographic information such as location of nearby pipelines and undersea cables, exclusive economic zones and protected ecosystem boundaries, proximity to ports, and the AIS legal requirements of each nation can provide further context as to activities ships might be engaged in. Further fusing these forms of contextual info in with the image and AIS data modalities would help further bolster the behavior claims an AI could make for the ships it observes.
The work conducted in this paper proposes a framework that uses a combination of both traditional algorithms and AI to fuse multiple data modalities together, identify ships within that data, and classify their behavior. This framework, known as a “brain-like approach for tracking maritime activity and nuance,” or BATMAN, is one that uses a pair of neural networks to identify ships in imagery and then classify their behavior. In-between the networks, a sequence of preprocessing algorithms operates to identify ships before having the ships’ behavior classified by the second neural network (
Figure 1). The entire system is deployed to an Amazon Web Services (AWS) framework which can adequately handle the size of data expected in a geospatial intelligence problem space.
The work here will first describe the types of datasets that were utilized to train the neural networks used and to verify operation of the preprocessing algorithms. It will then discuss how each of the neural networks and algorithms operated at every stage of the processing pipeline. How the system was trained and deployed to AWS is then described, followed by the results obtained at each stage of the pipeline. A discussion section then explores what the results at each stage mean within BATMAN, and how these results impact the future design direction for the project. Finally, a conclusion summarizing results is included at the end of the work.
4. Discussion
4.1. Effects of False Negatives and False Positives
During testing, the question of the importance of false negatives (FN) vs. false positives (FP) was asked, in addition to the question of whether the network should be biased towards one or the other. If the case is considered where a FN is transmitting AIS data, the data would still be parsed to the classifier. Notwithstanding this, a heavy bias is now applied to the classifier to label the ship, which can only be detected from AIS transmissions, as possibly spoofing their location. In the case of FP, the classifier would get the inverse of this bias, now towards the ship’s perhaps being dark (i.e., not transmitting information). YOLO models have a harder time classifying smaller ships, and since some regulations allow for smaller vessels not to transmit AIS per SOLAS regulation [
26], a FN of a smaller vessel could bypass the incorrect classification of a dark ship.
Since natural conditions such as cloud coverage, weather, and darkness could perhaps shield a ship from being detected by remote electro-optical imagery, FN should be expected more often in cases of non-malicious behavior. Because of this trend, it is instead better to tune classifiers to accept more FN than FP, as other data modalities could still help classify a FN. In the case in which a prediction is very sure (i.e., a high confidence value), but there is no corresponding paired AIS message, then the classifier should be predisposed to report back that there is a ship that is spoofing their AIS.
When considering the issue of FN, the topic of small ships should certainly be mentioned whenever discussing ship detection. Whether in the image or the AIS domain, small ships are rather difficult to detect. A small ship in satellite imagery might be only a few pixels in size, and many small vessels do not possess AIS transmitters, since they are not legally required to do so. Ensuring that the detection algorithm is tuned to detect small ships is of the utmost importance, since they are the ones least likely to appear within AIS data. In future versions of BATMAN, more advanced image analysis could be utilized to detect small ships within imagery, such as ship wake detection. This type of analysis would allow for detection of ships that are even barely visible within an image for small boats in motion.
4.2. Ship-Pairing Dynamics
For all parameter choices, pairing is generally improved by increasing
Pmax. These improvements are either made in terms of an increased number of pairs or a decreased mean interval distance. Increasing
Pmax even up to millions of permutations increases computation time by no more than 40 s, or 10%. Moreover, for smaller values of
Tm (i.e., <10 km), the number of successful pairs is increased over greedy pairing. For a given time threshold, the number of paired ships is saturated beyond a threshold distance of 10 or more km (469 for 10,000 s, 452 for 1000 s and 441 for 1000 s), which can be achieved by both greedy and more sophisticated approaches. On the other hand, the mean interval distance of these pairings is slightly improved when searching over permutations. Below, the distances required for the number of greedy pairs drops faster than those for permutation pairs. However, when the number of pairs is increased by the permutational search, the mean distance increases by a greater amount than the decrease at high threshold distances. Dynamics of how pairing scales with varying
Ts and
Tm can be found in
Figure 7.
As a final note on ship-pairing, the algorithm currently attempts to optimize the pairing process via various distance calculations and averages across the identified ships in the two data modalities. It does not try and optimize to metrics such as ship size or class. As of now, the Ship Log records the ship class in accordance with the self-reported ship class within the AIS data. However, the ship image detection algorithm could be trained to identify all the ship classes found in AIS data and optimize ship-pairing in accordance with a ship’s class and size in addition to its location.
4.3. Behavioral Classification Dynamics
The two most significant training factors varied during training of the neural networks were learning rate scheduling and loss function. As noted in
Section 2.3.5, the custom loss function greatly outperformed mean-square error and mean absolute error, especially in classes with fewer labels. Further improvements to this loss function would include class weighting to more heavily weight classes with fewer examples. Additionally, a simple learning rate decay showed significant improvement over a static learning rate.
In general, the traditional approaches performed just as well, if not better, than the neural network approaches tested. In classes where performance was similar, the number of examples in those classes was high enough to the point where all algorithms were able to successfully generalize the behavior. In other cases where behaviors were less common, the neural network approaches failed to perform as well since they did not have as many samples to train against. However, in the cases of transshipment and wandering, the classes were so rare that no algorithm was able to successfully detect the behavior. These scenarios show that techniques such as synthetic data generation or using unsupervised learning schemes in future work could aid in boosting the performance of standard neural networks, such as the dense or convolutional networks. The traditional approaches could also benefit from a higher sample count of the rarer behaviors, which could be made available via synthetic data generation.
When looking at the results from
Figure 6 overall, at points, it is a little difficult to tell which algorithm is performing best. The recall, precision, and F1Mean values across all behaviors for each algorithm, were averaged, and the values obtained can be found in
Table 6. The top performer across each metric is in bold print.
Overall, ERT performed best on average across all three key metrics. However, as can be seen in
Figure 6, there are behaviors where algorithms such as GBT clearly surpass ERT (e.g., offshore loitering). In future versions of BATMAN, an ensemble approach to classifying ship behavior could be considered in which each algorithm is queried with their responses weighted with respect to the reliability of that specific algorithm for the specific behavior. It should also be noted that the algorithm performance shown here could be influenced by how the labeled data was generated for training. If training data is generated differently in future work, the performance of each approach could shift in either a positive or negative direction. Since the data for this work was generated in a rather structured manner via tagging entries in the Ship Log, the structure could be influencing the high performance of the traditional approaches (i.e., CART, ERT, and GBT). If future data is less separable in nature, the neural network approaches could begin to surpass the performance of CART, ERT, and GBT.
Further improvements in the design of the neural networks would look towards architectural improvements and better normalization processes. As noted in
Section 2.3.5, Tabformer and Tabnet are two of the leading architectures for classification of tabular data; however, other sequence modeling or transformer architectures could also provide improvements over the current approaches. To improve normalization processes, additional preprocessing steps could be taken, such as creating more intelligent ways for removing obviously corrupt data, creating additional features for values outside given ranges (e.g., speed > 50 m/s), or incorporating other encoding methods to the preprocessing pipeline. Results shown in
Figure 6 demonstrate that there is room for improvement across both neural network and traditional approaches, and neural networks such as Tabformer or Tabnet could be helpful in closing the gap on detect rarer behaviors within the Ship Log.
A key conclusion drawn from the behavioral classification process was that most of the behaviors studied were fairly simple to detect to at least some degree. However, the true challenge in a problem space such as classifying ship behavior at a broad scale lies within properly fusing data together that can be digested by a classifier, and then properly labeling that data to reflect real world scenarios. In the field, these algorithms could detect ship behavior using different modalities of data, but if the system remains fixed in the type of data it intakes and how it learns from it, malicious actors at sea could adjust their behavior to further avoid detection. For example, if the ship detection algorithms used against EO imagery have difficulties detecting small ships, and all the criminals using large ships are caught with BATMAN, a survivorship bias could occur where criminals using small vessels are the only ones remaining and are left to flourish. Likewise, if the behavioral classifier is fixed on attempting to classify specific behaviors by specific routes ships run when at sea, but then dynamics of the routes change, those behaviors would either be misidentified or not identified at all. For the long term, it would be critical for a framework such as BATMAN to adopt a form of lifelong learning scheme [
80] to its weights where they are constantly considering new data.
Lastly, during behavioral classification, the contextual data most likely plays a minor supplementary role in helping to identify specific ship behavior. Information such as weather data might help the algorithms isolate activity, such as the various forms of loitering or being docked, which could possibly increase in occurrence during times of inclement weather, when seas are rougher. Although this work did not specifically analyze how significant a role contextual data played in behavior classification success, future work aims to do so.
5. Conclusions
This work has shown and demonstrated a multi-modal data fusion pipeline that is able to identify ships and classify their behavior. This system, named BATMAN, can fuse satellite image data and AIS data together to identify ships through a ship-pairing process. The ship-pairing process was able to identify 78% of ships present within the images and AIS data (the other 22% could be considered “dark”). These ships can then be classified by ten different behaviors that provide clarity as to their actions. In the cases in which sufficient examples were present within the data for each behavior, the classifier was able to recognize them with high recall and precision. All of this analysis was conducted using an interconnected setup on Amazon Web Services. To the authors’ knowledge, this is the first time a framework to classify maritime activity in such a comprehensive manner has ever been publicly documented. BATMAN and its future iterations could serve as excellent aides for maritime analysts in the public, private, and academic sectors.
Although not comprehensive in its behavioral classification capability, BATMAN serves as a solid foundation with which to begin classifying ship behavior at a more holistic level. In the data modality domain, BATMAN can already cross-validate the existence of ships across the image and AIS domain. Other modalities such as acoustics or radio communications data could greatly enhance BATMAN in the future to allow it to detect ships that might be difficult to detect in the image and AIS domains. In the behavioral classification domain, generating data that is more dynamic and possesses a wider range of behaviors is critical to ensuring that BATMAN can fully understand the maritime environment and all possible scenarios. Both corporations and government organizations that have vested interests in the maritime domain should begin thinking about ship activity at the comprehensive level that BATMAN does. By providing more context to ships’ activities, BATMAN could make the world’s oceans safer places for all seafarers.