Next Article in Journal
Improving Tomato Fruit Volatiles through Organic Instead of Inorganic Nutrient Solution by Precision Fertilization
Previous Article in Journal
A New Method for Detecting Weld Stability Based on Color Digital Holography
Previous Article in Special Issue
Adapting the Segment Anything Model for Volumetric X-ray Data-Sets of Arbitrary Sizes
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

AI-Enabled Animal Behavior Analysis with High Usability: A Case Study on Open-Field Experiments

1
Software College, Northeastern University, Shenyang 112000, China
2
Technology Strategy and Development Department, Neusoft Group, Shenyang 110002, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2024, 14(11), 4583; https://doi.org/10.3390/app14114583
Submission received: 10 April 2024 / Revised: 21 May 2024 / Accepted: 23 May 2024 / Published: 27 May 2024
(This article belongs to the Special Issue Digital Image Processing: Advanced Technologies and Applications)

Abstract

:

Featured Application

In this study, we designed a highly available animal behavior analysis platform that can help researchers significantly improve their work efficiency. In addition, the platform has good flexibility, scalability, and human-machine interaction. Researchers can easily configure and use the platform for behavioral observation experiments with minimal learning costs.

Abstract

In recent years, with the rapid development of medicine, pathology, toxicology, and neuroscience technology, animal behavior research has become essential in modern life science research. However, the current mainstream commercial animal behavior recognition tools only provide a single behavior recognition method, limiting the expansion of algorithms and how researchers interact with experimental data. To address this issue, we propose an AI-enabled, highly usable platform for analyzing experimental animal behavior, which aims to provide better flexibility, scalability, and interactivity to make the platform more usable. Researchers can flexibly select or extend different behavior recognition algorithms for automated recognition of animal behaviors or experience more convenient human-computer interaction through natural language descriptions only. A case study at a medical laboratory where the platform was used to evaluate behavioral differences between sick and healthy animals demonstrated the high usability of the platform.

1. Introduction

Animal behavior is the body language by which an animal expresses its psychological and physiological state and its overall function. Typical model animals, such as mice, rabbits, and goats, are widely used to analyze different behaviors in the open field to measure the effectiveness of experiments in biology, toxicology, neuroscience, pharmacology, animal husbandry, and genetics [1,2,3].
Due to the advancement of embedded technology and automation technology, animal behavior recognition has progressed rapidly. For example, Arablouei et al. [4] utilize embedded devices and corresponding behavior recognition methods for efficient behavior recognition of livestock, such as cows. Roughan et al. [5] proposd automation technology to predict the behavioral changes of mice undergoing surgery and observe the effects of painkillers. With the development of deep learning techniques, the performance of animal behavior recognition has been significantly improved. Natarajan et al. [6] achieve high-accuracy detection of wild-animal behavior using deep learning models. In order to ensure real-time performance, Fuentes et al. [7] proposed a behavior recognition algorithm for cattle based on a spatial-and-temporal information framework. Despite the success of these studies in specific scenarios, these studies generally need to rely on commercialized tools to provide convenient human–computer interaction and data retrieval.
Although commercialized tools can simplify human–computer interaction and reduce labor costs for behavior recognition, they still need more flexibility, scalability, and interactivity. Specifically, these tools usually can only detect specific behaviors, making it difficult to adapt to different experimental needs. In addition, designing the underlying algorithms of commercialized tools is often scenario-specific. It cannot generate intermediate results such as single-frame pose data, limiting its application scope. The limitations of hardware and architecture also make it difficult for these tools to change or expand the underlying algorithms. It affects the accuracy and speed of the experimental results. Finally, commercialized tools have a limited scope of human-computer interaction for data retrieval and analysis, and researchers need to resort to other specialized tools or programming languages, such as structured query language (SQL) or Python, which increases the learning cost and reduces the usability of the tools. Therefore, designing and implementing a laboratory animal behavior analysis platform that can efficiently identify animal behavioral actions, effectively manage animal behavioral data, support changes in the underlying algorithms, and provide convenient human-computer interaction capabilities is of great significance in reducing the workload of related researchers.
In order to compensate for the shortcomings of existing tools and to design a highly usable platform for animal behavior analysis, we established three goals that the platform should achieve:
  • The platform should be flexible enough to support researchers in selecting different behavior recognition methods and behavior detection categories.
  • The platform should be scalable to support researchers in upgrading or expanding the underlying algorithms.
  • The platform should have flexible and convenient interactivity so that researchers can use the platform’s preset human-computer interaction functions when using experimental data systems or commercial tools for data querying and statistical analysis or use more adaptive interaction methods to meet the changing functional needs of researchers.
When designing the platform, we integrated various architecture design methods, such as microservices and plug-in design, to support researchers in flexibly configuring the detection methods and behavioral categories and having good flexibility and scalability. In order to improve the interactivity of the platform and enable researchers to retrieve and analyze data more flexibly, we have introduced natural language processing algorithms into the platform, through which we analyze the intent of the user’s natural language query and convert the command to generate database execution statements that can be executed. Although the fusion architecture and natural language processing technology can bring higher usability to the platform, the effective integration, replacement, extension, and management of different behavior recognition algorithms, the adaptability of natural language processing algorithms in the field of animal experiments, and the cross-language problem of natural language processing algorithms still pose significant challenges to the implementation of the platform.
We propose a high-availability animal behavior analysis platform that combines good architectural design practices and natural language processing techniques, and Figure 1 shows the overall architecture of the platform. We build the platform ecosystem as a hybrid architecture, where behavior recognition services can be flexibly configured or extended to efficiently recognize multiple behavioral categories, including fine-grained movements, and produce intermediate results that meet specific experimental requirements. The platform is highly integrated with multiple business modules, which can automatically identify the behavioral actions based on the input data and store the identified behavioral data information directly in the database without manual recording or inputting information, thus effectively reducing the labor cost. The platform provides natural language query interface services. In addition to predefined platform functions, it can also use natural language descriptions for more flexible data retrieval and analysis.
In summary, our main contributions are as follows:
  • We have developed an AI-enabled, highly available platform that centralizes necessary functions for researchers, streamlining their workflow, reducing costs, and enhancing efficiency.
  • We have enhanced the platform’s architecture with multiple design patterns to boost its flexibility and scalability, allowing for easy selection and extension of various algorithms and integration of posture estimation and behavior recognition for diverse experimental needs.
  • We have incorporated natural language processing to improve user interaction, eliminating the need for additional programming or complex database operations for data analysis.
  • We have validated the platform’s effectiveness through a case study on UBE3A gene deletion, highlighting its practical utility in real-world scenarios.
The paper is organized as follows: Section 2 presents the related work. Section 3 describes the overall system architecture. Section 4 outlines methods to improve the usability of behavior recognition. Section 5 outlines methods to improve the usability of human-computer interaction. Section 6 discusses the case study. Section 7 provides the conclusion.

2. Related Work

In recent years, the field of animal behavior analysis has made remarkable progress because of the application of commercial tools and advanced behavior recognition algorithms. These techniques not only improve research efficiency but also provide strong support for animal welfare and disease research.
Commercial tools. EthoWatcher is an open-source software designed to record and analyze animal behavior [8]. It can process video files and offers rich features to label and quantify animal movement information. EthoWatcher provides a user-friendly interface for various experimental setups, which makes it easy for researchers to perform behavioral analyses. The ToxTrac software utilizes a second-order Kalman filter to estimate a detected object’s trajectory and can fuse existing trajectory segments to generate a complete trajectory [9]. ToxTrac also provides various tools and features for analyzing animal behavior, such as path length, average velocity, and dwell time. These features make ToxTrac a powerful tool in animal behavior research. ANY-maze is an animal behavior analysis system developed by Stoelting, Inc., Kiel, WI, USA [10]. By marking a point on the back of a mouse, ANY-maze can calculate the distance the mouse moves in the open field, thus determining the mouse’s locomotor ability. Although various parameters can be generated automatically, statistical analysis software such as GraphPad is required to analyze the differences between disease and normal mice [11].
Animal behavior recognition algorithm. With the development of machine learning and deep learning technology, many scholars have started to apply methods based on machine learning and deep learning to animal behavior recognition, and they have achieved good results. Fang et al. [12] proposed an animal behavior classification method based on six features (keypoint location, depression, skeleton, shape feature, skeleton angle, and elongation) and a naive Bayes model (NBM), which can effectively identify and classify the daily behaviors of animals. In order to improve detection accuracy, Nasiri et al. [13] fully utilized the advantage of long short-term memory (LSTM) in processing time series data. They accurately assessed the lameness status of broilers by successively extracting keypoints into the LSTM model and classifying the lameness degree of broilers according to the six-point assessment method. To further improve detection accuracy Lin et al. [14] first estimated bird keypoints using HRNet to generate global and local features [15]. After that, the excitation region was localized by keypoint clustering. Finally, bird behavior recognition achieved significant results by combining ResNet [16]. In order to compensate for the lack of single morphological features, Li et al. [17] fused multi-features to realize efficient lameness classification, including red–green–blue (RGB), optical flow, and skeleton. They utilized VGG-19 to extract skeleton joint point features and analyze spatiotemporal features by ST-GCN [18,19]. Chen et al. [20] ameliorated the deep learning method for pig aggression behavior recognition, using video data as input and extracting temporal and spatial features based on VGG-16 and LSTM models [18]. Its recognition accuracy reached 98.4% and significantly improves prediction efficiency. In order to meet the demand for real-time monitoring in the production environment, Zhang et al. [21] designed a real-time sow behavior detection algorithm (SBDA-DL), based on MobileNet and a single-shot multi-box detector (SSD). They trained and predicted sow behaviors, including watering, urinating, and crawling, and obtained satisfactory results. Moreover, several methods focus on behavioral recognition detection in complex wild environments. For example, Schindler et al. [22] used an infrared camera to capture the activity of deer, wild boar, fox, and hare in the wild environment. It can recognize the feeding, moving, and gaze behaviors based on the ResNet variant and SlowFast framework [23].
Table 1 compares our platform with other results. Our platform has flexible architecture and algorithm service management. It can quickly adapt to different behavior recognition requirements and support the training of proprietary models. Our platform incorporates natural language processing for a natural language query interface. It improves data retrieval and analysis processes and reduces reliance on traditional statistical tools.

3. Overall System Architecture

The software system we designed is an animal behavior analysis platform that can be deployed in medical laboratories. It automatically acquires the behavioral information of animals in open-field videos and generates corresponding data reports to be stored in a database. The platform allows researchers to flexibly select existing behavioral recognition algorithms or extend new ones into the platform for use according to different experimental needs. In addition, the platform utilizes an integrated natural language query interface service to provide high usability, as researchers are no longer limited by preset functions when conducting data retrieval or analysis. We selected relational databases for the platform because the relational database can effectively manage and maintain relationships between data, ensure data integrity and consistency, and support complex query operations in SQL language. We used a model–template–view (MTV) pattern similar to the model–view–controller (MVC) pattern, which can effectively separate data, business logic, and user interface. The pattern can improve the maintainability and scalability of the system and make the platform easier to develop and maintain. The model layer handles the application’s data logic and database interaction. The template layer is responsible for building the structure and style of the page, typically using hypertext markup language (HTML) and template languages. As a traditional controller, the view layer receives and processes user requests and passes the model data to the template for display. In addition, we selected JavaScript object notation (JSON) as the structural form for data exchange to achieve lightweight data transmission and parsing, which can improve system performance and efficiency. We developed the algorithm library using Python. We used Flask, Pytorch frameworks, and open-source libraries (such as OpenCV-python, Numpy, Scikit-video, and others) in the development project. We deployed the platform, pose estimation service, behavior recognition service, and natural language query interface service on four identical devices. Each device adopted the Ubuntu 18.04 system with a CPU model of 2vCPU Intel (R) Xeon (R) Platinum 8352V, Intel Corporation and NVIDIA Corporation, Santa Clara, California, USA and had 90 GB of memory. The device that was deployed for the pose estimation, behavior recognition, or natural language query interface service had anRTX4090 (24 GB) GPU, Santa Clara, California, USA. Figure 2 shows the overall architecture of the platform, where the gray part indicates the main service modules, and Table 2 describes the relevant information of each module.
The platform has two working modes—training and inference—to adapt to different practical use cases. In the training mode, researchers first need to input different training information, including animal category, number of key points, number of behavioral categories, etc., through the front-end page according to the different algorithms selected, then import and label the training data. Specifically, the labeled key point data will be used to train within the pose estimation model, while the labeled behavioral category data will be used to train the behavioral recognition model. Once the training starts, the training information and progress will be fed back to the researcher in real time through the front-end page. In the inference mode, the video capture and sending module will first send the captured video data to the receiving and preprocessing module on the server side. Then, after the preprocessing module processes the video data, there will be two different kinds of subsequent processing depending on the requirements. One is to input the preprocessed data into the pose estimation module, and the obtained pose estimation result will be temporarily stored as an intermediate result at the service end. Then, the posture estimation results are input into the behavior recognition model to generate specific behavioral category information, which is then stored in the database. The other is to directly input the preprocessed data into the behavior recognition model for analysis and keep the analysis results. Once reasoning is complete, researchers can describe their data retrieval or analysis needs in natural language on the front-end page. These requirements are sent, received, processed, and then fed into the natural speech query interface module, which converts the requirements into commands understandable by the platform and executes them. Finally, the query results are displayed to the researchers through the front-end page.

4. High Availability of Animal Behavior Recognition

Automatic recognition of animal behavior is the core function of the platform. It is of great significance for medical animal behavior experiments to be able to select behavior recognition methods flexibly and effectively recognize the behavior of animals in experimental videos. In order to improve the usability of the platform and enable it to flexibly select and extend the behavior recognition algorithm according to the actual needs of researchers, we integrated the design idea of plug-in management, unified interface, and algorithm library when designing the platform architecture.
Plugin design is a software architecture design pattern that modularizes the functionality of software, and each module can be developed and used as a separate plugin. In the laboratory animal behavior analysis platform, we design each behavior recognition algorithm as a plugin and deploy it locally. Each plugin contains all the code and resources to implement a certain behavior recognition algorithm. These behavior recognition algorithm plugins can be developed and tested independently of the platform and only need to follow certain interface specifications and data formats. At runtime, the platform can dynamically load and unload plugins to execute the algorithms in the plugins. To realize this design, we define a local plugin interface and require all plugins to implement this interface. This interface includes the initialization of the plugin, setting and obtaining parameters, executing analysis, obtaining results, and other operations. The plugin design allows the platform to have higher flexibility and extensibility. Researchers can choose and combine plugins according to their needs and even develop their own plugins.
Although plug-in local deployment can bring good response speed, local deployment is sometimes limited by hardware performance. Therefore, in addition to supporting plug-in algorithm access for local deployment, the platform also supports access to non-locally deployed behavior recognition algorithm services in the form of a unified interface. The platform requires all behavior recognition algorithms to have a unified remote web interface. We define the basic operations of this interface about behavior recognition algorithms, such as initialization, setting parameters, executing analysis, obtaining results, and so on. The unified network interface approach allows the platform to use behavior recognition services deployed on other computing resources. In addition, the unified interface service removes the platform’s focus on the specific details of the service and is not affected by upgrades or replacements of the algorithms.
An algorithm library is a software library that stores and manages algorithms. The platform stores all available behavior recognition algorithm plug-ins and remote service interfaces in an algorithm library. The algorithm library contains service resources for all available behavior recognition algorithms. Researchers can use the search and filter functions to select appropriate algorithms for behavioral analysis experiments. To realize this design, we define the structure of an algorithm library containing storage paths of algorithms, service interface paths, metadata formats, etc. The platform provides a module to manage the algorithm library, which contains functions such as adding, deleting, searching, and loading algorithms. The application of the algorithm library enables the platform to centrally manage decentralized algorithm plug-ins and remote services, which is convenient for researchers to find and select and improves the usability of the platform.
Unlike commercial tools, the platform uses a more flexible architecture that allows it to change different behavior recognition methods according to user needs, such as skeleton keypoint-based methods, optical flow information-based methods, depth image-based methods, and appearance contour-based methods. Among them, the skeleton keypoint-based method needs to rely on a posture estimation algorithm to obtain information about the key points of the animal’s skeleton. The timing information of these key points can reliably describe the subtle changes in the animal’s posture and serve as the basic data for analyzing other motion indicators. The behavior recognition algorithm based on skeleton key points can identify specific categories of behaviors based on the key point sequence information. Combining posture estimation with key point-based behavior recognition algorithms satisfies the reliability of behavior recognition in animal experiments and increases the flexibility of adapting to different experimental needs. The key points must be accurately mapped onto the animal limbs to recognize the animal-generated pose data during the experiment. The accuracy of the key point information of experimental animals has an extremely important impact on subsequent behavior recognition and other experimental tasks.
The platform currently provides two pose estimation algorithm plug-ins, one of which is the DeepLabCut pose estimation algorithm, which combines target detection, target tracking, and semantic segmentation algorithms to accurately locate key points on the limbs of experimental animals without the need for labeling [24]. DeepLabCut reduces the computational cost of the pose estimation algorithm by transforming the complex pose estimation task into key point detection and tracking, significantly reducing the computational cost. The network architecture of DeepLabCut is based on a convolutional neural network, as shown in Figure 3. DeepLabCut consists of the following main components: a feature extraction layer for extracting features from the input image, a fully connected layer for key point regression, a loss function that measures the difference between the predicted key point position and the actual calibrated position, and the optimizer that adjusts the network parameters to minimize the prediction error. Overall, the network structure of DeepLabCut is a convolutional neural network based on a backbone network such as ResNet, which implements key point localization and tracking through a feature extraction layer and a key point regression layer.
Another plug-in for various pose estimation algorithms provided by the platform is YOLOX-Pose [25], an algorithm for multi-person pose estimation based on the popular YOLO target detection framework [26]. The algorithm combines the advantages of top-down and bottom-up approaches by simultaneously detecting the bounding boxes and corresponding 2D poses of multiple people through a forward propagation process. Unlike traditional heatmap-based two-stage approaches, YOLO-Pose is end-to-end trainable and optimized to evaluate the metric of object key point similarity (OKS) instead of using L1 loss as a proxy for training. In addition, YOLO-Pose does not require the post-processing step of the underlying method to group the detected key points into skeletons, as each bounding box has an associated pose, which enables the natural grouping of key points. YOLO-Pose achieved new optimal results on the COCO validation set and the test set (90.2% AP50 and 90.3% AP50) and outperform all existing key points in a single forward propagation process, outperforming all existing bottom-up methods without the need for flip tests, multi-scale tests, or other test time enhancements. While the original YOLO pose implements single-shot pose estimation based on the YOLOv5 target detection framework, the platform extends it based on the better-performing YOLOX framework. It provides network structures with different parameter scales such as YOLOX-tiny-Pose, YOLOX-s-Pose, YOLOX-m Pose, YOLOX-s-Pose, YOLOX-m-Pose, and YOLOX-l-Pose. The network structures with different parameter scales can meet the performance constraints of different hardware resources.
Behavior recognition based on skeleton keypoints is usually done in two ways: one is based on keypoint coordinate information by manual design of matching rules for behavioral categories, such as linear motion behaviors that can be matched by the linear change of the coordinates of the center point of the animal’s body between consecutive frames; the other method based on deep learning by the model autonomously learns the keypoint change characteristics of different behavioral categories.
The platform provides a behavior recognition algorithm plug-in based on skeleton key points, which selects the ST-GCN network as the underlying algorithm, as shown in Figure 4. For the first time, this network combines the graph convolution operation, which captures spatial dimensional information, and the temporal convolution operation, which captures temporal dimensional information, to form spatiotemporal convolution modules. These modules can extract high-level features of skeleton graph sequences through multiple layers. The ST-GCN network mainly consists of nine layers of basic modules, with the output channels of the first three layers being 64, the middle three layers being 128, and the output channels of the last three layers being 256. In addition, the size of the temporal convolution kernel of each layer is 9. To reduce the feature loss of the network during feature extraction and to improve the feature extraction capability of the model, residual concatenation is used in each base unit to realize the cross-region feature fusion. Meanwhile, to avoid overfitting during the training process and improve the robustness of the model, a dropout layer is added to each base unit. After these processes, the feature vectors generated from the skeleton sequences will finally be fed into the SoftMax classifier for behavioral action classification.
Another behavior recognition algorithm plug-in provided by the platform is the SlowFast algorithm based on optical flow features. The algorithm uses hand-designed optical flow features to characterize the movement information of the target between two frames. SlowFast is biologically inspired by a two-pathway structural model, Slow Pathway and Fast Pathway, concerning the characteristics of P-cells, which are used to capture spatial information, and M-cells, which are used to capture fast-moving information, in the retinal cells of primates. Slow Pathway is used to capture spatial semantic information reflected by sparse frames, and it uses a very low frame frequency; Fast Pathway is used to capture rapidly changing running information, and it uses a very high frame frequency. In addition, Slow Pathway has a larger model volume, like 80% P-cells; Fast Pathway is lightweight, like 20% M-cells. In the middle of the two pathways is a Fast to Slow passthrough connection, i.e., the fusion of motion information to spatial semantics. Finally, the two-pathway information is fused for classification.

5. High Availability of Human–Computer Interaction

With the changing needs of animal behavior experiments, commercial tools or independent experimental data management systems have gradually become unable to meet researchers’ data retrieval and analysis needs. The main reason is that the interactivity of commercial tools or independent experimental data management systems could be better, and there are problems such as limited query syntax, pre-written queries, lack of context understanding, and strict format requirements. In order to improve the usability of human–computer interaction in data retrieval and analysis, the platform combines text-to-SQL-related algorithm models into natural language query interface services in the form of plug-ins and integrates them into the platform.
Natural language query interface service can bring much convenience to the human-computer interaction of the platform. First, the natural language query interface service can provide researchers with more free query methods. Traditional database queries need to write SQL statements according to specific syntax and structure, which limits users’ query methods. Researchers can use their familiar vocabulary and expressions for querying without being restricted to a specific query syntax. Secondly, the natural language query interface service can provide dynamic query functions. Researchers can adjust the query conditions according to real-time needs without writing fixed SQL statements in advance. For example, the user can say, “Show the experimental mice that have been assisted to stand more than five times in the past three days”; the natural language query interface service can understand the user’s intent and generate and execute the corresponding SQL statement.
The natural language query interface service makes human–computer interaction more flexible and natural through free querying methods, dynamic querying, and contextual understanding. Users can query in a way they are familiar with and make flexible adjustments according to real-time needs, thus improving the flexibility and adaptability of the interaction. In addition, the natural language query interface service brings significant advantages to human–computer interaction by improving query accuracy, lowering the use threshold, and providing a better user experience.
Currently, the platform provides two algorithms as natural language query interface services for researchers: namely, RAT-SQL and RYANSQL [27,28]. RAT-SQL encodes schema links and table structures based on Transformer by adding a relation-aware self-attention mechanism; Figure 5 illustrates the model structure of RAT-SQL. RAT-SQL transforms a database schema into a directed graph G q , describes known relationships by adding biases, and encodes associations between natural language questions and database schemas using name-based and value-based strategies. Eventually, these encoding results are fed into a tree Decoder and decoded according to the syntax rules of SQL to generate SQL statements. Due to the cross-linguistic issues, the platform also incorporates a cross-linguistic common sense knowledge graph and a cross-domain common sense knowledge graph (ConceptNet) [29] into the schema-concatenation phase of RAT-SQL, which results in improved accuracy of RAT-SQL execution for medical animal experiment information retrieval.
RYANSQL model mainly uses a sketch-based slot-filling method and marks the complex structure of SQL statements by SQL statement position code (SPC). RYANSQL divides the generation of SQL statements into two phases: sketch generation and slot filling. For nested statements, RYANSQL first splits the SQL statement into non-nested SELECT statement blocks and represents the relationship between the blocks by SPC. Then, the final SQL statement is generated by recursively predicting the SPC and the corresponding SELECT statement blocks.

6. Application Case

The platform was deployed in a medical laboratory in Liaoning Province, China, with a wide variety of experimental animals and sufficient experimental video resources. In this facility, we chose “A study on the pathological mechanism of motor defects due to UBE3A gene deletion” as a validation case for the platform. Figure 6 shows the detailed process of this case study and the external environmental dependencies. In this study, the researchers used AS mice to simulate clinical Angelman Syndrome patients, observed the behavioral differences between the disease model mice and normal mice, and investigated the specific locomotor differences by combining postural keypoint locomotor information with calcium signaling information. Since postural and behavioral data were needed for this case, the researchers selected a skeleton keypoint-based behavioral recognition method. In deploying and using the platform, we instructed the researchers to train the corresponding DeepLabCut posture estimation model and ST-GCN behavior recognition model according to the actual needs. We retrained the RAT-SQL model in the natural language query interface service for data retrieval and analysis needs.

6.1. Dataset

6.1.1. Mouse Behavioral Dataset

To train and validate the actual effects of the pose estimation model and the behavior recognition model, we randomly selected 3000 experimental mouse behavioral videos. Each video was about 150 frames, mainly containing behavioral actions, such as stationary, standing, curling up, rectilinear movement, and steering movement. We divided these videos according to the ratio of 8:2, which constituted the training set and test set of the behavior recognition model. At the same time, we randomly selected 500 frames of images from these 3000 videos, and the experimenter labeled five key points on each mouse according to the experimental needs and the characteristics of the mouse skeleton. Then, we divided these images according to the ratio of 9:1, constituting the training set and test set of the pose estimation model.

6.1.2. Text-to-SQL Medical Animal Experiment Chinese Dataset

In order to train and verify the actual effect of the natural language query interface module in converting natural language into SQL statements, we have collected about 1500 SQL statement scripts used by researchers in the past and supplemented the natural language descriptions of these SQL statements and the corresponding table structure information according to the relevant information. The types of query statements include queries with keywords such as group by, order by, and having but also multi-table join queries, Nested queries, and more comprehensive calculation queries. Compared with CSpider [30], TableQA [31], and other datasets, the Chinese dataset of medical animal experiments is relatively simple. However, it is more domain-specific and in line with the actual needs of medical animal experiments’ information retrieval. We divided the dataset into a training set and a testing set according to the ratio of 2:1.

6.2. Behavioral Recognition

DeepLabCut is fully pre-trained on the ImageNet dataset. In addition, DeepLabCut has been tested and calibrated on behavioral data generated by different species of organisms, such as mice and fruit flies. These diverse data make the model robust. In this case, we trained DeepLabCut specifically using the mouse mentioned above behavioral dataset mentioned above, due to the differences in key point locations. We evaluated the model’s accuracy by comparing the deviation between the pixel coordinates of the key points predicted by the model and the coordinates labeled by the expert. We used the change in the mean value of the deviation for each key point to objectively assess the stability of the model. The root mean square error (RMSE) measures the root mean square difference between the predicted and true values, thus indicating the average degree of deviation between the predicted and true values. The formula for the RMSE is shown below:
R M S E = i = 1 n x t , i x p , i 2 n
where n is the number of observations, x t is the true value, and x p is the predicted value. Since DeepLabCut supports a variety of feature extraction networks, we selected six different feature extraction network models to obtain the best performance, including ResNet-50 and ResNet-101 for training [32,33]. We evaluated the detection performance of these networks on mouse skeletal key points and selected the most suitable feature extraction network for this case.
As shown in Figure 7, the RMSEs of different feature extraction networks for the three key points of the mice varied. The EfficientNet-b6 network has the smallest error on the test set [34], and the difference between its predicted coordinates and the true pixel coordinates was close to 5.9 pixels at the tail. Considering the higher pixels occupied by the nose part of the mouse in the high-resolution image in this experiment, such a coordinate deviation is acceptable.
In addition to this, we also evaluated the processing speed of different feature extraction networks. For example, when using MobileNet-V2-0.35 [35], the model can reach a processing speed of 16.5 frames/sed but with a relatively high error rate. The slowest detection speed is EfficientNet-b6, about 3.8 frames/sec. In medical animal experiments, the need for accuracy is usually higher than the processing speed. Therefore, we finally chose EfficientNet-b6 as the feature extraction network for DeepLabCut.
In the study of the pathological mechanism of motor deficits caused by UBE3A gene deletion, in addition to capturing the relationship between motor function and calcium signals in mice through changes in skeleton key point coordinates, the main focus was on whether the frequency of occurrence of the five movements of stationary, stand, rectilinear movement, steering movement and curl up in diseased mice and normal mice has changed. To analyze the changes in the frequency of these actions, the experimentalists, in constructing the behavioral dataset of the mice, selected only the videos that contained these actions. Figure 8 demonstrates the different behavioral actions in a single pose frame. Then, we input this dataset into DeepLabCut, which led to the corresponding skeleton key point data. After data preprocessing, we obtained valid inputs applicable to the ST-GCN network, and using these valid inputs, we retrained the ST-GCN network. Table 3 shows the detection accuracy of the ST-GCN network for different classes of actions. For actions with significant pose changes, ST-GCN has high detection accuracy. However, for steering movement, which is similar to rectilinear movement, the model may not be able to learn enough detailed information due to the small number of key points, resulting in average detection accuracy. Therefore, we suggest the experimenter mark more key points in the video to track the detailed changes in the pose more accurately. Figure 9 shows the detection results of ST-GCN for different behavioral categories in detail. Overall, ST-GCN offers high accuracy in mouse behavioral action detection, which meets the experimenter’s accuracy requirements.

6.3. Natural Language Query Interface

To capture the alignment relationship between natural language issues raised by users and database patterns, we needed to perform simultaneous semantic encoding on both in the RAT-SQL algorithm of the natural language query interface service. Considering the excellent performance of the BERT pre-trained model in natural language processing tasks, we choose to use its multi-language version to solve the cross-language problem in the Chinese Text-to-SQL task.
For Chinese natural language problems, we first needed to perform the divide words operation. In this case, we chose to use a Chinese word-splitter tool with high accuracy, i.e., Jieba, which can process Chinese natural language problems and return the combination of words with the highest probability. However, since natural language problems often contain Arabic numerals, unit symbols, and punctuation marks, in addition to Chinese characters, we further processed the output of the Jieba lexer tool by combining the substrings separated by the above cases to keep their original meanings unmodified.
For the column and table names of the database, we adopted the method consisting of English words with underlined separators according to the needs of engineering practice. Therefore, we only needed to divide words based on the underline. After completing the divide-words work, we spliced the obtained natural language questions, data tables, and data columns, and we connected each data column with its corresponding type.
Since the input contains Chinese and English, we encode it using a multilingual BERT pre-trained model (Multilingual-Bert) [36]. In the original RAT-SQL model, the schema-linking operation utilizes strings for matching, and therefore, the matching mechanism will not work correctly when multiple languages are involved. To address this problem, we introduced a multilingual, cross-domain common sense knowledge graph (ConceptNet [37]) and optimized the schema-linking process using its tautological edges. ConceptNet is a directed graph structure whose vertices are natural language words and phrases, and its edges are labeled ‘types’ and ‘weights’. Figure 10 shows its seven commonly used relation types.
The evaluation metrics for the natural language query interface task consist of two main aspects: first, the exact matching rate of the structure of the generated SQL statements to the standard SQL statements, and second, the execution accuracy of the SQL query statements in the given database. In this case, we realized that experimenters are usually only concerned with whether the actual output meets their needs. Therefore, we focused more on how to make the SQL statements generated by the text-to-SQL model obtain the correct results after execution. First, we used the improved RAT-SQL model for pre-training on Chinese datasets such as CSpider, DuSQL [38], and TableQA [31]. Then, we performed special training on the Chinese dataset of medical animal experiments. Table 4 shows the accuracy performance of RAT-SQL on the test set after introducing ConceptNet.
As shown in the above table, RAT-SQL performed excellently on the medical experimental animal dataset, although its conversion ability was weak when dealing with difficult samples. Highly difficult samples only accounted for a small portion of the actual demand. After introducing ConceptNet, RAT-SQL was able to meet the retrieval needs of the experimentalists sufficiently. For the problem of low accuracy of highly difficult samples, we plan to use multi-round quizzing to improve its conversion ability in the subsequent work.

7. Conclusions

We propose an AI-enabled and highly available animal behavior analysis platform, which has been applied to a medical experimental institution in China to evaluate the behavioral differences between disease model mice and normal mice in a specific case. In this case, the platform obtained the experimental video through the video capture device. Then, it used the pose estimation model to extract the single-frame pose features of experimental animals. Then, the platform used the behavior recognition model to process the continuous single-frame pose features to capture and store experimental animals’ behavior information automatically.
In the platform’s design, we mainly focused on how to improve the flexibility and scalability of behavior recognition and the interactivity of the platform to reduce the learning cost for researchers to use the platform and improve its usability. Therefore, the platform architecture integrates plug-in management, unified interface, and algorithm library design ideas so that it can be flexibly configured and extended. Whether high-precision behavior recognition can be achieved in animal behavior experiments is often one of many evaluation standards. The intermediate result output in the recognition process, such as the pose data of each frame, is also of great significance. Flexible algorithm replacement and expansion can enable researchers to choose more suitable identification methods according to actual needs and make the integration of algorithm upgrading and other algorithms more convenient, making the platform more competitive in the flexibility and scalability of algorithms.
The platform mainly relies on three core services. The pose estimation service performs pose estimation on experimental animals in the same experimental environment to obtain each animal’s key point coordinate information. The behavior recognition service extracts and classifies the behavior feature vector of each animal in the video frame in different ways according to the selected algorithm. The natural language query interface service can convert the researchers’ natural language query requirements into executable SQL statements and obtain the corresponding results from the database. The natural language query interface service provides more flexible and efficient information retrieval and improves the platform’s interaction. Based on these three core services, we have built a highly available animal behavior analysis platform, which not only realizes the automatic identification of animal behavior but also enables researchers to flexibly select behavior recognition algorithms through the algorithm library, eliminating technical barriers and reducing researchers’ dependence on experts. By utilizing natural language query interface services, the platform can open data access to all researchers and provide higher usability behavioral interactions.
The platform relies on computer vision technology in deep learning. However, the recognition effect may be affected when the video quality is too low, or the limbs between animals are blocked too much. In addition, in the natural language query interface service, researchers’ inaccurate language description may also affect the effect of information retrieval. In order to further improve the availability of the platform, we plan to reduce the dependence of the platform on video quality and improve the accuracy of behavior recognition. Therefore, we will consider making the platform compatible with multimodal behavior recognition algorithms, reducing the dependence on a single video image by fusing data information such as voice, electroencephalogram signal, or pressure sensor signal, and improving the detection effect. For the inaccurate description of researchers in information retrieval, we plan to use multiple rounds of questions and answers to guide researchers in expressing their needs more accurately to improve the accuracy and availability of platform data retrieval and analysis.

Author Contributions

Conceptualization, J.S. and G.H.; methodology, Y.C., T.J. and J.S.; software, T.J. and Z.J.; validation, J.S., Y.C. and G.H.; formal analysis, Y.C.; resources, G.H. and Z.J.; data curation, Y.C.; writing—original draft preparation, Y.C.; writing—review and editing, J.S.; visualization, Y.C.; supervision, J.S. and G.H.; project administration, G.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (Grant No. 62302086) and the Natural Science Foundation of Liaoning Province (Grant No. 2023-MSBA-070).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data provided in this study may be provided at the request of the corresponding author due to ethical and privacy protection restrictions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Broomé, S.; Feighelstein, M.; Zamansky, A.; Carreira Lencioni, G.; Haubro Andersen, P.; Pessanha, F.; Mahmoud, M.; Kjellström, H.; Salah, A.A. Going Deeper than Tracking: A Survey of Computer-Vision Based Recognition of Animal Pain and Emotions. Int. J. Comput. Vis. 2023, 131, 572–590. [Google Scholar] [CrossRef]
  2. Chen, J.; Hu, M.; Coker, D.J.; Berumen, M.L.; Costelloe, B.; Beery, S.; Rohrbach, A.; Elhoseiny, M. MammalNet: A Large-Scale Video Benchmark for Mammal Recognition and Behavior Understanding. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; IEEE: New York, NY, USA, 2023; pp. 13052–13061. [Google Scholar] [CrossRef]
  3. Da Silva Santos, A.; De Medeiros, V.W.C.; Gonçalves, G.E. Monitoring and Classification of Cattle Behavior: A Survey. Smart Agric. Technol. 2023, 3, 100091. [Google Scholar] [CrossRef]
  4. Arablouei, R.; Wang, L.; Currie, L.; Yates, J.; Alvarenga, F.A.; Bishop-Hurley, G.J. Animal Behavior Classification via Deep Learning on Embedded Systems. Comput. Electron. Agric. 2023, 207, 107707. [Google Scholar] [CrossRef]
  5. Roughan, J.V.; Wright-Williams, S.L.; Flecknell, P.A. Automated Analysis of Postoperative Behaviour: Assessment of HomeCageScan as a Novel Method to Rapidly Identify Pain and Analgesic Effects in Mice. Lab. Anim. 2009, 43, 17–26. [Google Scholar] [CrossRef]
  6. Natarajan, B.; Elakkiya, R.; Bhuvaneswari, R.; Saleem, K.; Chaudhary, D.; Samsudeen, S.H. Creating Alert Messages Based on Wild Animal Activity Detection Using Hybrid Deep Neural Networks. IEEE Access 2023, 11, 67308–67321. [Google Scholar] [CrossRef]
  7. Fuentes, A.; Yoon, S.; Park, J.; Park, D.S. Deep Learning-Based Hierarchical Cattle Behavior Recognition with Spatio-Temporal Information. Comput. Electron. Agric. 2020, 177, 105627. [Google Scholar] [CrossRef]
  8. Crispim Junior, C.F.; Pederiva, C.N.; Bose, R.C.; Garcia, V.A.; Lino-de-Oliveira, C.; Marino-Neto, J. ETHOWATCHER: Validation of a Tool for Behavioral and Video-Tracking Analysis in Laboratory Animals. Comput. Biol. Med. 2012, 42, 257–264. [Google Scholar] [CrossRef] [PubMed]
  9. Rodriguez, A.; Zhang, H.; Klaminder, J.; Brodin, T.; Andersson, P.L.; Andersson, M. ToxTrac: A Fast and Robust Software for Tracking Organisms. Methods Ecol. Evol. 2018, 9, 460–464. [Google Scholar] [CrossRef]
  10. Lim, C.J.; Platt, B.; Janhunen, S.K.; Riedel, G. Comparison of Automated Video Tracking Systems in the Open Field Test: ANY-Maze versus EthoVision XT. J. Neurosci. Methods 2023, 397, 109940. [Google Scholar] [CrossRef]
  11. Meade, M.J. Medication-Related Osteonecrosis of the Jaw: A Cross-Sectional Survey Assessing the Quality of Information on the Internet. Oral Surg. Oral Med. Oral Pathol. Oral Radiol. 2022, 133, e83–e90. [Google Scholar] [CrossRef]
  12. Fang, C.; Zhang, T.; Zheng, H.; Huang, J.; Cuan, K. Pose Estimation and Behavior Classification of Broiler Chickens Based on Deep Neural Networks. Comput. Electron. Agric. 2020, 180, 105863. [Google Scholar] [CrossRef]
  13. Nasiri, A.; Yoder, J.; Zhao, Y.; Hawkins, S.; Prado, M.; Gan, H. Pose Estimation-Based Lameness Recognition in Broiler Using CNN-LSTM Network. Comput. Electron. Agric. 2022, 197, 106931. [Google Scholar] [CrossRef]
  14. Lin, C.W.; Hong, S.; Lin, M.; Huang, X.; Liu, J. Bird Posture Recognition Based on Target Keypoints Estimation in Dual-Task Convolutional Neural Networks. Ecol. Indic. 2021, 135, 108506. [Google Scholar] [CrossRef]
  15. Ren, Q.; Lu, Z.; Wu, H.; Zhang, J.; Dong, Z. HR-Net: A Landmark Based High Realistic Face Reenactment Network. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 6347–6359. [Google Scholar] [CrossRef]
  16. Asma-Ull, H.; Yun, I.D.; Yun, B.L. Regression to Classification: Ordinal Prediction of Calcified Vessels Using Customized ResNet50. IEEE Access 2023, 11, 48783–48796. [Google Scholar] [CrossRef]
  17. Li, Z.; Zhang, Q.; Lv, S.; Han, M.; Jiang, M.; Song, H. Fusion of RGB, Optical Flow and Skeleton Features for the Detection of Lameness in Dairy Cows. Biosyst. Eng. 2022, 218, 62–77. [Google Scholar] [CrossRef]
  18. Shah, S.R.; Qadri, S.; Bibi, H.; Shah, S.M.W.; Sharif, M.I.; Marinello, F. Comparing Inception V3, VGG 16, VGG 19, CNN, and ResNet 50: A Case Study on Early Detection of a Rice Disease. Agronomy 2023, 13, 1633. [Google Scholar] [CrossRef]
  19. Yan, S.; Xiong, Y.; Lin, D. Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition. Proc. AAAI Conf. Artif. Intell. 2018, 32, 7444–7452. [Google Scholar] [CrossRef]
  20. Chen, C.; Zhu, W.; Steibel, J.; Siegford, J.; Wurtz, K.; Han, J.; Norton, T. Recognition of Aggressive Episodes of Pigs Based on Convolutional Neural Network and Long Short-Term Memory. Comput. Electron. Agric. 2020, 169, 105166. [Google Scholar] [CrossRef]
  21. Zhang, Y.; Cai, J.; Xiao, D.; Li, Z.; Xiong, B. Real-Time Sow Behavior Detection Based on Deep Learning. Comput. Electron. Agric. 2019, 163, 104884. [Google Scholar] [CrossRef]
  22. Schindler, F.; Steinhage, V. Identification of Animals and Recognition of Their Actions in Wildlife Videos Using Deep Learning Techniques. Ecol. Inform. 2021, 61, 101215. [Google Scholar] [CrossRef]
  23. Sun, G.; Liu, T.; Zhang, H.; Tan, B.; Li, Y. Basic behavior recognition of yaks based on improved SlowFast network. Ecol. Inform. 2023, 78, 102313. [Google Scholar] [CrossRef]
  24. Lauer, J.; Zhou, M.; Ye, S.; Menegas, W.; Schneider, S.; Nath, T.; Rahman, M.M.; Di Santo, V.; Soberanes, D.; Feng, G.; et al. Multi-Animal Pose Estimation, Identification and Tracking with DeepLabCut. Nat. Methods 2022, 19, 496–504. [Google Scholar] [CrossRef] [PubMed]
  25. Hua, Z.; Wang, Z.; Xu, X.; Kong, X.; Song, H. An effective PoseC3D model for typical action recognition of dairy cows based on skeleton features. Comput. Electron. Agric. 2023, 212, 108152. [Google Scholar] [CrossRef]
  26. Sriharipriya, K.C. Enhanced Pothole Detection System Using YOLOX Algorithm. Auton. Intell. Syst. 2022, 2, 22. [Google Scholar] [CrossRef]
  27. Wang, B.; Shin, R.; Liu, X.; Polozov, O.; Richardson, M. RAT-SQL: Relation-Aware Schema Encoding and Linking for Text-to-SQL Parsers. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 7567–7578. [Google Scholar] [CrossRef]
  28. Katsogiannis-Meimarakis, G.; Koutrika, G. A Survey on Deep Learning Approaches for Text-to-SQL. VLDB J. 2023, 32, 905–936. [Google Scholar] [CrossRef]
  29. Liu, H.; Singh, P. ConceptNet — A Practical Commonsense Reasoning Tool-Kit. BT Technol. J. 2004, 22, 211–226. [Google Scholar] [CrossRef]
  30. Min, Q.; Shi, Y.; Zhang, Y. A Pilot Study for Chinese SQL Semantic Parsing. arXiv 2019, arXiv:1909.13293. [Google Scholar]
  31. Sun, N.; Yang, X.; Liu, Y. TableQA: A Large-Scale Chinese Text-to-SQL Dataset for Table-Aware SQL Generation. arXiv 2020, arXiv:2006.06434. [Google Scholar]
  32. Hien, P.T.; Hong, I.P. Millimeter Wave SAR Imaging Denoising and Classification by Combining Image-to-Image Translation with ResNet. IEEE Access 2023, 11, 70203–70215. [Google Scholar] [CrossRef]
  33. Nijaguna, G.; Babu, J.A.; Parameshachari, B.; De Prado, R.P.; Frnda, J. Quantum Fruit Fly Algorithm and ResNet50-VGG16 for Medical Diagnosis. Appl. Soft Comput. 2023, 136, 110055. [Google Scholar] [CrossRef]
  34. Bharadwaj, G.V.; Sree, Y.R.; Varshita, J.L.; Chebrolu, S. Ensemble Model of U-Net EfficientNet-B3, U-Net EfficientNet B6, CoaT, SegFormer for Segmenting Functional Tissue Units in Various Human Organs. In Proceedings of the 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), Delhi, India, 6–8 July 2023; pp. 1–8. [Google Scholar] [CrossRef]
  35. Kumar Shukla, R.; Kumar Tiwari, A. Masked Face Recognition Using MobileNet V2 with Transfer Learning. Comput. Syst. Sci. Eng. 2023, 45, 293–309. [Google Scholar] [CrossRef]
  36. Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
  37. Pan, M.; Pei, Q.; Liu, Y.; Li, T.; Huang, E.A.; Wang, J.; Huang, J.X. SPRF: A Semantic Pseudo-relevance Feedback Enhancement for Information Retrieval via ConceptNet. Knowl.-Based Syst. 2023, 274, 110602. [Google Scholar] [CrossRef]
  38. Wang, L.; Zhang, A.; Wu, K.; Sun, K.; Li, Z.; Wu, H.; Zhang, M.; Wang, H. DuSQL: A Large-Scale and Pragmatic Chinese Text-to-SQL Dataset. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Online, 16–20 November 2020; pp. 6923–6935. [Google Scholar] [CrossRef]
Figure 1. Overall structure of the platform.
Figure 1. Overall structure of the platform.
Applsci 14 04583 g001
Figure 2. The main architecture of the platform, in which the pose estimation module, the behavior recognition module and the natural language processing model are the main modules.
Figure 2. The main architecture of the platform, in which the pose estimation module, the behavior recognition module and the natural language processing model are the main modules.
Applsci 14 04583 g002
Figure 3. DeepLabCut network architecture, where EfficientNet is the replaceable feature extraction network.
Figure 3. DeepLabCut network architecture, where EfficientNet is the replaceable feature extraction network.
Applsci 14 04583 g003
Figure 4. Principle of ST-GCN algorithm, where GCN is spatial graph convolution and TCN is temporal graph convolution.
Figure 4. Principle of ST-GCN algorithm, where GCN is spatial graph convolution and TCN is temporal graph convolution.
Applsci 14 04583 g004
Figure 5. The model structure of RAT-SQL. * N represents the number of Tansformer layers.
Figure 5. The model structure of RAT-SQL. * N represents the number of Tansformer layers.
Applsci 14 04583 g005
Figure 6. The detailed process and the external environmental dependencies. The Chinese meaning in the picture is to help me search for relevant information on the auxiliary standing movement of mouse A in Experiment A.
Figure 6. The detailed process and the external environmental dependencies. The Chinese meaning in the picture is to help me search for relevant information on the auxiliary standing movement of mouse A in Experiment A.
Applsci 14 04583 g006
Figure 7. RMSE of five critical points in mice extracted by different feature extraction networks.
Figure 7. RMSE of five critical points in mice extracted by different feature extraction networks.
Applsci 14 04583 g007
Figure 8. Pose in mice. (a) Stationary. (b) Rectilinear movement. (c) Steering movement. (d) Stand. (e) Curl up.
Figure 8. Pose in mice. (a) Stationary. (b) Rectilinear movement. (c) Steering movement. (d) Stand. (e) Curl up.
Applsci 14 04583 g008
Figure 9. Confusion Matrix for Behavioral Identification Results.
Figure 9. Confusion Matrix for Behavioral Identification Results.
Applsci 14 04583 g009
Figure 10. Seven Common Relationship Types in ConceptNet.
Figure 10. Seven Common Relationship Types in ConceptNet.
Applsci 14 04583 g010
Table 1. Comparison of our platform with commercial tools and behavior recognition algorithms.
Table 1. Comparison of our platform with commercial tools and behavior recognition algorithms.
SourceResearch ObjectBehavior TypeBehavior Recognition MethodData Retrieval and AnalysisScalabilityInteractivity
EthoWatcherAnimalsExtracts only activity-related parametersDigital image processing techniquesContains certain functions, further analysis requires reliance on other toolsNot supporting algorithm replacementGraphical interface
ToxTracAnimalsExtracts only activity-related parametersDigital image processing techniques and second-order Kalman filterContains certain functions, further analysis requires reliance on other toolsNot supporting algorithm replacementGraphical interface
ANY-mazeAnimalsHairdressing, stiffness, movement, stillness, activity related parameterExperimental animal trajectory prediction algorithmContains certain functions, further analysis requires reliance on other toolsNot supporting algorithm replacementGraphical interface
FangBroiler chickenStanding, walking, running, feeding, resting cardingNaive BayesData retrieval and analysis rely on other tools-Command line
NasiriBroiler chickenLimpLSTMData retrieval and analysis rely on other tools-Command line
LinBirdSwimming, flapping wings, standing, shoot winging, feeding, squattingResNet18Data retrieval and analysis rely on other tools-Command line
LiDairy cattleLimpST-GCNData retrieval and analysis rely on other tools-Command line
OursAnimalsCustomizableReplaceableContains certain functions and natural language query interfaceSupporting algorithm replacementGraphical interface and natural language query interface
Table 2. Relevant information of each module.
Table 2. Relevant information of each module.
Module NameFunctionality
Data acquisition and transmission moduleCapture video data and pass it on to the server.
Data receiving and preprocessing moduleReceive and convert data into standardized form.
ControllerReceive and process the requests sent by the client.
Algorithm library management moduleManage plug-ins and interface services within the algorithm library.
Pose estimation moduleTraining or execution of different pose estimation algorithms.
Behavior recognition moduleTraining or execution different behavior recognition algorithms.
Natural language query interface moduleConverting natural language queries entered by researchers into computer instructions.
Table 3. Detection results of ST-GCN for different behavioral categories.
Table 3. Detection results of ST-GCN for different behavioral categories.
Behavioral CategoriesAmountCorrect AmountAccuracy
Stationary52849694.12
Stand54550292.11
Curl up47845394.76
Rectilinear movement101598797.24
Steering movement43438288.01
Aggregation3000282194.03
Table 4. The accuracy performance of RAT-SQL.
Table 4. The accuracy performance of RAT-SQL.
TypeTotal Sample SizeCorrect AmountExecution Accuracy
Easy2722670.98
Medium1461370.93
Hard93810.87
Aggregation5114850.95
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, Y.; Jiao, T.; Song, J.; He, G.; Jin, Z. AI-Enabled Animal Behavior Analysis with High Usability: A Case Study on Open-Field Experiments. Appl. Sci. 2024, 14, 4583. https://doi.org/10.3390/app14114583

AMA Style

Chen Y, Jiao T, Song J, He G, Jin Z. AI-Enabled Animal Behavior Analysis with High Usability: A Case Study on Open-Field Experiments. Applied Sciences. 2024; 14(11):4583. https://doi.org/10.3390/app14114583

Chicago/Turabian Style

Chen, Yuming, Tianzhe Jiao, Jie Song, Guangyu He, and Zhu Jin. 2024. "AI-Enabled Animal Behavior Analysis with High Usability: A Case Study on Open-Field Experiments" Applied Sciences 14, no. 11: 4583. https://doi.org/10.3390/app14114583

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop