This section discusses papers that have passed the entire search strategy. The section is divided into five categories that were identified during the review. These are mainly categories of proposed models, the division of which includes deep learning methods, probabilistic methods, TSP-based methods, algorithmic methods, and others. The final category encompasses additional works that offer significant insights within the domain of interest.
5.1. Deep Learning Methods
This subsection presents models that can be considered deep in the context of machine learning, using different types of neural networks, primarily recurrent networks.
Konishi et al. [
59] proposed a new, efficient approach to the safe control of warehouse robots. DRL algorithms allow the calculation of suboptimal solutions that may be unsafe. The given solutions are then transformed into a standard supervisory control problem with an automaton and a set of unsafe states. Using Supervisory Control Theory, also known as the Ramadge–Wonham Framework, constraints on plant behaviour are automatically created so that as many preset specifications as possible are met. The supervisor can stop the plant from generating a subset of controllable events, but there is no way to force it to generate an event. The solution provided was validated on simulation data. The innovative aspect of this work lies in the application of mathematical apparatus, specifically the Ramadge–Wonham Framework, to ensure safety in warehouse robot control. This approach presents a novel way to address the issue of unsafe suboptimal solutions. However, the mathematical complexity of this method could pose a barrier to its practical application, particularly for individuals without a strong mathematical background. Future work could focus on developing more accessible, user-friendly interfaces or tools that can leverage this mathematical framework, thereby broadening its applicability.
Andersen et al. [
60], in their paper, proposed a new “Dreaming Variational Autoencoder” model to speed up detection of potential threats. The authors cite that often expert systems already exist in fully automated warehouses, but are not flexible enough to work in dynamic environments. The model works in three stages: (1) The prediction model learns the dynamics of environmental changes (transitions between states). (2) Reinforcement learning without models uses the prediction model to safely sample the environment. (3) The model is implemented in the real world. This is an interesting solution that is based on a graph of transitions. The result of this model is represented as the number of erroneous positions in the storage map in the form of a 32 × 32 image, and each erroneous transition is marked as an error with a value of 1. They validate the solution on various Atari games. The innovative aspect of this work is the introduction of the “Dreaming Variational Autoencoder” model, designed to expedite the detection of potential threats in dynamic environments. This model, which operates in three distinct stages and is based on a graph of transitions, offers a fresh perspective on addressing challenges in fully automated warehouses. However, a limitation is evident in the deep warehouse environment used for testing, which does not mirror real-world systems perfectly. Additionally, while the DVAE-2 model showcases efficiency, model-free algorithms, despite their potential for better performance with unlimited sampling, present challenges in training. Looking ahead, there is a pressing need to ensure agents operate within safety boundaries, especially when confronted with unforeseen events such as fires or agent collisions. Another intriguing avenue for future exploration is the efficacy of agents with non-stationary policies in multi-agent environments.
Another paper by Mangalam et al. [
61] proposes a novel PECNet neural network architecture. PECNet is a neural network architecture that uses both past history and truthful endpoint to train a Variational Autoencoder (VAE) for multimodal endpoint inference. The network consists of three main components: a social pooling module, a predictor network, and a VAE. During training, real trajectory targets are used. The social pooling module takes the past history and ground truth endpoint, along with endpoint sampling, and encodes them into a latent representation. The social pooling module is equipped with a block diagonal social mask, which encodes spatial–temporal relationships between agents. The predictor network takes the latent representation from the social linking module and generates a predicted trajectory for each agent. The predictor network is conditioned on sampled endpoints and the social pooling module. VAE is used to infer multimodal endpoints, allowing the network to predict multiple possible future trajectories for each agent. The VAE is trained using both a truth endpoint and a sampled endpoint to learn the underlying trajectory distribution. The model was validated on the ETH/UCY and Stanford Drone (SDD) datasets. Overall, PECNet is a powerful network architecture for multi-agent, multimodal trajectory forecasting that uses both past history and ground truth endpoints to generate accurate and differentiated forecasts. The innovative aspect of the article is the introduction of the PECNet neural network architecture, a sophisticated design tailored for multi-agent, multimodal trajectory forecasting. The authors have also introduced the innovative “truncation trick” for trajectory prediction, a method that adjusts diversity for performance without the need for retraining. However, the model’s reliance on real trajectory targets during training could pose challenges in scenarios where such data are limited or unavailable.With its state-of-the-art performance across multiple datasets, future endeavours could delve deeper into refining the model’s components and exploring its applicability in even more diverse settings.
Nevertheless, the application of PECNet in real applications is problematic, as indicated by Garg and Rameshan [
62]. In this work, the authors improve the PECNet model, which was originally highly overtrained to training data, by which it could not be applied to a real-world application. They also reported that adding noise to the data completely destroyed the PECNet learning process. That made it unable to be applied to problems where the data contain noise, e.g., UWB sensors have an error of several centimetres. The improvements that the authors made to PECNet as a separate SIREN model are as follows: (1) They used a different method for optimization—used CMA-ES instead of ADAM. (2) They implemented cyclic annealing to stabilize the variational training of the autoencoder. (3) Another generative model has been added to produce a multimodal model output. This result was a confidence metric for better prediction and control, i.e., they replaced the standard multilayer perceptrons (MLPs) on the sinusoidal representation network (SIREN). The model was validated using six synthetic datasets: (1) pure Newtonian trajectories; (2) noisy Newtonian trajectories; (3) trajectories that are looped, circled, and used other geometric curves; (4) an interactive trajectory model using HMM; (5) a set based on reinforcement learning—the agent environment and their movement; and (6) a synthetic SDD. The innovative aspect of the this work lies in the enhancement of the PECNet model through the introduction of the SIREN model. This new approach incorporates three pivotal advancements: the adoption of CMA-ES for optimization, the implementation of cyclic annealing for stable variational autoencoder training, and the integration of an additional generative model to produce a multimodal output, resulting in a more reliable prediction metric. Despite these advancements, the model still grapples with a significant limitation: a growing disconnect between academic research and its practical applications, primarily attributed to dataset over-fitting. For future endeavours, there should be an establishment of a robustness and deployment fitness score, emphasizing comprehensive code reviews and real-world simulation testing. What is more, using generative models such as robGAN to assess a model’s readiness for deployment should be considered.
The deep learning network TrajNet++ by Sethi et al. [
63] effectively simulates group interactions within high-density settings, thereby facilitating social navigation for robots. The model is able to learn a variety of social norms used by humans when walking in large groups. The proposed metrics show that the TrajNet++ model outperforms traditional domain knowledge-based methods for modelling group interactions in the ORCA synthetic dataset [
99]. The innovativeness of the TrajNet++ model is its excellence in simulating group interactions in high-density settings, which enables robots to navigate socially. It adeptly learns diverse human social norms, especially in large group dynamics. Moreover, it outperforms traditional methods, as evidenced by its performance on the ORCA synthetic dataset. However, its reliance on this synthetic dataset raises questions about its adaptability in real-world scenarios. For future exploration, there is potential in deploying TrajNet++ in real-world applications such as intelligent transportation systems and refining its capabilities to capture a broader range of social interactions.
In the study by Postnikov et al. [
64], a model is introduced that includes two main parts: a cross-attention module and a transformer block. This model is effective in predicting human trajectories based on certain conditions, which is illustrated by its use of iterative attention blocks. The cross-attention module maps a latent array and free-form input information to a latent array, while the transformer block maps a latent array to a latent array. These components are used alternately to process the input information. The model uses multiple layers of cross-attention to iteratively extract information from the input data. The encoder consists of a stack of N identical blocks, with each block having three cross-attention modules and a latent transformer. The attention module is permutationally invariant, which is not suitable for using temporal information. To preserve temporal information, 1D positional encodings are concatenated with agent embedding. The latent transform block uses the GPT-2 architecture, which is itself a decoder of the original transform architecture. Cross-attention is an attention layer that decomposes attention into multiple heads. The trajectory generation problem is divided into two stages: proposing the position of the pedestrian target and constructing trajectories conditioned on the proposed target position. The target decoder decodes the final endpoint position from the encoded latent array using the multilayer perceptron, while the trajectory decoder decodes the trajectory using targets true during training and targets predicted during inference. It is worth noting that the architecture of the solution proposed by the authors is in the form of an autoencoder. The method was validated on ETH/UCY datasets. The results of this method are similar and slightly inferior to the rest of the state-of-the-art. The innovation of this work lies in the introduction of a trajectory prediction model that seamlessly integrates cross-attention modules and transformer blocks. This unique architecture, resembling an autoencoder, is adept at predicting human trajectories in urban-like environments. However, a limitation is its attention module’s permutational invariance, which is not ideal for temporal information, although this is addressed by concatenating 1D positional encodings with agent embedding. Looking forward, the model’s simplified architecture offers flexibility for further configurations or modifications, positioning it as a promising foundation for future trajectory prediction endeavours.
In their comprehensive study, Feng et al. [
65] propose a novel methodology to predict human mobility, which is known as DeepMove. This approach holds significant potential for various applications reliant on location data. The challenges these researchers faced included the complex regularity of sequential transitions, multi-level periodicity, and the heterogeneity and dispersion of trajectory data. To address these issues, they employed a multi-modal embedded recurrent neural network alongside a historical attention model. This model incorporated two specific mechanisms to effectively handle the complexities. The authors evaluated their model on three real-world mobility datasets and showed that DeepMove outperformed state-of-the-art models by more than 10%. Albeit this comparison was not of high quality due to the fact that they used PMM, RNN, and the Markov Model, which are fairly basic methods. Nevertheless, they showed that DeepMove provided clearer explanations for its predictions compared to other neural network models. In summary, the authors have achieved a novel approach that accurately predicts human mobility and provides interpretable results to improve performance in location-based applications. The study’s key innovation is the introduction of DeepMove, which is designed to predict human mobility. DeepMove is addressing challenges such as complex sequential transitions, multi-level periodicity, and the heterogeneity of trajectory data. However, a limitation in their comparison arises from using these basic methods as benchmarks. For future work, there should be an expansion of DeepMove to predict spatiotemporal points by considering potential durations. Additionally, the semantic context, such as points of interest and user tweets, should be incorporated. That would not only predict the location, but also the underlying motivations behind user movements.
In a noteworthy contribution to the field, Violos et al. [
66] present a distinctive model for next position prediction utilizing LSTM neural networks. The paper holds a unique position. It does not meet the EC5 condition, as it deals with trajectory prediction in the context of ships’ geographical position. However, it also presents a method involving LSTM. What adds to its uniqueness is that this LSTM method’s ready implementation can be found in the authors’ repository. The authors present a simple method for predicting the next position of ships, which includes three pipelines: training, transfer learning, and inference. The training pipeline processes ship trajectories, converting geolocations into normalized distance and bearing features, and then feeds them into a genetic algorithm. It is worth mentioning here that the approach of normalizing the trajectory data could be beneficial in the model prototype itself and could improve the quality of the said model. This is the second reason why this work was included. The transfer learning pipeline uses a knowledge base with stored trained DL models. In contrast, the inference pipeline uses the output model from the transfer learning pipeline. Its purpose is to predict the next position of the vessel based on its current position in real time. Validation was performed on a dataset consisting of the location of vessels. These data most likely belong to the authors’ usecase, since they have not provided any other information otherwise. The innovation lies in the model’s three-fold approach: training, transfer learning, and inference. The training phase transforms ship trajectories into normalized features, optimized via a genetic algorithm. The transfer learning taps into a repository of pre-trained models. A limitation is the model’s reliance on a dataset possibly proprietary to the authors, without referencing external sources. For future work, there is a potential in refining RNNs for trajectory analysis. This suggests an exploration of various optimization techniques, RNN types such as Gated Recurrent Units, and the integration of Attention mechanisms. A modular deep learning structure that factors in diverse inputs, such as weather or nearby objects, should be considered. This could enhance prediction accuracy.
Another promising model was presented by Mangalam et al. [
67]. The authors proposed a new approach to the problem of human trajectory forecasting, called Y-net, which is an inherently multimodal problem with various sources of uncertainty. They proposed dividing uncertainty into epistemic and aleatoric sources and modelling them separately using multimodality in long-term goals, landmarks, and paths. They also introduced a new long-term trajectory forecasting setting with a prediction horizon of up to a minute, which is an order of magnitude longer than in previous work. Finally, they presented a new trajectory forecasting network called Y-net. Y-net operates using the proposed epistemic and aleatoric structure to generate differentiated trajectory forecasts. Those forecasts are particularly relevant at long prediction horizons. Moreover Y-net maintains compatibility with the scene. They showed that Y-net significantly outperforms state-of-the-art approaches for both short and long prediction horizons on the Stanford Drone, ETH/UCY, and Intersection Drone datasets. Accordingly, the authors achieved the development of a new approach to improve the accuracy of human trajectory prediction, especially for long-term prediction horizons. Innovation that authors introduced is their Y-net, which divides uncertainty into epistemic and aleatoric sources, modelling them with multimodality in long-term goals, landmarks, and paths. This distinction allows for more precise trajectory predictions, especially over extended periods. The authors also pioneered a new long-term trajectory forecasting setting, extending the prediction horizon to up to a minute, significantly longer than prior models. However, the research does not delve into potential limitations or challenges faced during the model’s development. For future endeavors, the significant performance improvements suggest further exploration into refining Y-net’s components, potentially integrating additional data sources or expanding its applicability to diverse real-world scenarios.
In a significant contribution, Salzmann et al. [
68] proposed a model known as Trajectron++. This innovative model, characterized by its modular and recursive graph structure, is designed to predict multi-agent trajectories. It does so by considering the dynamics of the agents and incorporating heterogeneous data, including semantic maps. Trajectron++ is specifically designed to integrate with robot planning and control frameworks and can produce predictions that are optionally conditioned on ego-agent motion plans. The authors showcased the effectiveness of Trajectron++ using several real-world trajectory prediction datasets. These datasets posed significant challenges. However, Trajectron++ outperformed various state-of-the-art deterministic and generative methods, underscoring its effectiveness. The proposed model solves the important problem of trajectory prediction by enforcing dynamic constraints and incorporating environmental information, which is crucial for safe and socially aware robot navigation. The success of Trajectron++ in a variety of real-world scenarios demonstrates its potential to enhance the performance of interactive human–robot systems, particularly self-driving cars. The model has been tested on ETH, UCY, and nuScenes datasets. The Trajectron++ model is distinctively crafted to integrate effortlessly with robot planning and control frameworks, even offering the capability to condition predictions based on ego-agent motion plans. While the authors have not explicitly highlighted any limitations, it is worth noting that many deep models often face challenges when applied to real-world scenarios. In the future, the potential of Trajectron++ could be amplified by incorporating diverse data sources, enhancing its alignment with robotic systems, and capitalizing on its ability to generate comprehensive probability distributions for sophisticated robotic tasks.
In their paper, Yue et al. [
69] proposed an NSP-SFM model. This solution combines model-based and model-free approaches using a new Neural Differential Equation model. The NSP-SFM is a deep neural network which employs an explicit physics model featuring learnable parameters. This setup provides a potent inductive bias in the modelling of pedestrian behaviour. Simultaneously, the remaining parts of the network exhibit a strong capability in matching data, estimating system parameters, and modelling the stochasticity of dynamics. The authors compared NSP with 15 state-of-the-art deep learning methods on six datasets and improved state-of-the-art performance by 5.56–70%. They also showed that NSP has better generalization in predicting reliable trajectories in high-density scenarios. In addition, the authors showed that the physics model in NSP can provide reliable explanations for pedestrian behaviour, unlike black-box deep learning models. Validation of this solution was performed on ETH/UCY and SDD datasets. The NSP-SFM model introduces a novel blend of model-based and model-free techniques through a Neural Differential Equation framework. This innovation effectively models pedestrian behaviour by merging an explicit, learnable physics model with a deep neural network’s data-fitting prowess. It outperforms 15 other deep learning methods and offers better generalizability in high-density scenarios. However its limitation lies in oversimplifying humans as 2D particles, neglecting the complexities of real-world human dynamics. Future enhancements may delve into more intricate human behavior models, adapt to high-density crowd scenarios, and incorporate learning-based collision detection techniques.
A notable work fulfilling the review condition and offering a profound approach was provided by Cheng et al. [
70]. This research introduced a framework that aims to predict human plans and trajectories, a capability deemed highly valuable for effective human–robot collaboration. The authors of the paper emphasize that human plan and trajectory prediction are closely related, which is in line with the observations of other researchers (e.g., Ynet, NSP). The focus of the research paper is mainly on desktop assembly; however, the solution can be successfully applied in other fields where there is a need to coordinate the work of humans and machines. The framework system has been tested on actual industrial machines, which guarantees its effectiveness and practical application. The components of the framework system are algorithms for plan detection, trajectory prediction, and a robot behaviour planner. The plan detection algorithm is based on four components: trajectory prediction using LSTM neural networks, user target classification, user action classification using the Bayes algorithm, and correction of actual posteriori actions. All of these elements together form an effective and accurate system to predict human plans and trajectories and ensure their effective coordination with the robot. The work was validated using collected data on a real warehouse. The innovations of this paper include a robust plan recognition algorithm that combines neural networks with Bayesian inference, ensuring efficient and safe HRC. This framework not only predicts human trajectories but also recognizes high-level plans, enhancing the adaptability of robots to human actions. However, potential limitations arise from its reliance on motion labels via neural networks, which might pose accuracy challenges. Despite its primary focus on desktop assembly, the framework’s broader applicability remains an area of interest. Future endeavours should encompass comprehensive comparative studies and integration with diverse human–robot collaboration frameworks to further validate and expand upon the model’s advantages.
Table 3 categorizes the methods based on their evaluation approach, whether through simulation, benchmark datasets, synthetic datasets, or real-world case studies. From the presented papers, it is evident that benchmark datasets, particularly ETH/UCY and SDD, are the preferred choice for validation. Utilizing these datasets facilitates easy comparison of the methods in focus with existing state-of-the-art techniques. While real-world case studies are less common, they are crucial in ensuring that the proposed methods are not only theoretically robust but also practically applicable. The employment of simulation and synthetic datasets underscores the researchers’ intention to rigorously test their models in controlled settings before applying them to real-world contexts. Such approaches grant a deeper insight into the model’s behaviour under specific conditions.
Table 4 showcases a comparative analysis of models evaluated on the ETH/UCY and SDD datasets. The table delineates the performance of each model using two metrics: Average Displacement Error (ADE) and Final Displacement Error (FDE).
For the ETH dataset, ref. [
67] stands out with the lowest ADE and FDE values. On the combined ETH/UCY average, ref. [
69] exhibits the best performance in both ADE and FDE. In the SDD dataset, under different parameter settings, ref. [
69] is notable for its performance in the k = 20 setting. The parameter ’k’ refers to the number of samples used for evaluating multi-modal predictions.
5.2. Probabilistic Methods
In this section, a series of articles were provided. These articles shed light on the model’s non-deterministic prediction of human trajectory and underscore its distinction from a neural network.
In their research dedicated to human trajectory prediction, Petkovic et al. [
71] propose the implementation of a Hidden Markov Model (HMM), utilizing the Viterbi algorithm. The warehouse on which the algorithm would operate uses collecting robots that are supposed to stop moving when they detect a possible collision with a human. The ADE error they validated their solution with is the largest in the last step by far. As a baseline for ADE, they cite path interpolation, which firstly is not correct due to impossible paths, and also achieves worse ADE results. Hence, they were able to determine that the model performs better than interpolation. The innovations that researchers introduced is the utilization of a Hidden Markov Model combined with the Viterbi algorithm. This enables estimation of worker intentions by analysing both observed and hypothesized motions. This offers a more nuanced understanding of worker behaviour. However, the research had its limitations. The baseline path interpolation method used for average displacement error (ADE) validation was flawed, producing impossible paths. Moreover, the ADE error was notably large in the final prediction step, highlighting potential issues in long-term forecasting. For future endeavours, refining the model for better long-term predictions, real-world testing, and expanding the framework to handle more intricate warehouse scenarios are logical next steps to enhance prediction accuracy and overall system efficiency.
The article by Löcklin et al. [
72] discusses the challenges of predicting human movement trajectories in manufacturing. Furthermore, the authors present a schedule-based approach that uses real-time schedule data obtained from Manufacturing Execution Systems (MES). The application of scheduling in human motion trajectory prediction enhances semantic mapping approaches. This approach effectively reduces the number of possible destinations by considering the subsequent process steps for the goods currently in production. By reducing the set of destinations, the performance of future trajectory prediction can be improved. For evaluation, a commercial MES system is used together with an Ultra Wideband (UWB)-based real-time localization system to obtain human position data. Based on this data, a naive Bayes classifier uses the MES schedule and real-time position data to predict human movement intentions. The abstract activity modelling ensures that only a few training datasets are required for implementation, making the approach suitable for rapidly changing manufacturing environments, such as in flexible manufacturing. The model was also validated in a special laboratory. Said laboratory consisted of several workstations, a door to another workstation, a cafeteria, and a door that marked the end of a job. The key innovation lies in the introduction of a schedule-based approach for human trajectory prediction in manufacturing, leveraging real-time data from Manufacturing Execution Systems (MES). This method optimizes trajectory predictions by narrowing potential destinations based on current production steps. However, the model’s primary limitation is its oversimplified view of human behaviour, assuming movements are task-driven. Additionally, its dependency on MES and real-time locating systems may restrict its broader applicability. For future endeavours, an integration of more contextual information should be made, especially for reasons of deviations from planned schedules and for exploring the potential of the human–digital twin domain and context middleware.
Another publication by Wang et al. [
73] proposes a sequential similarity-based prediction approach that combines spatial and semantic contexts into a unified framework. The proposed method is evaluated using a real-world dataset from a large shopping mall that the authors hand-collected. The results show that it outperforms baseline methods and is suitable for real-world scenarios. The primary innovation is the introduction of a sequential similarity-based prediction approach that seamlessly integrates both spatial and semantic contexts within a single framework. This method is specifically tailored for indoor environments, which are rich in spatial-semantic information but also present unique constraints. However, a potential limitation is the model’s reliance on a specific dataset from a large shopping mall, which might not generalize to all indoor environments. For future work, it would be beneficial to test the model in diverse indoor settings and explore refinements based on different spatial-semantic configurations.
In their comprehensive study, Gilles et al. [
74] present a unified framework, known as THOMAS, for the prediction of multi-agent trajectories. This innovative framework is aimed at providing efficient and consistent multi-mode predictions for multiple agents’ trajectories. A unified model architecture for simultaneous estimation of future agent heatmaps is presented, using hierarchical and sparse image generation to enable fast and memory-efficient inference. A trajectory recombination model is proposed that takes a set of predicted trajectories for each agent as input and produces a reordered recombination that is consistent. The recombination module ensures that initially independent modalities are realigned to avoid collisions and maintain consistency. The results were presented for the Interaction multi-agent prediction challenge (Interaction 1.2 dataset [
75]), showing that the proposed framework reached first place on the online test leaderboard. The primary innovation lies in the introduction of THOMAS. This framework uniquely combines a unified model architecture for simultaneous estimation of future agent heatmaps with hierarchical and sparse image generation, ensuring rapid and memory-efficient inference. Furthermore, THOMAS introduces a novel trajectory recombination model. This model, by taking sets of predicted trajectories for each agent, produces a reordered recombination that ensures trajectories are consistent and collision-free. It would be essential to consider the framework’s generalizability across different datasets and scenarios. For future work, given the significant performance increase observed with the THOMAS module, it would be beneficial to explore its integration with other trajectory prediction models. Additionally, further refinement of the recombination module to handle more complex scenarios and diverse agent behaviours could be a promising avenue.
The articles reviewed provide a comprehensive overview of various approaches to trajectory prediction. However, as highlighted in
Table 5, the methods employed for evaluation differ across the studies. This diversity in evaluation techniques, poses a challenge when attempting to directly compare the efficacy and applicability of the proposed models.
5.3. TSP-Based Methods
In this subsection, publications will be presented, that treat the problem of trajectory prediction in a warehouse as a travelling salesman problem (TSP) in order picking scenario.
A work that breaks out a bit from the array of methods about trajectory prediction is presented by Theys et al. [
76]. The paper accomplished several goals related to the problem of sequencing and routing order pickers in warehouse systems. First, they evaluated the use of reformulating and solving the problem as a classical TSP, which resulted in up to 47% average savings in routing distance using the LKH TSP heuristic. Second, they examined the potential usefulness of combining problem-specific solution concepts from dedicated heuristics with high-quality local search features. Finally, they investigated whether a subset of features can be used to generate high-quality solutions for routing warehouse order pickers, or whether it is necessary to use “state-of-the-art” local search heuristics. They explored an innovative approach of reformulating the problem as a classical Travelling Salesman Problem (TSP). Utilizing the Lin Kernighan Helsgaun (LKH) TSP heuristic, the team achieved impressive savings in routing distance. However, the has limitations in solely relying on construction heuristics. While the method can harness the unique aspects of the Steiner TSP, their singular search methods, like aisle by aisle, can limit the exploration of the full solution space. Local search operators like the 2 opt, however, could effectively expand the searched solution area, leading to superior solutions without significant computational time increases. Future work could delve into fine-tuning these hybrid heuristic methods and exploring their application to other logistics and routing challenges.
An article that also presents a solution to the order picking problem as a TSP problem is presented by Ratliff and Rosenthal [
77]. In the paper, the authors mathematically try to prove that the construction of minimal subgraphs of the transition to collect products is able to be solved on a microcomputer in a minute. However, this solution was not tested for scalability and the warehouse was very small and simple because a mathematical simulation was used to validate the method. Innovation that has been introduced was an algorithm that is capable of solving the order picking problem using minimal subgraphs. However, limitations in the study arose due to its scope. The research was based on a simple and small warehouse, and the mathematical simulation used for validation didn’t test the solution’s scalability. Furthermore, while the method proved efficient for warehouses with crossovers only at the ends of aisles, the inclusion of additional crossovers within aisles introduced complications. These complexities increased the number of equivalence classes to consider, making the method potentially impractical for warehouses with more than two or three internal aisle crossovers. Looking ahead, future research could explore optimizing this algorithm for warehouses with multiple internal aisle crossovers and testing its practicality and scalability in more diverse and larger warehouse configurations.
The research conducted by Zunic et al. [
78] serves as a sanity check for the design of warehouse order picking optimization systems. Their work focuses on developing a system that enhances the efficiency of order picking processes for warehouse workers, with the basis rooted in a practical, real-world challenge. In the study, it came out that the layout of the warehouse had little effect on the paths of the workers. By using the algorithm, they increased productivity by as much as 41% compared to before using the algorithm. They tested the method on data from a real warehouse. The system’s adaptability stands out, with its capacity to operate optimally across various warehouse layouts, bolstered by the introduction of fictive locations. However, there are tangible limitations: real-world factors such as item weight and fragility were somewhat sidelined, and while the algorithm’s efficacy was proven in the study’s context, its absolute generalizability across all warehouse layouts is yet to be established. Future endeavours by the research team are inclined towards further minimizing the distance covered by workers during their daily routines. By strategically positioning frequently picked items near exits and clustering commonly ordered items together, the team anticipates achieving greater efficiency in warehouse operations.
The TSP-based methods for trajectory prediction in warehouses, as discussed in the presented studies, address the optimization of order picking scenarios. Referring to
Table 6, it is evident that the primary evaluation method is simulation, as highlighted by the works of Theys et al. [
76] and Ratliff and Rosenthal [
77]. While simulation-based evaluations give an understanding of the models’ potential, they may not fully represent real-world complexities. Conversely, Zunic et al. [
78] employed an empirical approach, testing their method with data from an actual warehouse. This real-world evaluation provides a direct assessment of the method’s practical applicability.
5.4. Algorithm-Based Methods
This sub-section will present papers in which the authors propose solutions to the problem of predicting human trajectory in the context of security, not necessarily involving predicting that trajectory. In addition, this subsection will be assigned papers in which the authors present framework systems to support warehouse security.
In their comprehensive study, Löcklin et al. [
79] delve into the potential of predicting warehouse workers’ trajectories based on historical movement data. This is achieved through the employment of a momentum algorithm, offering valuable insights for applications within factory and warehouse environments. Although the authors do not provide implementation details, they prove the effectiveness of their solution by adding noise of
10 cm and
50 cm to the data. They note, however, that using too long motion history can lead to larger prediction errors. Authors validated their solution on simulation data. The authors’ innovation was shown by exploration of predicting warehouse workers’ movements using a momentum-based approach combined with ultra-wideband-based Real-Time Locating Systems (RTLS), aiming to enhance human–robot collaboration in production settings. Their methodology displayed strengths, particularly in collision prevention for Automated Guided Vehicles (AGVs). However, there are inherent limitations: over-reliance on extensive motion history can inflate prediction errors, and the study used simulation data without diving into granular implementation specifics. The momentum-based method’s predictive scope remains confined, making it chiefly useful for short-horizon predictions. Looking ahead, there is a promising avenue in integrating worker destination predictions and considering floor plans to refine the trajectory predictions further.
The master’s thesis by Rybecký [
80] deals with the problem of organizing an autonomous warehouse. The author proposes a solution based on units moving autonomously, called Context-Aware Route Planning (CARP), which are controlled in an organized manner to achieve optimal routes. The CARP algorithm is based on A* and is used to solve the Multi-Agent Path Finding (MAPF) task. It finds a starting window for each agent to deploy them in the graph at a given time, and then performs an A* search with a cost function that considers the time at which vertices are visited. The algorithm optimizes for the makespan, or the time taken to complete the task, and uses heuristics to estimate the cost to the goal. It expands vertices by considering neighbouring vertices and their time windows to determine reachable windows for each agent. If the goal vertex is reached and is free forever, the algorithm finishes and reconstructs the found path. The work has been validated both on simulation and in a professional laboratory setting, where multiple robots have been deployed. The innovation presented in this work is the organization of an autonomous warehouse through the Context-Aware Route Planning (CARP) algorithm. It innovatively addresses the Multi-Agent Path Finding (MAPF) challenge by determining optimal starting windows and employing a unique cost function. However, certain heuristic assumptions, such as the precise estimate to the goal, did not outperform standard measures such as the Euclidean distance. Future research avenues include evaluating heuristic properties, enhancing memory utilization in MAPF solving, and possibly adapting existing algorithms such as D*Lite for the MAPF context.
In their insightful study, Hino et al. [
81] tackle the challenge of overseeing the operation of autonomous stacker cranes within a warehouse environment. Their research presents a careful consideration of the dynamics involved in the collision avoidance of these cranes. The authors proposed a simple algorithm for the movement of multiple stacker cranes. In short, an algorithm is used for collision avoidance in a crane movement system. It first chooses a trajectory from several candidates in “
” (a table containing trajectory information). If no trajectory can be chosen without collision, it attempts to adjust the trajectories to avoid collision. If collision still cannot be avoided, the crane transitions to a waiting state. If a deadlock situation occurs where two cranes have the same goal position and are both in the waiting state, the algorithm generates a trajectory to escape from the deadlock. The crane then moves according to the selected trajectory in “
”. The algorithm aims to optimize travel time by delaying movement or making detours when necessary, but in experiments they verified, using simulations, the movement of two. Their method relied on the fact that in addition to avoiding collisions, the algorithm checked deadlocks and allowed stacker cranes to slow down and stop. In experiments, it transpired that the basic collision avoidance method was as good as their proposed method. Validation was performed on simulation data. The innovation of the aforementioned study stems from the introduction of an algorithm that, beyond merely avoiding collisions, factored in crane dynamics and checked for potential deadlock scenarios. Despite its innovative approach, the study found that the basic collision avoidance method performed on par with their intricate algorithm in simulated experiments. The findings also revealed that the proposed approach improved efficiency by 11% compared to simpler methods, with computation times remaining practical. Future endeavours might delve deeper into refining the algorithm or expanding its applications in diverse warehousing scenarios.
In their innovative study, Cantini et al. [
82] explore the use of GPS-enabled employee smartphones as a means of determining the location of forklifts within a warehouse. This approach contributes to enhancing safety measures within the warehouse environment. The trajectories were analyzed using what is known as a “spaghetti chart”, a technique for drawing consecutive trajectories overlapping each other. However, this was only a simple preliminary analysis that did not yield interesting results. Nevertheless, this method was another that was tested in a real-world scenario. The authors introduced a pioneering technique by employing the “smart spaghetti” chart to depict consecutive overlapping trajectories, a methodology distinctively unique within the current literature. However, this approach is not without its limitations. It functions as a retrospective, offline qualitative analysis, devoid of real time monitoring and quantitative evaluations of reliability. While it is adept at pinpointing safety issues, the verification of suggested solutions remains challenging; it often requires supplementary methods such as logistic simulation. Looking ahead, there is potential to extend its use to varied warehouse sectors. Additionally, refining the model to forecast near misses in real-time based on worker movements is a promising avenue. Harmonizing the findings with predictive techniques or hands-on validation could further enhance its utility.
The study conducted by Niu et al. [
83] presents an innovative MPC-based algorithm designed for Autonomous Unmanned Vehicles (UAVs). This algorithm is intended to predict the trajectories of neighbouring UAVs in situations where inter-UAV communication is not possible. The proposed algorithm is then integrated with the DMPC framework to realize trajectory planning of multiple UAVs in an environment with static obstacles. In the simulation of two scenes, they showed that the proposed method is feasible and effective. Moreover, this approach can attain performance similar to DMPC with communication, known as communication-free MPC. It can achieve this with only a minor increase in computation time. This performance is significantly better than the DMPC algorithm, which considers neighbours as fixed-speed units. In this paper, the proposed method is implemented in sequential form. This algorithm uniquely predicts the trajectories of neighbouring UAVs, even in scenarios where there is an absence of inter-UAV communication. Such a feat allows UAVs to effectively navigate around static obstacles without the standard communication protocols, challenging the traditional norms of UAV navigation. The current model of this innovative algorithm operates in a sequential form. While it has showcased efficacy in simulations, its real-world applicability remains to be tested. As a next step, the method should evolve into a distributed form. The method should also be accompanied by real-world UAV testing, opening avenues for more extensive applications and refinements.
Subsequently, Kanai et al. [
84] introduce a predictive control methodology aimed at generating cooperative movement among heterogeneous agents working within constrained spaces, such as warehouses. The proposed method formulates a goal function and a dynamic model. This model amalgamates the state and inputs of all agents. Simultaneously, it incorporates constraints rooted in each agent’s specifications, including aspects such as the size and limitations of the actuators. Collision avoidance is also achieved using an occupancy grid map to account for obstacles of any shape in the target area. The effectiveness of the proposed method was demonstrated in a numerical simulation of heterogeneous mobile robots operating in a warehouse model. The authors introduce an innovative approach characterized by the integration of a goal function and a dynamic model. This model seamlessly integrates the states and inputs of all involved agents, meticulously factoring in constraints associated with each agent’s specific characteristics, such as size and actuator capabilities. A notable highlight is the adoption of an occupancy grid map, ensuring robust collision avoidance not only between agents but also with obstacles of varying shapes within the designated area. However, the method’s efficacy has been validated solely through simulations; hence, testing in real-world scenarios is an imperative next step. Future work might explore refining this model for different confined environments or incorporating sophisticated sensing mechanisms to enhance agent collaboration.
In another significant contribution to the field, Lu et al. [
85] present an algorithm designed for dynamic order picking in warehouse operations. This method, known as the Interventionist Routing Algorithm (IRA), is geared towards devising optimal routes for order pickers. The algorithm was modified to allow an operator to start an order-picking route from inside an aisle. This change required the development of new initiation procedures and the identification of new arc configurations to allow the forklift to leave the aisle. A new type of travel zone was introduced to increase the flexibility of the algorithm. The algorithm now uses two types of travel areas, One-Way (OW) and Round-Trip (RT), which are defined based on the relative locations of the picker, magazine, and all other retrieved items. The Round-Trip area was previously studied by Ratliff and Rosenthal in 1983. In addition, seven new PRS equivalence classes were added in addition to the six PRS equivalence classes previously proposed for the Round-Trip area. Six of these were proposed for the newly introduced One-Way area, while a seventh was added to the Round-Trip area. Finally, to adapt to the new One-Way areas and PRS equivalency classes, five additional tables were developed for route construction procedures. Moreover, two tables from Ratliff and Rosenthal’s work were altered by the addition of one row each. Moreover, based on the performed simulations, the algorithm scalability was confirmed. The authors have introduced the Interventionist Routing Algorithm (IRA), marking a significant innovation in warehousing strategies. A standout attribute of the IRA is its unique ability to start order-picking directly from within an aisle, necessitating the development of novel initiation procedures and arc configurations to accommodate forklift operations. While simulations underscore its potential, real-world validation remains pending. Future research could delve into its adaptability in diverse warehousing scenarios or compare it with emerging order-picking methodologies.
The publication by [
86] proposed an intelligent agent-based model for solving the order picking problem in an industrial warehouse with multiple storage locations. The authors combined elements of hierarchical and heterarchical frameworks in the modelling framework, resulting in a hybrid model. To overcome the rigidity and inflexibility of the hierarchical model, the authors used a real-time task assignment negotiation mechanism. Lower-level agents negotiate within the boundaries set by higher-level agents, resulting in both horizontal and vertical negotiation. Optimization algorithms, learning, databases, and knowledge bases were used by the agents to make better decisions. The authors used the proposed framework to solve the problem of allocating orders to picking zones through intelligent agents. Results using real system data showed that the proposed framework provided better throughput than the hierarchical model. This demonstrated flexibility, robustness, and fault tolerance to unforeseen events such as machine failure. Overall, this publication presents a novel approach to solving complex order picking problems in industrial warehouses using intelligent agent-based models. The authors have introduced a groundbreaking model tailored to address the order picking problem in industrial warehouses with multiple storage locations. This novel approach employs a real-time task assignment negotiation mechanism to surmount the inherent structural rigidity of traditional hierarchical models. While the proposed framework has been validated with real system data and exhibits superior throughput compared to the standard hierarchical model, the study’s results are still limited to simulations and specific datasets. It remains to be seen how this model responds across varied warehouse configurations or in environments with different unpredictabilities beyond machine breakdowns. Future avenues could delve into a broader validation across varied warehouse layouts, incorporate cutting-edge AI and machine learning for enriched agent decision-making, and probe the model’s synergy with burgeoning technologies such as IoT and robotics.
In their valuable research, Chen et al. [
87] put forth a real-time routing methodology for warehouses with multiple order pickers. The approach is based on an online Ant Colony Optimization (ACO) algorithm, which demonstrates its potential in managing congestion within the warehouse environment. The method aims to optimize picking routes for multiple pickers, taking into account congestion and unstable picking times. For each picker, a default route is generated by ACO and then coordinated to avoid congestion during the picking service. The proposed method improves the overall service time and throughput of the order picking process. A simulation was conducted to evaluate the effectiveness of A-MOP-NPT, which showed promising results in dealing with congestion and increasing system throughput. However, it should be noted that the current study assumes a pick-by-order policy and no capacity limit for pickers, which may not reflect real-world scenarios. The researchers innovatively employed the Ant Colony Optimization (ACO) algorithm to manage warehouse congestion. They generated a default route for each picker, minimizing congestion. However, the approach assumes a pick-by-order policy and overlooks picker capacity limits, limiting real-world applicability. Future work should address these limitations, test the model in varied warehouse setups, and consider integrating IoT and other smart technologies for enhanced decision-making.
The algorithm-based methods for trajectory prediction, as outlined in the discussed studies, provide a diverse range of solutions tailored to warehouse security and efficiency. Referring to
Table 7, it is evident that a significant portion of the research relies on simulations for validation. The diverse evaluation methods used in these studies, especially the inconsistencies arising from comparing results obtained in controlled simulations, underscore the challenges of directly comparing their outcomes and effectiveness.
5.5. Miscellaneous
The final subsection of this review encompasses relevant works whose findings and insights are applicable to the subject matter at hand.
The research conducted by Halawa et al. [
88] introduces an approach reliant on location data derived from Ultra Wideband (UWB) technology. This methodology is proposed as a means to enhance safety measures and operational efficiency within warehouse environments. The study’s authors point to the benefits of indicating to an employee the specific location of where to perform an operation, leading to reduced errors and increased efficiency. Unlike Automated Storage and Retrieval Systems (ASRS), the authors of the study focus on solving the problem of the movement of all forklifts. The paper proposes a three-step framework, which consists of: (1) Selecting location calculation technology (UWB). (2) Integrating UWB with WMS and FFMS systems. (3) Conducting analysis and making appropriate changes. The authors focused on metrics such as brake severity, congestion identification, route policy involvement, driver behaviour at intersections, speed in zones, forklift accidents, and forklift errors. In a comparison of location accuracy, the authors found that cameras are the most accurate, but also the most expensive. UWBs are the second most accurate solution, and also one of the cheaper positioning methods. This work aligns well with the broader context of improving warehouse safety and operational efficiency. Furthermore, the authors’ implementation of the proposed solution in a real-world warehouse setting enhances the relevance and applicability of their findings. The innovation that has been introduced was a novel three-phase framework. Unique insights are offered through heat maps and a data-refining algorithm, enriching warehouse decision-making processes. However, the research faces limitations, including noise interference from RTLS affecting precise location determination and the lack of real-time data analysis. Moreover, synchronization inconsistencies between different warehouse systems were identified. Moving forward, refining RTLS noise handling, focusing on real-time data interpretation, incorporating broader supply chain considerations, and aiming for an automated Industry 4.0-aligned decision support system are potential areas for future exploration.
In their research, Jiang et al. [
89] propose an advanced logistics monitoring system. This system, designed to enhance the efficiency and intelligence of logistics management, incorporates RFID sensor networks and leverages big data technologies. The framework system is built on a wireless sensor network platform and combines inbound and outbound logistics operations, warehouse positioning, and distribution monitoring management. The article tests the monitoring system in a real warehouse and shows that it has practical application value. Furthermore, the use of such a system can improve the level of informatization and intelligence of logistics management. The innovation introduced an innovative logistics monitoring system. Specifically, this system, rooted in a wireless sensor network platform, streamlines both inbound and outbound logistics operations, warehouse positioning, and distribution monitoring. When tested in real-world conditions, the system exhibited significant practical application potential and demonstrated its capability to elevate the informatization and intelligence quotient of logistics management. However, a significant limitation surfaced during the study: the system, while efficient and user-friendly, overlooks essential security considerations. For future enhancements, a comprehensive security review and implementation are crucial to ensure the integrity and safety of the logistics data being managed.
Next, Tsymbal et al. [
90] present a proposal for analysing the use of mobile robots to transport and manipulate goods within a manufacturing workspace. The level of intelligence required for these tasks is determined by their complexity, adaptability, and ability to respond to dynamic interactions. The proposed system is based on a logical decision-making model and is implemented using a robotic warehouse model. Validation of this solution was based on miniature autonomous unmanned vehicles in the laboratory. The innovation presented in their work is a novel system for analysing the use of mobile robots in manufacturing workspaces to transport and manipulate goods. Their system is unique in that it is grounded in a logical decision-making model and is implemented using a robotic warehouse model. The study’s distinctiveness is further underscored by its validation method, which employed miniature autonomous unmanned vehicles in a laboratory environment. However, the study highlighted several limitations. One of the primary concerns is the noise interference in the Real-Time Location System (RTLS), causing discrepancies in the exact positioning of forklifts—a critical issue given the precision required in narrow aisles of warehouses. Although they designed an algorithm to mitigate this, complete resolution of the RTLS noise remains a challenge and necessitates further refinement, potentially involving improved hardware technology. Another limitation was the absence of real-time data analysis, an essential component for Industry 4.0. For future work, there is a need for advancing RTLS technology to address noise and signal strength. The frameworks should be developed to synchronize warehouse data, creating real-time analysis systems equipped with cutting-edge machine learning methods, integrating RTLS for both items and forklifts tracking, and devising an automated decision support system embedded within an Industry 4.0 platform.
In a related study, Han et al. [
91] propose a novel method for order picking within a logistics setting. This method employs a combination of an HC strategy and a k-opt-based algorithm, demonstrating its potential within a multi-UAV system in an intelligent warehouse. They showed that their method outperformed other strategies and algorithms in terms of convergence time and lead time, even with an increase in order arrival rate. The proposed approach is suitable for logistics environments that use a lot of UAVs and require fast work. However, the authors acknowledged that some assumptions were made and further research is needed to account for UAV-to-UAV communication and different situations such as order removal and change. The approach was validated on a warehouse simulator. In conclusion, the authors’ work provides a novel approach to improving order picking performance in logistics environments. The novelty of this research is based on the introduction of a new method. This dynamic path planning approach allows for immediate assignment of orders, offering a notable improvement in convergence time and lead time, particularly with rising order arrival rates. Despite its notable advancements, the study does have its constraints. The research assumes a fixed starting and ending point at a depot and does not account for potential UAV collisions. Furthermore, the centralized nature of the logistics allocation means there has to be a robust connection between all UAVs and the central network, which could pose challenges in practical applications. For future work, there are several avenues for enhancement and further exploration. Future studies will delve into facilitating order batching through inter-UAV communication to decentralize control. There is a recognized need to consider real-world scenarios beyond just adding new items, such as order removals and changes. Lastly, to ensure the method’s real-world applicability, future validation should extend beyond simulations to real warehouse environments.
In their innovative study, Yoshitake et al. [
92] propose a new system for inventory picking and shelf sorting, known as the ShelfMigrant AGV system. This system employs a real-time holonic scheduling method, aimed at minimizing the waiting time of pickers within a warehouse setting. They evaluated the proposed system using a simulator and found that it improved picking productivity by reducing the waiting time of pickers compared to the conventional system. The proposed system is suitable for large warehouses with orders with higher variation and lower volume. In future work, the authors plan to reduce the computational cost of the system and test it in actual warehouses. Overall, the authors’ work provides a new approach to improving picking productivity in large warehouses with mixed and smaller order volumes using the ShelfMigrant AGV picking system. The approach was validated using a warehouse simulator. The primary innovation lies in its use of AGVs to transport both inventory and sorting shelves directly to pickers. Unlike conventional systems where sorting shelves are stationary or moved only after sorting tasks are complete, the ShelfMigrant AGV system can transport a sorting shelf even in the midst of its sorting, optimizing productivity. The research, while promising, is currently validated only through simulation. This means the results, while indicative, are yet to be proven in real-world conditions. There is also a recognition of computational costs associated with the proposed system, suggesting potential scalability or efficiency challenges. Future endeavours by the authors should focus on reducing the computational cost of the ShelfMigrant AGV system, making it more efficient for real-world application. Furthermore, the system needs to be tested in actual warehouse environments to validate its practical effectiveness beyond simulation.
In the publication by Binos et al. [
93], an intelligent agent-based framework is introduced to enhance warehouse management systems (WMS) in dynamic demand environments. The authors provide a novel framework that offers a triple contribution, encompassing innovative design principles and approaches for efficient WMS implementation. First, it provides the benefit of detecting warehouse exceptions in real time before they escalate to disruptions. Unexpected events can have a significant impact on productivity and efficiency, despite adherence to warehouse processes that dictate a smooth flow of product. Second, research offers new algorithms and optimization processes to improve the efficiency of warehouse processes, but these are not easily incorporated into existing WMS systems. Constructing service agents in this model can facilitate the addition of new features in an environment where they can be applied as required. This application is based on real-time environmental constraints, and the features can be evaluated both in real-time and historically through data analysis. Third, the decision support aids warehouse decision makers by addressing cognitive, memory, and time constraints. It does this by providing exception information packages and potential solution scenarios, which facilitates faster resolution of warehouse exceptions that require human involvement. The framework has been tested on a warehouse simulation. The authors have innovatively developed an agent-based Warehouse Management System (WMS) framework, meticulously tailored to address the complexities of dynamic e-commerce landscapes. This framework encapsulates features such as real-time warehouse exception detection, a modularized mechanism for facile integration of contemporary optimization algorithms via service agents, and an intricate decision support infrastructure designed for expeditious human intervention. However, the framework’s real-world efficacy remains untested, as its validation is primarily through simulation. Subsequent research trajectories should contemplate the integration of diverse artificial intelligence methodologies to amplify operational robustness and employ performance indices such as stock out rate and shipping accuracy for a more granular evaluative framework.
The study conducted by Ding [
94] investigates the implementation of a Smart Warehouse Management System (SWMS) based on the Internet of Things (IoT). The authors emphasize the advantages and benefits of incorporating sensors within the warehouse environment. The main benefits they point out are: control over the entry and exit of goods, increases warehouse efficiency, reduces the possibility of errors, decrease in labour and thus costs. The innovation introduced in this work is three-fold. The authors introduced a Smart Warehouse Management System, which is a significant deviation from traditional warehouse management models. They also integrated advanced sensor technology into their system. Finally, the system not only gathers data but also intelligently processes and controls the in/out of storage and cargo handling processes by leveraging internet and cloud computing technologies. This is a significant step forward in ensuring real-time data processing and decision-making in warehouses. However, the study lacks a detailed comparison with other modern systems and empirical validation of its advantages. Future research should delve into comparative analyses, explore integration with evolving technologies, and assess adoption barriers among enterprises.
In a compelling case study exploring the utilization of Artificial Intelligence (AI) in e-commerce fulfilment, Zhang et al. [
95] present their research on resource orchestration at Alibaba’s Smart Warehouse. This paper offers valuable insights into the practical implementation and benefits of AI within a real-world warehouse setting, specifically focusing on Alibaba’s operations. The conclusions reached by the authors are: (1) Data, AI algorithms, and robots are significant resources in developing AI capabilities. (2) Orchestration of AI resources and other related resources leads to the development of strong AI capabilities. (3) AI capabilities interact and co-evolve with human capabilities to create value. (4) Interactions and co-evolution involving AI and human workers depend on the type of task. (5) AI applications create business value in terms of efficiency (e.g., space optimization, labour productivity) and effectiveness (e.g., error reduction) by automating, extending, and transforming key business processes. The innovation of the research stands out for its resource orchestration perspective, pinpointing that for AI to truly shine, resources such as data, algorithms, and robots need to be orchestrated with other organizational elements. Notably, the study underscores the mutual co-evolution of AI capabilities with human skills, emphasizing their combined potential in enhancing business processes. However, the research is not without limitations. Its case-study nature confines it to specific context and thus may not be statistically generalizable across different industries or settings. The research exclusively zeroes in on certain business processes, omitting others such as goods receiving and outbound logistics. Additionally, its internal focus omits possible external influences on AI integration. Future research avenues are apparent from these limitations: a broader, multi-industry examination to attain statistical generalizability, exploration of AI in other warehousing processes, and a more expansive investigation that takes into account external macrolevel influences such as government policies or societal beliefs. This will pave the way for a holistic understanding of AI’s integration across varying contexts and conditions.
A paper presented by Song et al. [
96], which is also a literature review, but mainly deals with the application of IOT in smart logistics, also includes a section on smart warehouses. They mainly mention environment sensing in the context of security, warehouse layout optimization, and warehouse management in general. As difficulties of the field, they mention (1) data security, (2) data privacy, and (3) managing the range of different solutions and their integration with each other. In an innovative manner, their paper does not merely compile existing knowledge but critically examines the deployment of IoT across various logistics functionalities, from transportation to warehousing and distribution. The paper’s explicit focus on environment sensing, especially in the context of security, warehouse layout optimization, and comprehensive warehouse management, gives it a nuanced depth. Moreover, by delving into predominant challenges in the sector, such as data security, privacy, and the integration of diverse solutions, they offer practical insights in their review. However, the paper’s overarching nature is a limitation. As a literature review, it may lack empirical evidence, hands-on experiments, or practical case studies to complement the theoretical discussion. For future work, several research challenges within the IoT-based smart logistics arena await tackling. Given the rapidly expanding field, empirical studies addressing these challenges, such as data security and integration issues, represent a promising avenue.
In their article, Gils et al. [
97] highlight the significance of integrating order picking planning problems. The authors provide an in-depth examination and review of the current state-of-the-art classification in this domain, emphasizing the importance of considering multiple planning problems to design efficient order-picking systems. In particular, the review demonstrates the following: (1) Problem interaction analysis that evaluates combinations of problems with different planning horizons. (2) Effective solution of integrated operational problems using metaheuristics. (3) Underexplored relevant research opportunities in order picking optimization. The innovativeness of the research lies in their comprehensive review and classification, which underscores the interplay between various planning problems, and the subsequent guidelines offer practical solutions for warehouse managers. Yet, the study recognizes its limitations, notably the small sample size for many problem combinations and the dearth of in-depth investigations on many of these combinations. Future research opportunities emerge in developing integrated models for specific planning combinations, refining heuristic algorithms tailored for real-world scenarios and amplifying the scope of literature-reviewed or empirically studied underexplored problem combinations.
The paper by Vanheusden et al. [
98] sheds light on the gap between academic research and practical implementation in the context of order-picking planning. The authors argue that while academic research yields valuable insights, there is often a lack of effective translation into practical warehouse management policies. The paper provides a comprehensive classification and review of the current state-of-the-art, aiming to bridge the divide between theoretical advancements and practical considerations in order-picking planning. According to the authors, simplistic approaches favoured by warehouse managers can lead to suboptimal results. They suggest that future research should prioritize practical factors and practitioner insights into order-picking operations. Specifically, research should focus on identifying bottleneck drivers, examining warehouse safety, analyzing the impact of similar products on picking times, and developing decision-making tools for balancing workload. In addition, research efforts should evolve toward integrated solution algorithms that take into account various practical factors and interdependencies among planning problems. Access to real-world data is key to building realistic models, and this research can support future research on practical considerations in automated systems. The innovation introduced by the paper is bridging the divide between academic insights and practical needs in order-picking planning. By identifying often-neglected practical factors and their implications, the study uniquely reorients the focus from purely theoretical models to more actionable insights for warehouse managers. While its strength lies in a comprehensive review and classification, the paper primarily uses existing literature and may lack new empirical findings. Future endeavours can prioritize empirical validations of these practical factors, develop real-world informed algorithms, and foster stronger academic–practitioner collaborations, especially as automated systems become more prevalent.