**1. Introduction**

Optimization problems have been one of the most important research topics in recent years. They exist in many domains, such as scheduling [1,2], image processing [3–6], feature selection [7–9] and detection [10], path planning [11,12], feature selection [13], cyber-physical social system [14,15], texture discrimination [16], saliency detection [17], classification [18,19], object extraction [20], shape design [21], big data and large-scale optimization [22,23], multi-objective optimization [24], knapsack problem [25–27], fault diagnosis [28–30], and test-sheet composition [31]. Metaheuristic algorithms [32], a theoretical tool, are based on nature-inspired ideas, which have been extensively used to solve highly non-linear complex multi-objective optimization problems [33–35]. Several popular metaheuristics with a stochastic nature are compared in some studies [36–38] with deterministic Lipschitz methods by using operational zones. Most of these metaheuristics methods are inspired by natural or physical processes, such as bat algorithm (BA) [39], biogeography-based optimization (BBO) [40], ant colony optimization (ACO) [41], earthworm optimization algorithm (EWA) [42], elephant herding optimization (EHO) [43,44], moth search (MS) algorithm [45], firefly algorithm (FA) [46], artificial bee

colony (ABC) [47–49], harmony search (HS) [50,51], monarch butterfly optimization (MBO) [52,53], particle swarm optimization (PSO) [54,55], genetic programming [56], krill herd (KH) [57–63], immune genetic algorithm (IGA) [64], and cuckoo search (CS) [65–69].

Yang and Deb [69] proposed a metaheuristic optimization method named CS algorithm, which is inspired by smart incubation behavior of a type of birds called cuckoos in nature.

CS performs local search well in most cases, but sometimes it cannot escape from local optima, which restricts its ability to carry out full search globally. To enhance the ability of CS, Mlakar et al. [70] proposed a novel hybrid self-adaptively CS algorithm adding three features: a self-adaptively of cuckoo search control parameters, a linear population reduction, and a balancing of the exploration search strategies. Li et al. [71] enhanced the exploitation ability of the cuckoo search algorithm by using an orthogonal learning strategy. An improved discrete version of CS was presented by Ouaarab et al. [72].

On the other hand, most researchers agree that the performance of algorithms can be improved by using learning techniques. For example, Wang et al. [73] presented a new method to enhance learning speed and improved final performance, which directly tuned the Q-values to a ffect the action selection policy. Alex et al. [74] presented a new evolutionary cooperative learning scheme that is able to solve function approximation and classification problems, improving accuracy and generalization capabilities. A new CS algorithm named snap-drift cuckoo search (SDCS) was presented by Hojjat et al. [75]. In SDCS, a snap-drift learning strategy is employed to improve search operators. The snap-drift learning strategy provides an online trade-o ff between local and global search via two snap and drift modes.

Although much e ffort has been made to enhance the performance of CS, many of the variants fail to improve the performance of CS algorithm on certain complicated problems. Furthermore, there are few studies on optimizing the parameters of CS algorithm by using learning strategy. In this paper, we present an improved CS algorithm called dynamic step size cuckoo search algorithm (DMQL-CS) that adopts strategies with *Q*-Learning and genetic operator. Step size strategy of the traditional CS focused only on examining the individual fitness value based on the one-step evolution e ffect of individual, but ignored the evaluation of step size from the multi-step evolution e ffect, which is not conducive to the evolution of the algorithm. We use *Q*-Learning method to optimize the step size, in which the most appropriate step size control strategies are retained for the next generation. At the same time, their weights are adaptively adjusted by using learning rate, which is used to guide individuals to search for a better solution at the next evolution. In addition, crossover operation and mutation operation are added into the DMQL-CS algorithm to accelerate the convergence speed of the algorithm and expand the diversity of the population.

The present manuscript di ffers from other similar work insofar as the advantage of learning based on *Q*-Learning and genetic operators. *Q*-Learning considers the multi-step evolution e ffect of individual such that the most appropriate step size control strategies are retained for the next generation. For the proposed DMQL-CS approach, the outstanding work of the paper is mainly listed in the following two aspects:


of the CS algorithm, numerous strategies have been designed to adjust the crossover rate. In this work, a self-adaptive scheme is used to adjust the crossover rate. Genetic operators expand the search area of the population to improve the exploration and maintain the diversity of the population, which also helps to improve the exploration of the population of learners.

Finally, the DMQL-CS method was tested on 15 benchmark functions, CEC 2013 test suite, and the problem of logistics distribution center location. The experimental results compared with those of other approaches demonstrated the superiority of the proposed strategy. A series of simulation experiments showed that DMQL-CS performs more accurately and efficiently than other evolutionary methods in terms of the quality of the solution and convergence rate.

The remainder of this paper is organized as follows. In Section 2, the related work on cuckoo search is presented. Section 3 presents cuckoo search. The proposed DMQL-CS algorithm, including *Q-*Learning model, step size control model with *Q-*Learning, and genetic operator, is described in Section 4. The comparison with other methods, through 15 functions, CEC 2013 test suite, and the problem of logistics distribution center location, is given in Section 5. Finally, Section 6 concludes this paper and points out some future research directions.
