**1. Introduction**

The cell formation (CF) problem has attracted researchers in academia as well as practitioners in the field since it was introduced as a part of group technology (GT) [1]. The initial step of CF is to create machine cells and their associated part families. A machine cell is a collection of functionally dissimilar machines which are grouped together and dedicated to process its associated part family, which is a collection of parts which are similar with regard to their geometric shape and size or processing requirements. By creating efficient cells, the maximum operations of the machines within cells (intra-cell operations) and minimum transfers of parts from one cell to another (inter-cell operations) are achieved. This leads numerous operational benefits, such as a reduction in setup time, work-in-process inventories, improvement in quality and a high degree of flexibilities to product demand changes [2].

Since the CF is an NP-hard problem [3,4], a number of approaches and methods have been proposed to solve the CF problem effectively. Papaioannou and Wilson [5] provided a recent review of the CF solution methodologies. The CF problems are classified into two categories: the standard CF (SCF) problem, considering only one process plan for each part, and the generalized CF (GCF) problem, considering alternative process plans for each part. Both problems can include replicate machines; i.e., extra copies for a machine type. GCF is more complicated than SCF since SCF is a special case of GCF. When a part has alternative process plans, operations can be performed on different types of machines or extra copies of a machine type. By considering alternative process plans and replicate machines, more independent cells and higher machine utilization due to reduced inter-cell flows can be achieved [6].

The first step in solving CF problems is to construct the mathematical model which is best suited to achieving the objectives of a specific CF. However, this usually leads to a huge model that has many integer and continuous variables, constraints, and/or nonlinear functions. As the number of machines and parts directly influencing the size of CF problem increases, the optimal solution methodology fails to solve large CF instances [7]. More specifically, if a mathematical model of CF contains nonlinear and/or multi-objective functions over multiple periods, it is very difficult to optimally solve that model, although those nonlinear functions can be linearized. Therefore, a number of soft computing approaches relying on local search methodologies such as artificial intelligence, heuristic/meta-heuristic, or hybrid algorithms have been proposed. Soft computing approaches attempt to find good or acceptable solutions to the proposed mathematical model of CF in a short computation time at the expense of the global optimum. Most soft computing approaches use specific mathematical models to set up the CF problem rather than solving them optimally. In this regard, almost all soft computing approaches for CF are heuristic in nature. However, the cell design experience with industrial experts shows that designers would rather spend more time to achieve an optimal or near optimal solution than use a heuristic approach to get an inferior solution [8].

On the contrary, hard computing approaches can use exact search algorithms such as branch-and-bound to solve the mathematical models of CF optimally if reasonable computation time is allowed. The application of hard computing approaches for CF significantly has relied on recent advances in computer hardware and commercially available mixed integer linear programming (MILP) solvers, such as CPLEX, LINGO, or Gurobi. Borrero et al. [9] stated that hard computing approaches can yield optimal solutions to large MILP problems with a reasonable running time if appropriate mathematical models are constructed.

Recently, two hard computing approaches for solving the mathematical models of CF have been mentioned in the CF-related literature. The first is an exact method that attempts to find the best cell configuration by directly maximizing the objective function of CF. The grouping efficacy (GE) measure [10] has been widely used as an objective function of CF. Since the GE takes a fractional function, the CF problem with the GE objective function results in a 0–1 nonlinear fractional programming problem. Thus, maximizing the GE directly has attracted many researchers since the early 2010s [11–18]. In order to evaluate the performance of their exact methods, 35 small to intermediate-size benchmark incidences [19] have been widely used for benchmark testing, and their solutions have been compared. However, some instances were not solved optimally even under the time limit of 100,000 s using the CPLEX MILP solver.

The other approach aims to indirectly maximize the GE or other alternative performance measures by using the classic or modified *p*-median problem (PMP). Since Hakimi [20,21] first introduced the PMP on a network of nodes and arcs, the PMP has been widely studied and extended to many practical situations including the location of plants, warehouses, distribution centers, hubs, and public service facilities [22]. Revelle and Swain [23] used Balinski-type constraints [24] to present an integer linear programming (ILP) formulation of the PMP. Unfortunately, since the original Revelle and Swain model (ORSM) defined on an *n*-node network contains *n*<sup>2</sup> binary variables and *n*<sup>2</sup> + 1 constraints, it is computationally infeasible to exactly solve the ORSM even for moderately sized networks. Therefore, many attempts have been made to formulate equivalent PMP models including fewer binary variables and constraints than the original ORSM [25–31]. Those reduced PMP models have been solved using hard computing techniques on MILP solvers, and their performances have been compared to those of past PMP models.

Kusiak [32,33] first proposed using the PMP-type model as an alternative mathematical programming model for the CF, replacing exact methods. However, the PMP itself does not explicitly optimize the objective of CF in the same way as the GE. Nevertheless, the PMP grasps the clustering nature of CF and presents a flexible framework by allowing additional constraints reflecting realistic aspects to be introduced [34]. In this context, the PMP matches the CF problem well and shows good solution performance for small to intermediate-size SCF/GCF instances [35–49]. Recently, Goldengorin et al. [34] proposed a flexible PMP-based approach for solving large-sized 0–1

SCF problems and used the Xpress MILP solver to optimally solve most of the SCF instances available in the literature within one second.

However, few studies reporting successful applications of the hard computing approach of the PMP-type model to large-sized GCF instances have appeared in the literature. There are two main reasons for this:


Motivated by the drawbacks of extant studies attacking the GCF problem, this paper proposes an effective hard computing approach using the PMP-type model to solve large-sized GCF problems. Our hard computing approach has the following distinctive features compared to previous hard computing approaches dealing with the GCF problem:


conducted over the widest range of GCF incidences that have ever appeared in the CF-related literature. Our collection of the GCF incidences can be used as a standard data set for subsequent benchmark tests in the future.
