*2.5. Challenges*

Many research and review articles have discussed challenges in the implementation of DTs, and the issues can be categorized as time-, safety-, and mission-critical [115–120]. In this section, issues that are more relevant to the manufacturing sector and modeling community are presented, including data communication, model development and maintenance, cyber-physical security, and real-time capability.

One of the challenges in achieving a DT framework is to establish a stable two-way connection between the physical and virtual components to support real-time integration. Heterogeneity in equipment manufacturers and their software [116] is a major hurdle that needs to be addressed using a common interface or file format that could make interactions between several software easier. Several prominent manufacturers are already making strides by supporting commonly used OPC UA/DA interfaces. The creation of a database system that is not only vertically and horizontally scalable but also structured would also be important in such a framework. Thus, migrating to a NoSQL database would be recommended, but in this case, the manufacturing industry lags since several software currently only save data in SQL databases. Additionally, the resolution of sensor data, latency within the data communication channel, increased volume and variety of data, and the requirement of fast storage and retrieval are all challenges within this context.

The development of virtual models is often costly and challenging due to the lack of a complete understanding of the physical process [93]. This deficiency sometimes leads to inconsistences between models and the physical system. These inconsistencies need to be appropriately identified and handled, which can impose challenges to the modeling and operation teams. To resolve the issue, systematic model development approaches, along with appropriate model maintenance strategies are needed. Moreover, since the models need to perform simulation and system analyses in real-time, efficient and accurate algorithms that can utilize available information in real-time and continuously are crucial, presenting a challenge to both the modelers and allocation of computing resources.

In addition to the modeling aspects, cyber-physical security is another area of concern to ensure the normal operation of physical and virtual components against malicious attacks [121]. In a fully integrated DT, large data sets with important and potentially confidential information are exchanged, which require secure communication and processing among all systems [122].

#### **3. Digital Twin in Pharmaceutical Manufacturing**

In pharmaceutical manufacturing, the potential of using DTs to facilitate smart manufacturing can be seen in di fferent phases of process development and production. In the process design stage, the use of a DT can significantly accelerate the selection process of a manufacturing route and its unit operations as it is able to represent physical parts with various models. The understanding of process variations can be obtained from DT simulations, which allows for the prediction of product quality, productivity, and process attributes, reducing the time and costs for physical experiments [123]. In the operation phase, real-time process performance can be monitored and visualized at any time, and the DT can analyze the system in a continuous manner to provide control and optimization insights of the process [123]. The DT can also be used as a training platform for operators and engineers, as the real-time scenario simulation and on-the-job feedback can be realized through DT. With regards to pre- and post-manufacturing tasks, the DT platform can assist with tasks including but not limited to material tracking, serialization, and quality assurance.

Some key requirements for achieving smart manufacturing with DT include real-time system monitoring and control using Process Analytical Technology (PAT), continuous data acquisition from equipment, intermediate and final products, and a continuous global modeling and data analysis platform [29]. The pharmaceutical industry has taken several steps towards this by using techniques such as Quality-by-Design (QbD) [124], Continuous Manufacturing (CM) [124], flowsheet modeling [125], and PAT implementations [126]. Some of the tools have been investigated extensively, but the overall integration and development of DTs are still under infancy.

This section reviews the progress of current research and industry applications towards DTs in pharmaceutical manufacturing from aspects of PAT sensing, model building, and data integration, which corresponds to the physical component, virtual component, and data managemen<sup>t</sup> parts in the general DT framework. Challenges and opportunities are discussed at the end of this section.

#### *3.1. PAT Methods*

A key component in the development of a DT is data collection. In addition to readings from equipment, (critical) quality attributes also need to be collected from physical plants in a timely manner for use in the virtual component. The models and analyses are reliant on good data. Several traditional technologies exist to determine CQAs such as sieve analysis and High-Performance Liquid Chromatography (HPLC), but these cannot provide real-time data and are performed away from the production line rather than in-line or at-line. Thus, PAT tools have been explored and developed to address these issues [127].

PAT tools in the pharmaceutical industry have a wide range of applications, including measuring particle size of crystals [128], blend uniformity [129], testing tablet content uniformity [130], etc. Spectroscopy tools (Nuclear Magnetic Resonance (NMR), Ultraviolet (UV), Raman, near-infrared, mid-infrared, online mass spectrometry) constitute one of the major techniques used to measure the CQAs of pharmaceutical processes. Raman and Near-Infrared Spectroscopy (NIRS) are commonly used in the industry. Raman Spectroscopy has been employed for the on-line monitoring of powder blending processes [131]. Since acquisition times for Raman can be higher, NIRS is preferred for real-time measurements. NIRS has been used for real-time monitoring of powder density [15] and blend uniformity of processes [129]. NIRS has also been integrated with control platforms for process monitoring and control [132]. Baranwal et al. [133] employed NIRS to replace HPLC methods to predict API concentration in bi-layer tablets. PAT tools have also been used by the pharmaceutical industry to determine the particle size distribution of the product [134]. Several available optical tools such as Focused Beam Reflectance Measurement (FBRM) [135], a high-resolution camera system [136] have also been employed in the industry for particle size analysis. Some studies have utilized a network of PAT tools to achieve a monitoring system to help monitor and control a unit process [127,137].

The US FDA has also taken steps in promoting the use of PAT tools in pharmaceutical manufacturing with the goal of ensuring final product quality [138]. The pharmaceutical industry has adopted PAT in various applications throughout the drug-substance manufacturing process [139]. Although this has certainly led to an increase in the usage of PAT tools, their applications still remain focused on research and development rather than in full-scale manufacturing [126]. In the limited number of cases where they were employed in manufacturing, they have been successful in reducing manufacturing costs and improving the monitoring of product quality [140]. The development of different PAT methods, with their compelling application as an integral part of a monitoring and control strategy [141], has established a building block in gathering essential data from the physical component, enabling the further development of process model and DT.

#### *3.2. Process Modeling*

DTs highly depend on the use of data and models, and in the pharmaceutical industry, there is a growing interest in the development and application of methods and tools that facilitate that [142]. Different types of models have been developed for batch and continuous process simulations, material property identification and prediction, system analyses, and advanced control. Papadakis et al. recently proposed a framework for selecting efficient reaction pathways for pharmaceutical manufacturing [143], which includes a series of modeling workflows for reaction pathway identification, reaction and separation analysis, process simulation, evaluation, optimization, and operation [142]. The overall framework would yield an optimized reaction process with identified design space and process analytical technology information. The models developed under this framework can all be used as the virtual component within a DT framework to provide further process understanding and control of the manufacturing plant.

As mentioned in Section 2.2, the modeling approaches can be classified as mechanistic modeling, data-driven modeling, and hybrid modeling. For mechanistic modeling approaches in pharmaceutical manufacturing, the discrete-element method (DEM), finite-element method (FEM), and computational fluid dynamics (CFD) are often used [144]. To simulate the particle-level or bulk behavior of the material flow in different pharmaceutical unit operations, DEM is a powerful tool and has been applied widely [145–147], though its high computational cost limits its practical use when running locally. With HPC and cloud computing, it is possible to integrate DEM simulations with the overall process, resulting in a near-real-time model. For model fluid flow in pharmaceutical processes, including API drying and fluidized beds, CFD and FEM are popularly implemented [144]. These two methods are also heavily utilized in biopharmaceutical manufacturing (see Section 4.2).

Data-driven modeling methods involve the collection and usage of a large amount of experimental data to generate models, and the resulting models are based on the provided datasets only. Commonly implemented approaches in pharmaceutical manufacturing include the artificial neural network (ANN) [148,149], multivariate statistical analysis, Monte Carlo [150], etc. These methods are less computationally intensive, but due to the lack of underlying physical understanding in the trained models, the prediction outside of the space of the dataset is often unsatisfactory.

There is also a recent trend in developing various types of hybrid modeling techniques to model complex pharmaceutical manufacturing processes, while lowering the demand of computational cost and data availability. Population balance modeling (PBM), with a comparatively lower computational cost, has been extensively used to model blending and granulation processes [64,151], and a PBM–DEM hybrid model has also been used to improve model accuracy while maintaining reasonable computational costs [152]. Other semi-empirical hybrid models, such as the ones that incorporate material properties into process models [153], and to investigate the effect of material properties in residence time distribution (RTD) and process parameters [146,154–157], have also been developed for different powder processing unit operations [52,158]. These models, when incorporated with a full DT framework, will facilitate the overall product and process design and development, accelerating the drug-to-market timeline.

Table 2 provides a feature-based comparison of various models used in pharmaceutical manufacturing applications. The characterization of computational complexity is based on the typical computational

cost for a single unit operation. The feature of real-time capability emphasizes the ability for a model to produce simulation or prediction results in real-time and optimally, in-sync with the equipment. This feature highly depends on computational complexity. Even though mathematical and semi-empirical modeling approaches have this capability, they are mostly trained and implemented offline. Real-time applications are rarely seen in the context of pharmaceutical manufacturing. For adaptive modeling capability, the modeling approaches that are able to incorporate data are advantageous as new data can be used to update the models. The online usage of these models in adaptive mode can hardly be found.


**Table 2.** Feature-based comparison of various models.

In addition to developing models for single pharmaceutical unit operations, a flowsheet model integrating the entire manufacturing process can be used to predict the process dynamics affected by material properties and operating conditions of different unit operations. More importantly, systematic process analysis of the flowsheet model, such as sensitivity analysis, design space identification, and optimization, can all be performed with the flowsheet model. This provides insight into the characteristics and bottlenecks of the process and thus facilitates the development of control strategies [125]. Throughout the years of development, many researchers and pharmaceutical companies have developed mature approaches in conducting these analyses offline during the process design phase [52,56,125,159,160]. Flowsheet models are needed for the development of DTs. However, flowsheet models are stand-alone, so they cannot automatically update adapting to the physical plant. In current research, there is limited communication between the flowsheet model and the plant, which is a challenge in the development of a DT.

### *3.3. Data Integration*

The implementation of IoT devices in pharmaceutical manufacturing lines leads to the acquisition of vast amounts of data. This collection of process data and CQAs needs to be transmitted to the virtual component in real-time and in an efficient manner. In addition to these, several pharmaceutical process models also require material properties for accurate prediction. Thus, a central database location is required for access to all datasets for the virtual component [46]. All data transfer protocols discussed in Section 2.3 are applicable here as well. In addition to these, the applications and databases should also be compliant with 21 CFR Part 11 data integrity requirements in accordance with US FDA's guidance [161]. The database not only serves as a warehouse for real product data but can also be used to store results from simulations performed in the virtual component and optimized process parameters. It would also serve the purpose of relaying back these optimized process parameters to the real product.

Several studies have attempted to achieve an integrated data framework in downstream pharmaceutical manufacturing [46,84,132,162–165]. Some of these studies were focused on implementing a control system for the direct compression line [132,157,165]. Cao et al. [46] presented an ISA-88 compliant manufacturing execution system (MES) where the batch data were stored on a cloud database as well

as on a local data historian. The communications between the equipment and the control platform were performed in a similar manner for all the studies. The process control system (PCS) created a database based on the input recipe, and the database was replicated directly into the local data historian. The communication between the historian and PCS can be achieved using TCP/IP and OPC since each software is hosted on di fferent computer systems on the same network. The historian database can in turn be duplicated onto the cloud using network protocols such as MQTT, HTTPS, etc. Some authors have also presented ontologies for e fficient data flow for laboratory experiments performed during pharmaceutical manufacturing [166–168]. Cao et al. [46] also addressed the collection of laboratory data in an ISA-88 applicable recipe-based electronic laboratory notebook—many of the presented studies focused primarily on integrating one component of a completely integrated data managemen<sup>t</sup> system. Figure 2 illustrates a sample data integration framework, where data collected from the manufacturing plant as well as laboratory experiments are uploaded to a cloud database using the mentioned protocols. The data can then be used in the virtual component for simulations, and corrective actions can be sent back to the control platform.

**Figure 2.** Framework for dataflow in a continuous direct compaction tablet line. The text over the arrow indicates options for data transfer protocols.

#### *3.4. Challenges and Opportunities*

Integrating all building blocks mentioned in Sections 3.1–3.3, the authors are visioning a fully integrated, model-centric DT framework for pharmaceutical manufacturing, as shown in Figure 3. The physical plant continuously sends process data to the virtual end, establishing a data inflow to achieve continuous process monitoring and data storage. Once the real-time data are received, process visualization and evaluation can be performed in real-time using visualization tools and process models. Automatic control based on evaluation results can then be executed to modify process operations if it is needed. The overall data and information flow become a continuous, real-time, integrated loop. Models can be updated based on plant measurements and changes by implementing hybrid or adaptive modeling techniques, and real-time model evaluation results that support the identification of critical process parameter boundaries, process optimizations, and material/process

characterization can guide the operational updates of the plant. Our review has showcased that the pharmaceutical industry is on the move towards adopting a full DT. Currently, continuous monitoring of processes, storage of operation data, process visualization, and model-predictive control have been implemented in pharmaceutical applications. Building blocks are in place for all three components, but there still exist some key challenges and gaps.

**Figure 3.** Fully integrated DT framework for continuous pharmaceutical manufacturing.

In terms of process monitoring and the use of PAT, though the use of spectroscopy to estimate product compositions has become a routine, the accuracy of measurements in low-dose drug products, the consideration and handling of outside interferences, and the maintenance of calibration models (i.e., the robustness of calibration) are all common problems. For low-dose drug measurements, though there are new tools such as NIRS and in-line UV spectroscopy, the accuracy can be improved by increasing sampling frequency and spectra analysis. The outside interference issue may be resolved by implementing various iterative optimization technologies, as recent studies have demonstrated the capability of such an approach [169,170]. With regard to the calibration model maintenance, different offline, adaptive methodologies have been well presented by Kadlec et al. [171], but the online, continuous update with streaming data may be an option moving forward.

At the virtual end, recent research and technology development have shaped the general framework and applications. Libraries of models and system analysis tools exist to develop a fully connected virtual model. However, as mentioned in Section 3.2, the computational cost for many complex and integrated models is high, requiring the use of cloud and/or high-performance computing. The high computational requirement also hinders the use of models in real-time, which is a key component of the DT framework [4]. To resolve this issue, efficient computational algorithms and reduced order modeling approaches need to be implemented, as well as the efficient distribution of computational resources. Another relevant issue is that most models developed for the pharmaceutical industry are static, meaning that they only reflect the system at the time that the models are developed. The models do not update themselves as new data become available. Model maintenance is, therefore, required [172], and the goal is that this can be performed automatically by the virtual component [171,173,174]. These model maintenance problems can also be viewed as issues caused by a number of drifts (i.e., concept drift, model drift, data drift, sensor drift). Methodologies in handling drifts have been extensively studied in many electrical and computer engineering papers [175–178], but case studies in pharmaceutical manufacturing have not ye<sup>t</sup> been reported.

One of the most prominent issues includes the information communication between the two components. Table 3 illustrates a comparison between previous data integration frameworks that have been developed for pharmaceutical manufacturing. The limitations of each of these studies highlight the inability of current software tools and solutions to build a complete DT. Though the integration capability has been improving, it is noted that most of the current applications in the pharmaceutical industry only transfer data from the physical plant to the virtual component. The reverse is rarely seen. To have a fully integrated and automated DT, the information flow from the virtual component to the physical plant also needs to be established. The virtual plant should be able to change system settings and control the physical plant to help achieve an optimized process within the design space.


**Table 3.** A comparison of data integration studies presented for pharmaceutical manufacturing.

In addition, integrating data inside the physical manufacturing plant faces issues with homogeneity of the data format used by manufacturers [116]. A full manufacturing cycle requires the collection of online and o ffline data from di fferent departments and software. Though an increasing number of companies are adopting standard data formats and transfer protocols, the coordination among all di fferent data, software, and platforms is still a challenge. Currently, this coordination is more of a business and engineering decision within the companies using these systems. Poor integration and coordination often lead to the burden of using and maintaining multiple platforms and software. Because of this, many companies now prefer to purchase equipment and systems from a sole vendor, which is both a challenge and an opportunity for equipment and system providers.

The use of cloud databases and cloud-based data managemen<sup>t</sup> systems, data availability, stability of service, storage volume, and information security are all critical issues to be addressed [118]. As data are stored on the cloud, these data should be available when needed, which demands a high stability service and a rigorous business continuation plan. Many cloud platforms are using distributed technologies and cloud backups to resolve this issue, but the validity and reliability of the solutions need to be carefully studied before implementing them [179]. Moreover, with the implementation of IoT devices and various types of sensors, the volume of data collected from the manufacturing cycle can be extremely large. Even though many cloud platforms claim that they can coordinate the demanded storage capacity, it would result in an increasing burden to the company if the storage cost is high. With regard to information security, the issue is not new to the field of cloud storage, but it is particularly relevant to the pharmaceutical industry since the majority of the information is

highly confidential, and cases have shown that a vulnerable cyber system in pharmaceutical companies can cost millions or even billions of dollars. This challenge gives rise to opportunities in research and employment of cyber-physical security systems to ensure the safety and confidentiality of the information being transferred. This field has been a hot topic, especially in electrical and computer engineering disciplines. Methodologies used in securing smart grids, statistical-based authentication systems, physical and virtual cyber barriers, etc. can be implemented in pharmaceutical manufacturing to develop a secure DT.

Finally, regulatory perspective is an important consideration in developing and applying DT in pharmaceutical manufacturing. The US FDA has developed modeling capability and has granted funding to academic institutions to explore the appropriate application of process models and DTs in the field. Various guidelines, reports, and presentations have all demonstrated that the regulatory experience and exposure to the DT concept is currently evolving [27,180]. Though DT development is not required for regulatory approval, its components can definitely o ffer pharmaceutical companies and regulatory bodies more insight into the process and product.

#### **4. Digital Twin in Biopharmaceutical Manufacturing**

Biopharmaceutical manufacturing focuses on the production of large molecule-based products in heterogeneous mixtures, which can be used to treat cancer, inflammatory, and microbiological diseases [181,182]. To fulfill the FDA regulations and obtain safe products, biopharmaceutical operations should be strictly controlled and operate under a sterilized process environment.

In recent years, there is an increasing demand for biologic-based drugs that drives the need for manufacturing e fficiency and e ffectiveness [183]. Thus, many companies are transitioning from batch to continuous operation mode and employing smart manufacturing systems [182]. DT integrates the physical plant, data collection, data analysis, and system control [4], which can assist biopharmaceutical manufacturing in product development, process prediction, decision making, and risk analysis, as shown in Figure 4. Monoclonal Antibody production is selected as an example to represent the physical plant, which includes cell inoculation, seed cultivation, production bioreactor, recovery, primary capture, virus inactivation, polishing, and final formulation. These operations produce and purify protein products. Quality (majorly protein structure and composition) and impurities need to be monitored and transported to a virtual plant for analysis and virtual plant updates. Virtual plant includes plant simulation, analysis, and optimization, which guide the physical plant diagnosis and update with the help of the process control system. Integrated mAb production flowsheet modeling, bioreactor analysis and design space and biomass optimization are selected as examples shown in the three sections in the figure. However, the capabilities of virtual plant are not limited to the examples list above. To understand the progress of DT development in biopharmaceutical manufacturing, this section reviews the process monitoring, modeling and data integration (virtual plant, physical plant communication) in the existed industry and analyzed possibilities and gaps to achieve integrated biopharma-DT manufacturing.

**Figure 4.** Biopharma process, benefits, and DT connections.

#### *4.1. PAT Methods*

Biological products are highly sensitive to cell-line and operating conditions, while the fractions and structures of the product molecules are closely related to drug efficacy [184]. Thus, having a real-time process diagnostic and control system is essential to maintain consistent product quality. However, process contamination needs to be strictly controlled in the biopharmaceutical manufacturing; thus, the monitoring system should not be affected by fouling nor interfere with media to maintain monitoring accuracy, sensitivity, stability, and reproducibility [185]. In general, among different unit operations, process parameters and quality attributes need to be captured.

Biechele et al. [185] presented a review of sensing applied in bioprocess monitoring. In general, process monitoring includes physical, chemical, and biological variables. In the gas phase, the commonly used sensing system consists of semiconducting, electrochemical, and paramagnetic sensors, which can be applied to oxygen and carbon dioxide measurements [185,186]. In the liquid phase, dissolved oxygen, carbon dioxide, and pH values have been monitored by an in-line electrochemical sensor. However, media composition, protein production, and qualities such as glycan fractions are mostly measured by online or at-line HPLC or GC/MS [186,187]. The specific product quality monitoring methods are reviewed by Guerra et al. [188] and Pais et al. [189].

Recently, spectroscopy methods have been developed for accurate and real-time monitoring for both upstream and downstream operations. The industrial spectroscopy applications mainly focus on cell growth monitoring and culture fluid components quantifications [190]. UV/Vis and multiwavelength UV spectroscopy have been used for in-line real-time protein quantification [190]. NIR has been used for off-line raw material and final product testing [190]. Raman spectroscopy has been used for viable cell density, metabolites, and antibody concentration measurements [191,192]. In addition, spectroscopy methods can also be used for process CQA monitoring, such as host cell protein and protein post-translational modifications [187,193]. Research shows that in-line Raman spectroscopy and Mid-IR have capabilities to monitor protein concentration, aggregation, host cell proteins (HCPs), and charge variants [194,195]. The spectroscopy methods are usually supported with chemometrics, which require data pretreatments such as background correction, spectral smoothing, and multivariant analysis for quantitative and qualitative analysis of the attributes. Many different applications of spectroscopic sensing are reviewed in the literature [187,188,190,193].

#### *4.2. Process Modeling*

The application of DT in biopharmaceutical manufacturing requires a complete virtual description of physical plant within a simulation platform [4]. This means that the simulation should capture the important process dynamics in each unit operation within an integrated model. Previous reviews have focused on the process modeling methods for both upstream and downstream operations [183,196–200].

For upstream bioreactor, extracellular fluid dynamics [201–203], system heterogeneities, and intracellular biochemical pathways [204–215] can be captured. Processmodeling supports early-stage cell-line development, obtains optimal media formulations, and enables prediction of the overall bioreactor performance, including cell activities, metabolites' concentrations, productivity, and product quality under different process parameters [216,217]. The influence from various parameters such as temperature, pH, dissolved oxygen, feeding strategies, and amino acid concentrations can be captured and further used to optimize process operations [218–222].

For downstream operation, modeling strategies have focused on selecting design parameters, adjusting operating conditions, and buffer usage to achieve high protein productivity and purities efficiently. The different operating conditions include (1) flowrate, buffer pH, or salt concentration effects for chromatography operation [223–226]; (2) residence time, buffer concentration, and pH used for virus inactivation; (3) feed protein concentration, flux, retentate pressure operated for filtration [227]. Thus, the product concentration and various types of impurities can be predicted for each unit operation. The detailed modeling methods have been reviewed in the literature [228].

In recent years, biopharmaceutical companies are shifting from batch to continuous operations. It remains an unanswered question if it is feasible to start up a new, fully continuous process plant or replace specific unit operations with continuous units. Integrated process modeling provides a virtual platform to test various operating strategies such as batch, continuous, and hybrid operating modes [229]. These different operating modes can be compared based on life cycle analysis and economic analysis for different target products under various operation scales [229–233].

For flowsheet modeling, there are two approaches available in the literature, which include mechanistic and data-driven models. Due to the high computational cost, mechanistic modeling mostly focuses on the integration of a limited number of units, such as the combination of multiple chromatography operations [234]. Data-driven/empirical models are generally used to integrate all the unit operations in a computationally efficient way. Mechanistic models for a single unit can be integrated with other units that are built by the data-driven model to optimize a specific unit in the integrated process [235]. Mass flow and RTD models [236] can be included in the model to examine different scenarios of adding and replacing new unit operations and adjusting process parameters. Coupling with the control system, flowsheet modeling will be able to achieve real-time decision making and optimize the overall process operation automatically [237].

The data-driven models can be further integrated with Monte Carlo analysis or linear/nonlinear programming for risk assessment and process scheduling. Zahel et al. [238] applied Monte Carlo simulation in the end-to-end data-driven model, which can be used to estimated process capabilities and provide risk-based decision making following a change in the manufacturing operations.

Table 4 shows examples of capabilities and methods for process modeling, that can be potentially used in DT virtual plant model building. However, it needs to note that although process modeling has capabilities to capture all the above operating conditions and critical quality attributes, none of the modeling work incorporates all the process information within a single model. In recent years, hybrid models (for example, ANN + mechanistic model) have become more prevalent in both upstream and downstream model building because they improve the computational speed as well as the broad applications and model robustness.

**Table 4.** Capabilities and methods for process modeling in biopharmaceutical manufacturing. Note that many studies have used these methods, and the studies cannot be listed one by one. The papers selected in the table are used to represent the capabilities of the specific methods.



#### **Table 4.** *Cont.*

#### *4.3. Data Integration*

Data obtained in the biopharmaceutical monitoring system are usually heterogeneous in data types and time scales. They can be collected from different sensors, production lines (laboratory or manufacturing), and at different time intervals. With the development of real-time PAT sensors, a large amount of data is obtained during biopharmaceutical manufacturing. Thus, data preprocessing is essential to handle missing data, perform data visualization, and reduce dimension [253]. Casola et al. [254] presented data mining-based algorithms to stem, classify, filter, and cluster historical real-time data in batch biopharmaceutical manufacturing. Lee et al. [255] applied data fusion to combine multiple spectroscopic techniques and predict the composition of raw materials. These preprocessing algorithms remove noise from the dataset and allow the data to be used in a virtual component directly.

In DTs, virtual components and physical components should communicate frequently. Thus, the virtual platforms need to have the flexibility to adjust their model-structure for different products and operating conditions. Herold and King [256] presented an algorithm that used biological phenomena to identify fed-batch bioreactor process model structure automatically. Luna and Martinez [257] used experimental data to train the imperfect mathematical model and corrected model prediction errors. Although there are no such applications for the integrated process, these works show the possibilities to achieve physical and virtual component communication.

In biopharmaceutical manufacturing, the integrated database can guide process-wide automatic monitoring and control [258]. Fahey et al. applied six sigma and CRISP-DM methods and integrated data collection, data mining, and model predictions for upstream bioreactor operations. Although the process optimization and control have not been considered in this work, it still shows the capabilities to handle large amounts of data for predictive process modeling [259]. Feidl et al. [258] used a supervisory control and data acquisition (SCADA) system to collect and store data from different unit operations at each sample time and developed a monitoring and control system in MATLAB. The work shows the integration of supervisory control with a data acquisition system in a fully end-to-end biopharmaceutical plant. However, process modeling has not been considered during the process operations, which cannot support process prediction and analysis.

#### *4.4. Challenges and Opportunities*

In terms of process monitoring in the physical plant, the application of real-time CQA monitoring methods has not been adapted to industrial applications. The use of NIR or Raman spectroscopy shows potential in real-time multicomponent measurements, although most applications have not ye<sup>t</sup> been applied to industrial practice. To obtain accurate predicting/measurement results, raw material calibration and chemometric methods need to be applied, which increases the complexity of the application of spectroscopy. In addition, the data obtained from biopharmaceutical manufacturing are high dimensional and heterogeneous, which require advanced data integration and synchronization. An automated data aggregation, mining, storage, and visualization system is required to achieve DT automation. The data storage system should have large enough capability, easy accessibility, and high security as described in Section 3.4 to ensure manufacturing data security, patient data privacy, and the communication between the physical and virtual plant successfully.

To build a simulation of the physical plant, although di fferent modeling methods have been developed for both upstream and downstream unit operations, there is no robust model that captures CPPs and CQAs for all the unit operations in the integrated process. As listed in Table 4, upstream CFD, stoichiometric and kinetic models can achieve the bioreactor modeling on di fferent scales (from genome scale to manufacturing scales); however, not all these methods can be implemented within a DT framework because of the high computational cost. Similarly, downstream processes composed of di fferent unit operations that integrate and optimize all the mechanistic models altogether are not realistic. Thus, these can explain the reason why the current integrated process models focus on mass balance and activity plans based on empirical models or simulators. To deal with this problem, one possible way is to apply pre-analysis to the system to reduce the dimension and parameters by evaluating the CPPs and CQAs to ensure productivity and e fficacy. Based on the analysis, the system will select models and use the limited number of parameters to analyze or optimize the process. In this case, all di fferent modeling methods need to be built on the same platform or have good model–model communications. An alternative way is to apply hybrid models to reduce the computational burden in the integrated process. In addition, to capture the major unit operations, the auxiliary equipment such as bu ffer preparation, Cleaning-In-Place (CIP), and Sterilization-In-Place (SIP) also need to be integrated into the process modeling. These operations do a ffect decision-making, including manufacturing scheduling and cost analysis. However, there is no such model that captures all the auxiliary equipment. Moreover, in the risk analysis in biopharmaceutical manufacturing, process contamination will directly cause batch failure. Lot to lot variations also exist in the bioreactor culture and purification process. Developing a model-based control system that can diagnose the contamination and process variabilities at an early stage is essential to improve the process e fficiency. It is known that pharmaceutical or biopharmaceutical industries follow more stringent regulatory pathways; thus, the progress of accepting new technologies usually takes a longer time than other industries. It must be noticed that current technologies such as AI DTs do not conform to the QbD regulatory guidelines. The good news is that regulatory agencies are also seeking the adoption of innovative technologies. If DT can be developed for process operations and control at the same time, this method might be promising to be accepted by regulatory [260]. However, the DT approach is closely related to real-time optimization and operation supports, which are based on already built manufacturing platforms. In this situation, it might be hard to obtain regulatory approval [235].

The integration of virtual plant and physical plant in biopharmaceutical manufacturing is still in its infancy. It is promising to show that the application of data–model–control integration can be achieved for a single unit operation. Additionally, a data acquisition–control system can be achieved for an integrated process. However, to accomplish the biopharmaceutical DT, the development of real-time data acquisition, a dedicated data transferring system, an e ffective control and execution technique, robust simulation methods, anomaly detection, prediction tools, and easy access to secure the cloud server platform are still needed.
