1. Introduction
In this paper, we describe the work related to smart sensors carried out as part of the EU Horizon 2020 project AIDPATH (Artificial Intelligence-driven, Decentralized Production for Advanced Therapies in the Hospital) for the control and monitoring task of the cell expansion process [
1]. In AIDPATH, an adjustable AI-driven platform for automated Chimeric Antigen Receptor T cell (CAR-T cell) manufacturing is being developed containing a set of modular software and hardware tools whose objective is to provide equitable and affordable access to advanced therapies [
2,
3].
The incorporation of smart sensors facilitates the creation of sophisticated control strategies, providing a comprehensive understanding of the bioprocess and enabling more effective control and optimization. The disruptions in signals from hard sensors during cell culture can impact not only the process itself but also the loop control strategies in the bioreactors. These disturbances may be caused by factors such as environmental fluctuations, inherent biological system variability, or equipment malfunctions. Despite their significance, the user overseeing the system may struggle to determine whether these disturbances are temporary, within the expected range of variations, or indicative of a more significant issue that might require manual intervention.
Within the scope of this study, the AIDPATH platform, especially the FACER bioreactor, will benefit from the ability to identify disruptions in hard sensors caused by bubbles or other transient fluctuations. These disruptions have the potential to trigger unintended reactions, such as in the metabolite-dependent media feeding process. Recognizing these disruptions may play a crucial role in fine-tuning internal control strategies. Furthermore, the integration of smart sensors might be highly advantageous in notifying users of unexpected or unfamiliar situations that may require their attention. Thus, smart sensor implementation serves to establish an additional layer of control enabling a more resilient and optimized bioprocess.
In the last five years (2018–2023), advances in hardware, software, and sensor technologies have given rise to a diversity of applications and techniques applied to optimizing the control and monitoring of the cell culture process. For example, in [
4] a survey was conducted of automated cell expansion trends and the outlook of critical technologies. Key performance indicators (KPIs) and attributes that are mentioned include foaming, cell count, cell viability, glycosylation of proteins, biomass, cell morphology, and size. With respect to key technologies, fluorescence and Raman spectroscopy are mentioned for the identification of molecular specificities. Furthermore, chemometrics, principal components analysis (PCA), and artificial neural networks (ANN) are mentioned as techniques to extract key information and patterns. On the other hand, [
5] is a specific study of bioreactor automation driven by real-time sensing, in which a calibration curve is interpolated based on glucose, lactate, viable cell density, and total cell density. Wang et al. in [
6] survey the development of novel bioreactor control systems based on smart sensors and actuators. The following sensors were mentioned: RFID (Radio Frequency Identification) based hydrogen gas detector, cell growth monitor, multiparameter sensor (hardware, e.g., pH, temperature, and dissolved oxygen (DO)), and a combination of optical density, DO, and pH. It is to be noted that current bioreactors specialized in the production of cell and gene therapies include less monitoring due to the nature of the product, associated regulation, and lack of technologies that allow real-time monitoring of the product. Typical bioreactors for advanced therapies provide readings for temperature, pH, and DO, whereas a smaller number read oxygen (O
2) and carbon dioxide (CO
2) in the exhaust line, and even fewer bioreactors are able to monitor glucose and lactate.
In [
7], Reyes et al. conducted a more extensive survey of modern sensor tools and techniques for monitoring, controlling, and improving the cell culture process. Among the techniques identified are artificial neural networks to predict glutamine, glutamate, glucose, lactate, and viable cell concentrations; spectroscopy (near-infrared (NIR), mid-infrared (MIR), Raman, Florescence, …), optical sensors for O
2, pH, CO
2, free-floating wireless sensors, and PCA and partial least squares (PLS) regression to model viable cell density and antibody titers. Sensors are classified into seven main groups, as follows:
Data-driven sensors: PLS, PCA, ANN, and Fuzzy logic, including rules to describe unknown state variables from known measurements.
Model-driven sensors: Mass/energy balances, media composition vs. culture yield, kinetic equations, and thermodynamics.
Grey-box sensors: Mechanistic and data-driven. Dynamic modeling, Kalman filter, and system linearization using Taylor series expansion.
Soft sensors: Use non-invasive online spectroscopic methods such as NIR/MIR, 2D fluorescence, and Raman spectral data given the multidimensional complexity of the signal and the need for multivariate data analysis to relate the data to relevant process parameters.
Cascade control: Involve two feedback controllers used to improve the dynamic response of the controllers by distributing the disturbance over a secondary loop where corrective measures are taken without affecting the primary loop. It is mentioned that this type of controller has been successfully applied in bioprocessing, particularly to control dissolved oxygen.
Model predictive control: The controller response is based on a process model, which can be mechanistic, hybrid, or data-driven in origin. The model is capable of forecasting process events given process conditions and measurements from various input sensors.
Fuzzy control: Transforms quantitative data into qualitative parameters by converting numerical data into a membership function, which is a value between 0 and 1 that defines the degree to which a certain variable fits a given fuzzy set. The values in the 0–1 scale are dependent on a predetermined knowledge of the range of possible values.
A survey of recent advances in soft sensors for the monitoring, control, and optimization of industrial processes can be found in the paper by Jiang et al. [
8]. In the recent literature, one of the focuses of soft sensor technology is ‘deep learning’ which typically involves multi-layer neural networks [
9,
10,
11,
12,
13]. In [
9], Chai et al. apply deep learning to resolve missing data, whereas [
11,
12] deal with noise and uncertainty in data. In [
13], Ha et al. categorize smart sensor systems as neural network-based vs. non-neural network-based, indicating typical non-neural techniques such as linear regression, principal components analysis, support vector machines, and random forest, among others. Smart sensors in the context of Industry 4.0 and the Internet of Things (IoT) are considered in [
14,
15], and in [
14] Kalsoom et al. indicate that key features of industrial smart sensors include: low cost, data preprocessing, self-calibration, and self-diagnostics, among others. Finally, [
16,
17] consider data processing of time series in the context of outlier and anomaly detection for industrial applications.
The remainder of this paper is organized as follows:
Section 2 explains and describes the infrastructure for the application of the smart sensors: the AIDPATH platform and the Aglaris FACER bioreactor;
Section 3 provides details of the methods, experimental setup, datasets and smart sensor background, and development;
Section 4 presents the results of applying the smart sensors to the batch run data, including validation of the alerts using a consensus approach;
Section 5 provides a discussion section which reflects on relevant issues and considerations; finally,
Section 6 presents the conclusions.
3. Methods
In this section, the methods, experimental setup, datasets, and smart sensor development are described. The batch run (FACER) datasets are first detailed (
Section 3.1), followed by the smart sensor background and development (
Section 3.2), the Weighted Weighted Average (
WWA) smart sensor application to the FACER datasets (
Section 3.3), and the Fuzzy smart sensor (
Section 3.4).
3.1. Batch Run (FACER) Datasets
The datasets used for smart sensor development were captured from five expansions/batches of T cells or CAR T cells at clinically relevant scales conducted in the FACER. The general procedure used for cell production is described as follows: T cells were selected from peripheral blood mononuclear cells (PBMCs) and cryopreserved. After thawing, T cells were activated with TransAct™ (Miltenyi Biotec, Bergisch Gladbach, North Rhine-Westphalia, Germany) and expanded in the FACER in TexMACS™ medium combined with Human interleukin (IL)-7 and Human IL-15 (Miltenyi Biotec). Antibiotics were added in runs 1–4 but not in run 5. Human Serum was added in runs 1–3 and 5 where donor T cells were expanded but not in run 4 where patient CAR T cells (Clínica Universitaria Navarra, Pamplona, Spain) were expanded. Transduction in run 4 was performed with CAR lentivirus, provided by Clínica Universitaria Navarra. Each batch run used different initial conditions (quantity and donors/patients), and variable process conditions to provide different scenarios for smart sensor development and validation. During the expansion, sensor readings were polled at 10 s intervals.
Eight sensor measurements were pre-selected by the bioreactor process experts as the most significant for the cell expansion process, being those shown in
Table 1.
For the validation, different combinations of the datasets were used to calibrate and then test the smart sensors. For example, a first calibration was performed using batch runs 1 and 2 (more stable), then validation with batch runs 3 to 5 (less stable). Then, a second calibration was performed using batch runs 3 to 5, with validation on batch runs 1 and 2. In this way, thresholds were found that were optimal and a trade-off for all five batch runs.
Regarding the nature of the datasets, batch runs 1 and 2 were quite stable runs that output the expected cell production, and batch run 3 is an example of a biologically successful run that experimented with a mechanical issue mid-culture which stopped perfusion and affected the monitoring and sensing. Hence, batch run 3 can be used to test whether the smart sensors are capable of detecting, and maybe in the future, minimizing the effect of this kind of event. Batch runs 4 and 5 are successful runs with some instabilities that should be detected by the smart sensors. For all datasets, an initial unstable phase is expected when all parameters are slowly stabilizing around the set conditions, followed by a more stable phase.
For control and monitoring, it is important to identify anomalous or non-optimal conditions and to provide a stable signal (without non-biological disturbances) as a guide for the PID controller. From the sensor data value distributions in
Figure 3 (batch run 2), it can be seen that some distributions have multiple peaks, and values are also dependent on the corresponding time period during cell expansion, as well as the setpoints defined during the process.
Figure 3 shows the plots for each of the sensors during batch run 2, with time in days represented on the
x-axis and the sensor value on the
y-axis. It can be seen that for batch run 2, the CO
2 sensor was inactive and the GasFL sensor value was constant. On the other hand, the glucose and lactate sensor values show a characteristic trend that is related to the evolution of the cell expansion perfusion process. Also, the DO sensor shows a progressive decrease whereas the temperature sensor value oscillates through the batch run (however, only through a small absolute range as shown on the
y-axis).
Figure 4 shows the plots for each of the sensors during batch run 1, which is representative of other productions in the Aglaris FACER using the same perfusion protocol. After the initial pre-stabilization stage, it can be observed that glucose tends to stabilize towards its setpoint, while lactate increases slightly over time. On the other hand, pH decreases compared to the values in the fresh media and DO displays some oscillation with respect to its mean value. O
2 and CO
2 concentrations in the gas phase remain stable (with small-scale oscillations) around their setpoints, with temperature and GasFL behaving in a similar manner.
Figure 5 shows the plots for each of the sensors during batch run 3, in which there was a mechanical shut-down in the middle of the batch run (between days 3 and 4). However, the system was able to restabilize and complete the run with acceptable cell count and quality. The steep drop and recovery of the glucose and lactate values can be seen in a short period when the system was in shut-down and the sensors were not active. From day 4 onwards, setpoint readjustments to compensate for the shutdown have more atypical values and trends (e.g., CO
2, GasFL) with respect to normal runs 1 and 2 (refer to
Figure 3 and
Figure 4).
3.2. Smart Sensor Background and Development
The following gives the background and theoretical basis of the techniques used for the smart sensors developed in this work, and how they are applied to the control and monitoring of the Aglaris FACER. The smart sensor techniques are Signal Disturbance Indicator, Bollinger Bands,
WWA aggregation, and Fuzzy controller, with the corresponding hard sensor inputs as depicted in
Figure 6. Finally, the consensus approach is detailed for polling the different techniques to obtain an overall decision support recommendation for the platform’s control and monitoring.
3.2.1. Signal Disturbance Indicator
The Signal Disturbance Indicator (SDI) evaluates alterations in the signals and characterizes whether they are due to non-biological events. Some of these non-biological alterations could be a drop in the signal from glucose (accompanied by a drop in the lactate signal) due to a sudden change in temperature (i.e., the user opening the door to perform samplings), and/or transitory bubbles in the sensor array. Any of these disturbances, not due to cell behavior, might alter the signal trends (glucose, lactate, DO, pH) and the PID response may result in inaccuracy at those timepoints. Evaluating these events and identifying if they are related to cell behavior or not is valuable information that can be looped back into the system in order to correct and mitigate undesired effects.
Although the Aglaris FACER has temperature and bubble sensors in different parts of the circuit to measure these events, smart sensors can become a key tool to add extra valuable information to those measurements.
The developed SDI uses simple mean and standard deviation together with covariance in order to detect signal disturbances (i.e., a significant simultaneous drop in both glucose and lactate readings over a relatively short period of time). The cut-offs are established by statistical analysis of the available historical data, taking into consideration the standard deviation (σ) of the previous 30 min time window for the glucose (
G) and lactate (
L) trends over time. In particular, an event is defined as a disturbance if:
and
where
i = 1:number of timepoints and
is a constant varying from 2 to 4 in order to estimate the magnitude of the disturbance’s effect (small, medium, and large), respectively.
3.2.2. Bollinger Bands [24,25]
Bollinger Bands (BB) consist of an N-period (time window of N) moving average (
MA), an upper band at
K times an N-period standard deviation above the moving average (
MA + σ
K), and a lower band at
K times an N-period standard deviation below the moving average (
MA − σ
K). The resulting plot can thus embody stochastic/kinetic behavior and/or assumptions of the systemic setup and is not limited to the raw data value per se. The Bollinger band is typically used for tracking time series in financial data (stocks, etc.) but has also been successfully used for engineering applications. Thus, at point
i:
gives the upper band, and:
gives the lower band.
For the bioreactor data (runs 1 to 5) the K value was calibrated to K = 8.
Next, the width of the range (from lower to upper band) was calculated:
In addition, the distance
of data value
i (
) from each bound:
We take the smallest distance:
Obtain the distance as a percentage:
The alert threshold
was assigned as 15% in the case of glucose and 20% for lactate. This was set together with the bioreactor expert by studying the results of processing data from different batch runs. Hence, if the distance percentage
becomes less than or equal to the threshold
, the alert is triggered (assigned as 1); otherwise, the alert is assigned as zero, thus:
With reference to the application of Bollinger band sensors to the FACER control and monitoring, the following observations are made:
If the current sensor value becomes too close to the upper or lower bound, this can trigger an action/alert.
Trends can be identified—in general, the bands should reduce/converge towards the mean during the perfusion stage. The mean value for glucose should stabilize, which is considered a positive evolution given that a key control criterion is to keep the glucose level stable.
The overall concept of the Bollinger Band Smart Sensor for Glucose and Lactate monitoring is illustrated further in
Figure 7.
3.2.3. WWA—Weighted Weighted Average
WWA represents an adaptation of the WOWA (Weighted Ordered Weighted Average) aggregation operator [
26,
27,
28].
WWA is a versatile data aggregation and weighting technique, which on the one hand provides a scaling weight for variable values, and on the other hand, provides a critical range weight which represents the criticality of variations in the data values of each variable.
where:
si is the distance from the set point or reference value of the data value of sensor i.
vi is the scaling weight of sensor i.
wi is the critical range weight, representing the criticality of distance from the set point or reference value of sensor i (si).
(e.g., for glucose, if si < 0.8, then wi = 0; if si ≥ 0.8, then wi = 1.
n is the number of sensors.
As well as the data rows, the input to the WWA includes two vectors (scaling and range criticality) which are used to weight and merge the data in a flexible and customizable way.
The WWA is particularly useful for sensors and sensor data, where it is not desirable to entirely exclude variables/data values. WWA instead dampens or potentiates them using the weights depending on their relative scaling and critical range. One challenge is the correct assignment of the weights, which can be completed by a combination of statistical evaluation and domain expert knowledge and insights. The scaling weight of each sensor variable (defined by the process experts) can be interpreted as their impact on overall system behavior, control, and outcome. The critical range weight uses a threshold (also defined by the process experts) for each sensor: if the sensor value is below the threshold, the criticality weight is 0; if the sensor value is equal to or above the threshold, the criticality weight is 1. The weights and thresholds were validated with the bioreactor expert by observing the WWA outputs for different historical batch runs.
3.2.4. Fuzzy Controller [29,30,31,32]
Fuzzy control does not necessarily require any initial knowledge of system dynamics. It transforms quantitative data into qualitative parameters by converting numerical data (such as glucose sensor data range) into a membership value for different fuzzy sets, giving a value between 0 and 1 that defines the degree to which a certain variable fits a given fuzzy set. The values in the 0–1 scale are dependent on a predetermined knowledge of the range of possible values.
Fuzzy controllers are based on fuzzy set theory which have an advantage with respect to other deterministic and classifier systems: they allow membership of a data record to more than one class, each with its ‘grade of membership.’ As an example, consider the distance of the glucose level from the glucose sensor set point (as was explained for the WWA smart sensor). This distance can belong to two different fuzzy sets: ‘normal’ and ‘alert.’ For a given glucose level, the distance from the set point could belong to ‘normal’ with a membership value of 0.75 and to ‘alert’ with a membership value of 0.25.
Fuzzy sets are those whose elements have degrees of membership. Fuzzy sets were introduced by Zadeh in 1965 as an extension of the classical notion of set. Formally, a fuzzy set is a pair where is a non-empty set and is a membership function. The reference set is called the universe of discourse, and for each , the value is called the grade of membership of in . The function is called the membership function of the fuzzy set .
For a finite set , the fuzzy set can be denoted by
Let . Then, is called
- -
not included in the fuzzy set if (not a member);
- -
fully included if (full member);
- -
partially included if (fuzzy member).
The non-fuzzy (crisp) set of all fuzzy sets in a universe is denoted with .
3.2.5. Consensus
Four different paradigms are applied from statistics (SDI, Bollinger) and artificial intelligence (WWA, Fuzzy) with a similar objective, which is to measure and quantify the ‘stability’ of the system. This makes it possible to establish a ‘consensus’ approach which asks each technique and smart sensor for a stability evaluation, and then compares the similarity/difference of the replies and applies a function to show the consensus. The consensus can be based on, for example, the number of smart sensors giving an alert at the same time, so if 3 or more of the smart sensors indicate an alert, according to the previously calibrated thresholds, then the human operator can be recommended to pay special attention to the situation.
Applying consensus theory is a good approach for critical systems, as it avoids overfitting on any one technique, and increases the confidence factor of an overall alert/recommendation to the human operator of a control and monitoring system.
3.3. WWA Smart Sensor Applied to the FACER Datasets
The
WWA includes the intrinsic information value of the descriptive data in terms of scaling of the sensor value and range criticality of the data value. To develop the
WWA smart sensor, the pre-selected sensors were segregated into 2 categories: sensors with setpoints (SP) and sensors without setpoints (NSP). In sensors with a setpoint, the sensor reading should approximate the sensor setpoint. For the rest, no sensor setpoint is defined, but a probable range for the sensor values during perfusion was defined for each sensor based on historical data. The average point is the reference point used in the
WWA smart sensor. A summary of the set points and reference values for the eight sensors is given in
Table 2. It can be seen that the set point for glucose was 20 (mM) for batch runs 1 to 4 and 19 (mM) for batch run 5.
For the
WWA to add extra value to the signals in the system, we make use of weighting and probabilistic values. For these datasets, the weights which are shown in
Table 3, depend on the parameters and conditions initially selected (
Table 2 and
Table 3).
Table 3 shows the
scaling weight,
critical range distance, and
critical range weight assignments to the selected sensors. For a given sensor, if the distance of the sensor value from the set point (
Table 2) is greater than or equal to the criticality distance (
Table 3), then the criticality weight will be 1 (which indicates ‘alert’), otherwise it is 0 (indicating ‘normal’). This approach is used for the
WWA aggregation function smart sensor.
The first weighting vector refers to the scaling of a sensor value with respect to the values of other sensors. The scaling of a sensor value is considered static, i.e., it does not vary during a batch run or for different batch runs (as long as the production conditions remain the same, i.e., type of media, perfusion mode, and range of cell number). In
Table 3, it can be seen that the scaling weight is 1 for O
2, CO
2, T, GasFL, lactate, and pH. On the other hand, a scaling weight of 0.5 is applied to glucose and DO. This is because the values of glucose and DO are relatively greater than the other sensors, hence the weight value of 0.5 reduces their effective values for input to the aggregation operator.
The second weighting vector applied in the WWA aggregator function considers the criticality of data values of each sensor, and it is therefore dynamic, dependent on the sensor value at each time interval.
From
Table 3, it can be seen that the criticality distance is in general different for each sensor, and as mentioned previously the criticality weight can have a value of 0 or 1 depending on whether the sensor value distance from the set point is within the critical distance or not.
3.4. Fuzzy Smart Sensor
The fuzzy controller allows multiple grades of membership to different states, which may occur in ambiguous situations or noisy processes.
Figure 8 shows the definition for the glucose sensor (represented by the absolute difference of glucose value to glucose setpoint) is represented by two fuzzy sets, ‘normal’ and ‘alert’. ‘Normal’ ranges from 0 to 1.2 and ‘alert’ ranges from 0.4 to infinity. This provides an alert with the grade of membership based on the distance of the glucose value from the setpoint. For example, with reference to
Figure 8, if the distance from the setpoint is 0.8 (
x-axis value), the grade of membership to ‘normal’ will be 0.5, and the grade of membership to ‘alert’ will be 0.5. If the distance from the setpoint is 1.0, the respective grades of membership to ‘normal’ and ‘alert’ will be 0.25 and 0.75, and if the distance from the setpoint is 1.5, the grades of membership will be 0.0 and 1.0, respectively.
Finally, in summary of the definitions of the smart sensors in
Section 3,
Table 4 shows the threshold values assigned for each smart sensor, calibrated from batch runs 1 to 5. If the smart sensor value is greater than the threshold, the alert value is 1, otherwise 0. The threshold values were calibrated by the bioreactor experts, evaluating the alerts produced in each batch run and then adjusting to correspond to the expected alerts.
5. Discussion
With reference to
Table 3, the scaling weights for each sensor were determined as follows. Initially, the idea was to have values between 0 and 1. However, during the calibration with the bioreactor experts, it was decided to assign 0.5 to the glucose and DO sensors and 1.0 to the other six sensors, as a value of 0.5 was found to scale down the values of glucose and DO with respect to the other six sensors. The key aspect is that the
WWA aggregator smart sensor should produce an alert output that makes sense with respect to the batch run and the other sensors. In future work, the scaling weights could be further adjusted.
With respect to the five batch runs performed to evaluate the system and smart sensors, they were chosen and parameterized by the bioreactor experts and project coordinators within the design of the experiment as an initial calibration with mainly healthy patients. In the next steps of the project, more healthy donor batch runs will be added, and the system and the smart sensors will be further calibrated.
With respect to the calibration of the smart sensors across different batch runs (those that were stable versus those that were less so), this was completed in collaboration with the bioreactor experts who had actually performed the batch runs and had detailed knowledge of them. The adjustment of the alert thresholds should avoid too many alerts on the one hand, and insufficient alerts on the other, following the interpretation of the bioreactor experts for each batch run. More specifically, false positives (alerts without events) and false negatives (events without alerts) were scrutinized The aggregation smart sensors were more of a challenge given that they aggregate several hard sensors to provide one alert output, thus having a more complex behavior when compared to other smart sensors which depend on one or two hard sensors.
With respect to the unexpected shutdown and the ensuing re-stabilization in the third batch run, even though it was a mechanical issue, the Glu and Lac were decreasing progressively to 0, and this should be detected as a non-cellular behavior. Further calibration of the smart sensors and the consensus algorithm may be required to better detect a major event such as a mechanical shut-down. This could be completed by not only considering the number of simultaneous alerts but also the time the consensus is above three, for example.
Looking ahead to possible enhancements or new functionalities for the smart sensors to improve their detection and management of bioprocess instabilities, within the scope of the project new batch run data of healthy donors will be obtained, which will allow for further calibration so the system can generalize to as many potential patient cases as possible that can occur in a real hospital environment.
With respect to the manner in which the smart sensors inform decision-making during the cell expansion process in the Aglaris FACER bioreactor runs, the individual alert plots and consensus plot will be shown on the PLC controller dashboard. The human operators will consult the dashboard and evaluate this information together with existing monitoring and alert displays. As well as the number of simultaneous alerts at a given moment, the time a given number of alerts remain simultaneous is also a consideration. The time duration can be made more explicit in a future version of the consensus algorithm.
Reflecting on the presented results and observations, future adjustments can be recommended for the calibration of smart sensor thresholds and weights for upcoming batch runs. Particularly, this will be to adapt the alert thresholds to avoid too many alerts on the one hand and insufficient alerts on the other. The relevant events will be interpreted together with the bioreactor experts for each new batch run, and to assign values that also work for all previous batch runs. Furthermore, as well as adjusting the thresholds, a time factor (duration) will be introduced to determine when a consensus becomes an alert that needs user (human operator) interaction.
With respect to current limitations, the current batch runs are limited to five, and to mainly healthy patients; however, they have been carefully selected by the bioreactor experts as viable for initial calibration of the platform.