1. Introduction
The quest for a quantitative diagnostic and a specific marker for ME/CFS has yet to identify a reliable candidate, whether through routine pathology markers, or research efforts in immunology, microbiology, neuroscience and elsewhere. A number of cytokines, for example transforming growth factor-beta (TGF-β) and interleukin-10 (IL-10), have shown previous promise, but have not ultimately delivered a validated diagnostic test [
1,
2,
3,
4,
5,
6,
7]. To the list of potential serum markers, we have recently added activin B, which was detected in a pilot research study involving volunteers recruited via CFS Discovery (Donvale Victoria) [
8].
Activin B, along with activin A, is a member of the activin family of proteins, which belong to the TGF-β superfamily of growth and differentiation factors. Follistatin is a high-affinity binding protein for both activins, with diverse roles in physiology that include reproduction, haematopoiesis, immune cell development, as well as inflammation and immunity. The biology of activin A, at the time of writing, is better understood than that of activin B, although there is evidence of differences in relation to hepcidin regulation, associated receptor binding and SMAD signalling [
9,
10,
11].
Following the activin findings from preliminary studies [
8], this investigation aimed to validate these previous observations on a separate and larger population recruited by the same ME/CFS clinic in Melbourne. As well as the activin focus, other aims included applying the results from pathology and clinical testing, with and without activin B, to the pattern recognition algorithm random forest (RF), to identify wider marker patterns that separate ME/CFS cases from healthy controls. In addition to the development of activin B as a serum biomarker, a longer-term aim is to develop simpler diagnostic tools from routine data to assist health professionals diagnose ME/CFS.
The report herein examines the diagnostic potential of serum activin B, both individually and in combination with other blood, serum and urine markers considered for the assessment of research participants. The investigation directly compared the ME/CFS cases to healthy controls, but also examined the application of the weighted standing time (WST), as a measure of symptom severity, to stratify the ME/CFS cohort into mild to severe classes prior to analysis.
2. Materials and Methods
2.1. Participant Recruitment and Ethics Approval
The recruitment of research participants and associated procedures were described previously [
8]. All the participants were recruited via CFS Discovery (Donvale, VIC, 3111), either via direct invitation to existing patients, or responses to advertising locally, and via social networking sites. Only participants with a previous ME/CFS diagnosis were recruited.
Human Ethics approval was granted by the ANU Human Research Ethics Committee (Approval No. 2015/193, approval date 29 June 2015), with approved consent forms and participant information provided to each potential participant. Inclusion in the study was allowed after signed consent was received by the researchers. Specific participant identifiers were not supplied to the researchers, and only known to the clinicians and clinic staff. Each research participant was given an identification code by the clinic, with age (at time of the appointment) and sex also provided. Eighty-five (85) participants were initially recruited for the ME/CFS cohort, with five eventually excluded due to comorbidities and/or difficulties attending the required appointments. Seventeen (17) healthy control (HC) participants were recruited too and underwent the same assessment as the ME/CFS cohort, giving a total study cohort size of 97 participants.
2.2. ME/CFS Assessment, Sample Collection and Tests
Each participant was examined by the CFS Discovery clinicians using the International Consensus Criteria to guide ME/CFS diagnosis [
12] (NB: the earlier pilot study used the Canadian Criteria, which was replaced by the International Criteria in 2011–12). To be included in the ME/CFS participant cohort, the International Consensus Criteria must have been satisfied.
All participants performed a test for orthostatic intolerance (standing test—see section on weighted standing time for details) that included the collection of autonomic data during repose and the standing task [
13]. After the standing test, non-fasting venous blood samples were collected for routine pathology testing, in addition to a parathyroid hormone (PTH), thyroid function testing (TFT), vitamin D and serum activin B [
8]. For participants who were able, 24-h urine samples were collected and the volume, sodium (Na
+), potassium (K
+) and creatinine 24-h excretion rates were calculated.
Qualitative symptom inventories and questionnaires were also conducted for each participant, including the Epworth Sleep Scale [
14] and the DASS-42 [
15,
16].
For the range of tests conducted, please refer to previous publications describing the CFS Discovery pilot studies [
8,
13].
2.3. Data Cleaning, Organisation and Structure
Data were collected for each participant as standard practice for the CFS Discovery staff and stored electronically in the secure clinic database. Each participant/patient file contained all the questionnaire and survey data, the printed pathology results (Australian Clinical Laboratories, South Australia), the standing test (orthostatic intolerance) data, including blood pressure (BP), heart rate (HR) and associated autonomic measurements and calculations, the standing time and standing difficulty, as well as clinical notes recording patient details (age, sex, weight, height).
An identification code was assigned to each research participant by CFS Discovery staff, after which data were matched and added to a spreadsheet for researcher interrogation. Heart rate (HR) data collected during the standing test was assessed for evidence of POTS (postural orthostatic tachycardia syndrome), with a HR increase of ≥30 beats per minute (bpm) upon standing from a lying position accepted as positive for co-morbid POTS [
13,
17]. The final data collection included the standing time and difficulty scores, WST calculations, POTS (yes or no), blood, serum and urine pathology results, serum activin B, DASS and Epworth Sleep scale results, along with notes on other conditions or comorbidities.
After the clinic appointment, the participants were asked to collect a 24-h urine sample within a week of the clinic visit. A minority of participants did not collect this sample, resulting in a number of missing values for urinary Na+, K+, creatinine and their 24-h excretion rates. With small to medium samples sizes, the median for each WST class was calculated and used to fill the missing values for each specific class.
2.4. Orthostatic Intolerance (OI) Assessment
Standing difficulty is a subjective ordinal scale developed by CFS Discovery clinicians, which with standing time (maximum of 20 min, recorded at two-minute intervals with autonomic measurements, as well as at repose before and after standing) is used to calculate the weighted standing time (WST). The standing difficulty scale ranges from 0 (no difficulty standing during 20 min upright) to 10 (extreme difficulty to maintain an upright stance). If the participant was not able to stand for at least 10 min, they were given a standing difficult score of 14. The participants who could maintain an upright stance for longer than 10 min, but not stand for the entire 20 min, were scored at 12 for standing difficulty. The standing difficulty scale has not been validated on other patient/participant populations.
2.5. Weighted Standing Time (WST)
The standing test procedure to assess orthostatic intolerance and detect POTS has been published previously [
13], with a British study finding similar rates of POTS in a cohort from northern England [
17]. Furthermore, the WST and its capacity to stratify ME/CFS severity, along with identify useful patterns in diagnostic markers, was recently published [
18].
In brief, the WST takes the standing time (0–20 min, recorded at 2-min intervals) and weights this time with the subjective standing difficulty score, as described by the following equation:
The WST, therefore, provides a proxy for ME/CFS severity and a response variable with which to investigate the significance of the predictor (independent) variables and their interactions. The results presented herein were generated from the analyses of WST severity classes, as summarised in
Table 1. With the majority of the study participants able to stand for the entire 20 min of the orthostatic intolerance (OI) test, standing time alone was not an effective response variable.
2.6. Statistics and Machine Learning
2.6.1. Statistical Analyses
All descriptive statistics, test (inferential) statistics and regression/correlation analyses were performed using SPSS (Version 22—IBM SPSS software, Chicago, IL, USA).
Prior to conducting the appropriate statistical analyses, all raw data collected for investigation were subject to a one-sample Kolmogorov–Smirnov (K-S) test to assess whether they fulfilled a normal distribution, with K-S results of
p ≤ 0.05 indicating that the specific marker distribution was significantly different from a normal curve. Based on the K-S results (
Table 2), statistical significance between two groups was estimated by a Mann–Whitney U test, and three or more groups by Kruskal–Wallis non-parametric tests. Jonckheere–Terpstra non-parametric tests were also applied where the groups were clearly ordinal. Descriptive results were presented as the median and 25th–75th interquartile range (IQR).
Significance was set at p < 0.05 for the two group comparisons using the Mann–Whitney U test, and also for comparisons across more than two classes in the Kruskal–Wallis (KW) test.
2.6.2. Machine Learning
R statistical programming version 3.5.1 was used to run the recursive partitioning algorithms random forest (R library randomForest) and decision trees (R library rpart) [
19,
20]. Algorithm tuning was performed via the R caret package [
21].
Random forest analysis (RFA) was performed using the WST classes summarised in
Table 1b. Due to class imbalance and the relatively small overall sample size, the healthy controls were combined with mild ME/CFS cases (
Table 1a) to create an adjusted WST class 0, and therefore provide a larger class sample size for subsequent RFA. Running the original WST classes (
Table 1a) as the response of interest resulted in very poor class prediction, and as such, an ineffective model, in spite of attempts to compensate with class balancing R script. Future studies will benefit from larger sample sizes, particularly for healthy control cases.
All the RFA results presented herein used the three-class (WST) model to detect predictors of absent or mild ME/CFS symptoms (0), compared to moderate (1) or severe (2) symptoms (
Table 1b).
Severe cases were characterised by their inability to remain upright for the full twenty minutes of the standing test for orthostatic intolerance. Missing values in the raw data were filled by the median for each WST category, prior to RFA. Missing data was most pronounced for the 24-h urine markers, with 15–20% missingness found due to test non-compliance after the CFS Discovery initial appointment. The total case numbers for the ME/CFS and healthy cohorts are summarised in
Table 2. Individual missing values were also found for serum urea and electrolytes, and MCH.
Via algorithm tuning (caret), all RFA had the following features:
mtry = 4 (4 predictor variables tried for splitting at each node); ntree = 5000 (5000 decision trees grown to determine predictor variable rankings). With the following features included—replace = TRUE (cases are replaced during algorithm bootstrapping), and importance = TRUE (as well as Gini Index ranking, scores based on permutation ranking).
As well as the primary RFA to detect and rank predictors of ME/CFS severity via WST, bagging and boosting ensembles for a variety of algorithms were tested in parallel using R statistical programming via the caret and caretEnsemble packages [
21,
22]. Bagging and boosting are resampling methods used by the algorithm of interest to increase prediction accuracy through reducing variation, or correcting errors during the analysis. The analyses presented allowed for the comparison of machine learning methods, and therefore the assessment of the best analytical strategy for the dataset of interest.
A number of machine learning options are available for the training and testing of data to reveal outcome predictors. To examine the best machine learning option, ensemble analyses that compared random forest analyses (RFA) to support vector machines (SVM), gradient boosting and decision trees, were conducted with the aims of assessing the comparative predictive accuracy of various machine learning techniques. The relationships between the various machine learning algorithm ensembles, presented as accuracy measures and kappa statistics, are summarised in
Figure 1.
For the WST analyses, RFA produced the best accuracy and kappa results, suggesting this as the most suitable ML method to apply. For a comparatively small data set (for this study, 97 in total), RFA provides a method whereby hundreds to thousands of trees can be propagated as one analysis, and therefore introduce extra robustness into the analysis, which likely explains the superior performance for this data set. Nevertheless, the limitations of the total sample size did reflect in the large differences in accuracy and kappa statistic results. Receiver operating curves (ROC) and associated results were calculated by RFA modelling of MCH, ALP, serum urea, blood lymphocytes, 24-h urinary creatinine and activin B.
Random Forest Analyses (RFA) were subsequently applied to binary outcomes representing the direct comparison of ME/CFS to HCs, as well as the stratification of ME/CFS severity by WST (
Table 1). Early investigations did not produce a model because of class imbalance between ME/CFS and HC categories, in spite of introducing class balancing script into the R code for RFA. Combining Healthy Controls with mild ME/CFS cases (
Table 1b) solved this problem, allowing the building of RFA predictive models of disease categories. All the results presented hereafter are on the adjusted WST classes, as summarised in
Table 1b.
To calculate the marker thresholds (e.g., ALP > or < 60 U/L), the recursive partitioning algorithm, decision trees, was used on the same dataset classified by WST, with trees developed also for the direct comparison ME/CFS to healthy controls, and the full WST classification from class 0–3 (
Table 1a). For all the trees, the minimum split was 20 and the complexity parameter (
cp) ranged from 0.01 to 0.085. The direct comparison of ME/CFS cases to healthy controls required a
cp of 0.14. Due to the small to moderate starting sample sizes for each WST class, and that the final decision thresholds involved the loss of cases, results must be ascertained with caution, as the final decisions were often drawn from fewer than 10 cases.
2.7. Receiver Operating Characteristics
With the recognition of a predictor variable pattern by RFA, associated with the WST class, the diagnostic potential of the multi-marker profile to accurately separate ME/CFS severity was examined by receiver operating characteristic (ROC) curves, supported by an area under curve (AUC) calculation. A ROC curve plots assay sensitivity (rate of true positives) against the false positive rate (100—Specificity), with AUC estimating the accuracy of separating the two classes. As this suggests, only two WST classes were compared at one time, namely classes 0 versus 1, 0 versus 2, and class 1 versus class 2.
ROC plots were generated and AUC was calculated by the R statistical programming package ROCR [
23].
Examples of R code and primary results generated by machine learning and ROC are available in the
Supplementary Materials.
2.8. Activin B Assay
The development and optimisation of the activin B assay in human populations have been published previously [
24,
25]. However, for this study, the established assay for activin B was modified after it was discovered that non-specific interference was impacting the capacity of the assay to accurately measure lower activin B concentrations in human serum. The assay, which was used to measure serum activin B concentrations in the previous pilot study [
8,
18], was modified by the addition of activin-free gelding serum, as a carrier to remove the interference and enhance the accuracy of activin B detection.
3. Results
3.1. Direct Comparison of ME/CFS and Healthy Cohorts
The direct comparison of a range of pathology (blood, urine, serum) markers, questionnaire results and activin B are summarised in
Table 2. The subset of pathology markers included were informed by exploratory data interrogation by machine learning (
Figure 1 and
Figure 2), with additional serum electrolytes, platelets, neutrophils and parathormone (parathyroid hormone—PTH) also included because of clinical interest in the potential importance of these markers, as well as for the association with renal function suggested by other results. Red cell indices and TFTs showed no anaemia or thyroid deficiency associated with chronic fatigue symptoms, and in general all individual pathology results from ME/CFS and HC were within the laboratory reference interval, with exceptions outside of the reference interval excluded from the analyses if clinically indicated as a diagnostic confounder.
As summarised in
Table 2, the results of Kolmogorov–Smirnov (K-S) testing showed that platelets, ALP, neutrophils and age were assessed as being normally distributed, with the majority of markers at
p ≤ 0.025, which therefore did not follow a normal distribution. For this reason, non-parametric statistics were used for all markers and survey results to determine whether statistical significance was achieved for comparisons between ME/CFS classes. The small loss of power due to nonparametric testing was regarded as clinically unimportant.
Statistically significant differences in median pathology results from the comparison of ME/CFS to HC cases were observed for serum urea, parathyroid hormone (PTH) and 24-h urinary creatinine excretion rate, with each of these significantly decreased for ME/CFS (p ≤ 0.05). The median age was significantly higher for the ME/CFS group, with the median total DASS score significantly elevated (separate depression and anxiety scores were significantly increased for the ME/CFS group, but not the stress score). Although sleep problems are often reported during ME/CFS assessment, the Epworth Sleep score did not differ significantly between the groups.
Activin B
An objective of this study was to validate a previous (pilot) study result, which found that activin B is a serum biomarker that significantly (
p < 0.05) separates ME/CFS patients from healthy controls (HC). On direct comparison of the medians (
Table 2), activin B was significantly lower (
p = 0.013) for the ME/CFS cohort compared to results from the HC participant cohort. This is an inversion of the previous results, which found that activin B was significantly elevated in ME/CFS participants [
8]. As described in the Materials and Methods, the activin B assay had been re-optimised prior to these analyses.
3.2. Analyses of Markers Stratified by Weighted Standing Time (WST)
3.2.1. Four Severity Categories
Marker variation and survey results were investigated after the ME/CFS cohort was stratified by WST for symptom severity (classes 1–3) and compared to healthy controls (class 0) (
Table 3). Median (25th–75th IQR—interquartile range) results were presented and statistical significance assessed by Kruskal–Wallis tests.
Serum urea, ALP and 24-h urinary creatinine excretion rate were statistically significant at p < 0.05. The difference between WST classes for DASS (Total) also achieved statistical significance, with increases in total DASS scores obvious for of the WST ME/CFS classes (1–3), when compared to HC (class 0).
Significance at
p < 0.05 was not observed for activin B when comparing healthy controls (class 0) to the WST stratified ME/CFS cohort (classes 1–3) (
Table 3). As seen in
Table 3, apart from WST class 2 (moderate severity), the 25–75 IQR were large, suggesting high variations in the activin B results. When the healthy controls (WST 0) were compared directly to WST 2 by the Mann–Whitney U test, a significant result at
p = 0.005 was found, whereas the comparison of WST 0 to WST 1 and 3 was not significantly different (
p > 0.05). Based on this observation, activin B is most useful for separating healthy individuals from patients experiencing moderate ME/CFS symptoms, as defined by WST.
3.2.2. Three Severity Categories
WST classes 0 and 1 were combined to increase sample size for subsequent machine learning (ML), resulting in adjusted WST classes representing categories defining absent or mild symptoms (0), moderate (1) or severe ME/CFS symptoms (2), as reflected by orthostatic intolerance. This adjusted WST classification (
Table 1b) was used for all the following RFA and ROC investigations.
Age, Epworth Sleep Scale and total DASS score showed significant variations between WST classes (
Table 4—Kruskal–Wallis test). Age was significantly higher for the ME/CFS cohort compared to healthy controls (
Table 2). Comparison across WST classes indicated that the participants with moderate symptom severity were responsible for this age difference, which will require further investigation. Of the serum/blood markers, only MCH and ALP were significantly different, with ALP WST class 1 of a higher median compared to WST 0 and 3. Age can impact serum ALP levels; therefore, caution must be exercised when interpreting this result.
A significant difference between WST classes was not observed for activin B. The combination of healthy controls with mild cases increased the WST 0 median, and therefore statistically significant separation from WST classes 1 and 2 was not achieved.
3.3. Exploratory Machine Learning Analyses of ME/CFS and Healthy Control Data
As assessed by algorithm ensembles that calculated percentage accuracy and the kappa statistic (
Figure 1), Random Forest Analysis (RFA) was chosen as the machine learning method to conduct deeper analyses of the ME/CFS results. Two sampling methods were tested for each ensemble, namely (a) boosting and (b) bagging. In general, similar accuracy and kappa results were found for both sampling strategies (bagging results not shown).
Figure 2 presents the results of two RFA, one with five routine pathology markers, and the other with activin B included in the same pathology model. The pathology markers represent the most effective constellation of blood or urine test results that most successfully predicted WST categories 0, 1 and 2, with an overall predictive accuracy of 62–65%. The addition of extra pathology variables either did not improve the accuracy of the model or reduced overall WST class predictive accuracy.
The addition of Activin B to the model did not change the overall accuracy of the RFA model, but did slightly improve the prediction accuracy for WST class 2 (severe), at the expense of a poorer WST class 0 prediction (
Figure 2b). Activin B ranked as the third most important predictor of ME/CFS-WST categories, behind 24-h urinary creatinine excretion rate and ALP, both on the importance ranking and mean decrease Gini index (
Figure 2).
RFA emphasised 24-h urinary creatinine clearance as a key predictor of WST classes, with ALP ranking as the second most important predictor from among the pathology markers. The subsequent analysis of the same data by a tuned (
cp = 0.01,
minsplit = 20) single decision tree confirmed the leading role of urinary creatinine as a ME/CFS predictor (decision tree code and results are available in the
Supplementary Materials).
3.4. Receiver Operating Characteristic (ROC) Analyses and Discrimination of WST Categories by Activin B and Pathology Markers Post Random Forest
To assess the predictive value of the RFA models applying activin B, mean corpuscular haemoglobin (MCH), serum urea, lymphocytes, alkaline phosphatase (ALP) and urinary creatinine excretion rate to the prediction of ME/CFS, ROC curves were plotted and the area under curve (AUC) was calculated.
ROC curves and AUC calculations were examined as pairwise comparisons between WST classes (0-1, 0-2, 1-2). RFA and ROC were not reliable for the direct comparison of ME/CFS to healthy controls, due to data imbalance issues described elsewhere.
Figure 3 presents the RFA and ROC results for the comparison of WST classes 0 and 1 (
Table 1b).
Figure 3a shows the Gini Index and Importance (Mean Decrease Accuracy) weighting of predictor variables to discriminate between WST classes 0 and 1 (mild symptoms and healthy cases combined versus moderate ME/CFS symptoms). The rate of urinary creatinine excretion was the top-ranked predictor, followed by serum activin B. For the total constellation of markers, the 0 versus 1 AUC was calculated at 0.755, with the ROC curve showing a clear separation from 0.50 (
Figure 3b).
For the ROC-AUC analysis of class 0 versus class 2 (mild ME/CFS symptoms and healthy controls versus severe ME/CFS), urinary creatinine excretion rate was again the top-ranked predictor, with the impact of activin B reduced as determined by Gini Index and Importance scale, and serum urea and ALP elevated in predictive importance (AUC = 0.795). For classes 1 versus class 2, representing moderate versus severe ME/CFS symptoms, ALP, MCH, lymphocytes and serum urea ranked higher on both the Gini Index and Importance scale than urinary creatinine excretion and activin B, inverting the ranking observed for comparisons against class 0 (AUC = 0.704) (Results not shown).
3.5. Correct Prediction of ME/CFS Cases by RFA
As well as ranking predictors, the RF algorithm allowed the prediction of case category (WST class) based on the variables entered into the model. To understand the power of correctly predicted cases as a data modelling method to refine decisions on the diagnostic acuity of marker patterns, ROC was repeated for WST classes 0 versus 1, with only correctly RFA predicted 0 or 1 cases included (
Figure 4). The importance ranking of predictors (
Figure 4a) resembled that found for the all data general model (
Figure 3), with urinary creatinine excretion rate, activin B and ALP the top three predictors of WST classes 0 or 1. The ROC curve showed an excellent separation from the 0.50 threshold, with an AUC of 0.963, which was clearly superior to AUC 0.755 found for the general model of the same WST classes that included all cases, regardless of correct prediction (
Figure 3).
The correctly predicted cases across the entire WST scale (
Table 1b) were investigated by RFA to elucidate the broad pattern of the designated markers associated with the best accuracy prediction (
Figure 5).
Similar to the ranking of markers for WST classes 0 versus 1 (
Figure 3 and
Figure 4), the urinary creatinine excretion rate, ALP and activin B were the top-ranked predictors of all the WST classes (
Figure 5), which stratifies ME/CFS severity as according to orthostatic intolerance testing performance. While the cases were correctly predicted, WST class 0 recorded an (OOB) error rate of 8.7%, while class 2 recorded a 17% error rate. However, class 1 (Moderate severity) was perfectly predicted (
Figure 5), suggesting again that the marker set including activin B is best for predicting symptom severity ranging from healthy, through mild, to moderate ME/CFS. The extent of the error rate in the severe cases indicates wider variation in these ME/CFS cases. Future studies involving larger participant samples will assist in determining predictive parameters with greater accuracy.
3.6. New Reference Intervals for Serum and Urine Markers Based on Correct Random Forest Prediction
To provide simpler and accurate guidance to clinicians supporting ME/CFS patients, reference intervals were calculated based on cases correctly predicted by RFA. The reference intervals were calculated using the median and 25–75% IQRs.
New reference intervals based on correctly predicted cases for each analyte of interest were calculated based on the following criteria: (1) comparison of the ME/CFS cohort with the healthy control group; (2) calculation of reference intervals following the WST criteria of categories 0 (healthy controls plus mild ME/CFS), 1 (moderate symptoms), and 2 (severe symptoms). The full WST classification definitions are summarised in
Table 1.
Table 5 and
Table 6 show the medians and 25–75th IQR for the ME/CFS predictors correctly detected by RFA (namely, MCH, lymphocyte count, serum urea, ALP, 24-h urinary creatinine excretion rate and activin B), using criteria 1 (
Table 5) and 2 (
Table 6).
The small sample sizes led to most markers having a significant overlap of ranges, and therefore not producing distinctive reference intervals. There were some exceptions, namely, for ME/CFS urinary creatinine excretion rate, a 25–75 IQR of 8.03–10.88 mmol/24 h (median—10.5), and 12.12–15.3 mmol/24 h (median—12.7) for healthy controls (
Table 5).
For the WST comparison (
Table 6), 25–75 IQR overlap was not found between classes for activin B, with separation of confidence intervals observed for MCH and urinary creatine excretion rate (between class 0 and class 1, as well as classes 0 and 2). There was a marginal separation of 25–75 IQR observed for serum urea classes 0 and 2. Using WST data for between class prediction and ROC analyses, the correctly predicted cases (by RFA) for classes 0 and 1 (
Figure 4) emphasised the powerful role of 24-h urinary creatinine excretion rate as a potential diagnostic marker; it was the only marker analysed that showed a clear differentiation between class 0 in comparison to class 1 25–75 IQR intervals (results not shown).
The calculation of reference intervals specific to varying levels of ME/CFS severity, as quantitated by WST, was achieved (
Table 5 and
Table 6). With access to larger sample sizes, via large multi-centre studies and/or databases, the capacity to develop novel diagnostic guidelines using pathology results specific for ME/CFS, with and without activin B, will be possible.
Two multi-category, non-parametric statistical tests were used to assess significance for each predictor variable, the Kruskal Wallis (KW) and Jonckheere–Terpstra (J-T) Tests (
Table 6). The methods are different in how they estimate significance across three or more classes, with the J-T test designed for investigations of ordered (ordinal) variables. While all variables were clearly significant (≤0.012) by the KW test, only serum urea, 24-h urinary creatinine excretion rate and activin B demonstrated significance for both tests, suggesting enhanced statistical robustness for these markers in terms of variation across the three WST classes.
4. Discussion
As demonstrated previously by us [
8,
18] and others [
26], the results of pathology testing are not remarkable for ME/CFS patients, and often there are no statistically significant differences in the pathology results for ME/CFS when compared to healthy control subjects, although a recent study has identified the pathology test for creatine kinase (CK) as a significant marker to separate ME/CFS from control samples [
27]. These difficulties in detecting quantitative markers for ME/CFS diagnosis have stimulated many investigations over the past 30 years, and with research consistently suggesting immune system involvement or dysfunction [
28], cytokine studies have featured prominently in these attempts at biomarker development [
1,
2,
3,
4,
5,
6,
7]. The search for a cytokine biomarker has been fraught with frustration, for example, the promise of a TGF-β marker was stymied by the realisation that sample preparation may explain serum concentration variation [
7]. To this literature on putative ME/CFS serum biomarkers, we added activin B, which is useful in isolation, but also as a ratio with activin A or follistatin [
8] and maintains statistical significance across WST classes [
18].
The research presented here is a validation study on the potential of activin B as a reliable serum marker for ME/CFS, which would be a major advance in this field in light of the history of biomarker development. As reported, serum activin B showed statistical significance in separating ME/CFS participants from healthy controls, and additionally, demonstrated a capacity to differentiate WST classes in combination with select pathology markers. However, for this new research population, the trend was reversed, with healthy control participants showing a significantly increased median compared to the ME/CFS cohort. A cohort of existing CFS Discovery patients were recruited as research participants for this study, and issues associated with small to medium samples sizes may have contributed to these findings. The re-calibration of the activin B assay, due to sensitivity variation across the range of detection, improved the accuracy of the assay at lower serum concentrations, thus enhancing activin B detection capacity and broadening the reference interval range, which may explain the differences in activin B results found for this study when compared to the previous results [
8].
Exploratory random forest analysis (RFA) was performed on the same data, with subsequent analyses focussed on WST data only (see
Table 1b). The standard RFA of WST data (5000 trees per analysis, four predictor variables tested per node) resulted in (OOB) error rates of 38.14%. Activin B was also investigated as a member of a six-marker profile that included 24-h urinary creatinine excretion rate, mean corpuscular haemoglobin (MCH), alkaline phosphatase (ALP), serum urea, and total lymphocyte count. RFA, with or without activin B, showed an identical overall prediction error rate (OOB), but with the addition of activin B to the marker profile, a reduction in WST 2 (severe) class prediction error rate was identified, at the expense of an error rate increase for WST 0 (WST 1 error rate remained stable). For this RFA, urinary creatinine, ALP and activin B were the top predictors of WST class. The capacity of activin B to enhance discrimination between WST classes was also a feature identified by RFA.
Single decision trees confirmed the primacy of 24-h urinary creatinine clearance as a ME/CFS predictor, with a calculated decision threshold of 11.96 mmol/L separating class 0 (accuracy 83.3%) from classes 1 and 2, while ALP separated classes 1 and 2 at 62.5 U/L (accuracies of 86.4% and 75% respectively). Caution must be exercised when interpreting these results, since the final accuracy scores were often calculated from ≤20 cases.
As an extension of RFA, the panel of six predictive markers was assessed by receiver operating characteristic (ROC) curves to investigate the impact of test profile sensitivity and specificity (false negative, false positive rates). Pairwise WST classes were analysed per ROC, both for the entire data set, and for the correctly predicted cases for each WST class (0, 1, 2). Activin B remained in the top three in terms of predictor importance, with the model producing an AUC of 0.76 for all cases and an AUC of 0.963 for models comprising only correctly predicted outcomes. The correctly predicted cases from each WST class were subsequently used to calculate new reference intervals for each of the six RFA predictors (
Figure 2).
Due to the broad reference intervals calculated as medians and 25th–75th IQRs, distinct separation between WST classes was not common, but did feature for 24-h urinary creatinine clearance and some class comparisons for activin B. As stated earlier, larger participant cohorts are required for validation, and the calculation of accurate reference intervals via the method presented here.
In tandem with research on ME/CFS immunology, pathology and cytokine biology, metabolomics is yielding valuable insights into ME/CFS aetiology [
29,
30], which in turn crosses into mitochondrial function [
31]. New and sophisticated evidence of mitochondrial dysfunction in ME/CFS patients has emerged recently from patients involved as research participants in this project, with blast lymphocytes grown from blood samples collected at CFS Discovery and analysed via
Seahorse technology [
32].
Potential exists to meld metabolomics with immunity, and mTOR (mammalian targets of rapamycin and TORC subunits), which has a role in amino acid transport and protein synthesis [
33], may be central to this link, particularly in the context of muscle growth [
34,
35]. Muscle pain and weakness are often reported as leading ME/CFS symptoms [
12,
18]. TGF-β and activin proteins involve mTOR interaction, including for natural killer (NK) cells [
36], which are regularly noted as deficient in ME/CFS patients [
37], as well as for cartilage and bone biology [
38]. Separating the specific biology of activin B from the well-studied roles of activin A has shown insights in relation to SMAD signalling [
10,
39,
40], which will further illuminate activin B utility in the context of ME/CFS.
The centrality of NK cells to ME/CFS has been challenged recently by a comprehensive study involving more than 300 total participants, which included healthy and fatigue controls, as well as participants with varying levels of ME/CFS symptom severity [
41]. NK cell numbers and function, as reflected by subtype proportions or responsiveness post in vitro stimulation, were not different between the control and ME/CFS cohorts. Instead, CD8
+ T-cell proportions were altered, and mucosal associated invariant T cells (MAIT) increased for ME/CFS.
Future investigations will present results from the interrogation of databases that contain activin, pathology, mitochondrial and metabolomics results, and thereafter assist in the identification of additional immune-metabolomic biomarker patterns. Such results will thereafter contribute to the elucidation of disease mechanism via the unravelling of impaired metabolomic pathways, and understanding of the subsequent impact on immune function, muscle physiology and neurophysiology for ME/CFS patients.
In conclusion, activin B retained the capacity to separate ME/CFS cases from healthy controls (
Table 2), but as an inverse relationship compared to the situation reported previously [
8], with healthy controls having a higher median. The potential, therefore, to develop activin B as a general serum marker of ME/CFS needs multi-centre studies with large participant cohorts. While the current project recruited 97 participants, these were spread across the spectrum of good health to severe ME/CFS symptoms, hence resulting in small to moderate samples sizes. RFA studies revealed the unexpected role of activin B as a useful supporting marker for the discrimination of mild to moderate ME/CFS symptoms, as reflected by WST class, while severe cases were more difficult to predict via multi-marker RFA and the other methods developed to predict ME/CFS.