*2.1. Separation of CKD Patients from the Healthy Individuals by Plasma Proteome*

A total of 246 proteins were identified and quantified by label-free quantification (LFQ) from the plasma probes, and 184 of them were considered as relevant after the quality filtration. Then, the differences in plasma proteome of CKD patients and healthy individuals were estimated by principal component analysis (PCA). After the consistency check and averaging of the replicates, the dataset was reduced to 90 samples (group G—27, group D—28, group H—9, and group N—26) and label-free quantification (LFQ) values of 184 proteins were converted to the 17 principal components providing the cumulative variance of 70%.

The KNeighbors machine learning model (kNN) with the Euclidean distance was used for the separation of the total set of CKD patients (groups G, D, H) from the healthy individuals (group N). The best number of neighbors was found as 3. The mean proportion of correct classifier responses was 97.8%. Thus, the total CKD patients were differentiated from the healthy individuals using proteomics data of plasma with high confidence.

#### *2.2. Di*ff*erentiation of the Three Groups of Renal Patients by Plasma Proteome*

The possibility of differentiation of various types of CKD with similar symptoms by proteomics data was tested. At this step, the control samples obtained from healthy people (group N) were discarded and 134 CKD patient's samples were taken; 175 proteins in 131 samples were left after the quality control check.

The PCA analysis showed that 14 principal components were necessary for 70% of cumulative variance, and 64 averaged results were obtained after the consistency check. Three models of machine learning were tested: KNeighbors, logistic regression, and support vector machine (SVM). The optimal hyper parameters were found: number of neighbors (*n*\_neighbors = 1) for KNeighbors, constant of regularization (C = 1) for logistic regression, kernel (kernel = 'rbf') and gamma (gamma = 0.5) for SVM. The mean proportions of correct classifier responses were 87.5%, 84.3%, 82.8%, respectively. Based on this cross-validation quality assessment, the KNeighbors classifier was chosen as the best for further calculations. A number of errors of the nearest-neighbors algorithm were found for each class for the entire set. Class D had 2 errors from 28 (7.1%), class G had 2 errors from 27 (7.4%), and class H had 4 errors from 9 (44.4%). Thus, this algorithm works well for differential separation of group D from group G, but does not allow distinguishing class H.

Finally, the class H samples were excluded from the analysis and 55 samples were left in the set. The nearest-neighbor algorithm gave only two errors for this set (the proportions of correct decisions were 96.3% for class G and 96.4% for class D). Thus, this algorithm showed good results for the use in a medical test system that can separate glomerulonephritis from diabetic nephropathy based on plasma proteome.
