**Proteomics-Based Machine Learning Approach as an Alternative to Conventional Biomarkers for Di**ff**erential Diagnosis of Chronic Kidney Diseases**

**Yury E. Glazyrin 1,2,\*, Dmitry V. Veprintsev 2, Irina A. Ler 3, Maria L. Rossovskaya 3, Svetlana A. Varygina 3, Sofia L. Glizer 3,4, Tatiana N. Zamay 1, Marina M. Petrova 4, Zoran Minic 5, Maxim V. Berezovski <sup>5</sup> and Anna S. Kichkailo 1,2**


Received: 17 June 2020; Accepted: 6 July 2020; Published: 7 July 2020

**Abstract:** Diabetic nephropathy, hypertension, and glomerulonephritis are the most common causes of chronic kidney diseases (CKD). Since CKD of various origins may not become apparent until kidney function is significantly impaired, a differential diagnosis and an appropriate treatment are needed at the very early stages. Conventional biomarkers may not have sufficient separation capabilities, while a full-proteomic approach may be used for these purposes. In the current study, several machine learning algorithms were examined for the differential diagnosis of CKD of three origins. The tested dataset was based on whole proteomic data obtained after the mass spectrometric analysis of plasma and urine samples of 34 CKD patients and the use of label-free quantification approach. The k-nearest-neighbors algorithm showed the possibility of separation of a healthy group from renal patients in general by proteomics data of plasma with high confidence (97.8%). This algorithm has also be proven to be the best of the three tested for distinguishing the groups of patients with diabetic nephropathy and glomerulonephritis according to proteomics data of plasma (96.3% of correct decisions). The group of hypertensive nephropathy could not be reliably separated according to plasma data, whereas analysis of entire proteomics data of urine did not allow differentiating the three diseases. Nevertheless, the group of hypertensive nephropathy was reliably separated from all other renal patients using the k-nearest-neighbors classifier "one against all" with 100% of accuracy by urine proteome data. The tested algorithms show good abilities to differentiate the various groups across proteomic data sets, which may help to avoid invasive intervention for the verification of the glomerulonephritis subtypes, as well as to differentiate hypertensive and diabetic nephropathy in the early stages based not on individual biomarkers, but on the whole proteomic composition of urine and blood.

**Keywords:** chronic kidney disease; machine learning; differential diagnosis; proteomics; mass spectrometry; label-free quantification
