**1. Introduction**

Chronic kidney disease (CKD) is a supra-nosological concept that unites all patients with signs of kidney damage and/or a decrease in their function [1]. CKD is one of the major health problems with high mortality, because it causes irreversible changes in renal failure. No obvious clinical symptoms appear in early stage disease until severe damage has occurred [2]. Therefore, the need for early diagnosis of CKD is obvious. Diseases leading to CKD can be divided into two groups: (1) processes localized directly in the kidneys and urinary tract (glomerulonephritis, pyelonephritis, etc.), and (2) diseases in which the kidneys are target organs (diabetes, hypertensive disease, systemic diseases, etc.). Diagnosis of the disease causing the damage is paramount in all cases of the CKD presence [3–5]. The most common causes of CKD are diabetic nephropathy, hypertension, and glomerulonephritis [6]. Clinical manifestations, serum creatinine (Scr), and renal histopathology are commonly used to diagnose CKD and determine its different stages. The role of Scr is very limited [2]. Although kidney biopsy for histopathology may be an invasive and painful procedure, it is considered as the gold standard for the diagnosis of renal disease [7]. Bleeding and other surgical complications may follow this procedure. To reduce these risks, it could be safer to use alternative techniques.

The study of proteomic composition of urine and other human bio-fluids is very promising for the diagnosis of different kidney pathologies and for understanding the mechanisms of their occurrence. Proteinuria may reflect abnormal plasma protein loss, as a result of: (a) an increase in glomerular permeability for macromolecular proteins (glomerular proteinuria), (b) incomplete tubular reabsorption of low molecular weight proteins (tubular proteinuria), (c) abnormal loss of proteins of renal origin and urinary tract. Thus, the analysis of the urine proteome potentially allows us to speak about the localization of nephron damage, which greatly facilitates the differential diagnosis [8,9]. Research is currently underway, both in the search for specific proteins found in CKD [10–14] and attempts to highlight individual proteins that would become markers of specific diseases that cause CKD [15,16].

Often, information on changes in the expression level of a single protein is not enough to obtain sufficient accuracy and the sensitivity required for a clinical diagnostic system, and it is necessary to apply several indicators simultaneously. Thus, the use of a panel of 28 urinal proteins has shown the ability to differentiate Immunoglobulin-A nephropathy and primary membranous nephropathy with a sensitivity of 77% and specificity of 100% [17]. The sets of differently expressed urinal proteins were used for the differential diagnosis of lupus nephritis, primary membranous nephropathy, diabetic nephropathy, and focal segmental glomerulosclerosis. The sensitivity of differential diagnosis remained at 70% when using a set of 5 proteins, but the accuracy fell below 50% when using a set of less than 20 proteins [18]. It shows that these indicators are still insufficient for the effective differentiation of CKDs.

However, the most versatile and universal approach for differential diagnosis should consider the full quantitative information about a large number of proteins contained in patient's fluids. Multicomponent proteomics data derived from mass spectrometric analysis of a non-diagnosed patient's sample can be processed in comparison with similar data sets obtained from people with known diagnoses, to assign a new patient to a particular group. For this purpose the mathematical models of machine learning, which take into account the interactions of a large amount of data in a multidimensional space, can be used. Such an approach may become a new concept of an effective and universal test system for both early diagnosis of CKD and post diagnostic differentiation of renal diseases of different origin.

Recent methods of large data sets processing are often based on the principle of "black box", where input data are transformed into decision factors without any additional knowledge of internal working. Mathematical instruments, such as machine learning and data analysis, are increasingly being used in medicine [19,20]. Machine learning is a branch of the data science that trains computers

to perform tasks by observing patterns in large datasets and using them to derive rules or algorithms that optimize task performance [21]. It is used for computer-aided diagnosis of acute neurological events [22] and retinal disease [23]. These studies were mainly based on general clinical indicators, whereas the application of the wide-scale method of quantitative proteomics based on a comparison of relative expressions of a large number of proteins can show much greater efficiency.

In this paper, we introduce a new approach to the differential diagnosis of CKDs of different origins, such as diabetic nephropathy, chronic glomerulonephritis and hypertensive nephropathy, which is based on large proteomics data sets obtained by mass spectrometry of blood plasma and urine, by means of several models of machine learning. The tested algorithms showed good abilities to differentiate the various groups of the tested renal patients according to the proteomic data.
