*Proceeding Paper* **Extracting and Processing of Russian Unstructured Clinical Texts for a Medical Decision Support System †**

**Irina Bolodurina 1,2, Alexander Shukhman 1, Leonid Legashev 1,\*, Lyubov Grishina <sup>1</sup> and Arthur Zhigalov <sup>1</sup>**


**Abstract:** The rapid growth in the volume of medical data is pushing the development and implementation of artificial intelligence (AI) tools. One of the directions of the application of AI in the field of healthcare is the use of natural language processing methods to build medical decision support systems based on electronic medical record (EMC) data. As a result of this study, a module for the extraction and pretreatment of patients' EMC was developed. In addition, an approach was implemented to extract features from the unstructured textual information of patient admission protocols, with the formation of an appropriate vector representation of data. Predictive models for the diagnosis of groups of diseases based on the logistic regression model and BERT were developed. The highest efficiency in the experiments was shown by the logistic regression model, with a F1-score of 0.81 and Matthews correlation coefficient of 0.75. The obtained results have been posted for public access based on the django framework and can be used for preliminary assessment of patient health status, as well as integrated into existing medical decision support systems.

**Keywords:** electronic health records; medical decision support system; natural language processing; BERT; logistic regression
