Next Article in Journal
Diverse Machine Learning for Forecasting Goal-Scoring Likelihood in Elite Football Leagues
Next Article in Special Issue
Balancing Results from AI-Based Geostatistics versus Fuzzy Inference by Game Theory Bargaining to Improve a Groundwater Monitoring Network
Previous Article in Journal
Assessing the Value of Transfer Learning Metrics for Radio Frequency Domain Adaptation
Previous Article in Special Issue
Enhancing Visitor Forecasting with Target-Concatenated Autoencoder and Ensemble Learning
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Towards AI Dashboards in Financial Services: Design and Implementation of an AI Development Dashboard for Credit Assessment

Faculty of Business and Economics, University of Goettingen, 37073 Goettingen, Germany
*
Author to whom correspondence should be addressed.
Mach. Learn. Knowl. Extr. 2024, 6(3), 1720-1761; https://doi.org/10.3390/make6030085
Submission received: 23 April 2024 / Revised: 23 July 2024 / Accepted: 25 July 2024 / Published: 27 July 2024
(This article belongs to the Special Issue Sustainable Applications for Machine Learning)

Abstract

:
Financial institutions are increasingly turning to artificial intelligence (AI) to improve their decision-making processes and gain a competitive edge. Due to the iterative process of AI development, it is mandatory to have a structured process in place, from the design to the deployment of AI-based services in the finance industry. This process must include the required validation and coordination with regulatory authorities. An appropriate dashboard can help to shape and structure the process of model development, e.g., for credit assessment in the finance industry. In addition, the analysis of datasets must be included as an important part of the dashboard to understand the reasons for changes in model performance. Furthermore, a dashboard can undertake documentation tasks to make the process of model development traceable, explainable, and transparent, as required by regulatory authorities in the finance industry. This can offer a comprehensive solution for financial companies to optimize their models, improve regulatory compliance, and ultimately foster sustainable growth in an increasingly competitive market. In this study, we investigate the requirements and provide a prototypical dashboard to create, manage, compare, and validate AI models to be used in the credit assessment of private customers.

1. Introduction

For the assessment of financial credit risk, the use of AI to accelerate, automate, and improve the credit assessment process is becoming increasingly attractive [1]. Moreover, increased computational power has made it possible to analyze vast amounts of data and to recognize behavioral patterns relating to loans. The example of credit risk assessment illustrates how AI must be utilized in the financial services industry, demonstrating the necessity for precision, efficiency, and the ability to handle large datasets to identify and evaluate potential risks accurately. Therefore, the integration of AI in financial services has become a transformative force in research and practice [2]. However, previous research has primarily focused on testing AI techniques to increase the predictive accuracy of credit assessment [3,4] (Appendix B). However, the integration of AI is leading to a paradigm shift in the way that financial institutions should develop, maintain, and operate AI and concurrently cooperate with regulators [5,6]. Therefore, in the long term, it is important to derive concepts or dashboards for model development to operate, monitor, and maintain AI models with suitable and compliant processes [7].
These challenges to the use of AI in financial services are increasingly being addressed by politicians, as they attempt to define regulations for the use of AI to promote trust, manage risks appropriately, and encourage AI innovations [8,9]. According to the European Commission’s “AI Act” [10], the entire process of model creation, including the data used and all measures taken, must comply with various quality guidelines and must be documented in detail. For example, the processes of data cleaning and preprocessing must be fully documented to prove that the underlying dataset has been prepared appropriately and compliantly with the AI Act, e.g., taking account of biases in the dataset [10]. Moreover, the problem of masking the internal decision-making process is addressed by regulatory requirements so that the explainability, interpretability, traceability, transparency, and security of AI-based services is increased [9,11,12]. To address the challenges identified in the integration of AI for credit risk assessment, various commercial tools have been developed to facilitate the creation and management of AI-driven credit risk dashboards. Leading examples, such as FICO, Zest AI, and CreditVidya, provide comprehensive analytics and visualization capabilities. However, financial companies must define iterative and compliant processes in model development for regulatory adherence in financial services, ensuring accuracy and transparency. Furthermore, all these aspects must be comprehensively considered during the design of AI-based services and associated processes to fulfill regulatory requirements.
Furthermore, previous research on AI in credit assessment indicates that AI-based models can deliver more accurate and performant results than traditional statistical models [7,13,14]. However, AI models add complexity and require an iterative approach over the long term. This creates a conflict where higher performance often comes at the cost of lower explainability, and vice versa. The development of these models corresponds to a data mining process and is associated with many phases and steps, ranging from the data preprocessing to the implementation of the model. Such a process is highly time-consuming and can take between 9 and 18 months [15,16,17,18]. Moreover, these datasets face challenges such as an uneven distribution of “good” and “bad” loans, and a time lag between data collection and model application.
Problems with the datasets can lead to a decrease in performance and intensify over time, a phenomenon known as concept shift or model shift [19]. Serious incidents like the 2007 financial crisis and the COVID-19 pandemic significantly trigger such conceptual shifts, requiring banks to continually validate and adjust their models to maintain functionality and accuracy [20,21,22]. This involves regular and ad-hoc validations, potentially leading to recalibrations or new models [21]. Furthermore, standardized procedures and comprehensive documentation are essential for structured control by financial companies and regulators [21]. Therefore, AI dashboards can assist in these iterative analyses by systematically evaluating datasets, developing and monitoring models, and enabling swift actions to ensure system accuracy for customer safety.
Moreover, mandatory cooperation with supervisory authorities and the lack of structure in model development and validation need to be addressed, as scoring model evaluations are often inadequate. To effectively analyze and monitor datasets and AI models, it is essential to structure and foster iterative processes like data analysis and model evaluation and comparison. Identifying and addressing concept-shift issues requires analyzing the performance of existing or new AI models with incoming data. A dashboard supporting these iterative analyses and documentation can enhance the development, maintenance, and evaluation of AI models [13]. This study addresses the research gap in structuring and systematizing the development, training, testing, comparison, evaluation, and deployment of AI models for credit assessment. The following research questions (RQs) aim to bring more structure to the field through automated documentation and clearly defined procedures:
  • RQ1: What are the requirements for an AI dashboard to develop and maintain AI models for credit assessment?
  • RQ2: How should an AI dashboard for credit assessment be designed and the model development process structured?
In the following, we first introduce the related research into AI-based services. Subsequently, we present the underlying methodology of the conducted design science research framework [23] and the employed cross-industry standard process for data mining (CRISP-DM) [24]. We then derive the requirements before presenting the design and development process leading to the final dashboard design. After the demonstration of the artifact, we discuss our results, mention limitations, and state a conclusion with an outlook on future research.

2. Related Research for the Use of AI in the Finance Industry

The accurate assessment of creditworthiness, prediction of bankruptcy, and detection or prevention of fraud are crucial for informed financial decision making, influencing access to loans, insurance, and other products. Moreover, while many of these processes have traditionally relied on established statistical models, the rise of AI has introduced a new paradigm with significant implications. For example, rule-based traditional models for credit assessment such as FICO scores are based on factors that are easy to understand, such as income, debt, and payment history. Thanks to this transparency, borrowers can understand how their score is calculated and can take action to improve it [25]. Therefore, the rule-based conventional models adhere to established regulations and guidelines and ensure fairness and consistency in credit assessment, with the logic behind them being relatively simple and making it easier for human analysts to understand and explain decisions. However, the governance of data-driven AI models strongly differs from the conventional models and requires a customized governance process to focus more on AI-specific criteria and risks [26,27]. Understanding trained AI models can be significantly enhanced through the extraction of patterns from datasets. The European Commission’s Artificial Intelligence Act highlights the necessity of distinguishing between so-called Weak AI, which focuses on specific tasks, and Strong AI, which exhibits more generalized cognitive capabilities [28].
Ensuring the trustworthiness of AI requires that datasets are free from biases and protect individual privacy. Adhering to these principles not only aids in the development of robust and versatile AI systems but also ensures they are ethical and secure, fostering greater public confidence and societal advantages. Furthermore, the problem of customized governance has also been considered by regulatory authorities in many European countries [20,21,29,30,31,32,33]. For example, the French Prudential Supervision and Resolution Authority (ACPR) published the first discussion paper in 2020, considering the issues with explainability and governance of AI [29]. ACPR defined four evaluation principles to design and develop AI algorithms and models: appropriate data management, performance, stability, and explainability. These four principles include the required structure and characteristics of AI-based services in financial services that must be defined, developed, and deployed in a compliant structure to ensure customers’ safety. However, due to the potential to increase automation in financial services with AI, all underlying and related processes and systems to develop and maintain these AI models must also be accelerated [34]. Therefore, the complexity of implementation and quality assurance issues become major obstacles preventing financial institutions from adopting AI-based services [34].
Thus, a balance must be struck between innovation and risk when using AI in financial services. Accordingly, it is necessary to develop not only accurate AI models but also systems or dashboards to maintain, develop, and monitor these models. Nevertheless, the existing literature aims to develop suitable and highly accurate models for datasets [35]. These approaches cannot guide the financial industry forward if there is no combination of well-defined systems and processes to maintain the models to ensure continued system security, as an AI model cannot operate properly forever [36]. In addition, the operational quality and robustness of the models must be monitored and validated regularly [10]. In case of adjustments required due to model or data drift, e.g., caused by changing economic conditions, financial companies or providers of these AI-based services must be equipped with potential solutions through increased automation, integration, and mitigation measures [26]. However, all the specific governance needs vary depending on the use case [37].
Furthermore, existing regulatory bases and recommendations from the OECD [38] focus on responsibility for trustworthy AI and require proactivity and sustainable development. Therefore, it is necessary to foster a regulatory environment that supports a flexible transition from research to the development, deployment, and operation of AI-based services [9]. Consideration of a controlled environment, e.g., an AI dashboard, is necessary to be able to test and scale up the proposed models [37,38]. Moreover, this interactivity between financial firms and regulators needs to be improved to promote entrepreneurship and productivity and ensure the benefits of AI [29,38,39].

3. Research Design

To target the research gap outlined in the first section, we employed a problem-oriented design science research approach (DSR) according to Peffers et al. [23] in combination with CRISP-DM [24] to structure and support the process of model development for credit assessment with an AI dashboard. DSR helps us to structure the research process while identifying the problem, analyzing requirements, and developing artifacts (Figure 1). Moreover, due to the structure of CRISP-DM across industries for data mining processes, it provides a fundamental and appropriate basis on which to structure the process of model development and model evaluation for credit assessment with an associated understanding of business and data.
We followed the CRISP-DM approach to implement a standardized, non-iterative process that significantly reduces lengthy iterations. CRISP-DM’s established structure allows for extensions and simplifies the model development process for an AI dashboard for financial companies, developers, and regulators. The first step, business understanding, determined the requirements for an AI dashboard in credit assessment. As already outlined, with CRISP-DM, it is necessary to investigate the underlying dataset for a practical and useful orientation to propose an appropriate list of requirements.
The process of requirements analysis, which is based on four steps of the CRSIP-DM, is described in Section 4.1 (Figure 2). First, we investigate the requirements for business and data understanding based on the credit assessment dataset. Second, we identify the requirements for data preprocessing and preparation including data analysis. Third, we gather the requirements for modeling to support the process of model development on the AI dashboard. Lastly, we identify the requirements for the evaluation and comparison of models.
In Section 4.2, we derive the functionalities to fulfill the identified requirements based on the use-case analysis. Moreover, due to the regulations in the finance industry, it is necessary to analyze and consider regulatory requirements to select appropriate modeling techniques for an AI dashboard (Section 4.3). Based on the analysis of the requirements, use cases, and regulatory basis, we designed and developed the proposed artifact for the development and maintenance of AI models, as described in Section 4.4. As the last step of problem-centered DSR, the demonstration of the artifact “AIDash” is described in Section 4.5, using an exemplary scenario in which a new model for credit assessment is developed and evaluated.

4. AI Dashboard for Model Development in Credit Assessment

We start with analysis of the problem and requirements, to structure the development process of the meta-artifact to support the iterative processes of data understanding, data preprocessing, model development, and model evaluation based on CRISP-DM.

4.1. Problem Identification and Requirement Analysis

For the development of an artifact for credit assessment, it is necessary to consider the existing approaches. Therefore, we conducted a structured literature review based on the methodological guidelines of Cooper [40] and vom Brocke et al. [41]. The search was conducted in established scientific databases JSTOR, AIS Electronic Library, Emerald Insight, and IEEE Xplore with the following search terms: “Credit Scoring”, “Credit Assessment”, “Artificial Intelligence”, and “Machine Learning”, from 2014 onwards. We initially found 1301 publications, of which 61 appeared to be relevant for AI-based credit assessment after analyzing the titles and abstracts of the accessible publications (Appendix B). Relevant articles were those that dealt with AI-based credit assessment, dashboards for model development, provided literature reviews or interview studies, or considered respective regulations.
Based on the conducted literature review, we first examined the existing studies dealing with AI-based credit assessment. As identified from the literature review, CRISP-DM can help to structure the process of model development for credit assessment. Here, however, it is not possible to cover all the phases of CRISP-DM. In this study, the phases of business understanding, and deployment were not in our focus, as they set the prerequisites for the planned dashboard and require further strategic decisions depending on the companies’ service infrastructure (Table 1). In the following, we go through the four phases of CRISP-DM, conduct a requirement analysis for credit assessment, and summarize the gathered requirements and derived functionalities in a table.

4.1.1. Data Understanding of Credit Assessment

Datasets to train and build AI models consist of credit applications with positive and negative decisions based on several features such as previous payment behavior, deposits in bank accounts, bonds, and duration of employment. For this study, we used an exemplary dataset from Fahrmeir and Hamerle [42] including 1095 credit applications from private customers with 21 features, 300 rejections, and 795 approvals (Appendix A). The CRISP-DM approach consists of multiple steps distributed over phases. The first step of data understanding begins with the initial collection of data and includes functions to gain first insights about the dataset [24]. This involves determining the quality of the data, gaining initial impressions, and/or recognizing groups in the dataset from which hypotheses can later be formed. We first started gathering requirements from the second step of CRISP-DM, i.e., data understanding, for the analysis and management of datasets for credit assessment. The ability to manage datasets is crucial, as all subsequent steps are based on this data (RDU 1). This step also includes loading (or uploading) the dataset into the dashboard, which enables the available dataset for analysis (F1.1). Moreover, to work and integrate the datasets in the next steps, the prototype should provide an overview of the existing datasets [24]. As the datasets are possibly available in different formats, it is necessary to have the option of changing the parameters for import so that datasets can be integrated homogeneously and thus processed consistently (Table 2).
The second part of data understanding is data description (RDU 2), which must include the functionalities to check “gross” or “surface” properties of the acquired dataset and to report on the results. Furthermore, the third part consists of data exploration (RDU 3) to gain more insights for data preprocessing with the help of plots and reports (F2 and F2.1). Based on the nature of credit assessment, it is necessary to consider periodical changes, so the ability to compare multiple datasets must be provided to gain deeper insights for pre-processing datasets later in model training (F2.2). The use of plots is ideal for supporting this task, as they provide a quick and intuitive understanding of the distribution of the data. Plots such as bar charts, correlation matrices, or boxplots are common types that are used to examine the datasets in data mining. The last step of data understanding is data assessment to verify the quality of the data by checking it for completeness, missing values, and errors (F1.2 and F1.3). No function is derived for the data assessment, as this check should be performed most easily by examining and making individual queries away from the artifact.

4.1.2. Data Preprocessing

The pre-processing consists of all the steps required to construct the final dataset. The tasks in this step are usually conducted more than once in a non-linear process. The first step consists of data selection (RDP 1), which includes the functionality for the selection of features based on criteria such as relevancy, quality, quantity, and technical characteristics like data scope or type. Therefore, it is necessary to extend the functionality of F1 to allow users to manage the features (F1.4). The selected dataset must be clean of any missing values and outliers based on the conducted feature selection (F1.2 and F1.4.2).
Furthermore, the next part of data preprocessing consists of data construction (RDP 3) to derive further metrics from a dataset using mathematical operations. However, the credit assessment dataset is ordinal, i.e., its values are subject to a sequence, and numbers are assigned to the different values. Therefore, the dataset does not require any conventions that are needed for other use cases dealing with more complex datasets, such as insolvency forecasting. Likewise, the integration of datasets, e.g., from multiple tables or records to create new records or values, is not required for the scope of the prototypical implementation of the AI dashboard for credit assessment. Moreover, the formatting of the dataset consists of syntactical modifications such as delimiter in CSV files or the position of the target variable (F1.2). Therefore, it is necessary to have the option to change parameters for importing the dataset, so that the dataset can be imported properly. Furthermore, in the case of metrical data records, it is necessary to have the option of converting them into an ordinal structure (F1.4.1), which extends the derived function F1.4.

4.1.3. Modeling

The artifact must provide the structure for AI modeling, which includes the application of various techniques and the discovery of the (hyper-)parameterization leading to optimal values. It is important to consider that the performance of a model depends strongly on the dataset. Hence, modeling techniques require certain pre-processing measures, which is why it is essential to be able to go back to an upstream phase. The first step for modeling begins with the selection (RMB 1) of modeling techniques, such as neural networks, decision trees, support vector classifier, XG Boost, etc. To evaluate the quality and validity of the model, it is necessary to develop a mechanism that tests the performance of the model (RMB 2). For a credit assessment, it is common to determine the error rates of classification. Therefore, it is necessary to be able to split the dataset into training and test sets. Credit assessment datasets usually contain more approvals than rejections, which is why a balancing function must be offered to avoid any bias in the trained models.
The creation of a model (RMB 3) consists of the determination of many parameters (F3). Moreover, this step involves several iterations, including creating (F3.1), saving (F3.2), deleting (F3.3), uploading (F3.4), and re-training (F3.5) an AI model. As a result, models can be evaluated based on the performance metrics. As the decision structure and therefore the datasets for credit assessment can change periodically, it is necessary to compare different datasets and models. For the assessment of the model (RMB 4), many technical areas usually come together to evaluate the results. There is a large overlap between this step and the subsequent evaluation phase. First, the data mining engineer or data scientist interprets the results using their knowledge and the success criteria. In addition, business analysts and experts in their respective fields are consulted to discuss the results from the company’s perspective. In most projects, the data scientist/data mining engineer needs to create several models so that they can be compared with each other. Therefore, the functionality to compare the models and export the results is required to assess the models.

4.1.4. Evaluation

This step provides the functionality to evaluate the quality of the models and the approach, including data preparation and data quality. It is important to check whether the business-related problems of credit assessment, the proposed solution, and the AI model have been sufficiently considered. Previous assessment steps (RM 4) deal with factors that affect the performance of the models, in which context the objective of this phase is to evaluate whether a model contributes to the achievement of the company’s objectives and whether it fulfills the business-related requirements (REV 1). It is therefore necessary to view (F5) and export (F6) the results. This makes it possible to check and compare all the specifications of the models. In addition, the models should be able to be tested with both real as well as hypothetical credit applications (F5.1) or using another dataset (F5.2) to simulate the model behavior. The function of downloading the model (F3.2) is ultimately to integrate it into a production environment or to make further adjustments (REV 2).
The entire process up to model creation should be documented for evaluation, both to fulfill regulatory requirements and to ensure the traceability of the procedure. This requires that all model parameters and features can be exported in a report. In the final step of the evaluation phase, it then remains to be defined how to continue with the defined AI models. Hereafter, the development process is finished if the model performance is good enough and all requirements have been met, so that the deployment can begin. In addition, further iterations or new model development processes can be initiated if the model behavior needs to be tested regularly with new or revised datasets. However, although the artifact can help to make final decisions about the proposed AI models, it cannot provide any functionality to determine the next steps. An overview of identified requirements and derived functionalities is shown in Table 2.

4.2. Analysis of Use Cases of AI Dashboard for Credit Assessment

To structure the process based on the identified requirements and functions, it is necessary to analyze the use cases (Figure 3) and UML activity diagrams (Appendix C). This can promote the understanding of the requirements and functions when designing the artifact. Moreover, the individual modules do not correspond to the individual phases of the CRISP-DM process. The reasons for this relate to the iterative nature of CRISP-DM. For example, the first phase to be supported, data understanding, includes the steps of describing and examining the dataset. However, these steps cannot be completed before the dataset can be imported into the prototype correctly. Therefore, the function to change the parameters for import (F1.2), which is assigned to the second phase (pre-processing) according to CRISP-DM, must have already been carried out previously.
The artifact’s functional scope is illustrated with a UML use-case diagram divided into four CRISP-DM phases (Figure 3), using colors (blue, yellow, green, purple) corresponding to four modules: upload, plot, model, and dashboard. The upload module assists with preprocessing and data understanding by allowing dataset uploads and formatting. The plot module enables graphical evaluation and detailed statistical analysis of datasets. The modeling module supports various modeling techniques and parameter optimization. The dashboard module visualizes and compares model performance, allowing new models to be uploaded or existing ones retrained. It also allows periodic testing of model behavior with new datasets or individual credit applications.

4.3. Selection of Modeling Techniques

The regulatory principles determine the criteria for the use of AI models in financial services. The European Commission regulation for the use of AI [10] stipulates that these models must have good forecasting capability and no systematic errors or distortions. In addition, regulatory principles for processes related to data quality as well as model validation are defined [21]. However, these are not considered in detail, as they must be implemented individually by the respective institutions for each specific application. For instance, they do not specify what can be considered good forecasting capability.
Moreover, the supervisory authorities define further country-specific regulations for the use of AI-based models. For example, the Federal Financial Supervisory Authority (BaFin) of Germany defines conditions for the selection of methods and techniques, requiring that companies must be able to justify the decisions and the underlying assumptions comprehensibly [20]. The use of models with increased complexity and so-called “black box” structure is critical for supervisory authorities, as they are concerned about the problem of limited insight into the model decision structure. Moreover, approaches such as LIME (local interpretable model-agnostic explanation) provide insights into the inner structures of AI models and so reveal the underlying decision structures of AI models [43]. However, the required explainability relates not only to the model but also to the associated process for its development, so documentation is essential to bring more transparency to the underlying development process. Therefore, the supervisory authorities provide no general approval for algorithms, so companies must ensure that the models produce correct, robust, and reproducible results.
The modeling techniques for credit assessment must provide a high level of explainability, transparency, and performance [10]. Nevertheless, for prototypical implementation, the modeling process must allow the use of different approaches that can be evaluated and compared. Therefore, the artifact should be expandable to integrate further modeling techniques. However, users should be aware that complex models such as neural networks cannot be used in credit assessment without further elaboration by supervisory authorities due to the hidden black box structure. We derived the given categorization and techniques, i.e., statistical and intelligent learning models and ensemble methods, from the literature [13,15,44,45,46,47,48] to provide algorithms that were as representative and efficient as possible (Table 3). However, a clear categorization of the selected techniques was not possible since semi-parametric methods represent a bridge between statistical and intelligent learning models.
To appropriately compare models, it is necessary to provide functionality for model assessment based on performance metrics. Graphs and tables can support this assessment process by visualizing extracted performance metrics [49,50,51]. A suitable representation of model performance is the confusion matrix, which visualizes performance using actual and predicted values, aiding in decision making [49]. Additionally, the receiver operating characteristic (ROC) curve effectively illustrates model performance by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings. A steeper curve closer to the 0.0/1.0 coordinate indicates a better model, whereas a bisector-like ROC curve suggests random classification [52]. It is crucial to use statistical and appropriate metrics to present performance neutrally without distortion [13,53,54,55]. Chen et al. [13] define relevant performance metrics for binary classification or regression, such as in credit assessment. Moreover, including the negative prediction value (NPV) is essential for assessing the predictive quality of negative classifications, particularly in credit assessment, where an AI model could inaccurately achieve high accuracy by classifying all applications as positive if the dataset has few bad credit applications.
Moreover, it is essential to include these metrics as a part of the exported reports (Table 4) and to export training and testing datasets for further assessment, e.g., by responsible regulatory authorities or managers [13,51,56,57,58]. Documentation and export of all model-related information can promote development and evaluation processes that are as transparent and comprehensible as possible. Due to the required model assessment (RM4), it is crucial to be able to compare these performance metrics of several models with drill-down functionality to identify and highlight their differences [50,59,60,61]. Thereby, the grid structure of a table can provide a simple, clear comparison of the different performance metrics. The possibility of adding different models to the table allows the user the necessary interactivity [62].

4.4. Design and Development

To fulfill the specified requirements to assist the process of AI development, we designed and developed the web-based artifact AIDash. Web-based applications enhance transparency and traceability in the development process by providing easy access to artifacts for operational employees, managers, and regulators [63]. They facilitate real-time collaboration, ensure accountability through detailed logs and version control, and improve communication via centralized information and interactive dashboards. However, these benefits come with challenges such as increased security risks, dependence on internet connectivity, complex implementation, and compliance with data privacy regulations. Balancing these advantages and disadvantages is essential for effectively utilizing web-based applications while mitigating potential risks. For the development of the artifact, we opted for the Python-based Django framework, which provides sufficient AI modules and a modern development architecture for data mining processes [64,65] (Appendix H).
As described in Section 4.2, the artifact consists of four modules (upload, plot, model, and dashboard), that guide the user through the entire model creation process (Figure 3). Below, a possible sequence of use is modeled using UML activity diagrams for each of the modules (Appendix C). However, as the procedure and the required processes are strongly dependent on the respective case and the stage of the project, completely different sequences are also possible. For instance, the data format check can be skipped if it has already been carried out manually.

4.4.1. Upload Module

The first module provides the functionalities to support the data understanding and preprocessing aspects of CRISP-DM. First, a new dataset can be uploaded in CSV format or selected from the list of already uploaded datasets (Figure 4). The artifact provides an overview of the uploaded or selected dataset so that the user can change the parameters for the import if there is incorrect formatting, e.g., caused by an incorrectly selected delimiter. Then, once the file has been read correctly, the target variable must be specified. The selected delimiter, the position of the header line, and the name of the target feature in the dataset are saved in the database.
Furthermore, it is possible to check, rename, or delete the columns if any changes are necessary for training an AI model (Appendix A and Appendix D). However, the management of the dataset (F1) also provides functionality for coding the dataset (F1.4.1), with the conversion of the metric characteristics into ordinal values such as credit ratings (AAA, AA+, …, D). In addition, there is another option for setting the value ranges for the assignment of ordinal values, which helps to exclude outliers in the selected columns by entering minimum and maximum values. For additional insights to be able to make necessary changes in import parameters, such as value ranges to exclude outliers or for an appropriate selection of features, the user can navigate to the plot module and generate graphical impressions with interactive bar charts and boxplots or can check the correlations between the features. Therefore, multiple iterations may be necessary to finalize the parameters and the selection of features.

4.4.2. Plot Module

The plot module serves as a graphical representation of the selected dataset (Figure 5). It thus enables in-depth analysis and comparison between, for example, the periodic datasets (e.g., Dt and Dt−1) to determine the possible conceptual or model changes. The potential deviations in datasets can lead to changes in the model structure, meaning that it is no longer possible to continue training existing models. Initially, the user decides whether only one dataset should be viewed or a comparison is desired. Depending on the decision, a second dataset from the user can be selected or removed from the view. The default view is a bar chart, so that the frequencies of each feature are displayed. The user can also select a pie chart to analyze the distribution of the feature values.
However, the primary focus on percentage composition (Appendix E) can be explored in more detail using the boxplot diagram, which provides a precise overview of the distribution, quantiles, and medians (Figure A9). Moreover, the boxplot diagram also shows outliers and provides an opportunity to remove them. As another important part of the data analysis, a correlation matrix can be selected to obtain the pairwise correlation coefficients of the features in the form of a table (Figure A11). AIDash allows us to make an easy comparison with a second dataset for each diagram type. Further arrangements of datasets based on the graphical analysis can be carried out in the upload module.

4.4.3. Model Module

The model module serves the management of models so that the user can create or upload new models and also delete or download existing models (Figure 6). The buttons for deleting or downloading are located on the left-hand side of the module. A new model can also be created using the corresponding button, whereupon a dialog window opens and allows the user to specify a title for the model. Once the desired model has been selected, individual hyperparameters for the model can be determined manually by the user (Appendix F). Furthermore, the ratio of “good” and “bad” credit applications in the selected dataset is plotted on a doughnut chart (Figure A12). Significant imbalances in the dataset can have a strong negative impact on model accuracy and distort the results by simply predicting the majority class and so providing poor recognition of bad credit applications [13].
Therefore, AIDash provides two options to balance the dataset by adjusting the ratio of “good” and “bad” credit applications (Figure A12). First, the random removal function allows the selection of the desired ratio between “good” and “bad” credit applications, so that credit applications are randomly selected and removed. As a second option, SMOTE–Tomek provides a combination of over- and undersampling techniques to balance the dataset [66]. Datasets for credit assessment contain a very small number of bad loans by default [67,68,69]. The SMOTE–Tomek algorithm randomly selects entries from the minority class and generates synthetic entries using the K-nearest neighbor algorithm until the desired ratio between the minority and majority class is reached [66]. Subsequently, so-called Tomek links are removed, which are defined as entries of the majority class whose nearest neighbor is an entry from the minority class and vice versa. In the implementation of the prototype, oversampling is carried out until a balanced ratio (50:50) between the majority and minority classes is reached. Once the desired adjustments have been made, the model can be trained (Figure A12).
In case of a need to evaluate or compare a model created independently (outside the prototype), it is possible to upload models (Figure 6). For this, a dialog window opens in which the title, description, model name, and model type can be specified. Furthermore, the associated training and test datasets must be uploaded. These must be serialized (e.g., using the Pickle library of Python) so that they can be imported and saved as a new object in the database. However, before the model is saved in the database, the performance metrics can be calculated. Once all relevant models are available, it is possible to navigate to the dashboard module for the evaluation of multiple models.

4.4.4. Dashboard Module

The dashboard module is the core of the prototype (Figure 7) and provides the necessary functionality to evaluate prepared or existing AI models (Figure 6). All previous steps and modules are used to build the model so that it can be evaluated in this module. Moreover, it also includes additional features, such as re-training, testing with another dataset, or creating a report (Appendix G). By default, the dashboard contains a single model including all key performance metrics for the evaluation. However, it is possible to add further models for comparison with the key performance metrics using the plus symbol in the top right corner of the table (Figure 7).
In addition, AIDash provides the option of downloading each model and the associated training/test sets via the corresponding button in the last row of the table. A full report can be created via the third icon, containing model performance based on metrics, metadata about the model, classifications made, training/test datasets, graphs, and the selected hyperparameters of the model (Figure A15). The dashboard module itself provides step-by-step documentation of the modeling process with the specific operations performed. Thus, documentation of the tests and analyses performed is provided either via the dashboard itself or by downloading reports for internal or regulatory approval.
Furthermore, each model contained in the view can be retrained via a drop-down menu by uploading new training and test datasets in serialized format (Figure A13). This function allows users to test existing models with new datasets to conduct further adjustments directly on the dashboard module. For this, a new model is created in the background so that new credit applications can be assessed. Moreover, the model performance can be tested either by entering credit applications manually via a drop-down menu or by file upload (Figure A14), so that the predictions for each entry can be calculated. Moreover, there is a binary prediction option in which the model classifies the application as “good” or “bad” (1—good; 0—bad). Alternatively, it is possible to generate and export the probabilities of the classification (between 0 and 1).
In case of a need to test already trained models with new datasets (e.g., for periodic comparisons between Dt and Dt−1) to determine whether the model performance is sufficient, or any adjustments are necessary, by selecting “Test model with another dataset” in the drop-down menu, the corresponding dialog window is opened (Appendix G) so that a title can be specified for the new object and a new corresponding testing dataset can be uploaded. Subsequently, the prototype generates the predictions and provides the performance metrics of the model.

4.5. Demonstration of AIDash

Following the implementation of the artifact, the functional correctness of the AIDash was validated by applying it to an application scenario. The demonstration of AIDash consisted of two consecutive periods and the datasets of these. Due to the necessary change of datasets, it was necessary to check whether there was data or model drift, so that a new AI model could be trained, tested, compared, and prepared for deployment.
Based on the periodic nature of credit scoring datasets, the first module (Section 4.4.1) provides the functionality to upload a new dataset to compare multiple datasets (between Dperiod1.csv and Dperiod2.csv) and model behavior. Moreover, after the final selection of features from the dataset (see Figure A5), the coding process of features can be conducted (see Figure A6). Since both datasets were already coded, it was not necessary to make further adjustments to the features, and the next step began with the comparison of the feature distributions between datasets, e.g., the current and the next period, on the plot module before training the model (Appendix A).
By using the comparison view, it was possible to identify possible data drifts that affected the model performance and the model behavior (Figure 8). Firstly, the ratio between “good” and “bad” credit applications was similar in the two datasets. However, the second feature (existing_account) showed that the third (for more than 200 monetary units or salary accounts for at least one year in the bank) and fourth types (for no current account) of customers were associated with an upward trend in credit applications compared with the previous period. In addition, we observed a new set of loan durations, with 10 loan applications between 54 and 60 months. In addition, an increased loan duration was also reflected in the amount of the loan applied for (9: 1500 < … ≤ 20,000). These significant changes indicated a data drift that may have led to incorrect assessment by the models.
Due to the identified changes in the new dataset, it was necessary to develop a new model to assess the performance and ensure system safety. To do this, the model module (Figure 6) provided the functionality to manage the modeling process. The definition of a new model with AIDash is assisted by the predefined parameters, which can be further refined if necessary. However, the new dataset contained more “good” than “bad” loans, which was balanced out during the development of the new models (Figure A12). To enable a comparison on the dashboard module to find the best among all existing modeling techniques, we created several models from the list of techniques provided by AIDash.
In the last step of the process, AIDash provides a comparison view of multiple models. In order to demonstrate the model evaluation process, XGB, and DT models trained with Dperiod2.csv and Dperiod1.csv were added so that the performance of each model could be analyzed (Figure 9). Thus, AIDash was able to promote a periodical comparison between the existing and newly trained models. However, in this step, it was necessary to iteratively add/remove models to/from this view for comparison before the final decision was made. In this case, the XGB and DT models allowed us to assess the transition in model performance between the two periods and datasets.
First, the ROC curve indicated that the XGB models overperformed the DT models from this and the previous period. We observed that the curve for the new XGB model with Dperiod2.csv rapidly extended towards the upper right-hand corner of the graph. Second, the new XGB model with the performance metrics in the table below showed that a change or modification of the existing model made sense due to its superior performance. In addition, the performance metrics showed the better quality of the XGB model that was trained with the new dataset, with higher accuracy, higher F1 score, and lower Brier score. The metrics are highlighted with specific colors to make it easy to identify changes in performance. Moreover, AIDash provides the functionality to test the model with another dataset, allowing monitoring of the performance of existing models on new datasets to identify model drift (Figure A13). Lastly, it was necessary to export the model’s train and test datasets and the documentation report to make final adjustments for deployment. In case of a need for any regulatory assessment, the model behavior can be assessed by checking individual credit applications (Figure A13).

5. Discussion and Limitations

The implemented artifact provides an exemplary structure based on the gathered requirements from CRISP-DM to develop, maintain, evaluate, and monitor AI models for financial services. Table 5 provides an overview of the functional scope of the artifact AIDash. Comparing the scope of requirements and functions (Section 4.1) with the steps from CRISP-DM, the feasibility is visualized as follows: the green cells in the table are supported steps, while the yellow cells indicate the partial coverage, and the orange cells are the unsupported steps. The business understanding and deployment phases cannot be supported with the artifact as they require more strategic, conceptional thinking, as well as practical action and integration beyond the artifact. Assuming that the fundamental and strategic requirements are in place, the steps marked in green and yellow can be facilitated and improved within the scope of the artifact.
As in the first research question addressed, the derived functionalities from CRISP-DM outline the functional requirements in Table 2 [F1F6]. In addition, these requirements can help to develop an artifact for other similar use cases working with quantitative datasets in financial services. Moreover, due to the regulatory requirements for the use of AI in financial services, it is necessary to extend the artifact with further explainability, traceability, and transparency approaches throughout the development process—from data preparation to model development. The regulations do not clearly define any criteria for a suitable level of explainability. The artifact provides decision tree and logistic regression models that include explainability by design (Section 4.3). However, for black box models such as NN, it is necessary to extend the artifact with XAI techniques such as LIME or Shapley to provide the required explainability for these models.
The existing research gap in appropriate structures and systems for the development of AI models including training, testing, comparison, evaluation, deployment, and documentation mechanisms was investigated in this study. The requirements identified from the CRISP-DM stages addressed the problem of long development times and concept shifts. The structure of the artifact was formed from the functions derived from the requirements and mapped into four modules (upload, plot, model, and dashboard). The AI models identified in Section 4.3 and legal requirements confirm that models cannot be permitted or prohibited in general. Therefore, dashboards must provide a structure that ensures that the models and the associated processes are transparent, explainable, performant, and individually assessable.
The problem of the occurrence of concept shifts and the associated decline in model performance can impair a model’s ability to classify credit applications based on factual characteristics. Corresponding models are often based on datasets from up to three years in the past, as the development of new models by data analysts is time-consuming [15]. This high level of effort comes from the fact that there are high quality requirements for these models and the documentation both from credit institutions and the regulators, in order to increase transparency and ensure customers’ safety and trust. Therefore, the adaptability of AI-based services must be ensured due to the dynamic macroeconomic environment, so that possible concept shifts can be recognized at an early stage and measures can be taken to ensure good predictive capability of the models (Section 5).
Furthermore, the performance of a model depends significantly on the dataset used, so multiple iterations are relevant for the preparation of datasets and models for determining the best set of configurations [13,70]. Therefore, the possibility to analyze, compare, test, and evaluate datasets and models in detail is provided in the four modules. Furthermore, the export function makes it possible to document systemically the reports on conducted analyses. A potential conceptual shift can easily be detected by testing the already trained models with datasets from a different period. A reduction or improvement in model performance can then easily be discovered using the calculated performance metrics. If further assessments are required, the user can create or upload new models directly.
All these factors lead to lengthy model development and therefore cost-intensive processes that are usually carried out by experts in the field of data science, which is exactly the problem addressed by the developed artifact AIDash. The four outlined modules (upload, plot, model, and dashboard) enable the rapid development or maintenance of AI models. Thanks to the graphical user interface, even non-experts in the field of data science can develop or evaluate AI models. However, it should be noted that these models are not considered final or optimal. Rather, they can serve as a first iteration, as a performance comparison, or for control of the performance of existing or used models.
Moreover, the plot module provides the functionalities to identify deviations from the previous datasets, based on different graphical representations. The function for comparing data records enables assessment and observation of features to determine whether any strong variations or imbalances exist. The corresponding graphical representations from the available datasets help to rapidly identify periodical changes in the datasets that may lead to a conceptual shift. Furthermore, proposed models must be configured on the model module, which provides the functionality to address the imbalances between “good” and “bad” loans in the selected dataset. In addition, the artifact provides the basic functionality for creating models and balancing the selected dataset, which can be extended in fine granularity depending on the objective and required complexity.
The AI dashboard for financial services has significant implications for regulatory authorities, especially under frameworks like the European AI Act. While it enhances transparency and traceability by documenting every step from data preparation to model development, the explainability requirements of the models need to be further investigated based on the specific modeling technique. Therefore, requirements for a model- and case-specific degree of explainability must first be investigated. Furthermore, regulatory authorities demand high levels of transparency and traceability, and the AI dashboard aids compliance by providing detailed documentation of all actions taken during the model development process, supporting regulatory reviews and helping maintain compliance with documentation standards.
Additionally, the dashboard’s functionalities to detect and address conceptual shifts are crucial for maintaining model performance, which is essential for compliance. Systematically analyzing and testing models with new datasets helps to identify shifts early, allowing timely corrective actions. This proactive approach ensures AI models remain accurate and reliable, aligning with regulatory expectations for ongoing validation and performance monitoring. Furthermore, the dashboard promotes standardization in AI model development, ensuring all processes are consistent and well-documented, which is essential for regulatory approval. This streamlined development process reduces time and costs, making AI model development more accessible, aligning with regulatory goals of accessibility and inclusivity, and enabling even non-experts to develop and evaluate models.
Moreover, documenting data preprocessing, modeling, and sampling enhances transparency for regulatory authorities. Sharing prepared models with all documentation and datasets will aid regulatory control and review. This documentation justifies decisions to use or change AI models. It lacks features for recalculating or combining new model features, which could improve prediction quality. Research indicates that splitting datasets to train multiple models can enhance predictions, suggesting that the artifact could benefit from this functionality to reduce bias and improve accuracy [19,71]. Lastly, the problem-centered DSR process should include practical evaluations and expert interviews to refine requirements, concepts, and prototypes based on practical insights [23].

6. Conclusions

This study investigates the development of an AI dashboard to structure and support the development process of AI models for private credit assessment. We identified 13 requirements as part of the first research question and derived 28 key functionalities from them, distributed across four blocks of CRISP-DM (Table 5): data understanding, data preprocessing, modeling, and evaluation. Due to the underlying dataset and corresponding use case of credit assessment, four identified requirements [RDU 1, RDP 3, RDP 4, and REV 3] were not considered for deriving any functionalities. To provide a structured process of modeling, the proposed artifact consists of four modules equipped with the identified functionalities. To address the second research question, we structured the model development process and implemented an artifact that enables multiple iterations between these modules, allowing numerous corrections for models and datasets. Moreover, the provided workflow defines a useful and structured method for data preprocessing to avoid any bias in datasets. Automated documentation makes it possible to track the design process and thus brings more transparency into the field.
The AI dashboard for financial services significantly impacts financial regulations, particularly under frameworks like the European AI Act. It enhances transparency and traceability by meticulously documenting each step of the model development process, aiding regulatory reviews and compliance. This ensures all actions during data preparation, model training, testing, and evaluation are transparent and traceable, essential for regulatory scrutiny. The dashboard’s ability to detect and address concept shifts maintains model performance, aligning with regulatory requirements for ongoing validation and performance monitoring. It promotes standardization in AI model development, ensuring consistent and well-documented processes crucial for regulatory approval. This streamlines development reduces time and costs and supports regulatory goals of accessibility and inclusivity, enabling even non-experts to develop and evaluate models. While the dashboard addresses many current regulatory requirements, further enhancements are needed to fully comply with evolving European AI Act standards, to make it a comprehensive tool for regulatory compliance in AI in financial services.
The artifact AIDash goes beyond the functionalities of a typical dashboard and addresses several problems of model creation and validation in credit assessment. In particular, the prototype addresses the problem of concept shifts (Section 5). Users can recognize a decrease in model performance by testing an existing model with a new dataset, e.g., from a subsequent period. Furthermore, thanks to the graphical presentation (plot module), changes in the dataset can be recognized and countermeasures can thus be undertaken. The dashboard as a web application enables simplified analysis and evaluation of datasets and models so that even non-experts in the field of data science can perform regular analyses and monitor already developed models.
Extensive support for the phases and steps of CRISP-DM enables initial models to be developed quickly and easily. These models can be used, for example, as a performance comparison or can also be further developed so that they can subsequently be deployed in a productive environment. The AI dashboard stores all data relevant for the documentation of model creation. These data can be exported and thus also stored externally so that they can be used for the mandatory documentation for the supervisory authorities. In addition, the explainability desired by regulators must be addressed so that the artifact can be extended by XAI techniques to allow more complex models such as neural networks to be used in financial services. This study has made it clear that further research is needed to define compliant approaches to design, develop, maintain, document, and deploy AI models.

Author Contributions

Conceptualization, M.P. and M.S.; methodology, M.P.; formal analysis, M.P.; data curation, M.P.; writing—original draft preparation, M.P.; writing—review and editing, M.P. and M.S.; visualization, M.P.; validation, M.P.; project administration, M.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

We would like to thank the reviewers for their thoughtful comments and efforts toward improving our manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Dataset

Table A1. Coding of the underlying dataset.
Table A1. Coding of the underlying dataset.
FeatureDescriptionCoding
YTarget0: Bad loan
1: Good loan
X1Type of existing account with the bank 1: No account or debit balance
2: 0 ≤ … < 200
3: ≥200 or salary account for at least 1 year
4: No current account
X2Duration of the loan in months1: ≤6
2: 6 < … ≤ 12
3: 12 < … ≤ 18
4: 18 < … ≤ 24
5: 24 < … ≤ 30
6: 30 < … ≤ 36
7: 36 < … ≤ 42
8: 42 < … ≤ 48
9: 48 < … ≤ 54
10: 54 < … ≤ 60
X3Previous payment performance1: No loans to date/all loans duly repaid
2: Previous loans processed properly at the bank
3: Loans still outstanding with the bank
4: Previously hesitant credit management
5: Critical account/there are other loans (not with the bank)
X4Amount of the loan 1: ≤500
2: 500 < … ≤ 1000
3: 1000 < … ≤ 1500
4: 1500 < … ≤ 2500
5: 2500 < … ≤ 5000
6: 5000 < … ≤ 7500
7: 7500 < … ≤ 10,000
8: 10,000 < … 15,000
9: 15,000 < … ≤ 20,000
10: >20,000
X5Savings account or securities 1: <100
2: 100 ≤ … < 500
3: 500 ≤ … < 1000
4: ≥1000
5: Not available/no savings account
X6Duration of employment with the current employer 1: Unemployed
2: <1 Years
3: 1 ≤ … < 4 Years
4: 4 Years ≤ …< 7 Years
5: ≥7 Years
X7Amount of rates as % of disposable income 1: ≥35%
2: 25 ≤ … < 35%
3: 20 ≤ … < 25%
4: <20%
X8Duration at current residence 1: 1 Year
2: 1 ≤ … < 4 Year(s)
3: 4 ≤ … < 7 Years
4: ≥7 Years
X9Type of existing assets 1: Owning a house or land
2: If not 1: Home loan and savings contract/life insurance
3: If not 1 or 2: Car or other
4: Not available/no assets

Appendix B. Literature Review

Table A2. List of relevant literature from the literature review.
Table A2. List of relevant literature from the literature review.
SourcesAI-Based
Credit
Assessment
Dashboards
for Model
Development
Literature
Review
Interview
Study
Financial
Regulations
Addi and Souissi [72] x
Adisa et al. [73]x x
Arner et al. [74]x
Baier et al. [75]x x
Baldo et al. [76]x
Berrada et al. [77]x x
Cao et al. [78]x
Chornous and Nikolskyi [79] x
Dastile and Celik [80]x
Devi and Chezian [81]x
Devos et al. [82]x
Dyczkowski et al. [83]x x
Guntay et al. [84]x x
Hassan and Jayousi [85]x x
Hentzen et al. [4]x
Hooman et al. [86]x
Hoover [87] xx
Ismawati and Faturohman [88] x
Jemai et al. [89]x
Jin [90]x
Karim et al. [91] x
Khemakhem et al. [69]x x
Khemakhem and Boujelbene [92]x
Kossow et al. [93]x x
Kothandapani [94]x
Kruse et al. [34]x
Kurshan et al. [26]x
Li [95]x x
Lombardo [9]x x
Luo [96]x x
Maurya and Gaur [97]x
Mirza and Ogrenci [98]x
Mittal et al. [99]x
Moula et al. [68]x
Nieto [100] x
Ordabayeva et al. [101] x
Pamuk et al. [19]x x
Pan et al. [102]x
Paul et al. [103]x
Pincovsky et al. [3]x
Punniyamoorthy and Sridevi [104]x x
Qiu et al. [105] x
Ranpara and Patel [106]x
Sadok et al. [107]x
Safiya Parvin and Saleena [108]x
Septama et al. [109]xxx
Shoumo et al. [110] x
Soares de Melo Junior et al. [111]x
Solow-Niederman et al. [112]x
Virág and Nyitrai [113]x
Wang et al. [114]x
Wei [115]x
Wilson Drakes [116]x x
Xiao and Wang [117]x
Xiao et al. [118]x
Yotsawat et al. [119]x
Yu et al. [120]x
Zhang et al. [121]x
Zhao et al. [67]x
Zhong and Wang [122] x
Zhu et al. [123] x

Appendix C. UML Activity Diagrams

Figure A1. UML activity diagram of upload module.
Figure A1. UML activity diagram of upload module.
Make 06 00085 g0a1
Figure A2. UML activity diagram of plot module.
Figure A2. UML activity diagram of plot module.
Make 06 00085 g0a2
Figure A3. UML activity diagram of model module.
Figure A3. UML activity diagram of model module.
Make 06 00085 g0a3
Figure A4. UML activity diagram for dashboard module.
Figure A4. UML activity diagram for dashboard module.
Make 06 00085 g0a4

Appendix D. Upload Module

Figure A5. Change and selection of columns.
Figure A5. Change and selection of columns.
Make 06 00085 g0a5
Figure A6. Coding features.
Figure A6. Coding features.
Make 06 00085 g0a6

Appendix E. Interactive Plot Module

Figure A7. Bar plot comparing two datasets.
Figure A7. Bar plot comparing two datasets.
Make 06 00085 g0a7
Figure A8. Pie plot comparing two datasets.
Figure A8. Pie plot comparing two datasets.
Make 06 00085 g0a8
Figure A9. Boxplot to clean outliers in selected dataset.
Figure A9. Boxplot to clean outliers in selected dataset.
Make 06 00085 g0a9
Figure A10. Violin plots.
Figure A10. Violin plots.
Make 06 00085 g0a10
Figure A11. Confusion matrix of the dataset from Appendix A.
Figure A11. Confusion matrix of the dataset from Appendix A.
Make 06 00085 g0a11

Appendix F. Model Module

Figure A12. Creating a new model with the selected dataset.
Figure A12. Creating a new model with the selected dataset.
Make 06 00085 g0a12

Appendix G. Dashboard Module

Figure A13. Evaluation functions in a comparison view of multiple models.
Figure A13. Evaluation functions in a comparison view of multiple models.
Make 06 00085 g0a13
Figure A14. Testing model decisions for credit applications directly in AIDash.
Figure A14. Testing model decisions for credit applications directly in AIDash.
Make 06 00085 g0a14
Figure A15. An example report of a trained model.
Figure A15. An example report of a trained model.
Make 06 00085 g0a15
Figure A16. Testing model with another dataset in AIDash.
Figure A16. Testing model with another dataset in AIDash.
Make 06 00085 g0a16

Appendix H. Technical Details of AIDash

Table A3. List of Requirements for the Artifact “AIDash”.
Table A3. List of Requirements for the Artifact “AIDash”.
NameVersion
Python3.8.6
Django3.1.3
djangorestframework3.12.4
django-reset-migrations0.4.0
h5py3.1.0
imbalanced_learn0.8.0
imblearn0.0
matplotlib3.3.3
numpy1.20.1
openpyxl3.0.7
pandas1.1.4
plotly4.14.3
scikit_learn1.0.1
scipy1.5.4
seaborn0.11.0
setuptools49.2.1
tensorflow2.5.0
xgboost1.4.2

References

  1. Giudici, P.; Raffinetti, E. SAFE Artificial Intelligence in finance. Financ. Res. Lett. 2023, 56, 104088. [Google Scholar] [CrossRef]
  2. Lee, J. Access to Finance for Artificial Intelligence Regulation in the Financial Services Industry. Eur. Bus. Org. Law. Rev. 2020, 21, 731–757. [Google Scholar] [CrossRef]
  3. Pincovsky, M.; Falcao, A.; Nunes, W.N.; Paula Furtado, A.; Cunha, R.C. Machine Learning applied to credit analysis: A Systematic Literature Review. In Proceedings of the 2021 16th Iberian Conference on Information Systems and Technologies (CISTI), Chaves, Portugal, 23–26 June 2021; pp. 1–5, ISBN 978-989-54659-1-0. [Google Scholar]
  4. Hentzen, J.K.; Hoffmann, A.; Dolan, R.; Pala, E. Artificial intelligence in customer-facing financial services: A systematic literature review and agenda for future research. Int. J. Bank Mark. 2022, 40, 1299–1336. [Google Scholar] [CrossRef]
  5. Cao, L. AI in Finance: Challenges, Techniques, and Opportunities. ACM Comput. Surv. 2023, 55, 64. [Google Scholar] [CrossRef]
  6. Cao, L. AI in Finance: A Review. SSRN J. 2020, 2020, 3647625. [Google Scholar] [CrossRef]
  7. Loyola-Gonzalez, O. Black-Box vs. White-Box: Understanding Their Advantages and Weaknesses from a Practical Point of View. IEEE Access 2019, 7, 154096–154113. [Google Scholar] [CrossRef]
  8. ZVEI. ZVEI Comments on the European AI Regulation (“AI Act”). Available online: https://www.zvei.org/en/press-media/publications/zvei-comments-on-the-european-ai-regulation-ai-act (accessed on 9 February 2024).
  9. Lombardo, G. The AI industry and regulation: Time for implementation? In Ethical Evidence and Policymaking, 1st ed.; Iphofen, R., O’Mathúna, D., Eds.; Bristol University Press: Bristol, UK, 2022; pp. 185–200. ISBN 9781447363958. [Google Scholar]
  10. European Commission. Regulation of the European Parliament and of the Council: Laying Down Harmonised Rules on Artificial Intelligence (Artificial Intelligence Act) and Amending Certain Union Legislative Acts. Available online: https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:52021PC0206 (accessed on 28 December 2023).
  11. Covington. Artificial Intelligence in Financial Services in Europe. Available online: https://www.knplaw.com/wp-content/uploads/2022/02/Artificial-Intelligence-in-Financial-Services-in-Europe-2022.pdf (accessed on 5 December 2023).
  12. ECB. Opinion of the European Central Bank of 29 December 2021 on a Proposal for a Regulation Laying down Harmonised Rules on Artificial Intelligence. Available online: https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:52021AB0040 (accessed on 10 May 2023).
  13. Chen, N.; Ribeiro, B.; Chen, A. Financial credit risk assessment: A recent review. Artif. Intell. Rev. 2016, 45, 1–23. [Google Scholar] [CrossRef]
  14. Bank of England; Prudential Regulation Authority; Financial Conduct Authority. DP5/22—Artificial Intelligence and Machine Learning. Available online: https://www.bankofengland.co.uk/prudential-regulation/publication/2022/october/artificial-intelligence (accessed on 10 May 2023).
  15. Sousa, M.R.; Gama, J.; Brandão, E. A new dynamic modeling framework for credit risk assessment. Expert Syst. Appl. 2016, 45, 341–351. [Google Scholar] [CrossRef]
  16. Son, H.; Hyun, C.; Phan, D.; Hwang, H.J. Data analytic approach for bankruptcy prediction. Expert Syst. Appl. 2019, 138, 112816. [Google Scholar] [CrossRef]
  17. Gabrielli, G.; Melioli, A.; Bertini, F. High-dimensional Data from Financial Statements for a Bankruptcy Prediction Model. In Proceedings of the 2023 IEEE 39th International Conference on Data Engineering Workshops (ICDEW), Anaheim, CA, USA, 3–7 April 2023; pp. 1–7, ISBN 979-8-3503-2244-6. [Google Scholar]
  18. Tsai, C.-F. Two-stage hybrid learning techniques for bankruptcy prediction. Stat. Anal. 2020, 13, 565–572. [Google Scholar] [CrossRef]
  19. Pamuk, M.; Grendel, R.O.; Schumann, M. Towards ML-based Platforms in Finance Industry-An ML Approach to Generate Corporate Bankruptcy Probabilities based on Annual Financial Statements. In Proceedings of the 2021 IEEE/ACIS 20th International Fall Conference on Computer and Information Science (ICIS Fall), Xi’an, China, 13–15 October 2021. [Google Scholar]
  20. BaFin. Big Data and Artificial Intelligence: Principles for the Use of Algorithms in Decision-Making Processes. Available online: https://www.bafin.de/dok/16185950 (accessed on 31 January 2023).
  21. Deutsche Bundesbank; BaFin. Machine Learning in Risk Models—Characteristics and Supervisory Priorities: Consultation Paper. Available online: https://www.bundesbank.de/resource/blob/793670/61532e24c3298d8b24d4d15a34f503a8/mL/2021-07-15-ml-konsultationspapier-data.pdf (accessed on 11 September 2022).
  22. AMF. Artificial Intelligence in Finance: Recommendations for Its Responsible Use. Available online: https://lautorite.qc.ca/fileadmin/lautorite/grand_public/publications/professionnels/rapport-intelligence-artificielle-finance-an.pdf (accessed on 5 December 2023).
  23. Peffers, K.; Tuunanen, T.; Rothenberger, M.A.; Chatterjee, S. A Design Science Research Methodology for Information Systems Research. J. Manag. Inf. Syst. 2007, 24, 45–77. [Google Scholar] [CrossRef]
  24. Chapman, P. CRISP-DM 1.0: Step-by-Step Data Mining Guide; SPSS: Southall, UK, 2000. [Google Scholar]
  25. Bazarbash, M. FinTech in Financial Inclusion: Machine Learning Applications in Assessing Credit Risk; International Monetary Fund: Washington, DC, USA, 2019; ISBN 9781498314428. [Google Scholar]
  26. Kurshan, E.; Shen, H.; Chen, J. Towards self-regulating AI. In Proceedings of the ICAIF ‘20: ACM International Conference on AI in Finance, New York, NY, USA, 15–16 October 2020; Balch, T., Ed.; ACM: New York, NY, USA, 2020; pp. 1–8, ISBN 9781450375849. [Google Scholar]
  27. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
  28. Bertini, F. Artificial Intelligence and data privacy. Sist. Intelligenti 2023, 35, 477–484. [Google Scholar] [CrossRef]
  29. ACPR. Governance of Artificial Intelligence in Finance. Available online: https://acpr.banque-france.fr/en/governance-artificial-intelligence-finance (accessed on 14 February 2024).
  30. De Nederlandsche Bank. General Principles for the Use of Artificial Intelligence in the Financial Sector. Available online: https://www.dnb.nl/media/voffsric/general-principles-for-the-use-of-artificial-intelligence-in-the-financial-sector.pdf (accessed on 25 May 2023).
  31. Banca D’Italia. Artificial Intelligence in Credit Scoring: An Analysis of Some Experiences in the Italian Financial System. Available online: https://www.bancaditalia.it/pubblicazioni/qef/2022-0721/QEF_721_EN.pdf?language_id=1 (accessed on 5 December 2023).
  32. Banca D’Italia. Legal Framework. Available online: https://www.bancaditalia.it/compiti/vigilanza/normativa/index.html?com.dotmarketing.htmlpage.language=1 (accessed on 16 May 2023).
  33. Banco de España. Machine Learning in Credit Risk: Measuring the Dilemma between Prediction and Supervisory Cost. Documentos de Trabajo N.º 2032. Available online: https://www.bde.es/f/webbde/SES/Secciones/Publicaciones/PublicacionesSeriadas/DocumentosTrabajo/20/Files/dt2032e.pdf (accessed on 24 May 2023).
  34. Kruse, L.; Wunderlich, N.; Beck, R. Artificial Intelligence for the Financial Services Industry: What Challenges Organizations to Succeed. In Proceedings of the 52nd Hawaii International Conference on System Sciences, Maui, HI, USA, 8–11 January 2019; Bui, T., Ed.; 2019. [Google Scholar]
  35. Pamuk, M.; Schumann, M.; Nickerson, R.C. What Do the Regulators Mean? A Taxonomy of Regulatory Principles for the Use of AI in Financial Services. Make 2024, 6, 143–155. [Google Scholar] [CrossRef]
  36. Sanz, J.L.C.; Zhu, Y. Toward Scalable Artificial Intelligence in Finance. In Proceedings of the 2021 IEEE International Conference on Services Computing (SCC), Chicago, IL, USA, 5–10 September 2021; pp. 460–469, ISBN 978-1-6654-1683-2. [Google Scholar]
  37. OECD. SAFE (Sustainable, Accurate, Fair and Explainable). Available online: https://oecd.ai/en/catalogue/metrics/safe-%28sustainable-accurate-fair-and-explainable%29 (accessed on 28 December 2023).
  38. OECD. OECD Legal Instruments: Recommendation of the Council on Artificial Intelligence. Available online: https://legalinstruments.oecd.org/en/instruments/OECD-LEGAL-0449#backgroundInformation (accessed on 28 November 2023).
  39. OECD. Artificial Intelligence, Machine Learning and Big Data in Finance—OECD. Available online: https://www.oecd.org/finance/artificial-intelligence-machine-learning-big-data-in-finance.htm (accessed on 5 October 2023).
  40. Cooper, H.M. Organizing knowledge syntheses: A taxonomy of literature reviews. Knowl. Soc. 1988, 1, 104–126. [Google Scholar] [CrossRef]
  41. vom Brocke, J.; Simons, A.; Riemer, K.; Niehaves, B.; Plattfaut, R.; Cleven, A. Standing on the Shoulders of Giants: Challenges and Recommendations of Literature Search in Information Systems Research. Commun. Assoc. Inf. Syst. 2015, 37, 9. [Google Scholar] [CrossRef]
  42. Fahrmeir, L.; Hamerle, A. Multivariate Statistische Verfahren; Walter de Gruyter: Berlin, NY, USA, 1984; ISBN 9783110085099. [Google Scholar]
  43. Ribeiro, M.T.; Singh, S.; Guestrin, C. Why Should I Trust You? In Proceedings of the KDD ‘16: The 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; Krishnapuram, B., Shah, M., Smola, A., Aggarwal, C., Shen, D., Rastogi, R., Eds.; ACM: New York, NY, USA, 2016; pp. 1135–1144, ISBN 9781450342322. [Google Scholar]
  44. Leo, M.; Sharma, S.; Maddulety, K. Machine Learning in Banking Risk Management: A Literature Review. Risks 2019, 7, 29. [Google Scholar] [CrossRef]
  45. Yu, L.; Zhang, X.; Yin, H. An extreme learning machine based virtual sample generation method with feature engineering for credit risk assessment with data scarcity. Expert Syst. Appl. 2022, 202, 117363. [Google Scholar] [CrossRef]
  46. Liu, W.; Fan, H.; Xia, M. Step-wise multi-grained augmented gradient boosting decision trees for credit scoring. Eng. Appl. Artif. Intell. 2021, 97, 104036. [Google Scholar] [CrossRef]
  47. Wang, Y.; Zhang, Y.; Lu, Y.; Yu, X. A Comparative Assessment of Credit Risk Model Based on Machine Learning—A case study of bank loan data. Procedia Comput. Sci. 2020, 174, 141–149. [Google Scholar] [CrossRef]
  48. Fan, S.; Shen, Y.; Peng, S. Improved ML-Based Technique for Credit Card Scoring in Internet Financial Risk Control. Complexity 2020, 2020, 8706285. [Google Scholar] [CrossRef]
  49. Yigitbasioglu, O.M.; Velcu, O. A review of dashboards in performance management: Implications for design and research. Int. J. Account. Inf. Syst. 2012, 13, 41–59. [Google Scholar] [CrossRef]
  50. Few, S. Information Dashboard Design: The Effective Visual Communication of Data, 1st ed.; O’Reilly & Associates: Sebastopol, CA, USA, 2006; ISBN 978-0-596-10016-2. [Google Scholar]
  51. Noonpakdee, W.; Khunkornsiri, T.; Phothichai, A.; Danaisawat, K. A framework for analyzing and developing dashboard templates for small and medium enterprises. In Proceedings of the 2018 5th International Conference on Industrial Engineering and Applications (ICIEA), Singapore, 26–28 April 2018; pp. 479–483, ISBN 978-1-5386-5747-8. [Google Scholar]
  52. Hoo, Z.H.; Candlish, J.; Teare, D. What is an ROC curve? Emerg. Med. J. 2017, 34, 357–359. [Google Scholar] [CrossRef] [PubMed]
  53. Pace, A.; Buttigieg, S.C. Can hospital dashboards provide visibility of information from bedside to board? A case study approach. J. Health Organ. Manag. 2017, 31, 142–161. [Google Scholar] [CrossRef] [PubMed]
  54. Ghazisaeidi, M.; Safdari, R.; Torabi, M.; Mirzaee, M.; Farzi, J.; Goodini, A. Development of Performance Dashboards in Healthcare Sector: Key Practical Issues. Acta Inform. Med. 2015, 23, 317–321. [Google Scholar] [CrossRef] [PubMed]
  55. Maheshwari, D.; Janssen, M. Dashboards for supporting organizational development. In Proceedings of the ICEGOV2014: 8th International Conference on Theory and Practice of Electronic Governance, Guimaraes, Portugal, 27–30 October 2014; Estevez, E., Janssen, M., Barbosa, L.S., Eds.; ACM: New York, NY, USA, 2014; pp. 178–185, ISBN 9781605586113. [Google Scholar]
  56. Fischer, M.J.; Kourany, W.M.; Sovern, K.; Forrester, K.; Griffin, C.; Lightner, N.; Loftus, S.; Murphy, K.; Roth, G.; Palevsky, P.M.; et al. Development, implementation and user experience of the Veterans Health Administration (VHA) dialysis dashboard. BMC Nephrol. 2020, 21, 136. [Google Scholar] [CrossRef] [PubMed]
  57. Rahman, A.A.; Adamu, Y.B.; Harun, P. Review on dashboard application from managerial perspective. In Proceedings of the 2017 5th International Conference on Research and Innovation in Information Systems (ICRIIS), Langkawi, Malaysia, 16–17 July 2017; pp. 1–5, ISBN 978-1-5090-3035-4. [Google Scholar]
  58. Maury, E.; Boldi, M.-O.; Greub, G.; Chavez, V.; Jaton, K.; Opota, O. An Automated Dashboard to Improve Laboratory COVID-19 Diagnostics Management. Front. Digit. Health 2021, 3, 773986. [Google Scholar] [CrossRef] [PubMed]
  59. Büdel, V.; Fritsch, A.; Oberweis, A. Integrating sustainability into day-to-day business. In Proceedings of the ICT4S2020: 7th International Conference on ICT for Sustainability, Bristol, UK, 21–26 June 2020; Chitchyan, R., Schien, D., Moreira, A., Combemale, B., Eds.; ACM: New York, NY, USA, 2020; pp. 56–65, ISBN 9781450375955. [Google Scholar]
  60. Pauwels, K.; Ambler, T.; Clark, B.H.; LaPointe, P.; Reibstein, D.; Skiera, B.; Wierenga, B.; Wiesel, T. Dashboards as a Service. J. Serv. Res. 2009, 12, 175–189. [Google Scholar] [CrossRef]
  61. Lin, C.-Y.; Liang, F.-W.; Li, S.-T.; Lu, T.-H. 5S Dashboard Design Principles for Self-Service Business Intelligence Tool User. J. Big Data Res. 2018, 1, 5–19. [Google Scholar] [CrossRef]
  62. Presthus, W.; Canales, C.A. Business Intelligence Dashboard Design. A Case Study of a Large Logistics Company. Norsk Konferanse for Organisasjoners Bruk av IT. 2015. Available online: https://www.semanticscholar.org/paper/BUSINESS-INTELLIGENCE-DASHBOARD-DESIGN.-A-CASE-OF-A-Presthus-Canales/fc49fcfc4f345c748246d24dc47b5972323d1791 (accessed on 1 May 2024).
  63. Cankaya, E.C.; Odekirk, D. Creating Effective and Efficient User Dashboards through Dynamic Customization and Well-Designed Webpage Visualization. Available online: https://huichawaii.org/wp-content/uploads/2019/06/Cankaya-Ebru-Celikel-2019-STEM-HUIC.pdf (accessed on 1 February 2024).
  64. Liawatimena, S.; Hendric Spits Warnars, H.L.; Trisetyarso, A.; Abdurahman, E.; Soewito, B.; Wibowo, A.; Gaol, F.L.; Abbas, B.S. Django Web Framework Software Metrics Measurement Using Radon and Pylint. In Proceedings of the 2018 Indonesian Association for Pattern Recognition International Conference (INAPR), Jakarta, Indonesia, 7–8 September 2018; pp. 218–222, ISBN 978-1-5386-9422-0. [Google Scholar]
  65. Elliott, T. The State of the Octoverse: Machine learning. Available online: https://github.blog/2019-01-24-the-state-of-the-octoverse-machine-learning/ (accessed on 9 February 2024).
  66. Batista, G.E.; Bazzan, A.L.; Monard, M.C. Balancing Training Data for Automated Annotation of Keywords: A Case Study. Wob 2003, 3, 10–18. [Google Scholar]
  67. Zhao, J.; Wu, Z.; Wu, B. An AdaBoost-DT Model for Credit Scoring. In Proceedings of the 20th Wuhan International Conference on E-Business, WHICEB 2021, Wuhan, China, 28–30 May 2021. [Google Scholar]
  68. Moula, F.E.; Guotai, C.; Abedin, M.Z. Credit default prediction modeling: An application of support vector machine. Risk Manag. 2017, 19, 158–187. [Google Scholar] [CrossRef]
  69. Khemakhem, S.; Ben Said, F.; Boujelbene, Y. Credit risk assessment for unbalanced datasets based on data mining, artificial neural network and support vector machines. J. Model. Manag. 2018, 13, 932–951. [Google Scholar] [CrossRef]
  70. Kouki, M.; Elkhaldi, A. Toward a Predicting Model of Firm Bankruptcy: Evidence from the Tunisian Context. Middle East. Financ. Econ. 2011, 14, 26–43. [Google Scholar]
  71. Pamuk, M.; Schumann, M. Opening a New Era with Machine Learning in Financial Services? Forecasting Corporate Credit Ratings Based on Annual Financial Statements. Int. J. Financial Stud. 2023, 11, 96. [Google Scholar] [CrossRef]
  72. Addi, K.B.; Souissi, N. An Ontology-Based Model for Credit Scoring Knowledge in Microfinance: Towards a Better Decision Making. In Proceedings of the 2020 IEEE 10th International Conference on Intelligent Systems (IS), Varna, Bulgaria, 28–30 August 2020; pp. 380–385, ISBN 978-1-7281-5456-5. [Google Scholar]
  73. Adisa, J.; Ojo, S.; Owolawi, P.; Pretorius, A.; Ojo, S.O. Credit Score Prediction using Genetic Algorithm-LSTM Technique. In Proceedings of the 2022 Conference on Information Communications Technology and Society (ICTAS), Durban, South Africa, 9–10 March 2022; pp. 1–6, ISBN 978-1-6654-4019-6. [Google Scholar]
  74. Arner, D.W.; Zetzsche, D.A.; Buckley, R.P.; Barberis, J.N. FinTech and RegTech: Enabling Innovation While Preserving Financial Stability. Georget. J. Int. Aff. 2017, 18, 47–58. [Google Scholar] [CrossRef]
  75. Baier, L.; Jöhren, F.; Seebacher, S. Challenges in the deployment and operation of machine learning in practice. Res. Pap. 2019, 1–15. [Google Scholar]
  76. Baldo, D.R.; Regio, M.S.; Manssour, I.H. Visual analytics for monitoring credit scoring models. Inf. Vis. 2023, 22, 340–357. [Google Scholar] [CrossRef]
  77. Berrada, I.R.; Barramou, F.Z.; Alami, O.B. A review of Artificial Intelligence approach for credit risk assessment. In Proceedings of the 2022 2nd International Conference on Artificial Intelligence and Signal Processing (AISP), Vijayawada, India, 12–14 February 2022; pp. 1–5, ISBN 978-1-6654-4290-9. [Google Scholar]
  78. Cao, N.T.; Tran, L.H.; Ton-That, A.H. Using machine learning to create a credit scoring model in banking and finance. In Proceedings of the 2021 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE), Brisbane, Australia, 8–10 December 2021; pp. 1–5, ISBN 978-1-6654-9552-3. [Google Scholar]
  79. Chornous, G.; Nikolskyi, I. Business-Oriented Feature Selection for Hybrid Classification Model of Credit Scoring. In Proceedings of the 2018 IEEE Second International Conference on Data Stream Mining & Processing (DSMP), Lviv, Ukraine, 21–25 August 2018; pp. 397–401, ISBN 978-1-5386-2874-4. [Google Scholar]
  80. Dastile, X.; Celik, T. Making Deep Learning-Based Predictions for Credit Scoring Explainable. IEEE Access 2021, 9, 50426–50440. [Google Scholar] [CrossRef]
  81. Devi, C.R.D.; Chezian, R.M. A relative evaluation of the performance of ensemble learning in credit scoring. In Proceedings of the 2016 IEEE International Conference on Advances in Computer Applications (ICACA), Coimbatore, India, 24 October 2016; pp. 161–165, ISBN 978-1-5090-3769-8. [Google Scholar]
  82. Devos, A.; Dhondt, J.; Stripling, E.; Baesens, B.; Broucke, S.v.; Sukhatme, G. Profit Maximizing Logistic Regression Modeling for Credit Scoring. In Proceedings of the 2018 IEEE Data Science Workshop (DSW), Lausanne, Switzerland, 4–6 June 2018; pp. 125–129, ISBN 978-1-5386-4410-2. [Google Scholar]
  83. Dyczkowski, M.; Korczak, J.; Dudycz, H. Multi-criteria Evaluation of the Intelligent Dashboard for SME Managers based on Scorecard Framework. In Proceedings of the 2014 Federated Conference on Computer Science and Information Systems, Warsaw, Poland, 7–10 September 2014; pp. 1147–1155. [Google Scholar]
  84. Guntay, L.; Bozan, E.; Tigrak, U.; Durdu, T.; Ozkahya, G.E. An Explainable Credit Scoring Framework: A Use Case of Addressing Challenges in Applied Machine Learning. In Proceedings of the 2022 IEEE Technology and Engineering Management Conference (TEMSCON EUROPE), Izmir, Turkey, 25–29 April 2022; pp. 222–227, ISBN 978-1-6654-8313-1. [Google Scholar]
  85. Hassan, A.; Jayousi, R. Financial Services Credit Scoring System Using Data Mining. In Proceedings of the 2020 IEEE 14th International Conference on Application of Information and Communication Technologies (AICT), Tashkent, Uzbekistan, 7–9 October 2020; pp. 1–7, ISBN 978-1-7281-7386-3. [Google Scholar]
  86. Hooman, A.; Marthandan, G.; Yusoff, W.F.W.; Omid, M.; Karamizadeh, S. Statistical and Data Mining Methods in Credit Scoring. J. Dev. Areas 2016, 50, 371–381. [Google Scholar] [CrossRef]
  87. Hoover, S. California Credit Dashboard. California Policy Lab [Online]. 19 November 2023. Available online: https://www.capolicylab.org/california-credit-dashboard/ (accessed on 7 March 2024).
  88. Ismawati, I.Y.; Faturohman, T. Credit Risk Scoring Model for Consumer Financing: Logistic Regression Method. In Comparative Analysis of Trade and Finance in Emerging Economies; Barnett, W.A., Sergi, B.S., Eds.; Emerald Publishing Limited: Leeds, UK, 2023; pp. 167–189. ISBN 978-1-80455-759-4. [Google Scholar]
  89. Jemai, J.; Chaieb, M.; Zarrad, A. A Big Data Mining Approach for Credit Risk Analysis. In Proceedings of the 2022 International Symposium on Networks, Computers and Communications (ISNCC), Shenzhen, China, 19–22 July 2022; pp. 1–6, ISBN 978-1-6654-8544-9. [Google Scholar]
  90. Jin, S. Research on Bank Credit Risk Assessment Model based on artificial intelligence algorithm. In Proceedings of the 2022 2nd International Symposium on Artificial Intelligence and its Application on Media (ISAIAM), Xi’an, China, 10–12 June 2022; pp. 128–134, ISBN 978-1-6654-8541-8. [Google Scholar]
  91. Karim, M.; Samad, M.F.; Muntasir, F. Improving Performance Factors of an Imbalanced Credit Risk Dataset Using SMOTE. In Proceedings of the 2022 4th International Conference on Electrical, Computer & Telecommunication Engineering (ICECTE), Rajshahi, Bangladesh, 29–31 December 2022; pp. 1–4, ISBN 979-8-3503-2054-1. [Google Scholar]
  92. Khemakhem, S.; Boujelbene, Y. Predicting credit risk on the basis of financial and non-financial variables and data mining. Rev. Account. Financ. 2018, 17, 316–340. [Google Scholar] [CrossRef]
  93. Kossow, N.; Windwehr, S.; Jenkins, M. Algorithmic Transparency and Accountability. 2024. Available online: http://www.jstor.org/stable/resrep30838 (accessed on 8 March 2024).
  94. Kothandapani, H.P. Drivers and Barriers of Adopting Interactive Dashboard Reporting in the Finance Sector: An Empirical Investigation. Rev. Contemp. Bus. Anal. 2019, 2, 45–70. [Google Scholar]
  95. Li, Y. Credit Risk Prediction Based on Machine Learning Methods. In Proceedings of the 2019 14th International Conference on Computer Science & Education (ICCSE), Toronto, ON, Canada, 19–21 August 2019; pp. 1011–1013, ISBN 978-1-7281-1846-8. [Google Scholar]
  96. Luo, C. A comprehensive decision support approach for credit scoring. Ind. Manag. Data Syst. 2020, 120, 280–290. [Google Scholar] [CrossRef]
  97. Maurya, A.; Gaur, S. A Decision Tree Classifier Based Ensemble Approach to Credit Score Classification. In Proceedings of the 2023 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS), Greater Noida, India, 3–4 November 2023; pp. 620–624, ISBN 979-8-3503-0611-8. [Google Scholar]
  98. Mirza, F.K.; Ogrenci, A.S. Using Hybrid Approaches for Credit Application Scoring. In Proceedings of the 2023 IEEE 23rd International Symposium on Computational Intelligence and Informatics (CINTI), Budapest, Hungary, 20–22 November 2023; pp. 111–116, ISBN 979-8-3503-4294-9. [Google Scholar]
  99. Mittal, A.; Shrivastava, A.; Saxena, A.; Manoria, M. A Study on Credit Risk Assessment in Banking Sector using Data Mining Techniques. In Proceedings of the 2018 International Conference on Advanced Computation and Telecommunication (ICACAT), Bhopal, India, 28–29 December 2018; pp. 1–5, ISBN 978-1-5386-5367-8. [Google Scholar]
  100. Nieto, M.J. Banks, climate risk and financial stability. J. Financ. Regul. Compliance 2019, 27, 243–262. [Google Scholar] [CrossRef]
  101. Ordabayeva, Z.; Moldagulova, A.; Riza, I. Building a Credit Scoring Model Based on the Type of Target Variable. In Proceedings of the 2023 IEEE International Conference on Smart Information Systems and Technologies (SIST), Astana, Kazakhstan, 4–6 May 2023; pp. 31–36, ISBN 979-8-3503-3504-0. [Google Scholar]
  102. Pan, J.-S.; Wu, Y.-Q.; Lv, Y.; Lin, Q.-Y.; Peng, J.-R.; Ye, M.; Cai, X.-F.; Huang, W. Domain-adversarial neural network with joint-distribution adaption for credit risk classification. In Proceedings of the International Conference on Electronic Business, Chiayi, Taiwan, 19–23 October 2023. [Google Scholar]
  103. Paul, S.; Gupta, A.; Kar, A.K.; Singh, V. An Automatic Deep Reinforcement Learning Based Credit Scoring Model using Deep-Q Network for Classification of Customer Credit Requests. In Proceedings of the 2023 IEEE International Symposium on Technology and Society (ISTAS), Swansea, UK, 13–15 September 2023; pp. 1–8, ISBN 979-8-3503-2486-0. [Google Scholar]
  104. Punniyamoorthy, M.; Sridevi, P. Identification of a standard AI based technique for credit risk analysis. Benchmarking Int. J. 2016, 23, 1381–1390. [Google Scholar] [CrossRef]
  105. Qiu, Z.; Li, Y.; Ni, P.; Li, G. Credit Risk Scoring Analysis Based on Machine Learning Models. In Proceedings of the 2019 6th International Conference on Information Science and Control Engineering (ICISCE), Shanghai, China, 20–22 December 2019; pp. 220–224, ISBN 978-1-7281-5712-2. [Google Scholar]
  106. Ranpara, R.D.; Patel, P.S. An Ensemble Learning Approach to Improve Credit Scoring Accuracy for Imbalanced Data. In Proceedings of the 2023 International Conference on Integrated Intelligence and Communication Systems (ICIICS), Kalaburagi, India, 24–25 November 2023; pp. 1–5, ISBN 979-8-3503-1545-5. [Google Scholar]
  107. Sadok, H.; Mahboub, H.; Chaibi, H.; Saadane, R.; Wahbi, M. Applications of Artificial Intelligence in Finance: Prospects, Limits and Risks. In Proceedings of the 2023 International Conference on Digital Age & Technological Advances for Sustainable Development (ICDATA), Casablanca, Morocco, 3–5 May 2023; pp. 145–149, ISBN 979-8-3503-1111-2. [Google Scholar]
  108. Safiya Parvin, A.; Saleena, B. An Ensemble Classifier Model to Predict Credit Scoring—Comparative Analysis. In Proceedings of the 2020 IEEE International Symposium on Smart Electronic Systems (iSES) (Formerly iNiS), Chennai, India, 14–16 December 2020; pp. 27–30, ISBN 978-1-6654-0478-5. [Google Scholar]
  109. Septama, H.D.; Yulianti, T.; Budiyanto, D.; Mulyadi, S.M.; Cahyana, A.H. A Comparative Analysis of Machine Learning Algorithms for Credit Risk Scoring using Chi-Square Feature Selection. In Proceedings of the 2023 International Conference on Converging Technology in Electrical and Information Engineering (ICCTEIE), Bandar Lampung, Indonesia, 25–26 October 2023; pp. 32–37, ISBN 979-8-3503-7064-5. [Google Scholar]
  110. Shoumo, S.Z.H.; Dhruba, M.I.M.; Hossain, S.; Ghani, N.H.; Arif, H.; Islam, S. Application of Machine Learning in Credit Risk Assessment: A Prelude to Smart Banking. In Proceedings of the TENCON 2019—2019 IEEE Region 10 Conference (TENCON), Kochi, India, 17–20 October 2019; pp. 2023–2028, ISBN 978-1-7281-1895-6. [Google Scholar]
  111. Soares de Melo Junior, L.; Nardini, F.M.; Renso, C.; Fernandes de Macedo, J.A. An Empirical Comparison of Classification Algorithms for Imbalanced Credit Scoring Datasets. In Proceedings of the 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA), Boca Raton, FL, USA, 16–19 December 2019; pp. 747–754, ISBN 978-1-7281-4550-1. [Google Scholar]
  112. Solow-Niederman, A.; Choi, Y.J.; van den Broeck, G. The Institutional Life of Algorithmic Risk Assessment. Berkeley Technol. Law J. 2019, 34, 705–744. [Google Scholar]
  113. Virág, M.; Nyitrai, T. Is there a trade-off between the predictive power and the interpretability of bankruptcy models? The case of the first Hungarian bankruptcy prediction model. Acta Oeconomica 2014, 64, 419–440. [Google Scholar] [CrossRef]
  114. Wang, H.; Li, C.; Gu, B.; Min, W. Does AI-based Credit Scoring Improve Financial Inclusion? Evidence from Online Payday Lending. In Proceedings of the International Conference on Information Systems (ICIS) 2019 Conference, Munich, Germany, 15–18 December 2019. [Google Scholar]
  115. Wei, Y. Application of Machine Learning and Artificial Intelligence in Credit Risk Assessment. In Proceedings of the 2023 2nd International Conference on Artificial Intelligence and Autonomous Robot Systems (AIARS), Bristol, UK, 29–31 July 2023; pp. 150–156, ISBN 979-8-3503-2435-8. [Google Scholar]
  116. Wilson Drakes, C.-A. Algorithmic decision-making systems: Boon or bane to credit risk assessment? In Proceedings of the 29th European Conference on Information Systems–Human Values Crisis in a Digitizing World, ECIS 2021, Morocco, Africa, 14–16 June 2021. [Google Scholar]
  117. Xiao, J.; Wang, R. A Triplet Deep Neural Networks Model for Customer Credit Scoring. In Proceedings of the 2023 3rd International Conference on Consumer Electronics and Computer Engineering (ICCECE), Guangzhou, China, 6–8 January 2023; pp. 511–514, ISBN 979-8-3503-3157-8. [Google Scholar]
  118. Xiao, J.; Xie, L.; Liu, D.; Xiao, Y.; Hu, Y. A clustering and selection based transfer ensemble model for customer credit scoring. Filomat 2016, 30, 4015–4026. [Google Scholar] [CrossRef]
  119. Yotsawat, W.; Wattuya, P.; Srivihok, A. A Novel Method for Credit Scoring Based on Cost-Sensitive Neural Network Ensemble. IEEE Access 2021, 9, 78521–78537. [Google Scholar] [CrossRef]
  120. Yu, L.; Li, X.; Tang, L.; Gao, L. An ELM-based Classification Algorithm with Optimal Cutoff Selection for Credit Risk Assessment. Filomat 2016, 30, 4027–4036. [Google Scholar] [CrossRef]
  121. Zhang, X.; Yang, Y.; Zhou, Z. A novel credit scoring model based on optimized random forest. In Proceedings of the 2018 IEEE 8th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA, 8–10 January 2018; pp. 60–65, ISBN 978-1-5386-4649-6. [Google Scholar]
  122. Zhong, Y.; Wang, H. Internet Financial Credit Scoring Models Based on Deep Forest and Resampling Methods. IEEE Access 2023, 11, 8689–8700. [Google Scholar] [CrossRef]
  123. Zhu, B.; Yang, W.; Wang, H.; Yuan, Y. A hybrid deep learning model for consumer credit scoring. In Proceedings of the 2018 International Conference on Artificial Intelligence and Big Data (ICAIBD), Chengdu, China, 26–28 May 2018; pp. 205–208, ISBN 978-1-5386-6987-7. [Google Scholar]
Figure 1. Process of the problem-centered design science research approach in Section 4.1, Section 4.2, Section 4.3, Section 4.4 and Section 4.5.
Figure 1. Process of the problem-centered design science research approach in Section 4.1, Section 4.2, Section 4.3, Section 4.4 and Section 4.5.
Make 06 00085 g001
Figure 2. CRISP-DM process model.
Figure 2. CRISP-DM process model.
Make 06 00085 g002
Figure 3. The use-case diagram of the artifact.
Figure 3. The use-case diagram of the artifact.
Make 06 00085 g003
Figure 4. Data upload module in AIDash.
Figure 4. Data upload module in AIDash.
Make 06 00085 g004
Figure 5. Plot module in AIDash.
Figure 5. Plot module in AIDash.
Make 06 00085 g005
Figure 6. Model module in AIDash.
Figure 6. Model module in AIDash.
Make 06 00085 g006
Figure 7. Dashboard module in AIDash.
Figure 7. Dashboard module in AIDash.
Make 06 00085 g007
Figure 8. Data analysis in AIDash.
Figure 8. Data analysis in AIDash.
Make 06 00085 g008
Figure 9. Evaluation of new models.
Figure 9. Evaluation of new models.
Make 06 00085 g009
Table 1. CRISP-DM process model.
Table 1. CRISP-DM process model.
Business
Understanding
Data
Understanding
Data
Preparation
ModelingEvaluationDeployment
Determine
business objectives
Collect
initial data
Select
data
Select modeling techniquesEvaluate
results
Plan
deployment
Assess
situation
Describe
data
Clean
data
Generate test designReview
process
Plan
monitoring and
maintenance
Determine data mining goalsExplore
data
Construct
data
Build
model
Determine
Next Steps
Produce final
report
Produce
project plan
Verify data
quality
Integrate
data
Assess
model
Review
project
Format
data
Table 2. List of requirements and derived functionalities from CRISP-DM for the artifact.
Table 2. List of requirements and derived functionalities from CRISP-DM for the artifact.
BlockRequirementsFunction IdentifierDerived Functions for Credit
Assessment
Data
Understanding
(RDU)
RDU 1 Data collectionF1Managing datasets
F1.1   Uploading datasets
F1.2   Changing parameters for import
RDU 2 Data descriptionF1.3Describing datasets
RDU 3 Explore dataF2Analyzing datasets
F2.1   Plotting datasets
F2.2   Comparing datasets
RDU 4 Data assessment--
Data Preprocessing
(RDP)
RDP 1 Data selectionF1.4   Managing features
RDP 2 Data cleaningF1.2   Changing parameters for import
F1.4.2      Cleaning outliers
RDP 3 Data construction--
RDP 4 Data integration--
RDP 5 Data formattingF1.2   Changing parameters for import
F1.4.1      Coding features
Modeling
(RM)
RM 1 Model selectionF1.4   Managing features
RM 2 Model preparation and testsF3.1.1      Training a model
F3.1.2      Balancing (training) classes
RM 3 Model creationF3Managing models
F3.1   Creating a model
F3.2   Saving and downloading a model
F3.3   Deleting a model
F3.4   Uploading a model
F3.5   Re-training a model
RM 4 Model assessmentF4Comparing models
F6Exporting results
Evaluation
(REV)
REV 1 Evaluation of
results
F6Exporting results
F5Testing models
F5.1   Checking a credit decision
F5.2   Testing model performance with
other datasets
F3.2   Downloading a model
REV 2 Evaluation of
processes
F6Exporting results
REV 3 Determine the next steps--
Table 3. Selected modeling techniques for credit assessment dashboard.
Table 3. Selected modeling techniques for credit assessment dashboard.
Intelligent methodsArtificial neural network
Decision tree
Support vector classifier
K-nearest neighbors classifier
Statistical modelsLinear regression
Logistic regression
Ensemble methodsXGB classifier
Random forest
Table 4. Metrics for the assessment of the model performance of a binary classification.
Table 4. Metrics for the assessment of the model performance of a binary classification.
ValueNameDescriptionCalculation
AccAccuracy- T P + T N n
RSMERoot mean squared errorEquivalent to the Brier score for one-dimensional predictions ( P x T ( x ) ) 2 n
TPRTrue positive rateRecall, sensitivity,
hit rate
T P T P + F N
TNRTrue negative rateSpecificity,
selectivity
T N T N + F P
FPRFalse positive rateFall-out F P T N + F P
FNRFalse negative rateMiss rate F N T P + F N
PPVPositive prediction valuePrecision T P T P + F N
FF-scoreHarmonic mean of
precision and recall
2 P P V T P R P P V + T P R
AUCThe area under the ROC curve--
NPVNegative prediction value- T N T N + F N
Table 5. Overview of the functional scope of the developed artifact based on CRISP-DM.
Table 5. Overview of the functional scope of the developed artifact based on CRISP-DM.
StepsBusiness
Understanding
Data
Understanding
Data
Preparation
ModelingEvaluationDeployment
1.Determine business objectivesCollect initial
data
Select
data
Select
modeling
techniques
Evaluate
results
Plan
deployment
2.Assess
situation
Describe
data
Clean
data
Generate test
design
Review
process
Plan monitoring and maintenance
3.Determine data mining goalsExplore
data
Construct
data
Build
model
Determine
next steps
Produce
final report
4.Produce
project plan
Verify data
quality
Integrate
data
Assess
model
Review
project
5. Format
data
Green: full coverage; yellow: partial coverage; orange: No coverage.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Pamuk, M.; Schumann, M. Towards AI Dashboards in Financial Services: Design and Implementation of an AI Development Dashboard for Credit Assessment. Mach. Learn. Knowl. Extr. 2024, 6, 1720-1761. https://doi.org/10.3390/make6030085

AMA Style

Pamuk M, Schumann M. Towards AI Dashboards in Financial Services: Design and Implementation of an AI Development Dashboard for Credit Assessment. Machine Learning and Knowledge Extraction. 2024; 6(3):1720-1761. https://doi.org/10.3390/make6030085

Chicago/Turabian Style

Pamuk, Mustafa, and Matthias Schumann. 2024. "Towards AI Dashboards in Financial Services: Design and Implementation of an AI Development Dashboard for Credit Assessment" Machine Learning and Knowledge Extraction 6, no. 3: 1720-1761. https://doi.org/10.3390/make6030085

Article Metrics

Back to TopTop