Machine-Learning-Based Scoring System for Antifraud CISIRTs in Banking Environment
Abstract
:1. Introduction
- We developed a new flexible architecture for a scoring system with a machine learning scoring extension in a banking environment.
- We designed a machine learning model based on data from the early stage in banking transaction processes.
- We developed two autoencoder (AE) models (shallow and deep), which classify the transactions into white, gray, and black.
2. Related Work
3. New Antifraud CISIRT Scoring System: Architecture
3.1. Modules
- Electronic channels: the channel customer service for the bank. Using various electronic channels, the client or the bank’s service performs financial and non-financial operations on behalf of the client.
- Load balancer: an application that breaks down traffic into individual WAF instances to ensure the high availability of services. The bank’s infrastructure and software should handle occasional, sudden, and drastic increases in traffic.
- Web application firewall (WAF): responsible for checking if requests from the outside world are technically correct; whether they contain potential attacks; whether the requested endpoints can be accessed from the given channel; whether the headers are properly signed (HMAC or other algorithm); whether the SSL is correct.
- Antifraud subsystem: A classic antifraud system should include a main processing module, a rules editor, rules testing, a validation module, and an interface for administrators and the CSIRT. Expanding the system with an integration subsystem enables the use of additional, external scoring system. General processing is a module responsible for rule-based processing of requests, as a result of which the following business decisions are obtained:
- “OK” (white): The request is forwarded to the core financial system.
- “NOT OK” (black): The system automatically rejects the request and performs the programmed actions (e.g., inform the client about an attempted instruction via the notification subsystem and forward the report to the CSIRT).
- “MANUAL” (grey): The system provides information to the CSIRT about the need to manually verify a given operation.
- ML scoring extension (new module): The additional antifraud module proposed here has the main task of supporting the CSIRT through the use of classic statistical methods, as well as ML and deep learning methods. It enables the profiling of customer behavior and the detection of unusual actions for the customer. Potentially gray transactions are sent for manual confirmation, even if they do not exceed the minimum amount specified. This module is presented in detail in Section 4.
- CSIRT: the team responsible for protecting users against fraud from inside and outside the organization, which controls the operations performed and reacts to unusual suspicious requests by adding new rules to improve security. The team has access to sensitive data: personal data, a list of enabled authorization tools, installed mobile applications of the bank and devices, and the history of transfers and messages directly received from the core financial service.
- Notification subsystem: a separate system supporting the operation of the communication flow in the organization that sends banking messages via text messages (most often integrated with several GSM operators using the SMSC protocol), e-mails, and pushes for mobile devices (integration with Google Firebase and the Apple Push Notification Service). Notifications can be of any sort: 2FA, authorization, or advertising.
- Core financial service: the central banking system in the organization responsible for handling general and analytical ledgers, sources of money, current and savings accounts, intrabank settlements, and loans and that conducts full reporting. Optionally, it can be extended with a card system, i.e., debit and credit cards.
- Call center: a system that is a work tool for operators who are responsible for telephone contact with customers. In addition to the tasks related to the authorization of operations, they also perform sales and marketing work.
- Customer file system: a system containing customer files, which include personal data, information on contracts, customer address and correspondence data, information on signed consents (PSD2 and marketing), and history of contact with the bank.
- Auth module: This bank authorization system includes password hashes, the list of client authorization tools, full journal logs/authorization, and the history of authorization tool changes.
- Support financial system: additional system supporting the core financial service. It is responsible for verifying the operation limits. Currently, due to the need to support modern payment methods, it is used as a generator of virtual card numbers (e.g., Apple Pay or Google Pay). The core financial service, without the participation of the support financial system, is not able to correctly decode a transaction from the clearing file that was performed with a virtual card.
3.2. Data Flow in Proposed Architecture
- Step 1: The customer or bank service on behalf of the ordering customer through electronic channels performs financial and non-financial operations in the context of the customer.
- Step 2: Load balancer receives the request from electronic channels.
- Step 3: The received request, owing to the built-in algorithm, is transferred to a selected WAF instance.
- Step 4: The WAF verifies the correctness of the received request. In the case of correct verification, the request is transferred to the antifraud system.
- Step 5: General processing checks compliance with the rules and makes the following business decisions:
- 5a “OK” (white): The request is sent for extended verification in external scoring services;
- 5b “MANUAL” (grey): The request is sent for extended verification at external scoring services;
- 5c “NOT OK” (black): The request is rejected. The rejection information is sent to the notification subsystem and to the CSIRT.
- Step 6: The external scoring service checks compliance with external scoring systems and issues business decisions that are returned to the antifraud subsystem:
- 6a “OK” (white): The request is sent for extended verification to the ML scoring extension (new module);
- 6b “MANUAL” (grey): The request is sent for extended verification to the ML scoring extension (new module);
- 6c “NOT OK” (manual): The request is rejected. The rejection information is sent to the notification subsystem and the CSIRT.
- Step 7: The ML scoring extension performs multi-faceted statistical and machine/deep learning verification and issues business decisions that are returned to the antifraud subsystem:
- 7a “OK”: The request is valid and passed to the core financial system;
- 7b “MANUAL”: The request is directed to the call center for manual verification by the customer, and the CSIRT receives information about the incident;
- 7c “NOT OK”: The request is rejected. The rejection information is sent to the notification subsystem and the CSIRT.
- Step 8: The call center verifies the client and issues business decisions, which are returned to the antifraud subsystem:
- 8a “OK”: The request is valid and passed to the core financial system;
- 8b “NOT OK”: The request is rejected and the rejection information is sent to the CSIRT.
3.3. Data Collection Process: Privacy Leakage
4. ML Scoring Extension
4.1. ML Scoring Extension Modules
- External gateways: These entry/access gates to the main ML scoring extension system receive requests and forward them to the integration module in a standardized message. An important aspect is the state of today’s IT and technological structure, which obliges new systems to adapt to those already operating in the organization. New systems with additional functionality are implemented without disturbing the current structure. This is because of the value of old (often considered obsolete) systems that are stable, efficient, and free from critical bugs. Their development and modernization would be costly, risky, and time-consuming. The external gateways module is of key importance for us in future work, because we will analyze login operations and financial operations that we will directly connect with central systems. This will enable us to parallelize and accelerate the issuing of decisions for the organization’s main antifraud system.
- Integration module: In mature organizations, the supplier is responsible for the implementation of new software. It is their duty to integrate their system with the existing one. The exceptions are fintech/startup companies that most often order a SAAS service, in which the organization is responsible for the integration. The integration module receives as-is information from the input gates and then converts these data into the internal ML scoring extension structures. As such, the mentioned part of the software, in a standardized manner, polls the internal API ML scoring extension. In real implementations, the integration module is custom and cannot be a ready made as a universal “box”.
- External API: This in a system access module that can be used if the organization can or wants to integrate with external software. The benefit for the organization when using the external API for external gateways is that the data it uses and the services it queries are already in the well-known ML scoring extension format. As such, we ignore the loss of performance in the transformation between different formats and simplify the physical architecture of the solution. The external API issues access using modern technologies using REST API (NodeJS) and AMQP (RabbitMQ). The external API, in calling the Internet API, creates an abstraction layer that clearly separates the core from the implemented systems. This results in the possibility of controlling permissions (the list of ML scoring extension services available for individual internal systems of the organization may be different) and additional verification of the entered data.
- Internal API: This is the internal system API. Services have a specific scheme using OpenApi. Owing to the standardization, each internal module of the system coherently communicates with the core module. The Internal API orchestrates core services, completes missing data from various services, and creates more complex core requests.
- Core: the main module of the system, written in the CPP language. The core maintains journal operations and executes and controls asynchronous recursive operations. A load balancer for services is built in due to the possibility of each module working in several instances. The core functionality also includes acting as a router for services. The request received from the internal API redirects to the appropriate module.
- Reporting module: the module used to generate daily and periodic reports. It allows the issuing of a service that generates any report upon request. It has a dedicated, separate database so that the creation of reports does not interfere with the operation of the entire system.
- Scoring module: provides scoring services based on ML solutions. Upon input, the operation is evaluated; upon output, it receives one of three possible responses (OK, NOT-OK, or MANUAL). The scoring module uses the internal API to query the ML execution module, which sequentially starts the processing of decisions into deep and shallow autoencoders.
- ML tuning module: the service provider for tuning a new solution, which processes the operations file to tune the autoencoders. The module is based on Keras and Tensorflow.
- ML testing module: the module that provides services for versioning autoencoders, verifies processed new adjustments, processes full historical tests, and evaluates performance against historical tests.
- ML execution module: the module that provides services for the processing of financial and non-financial operations on autoencoders. In response to the request, the operation similarity float value (predict) is returned. We developed the module using Keras.
- Operator interface: the module that provides a web interface for a CSIRT operator. It contains data on processing queues for the entire module and its history, the history of module decisions, the currently analyzed operation, and the “MANUAL” decision list.
- Administration interface: the module that provides a web interface for the administrator. It allows the granting of privileges to CSIRT operators, viewing journal logs and operations, and viewing operation statistics. Moreover, it provides information on system performance.
- Log viewer module: the module that provides a web interface for viewing logs, based on Kibane, Elasticsearch, and APM. It offers multidimensional viewing of system logs, creating data views, and collecting metrics.
- Notification module: the module providing notifications within the ML scoring extension system, e.g., whether the ordered report is ready for viewing/downloading.
- Gateway out: the output gates necessary for communication between ML scoring extension and external systems and for integration with services within the organization (e.g., data warehousing and sending requests to services).
4.2. Data Flow for Decision-Making Operation in Proposed ML Scoring System
- Step 1: The organization’s core systems trigger a request for a decision in the context of an ongoing financial or non-financial operation:
- 1a: In the event that the organization’s systems are not adapted to use the issued API by the ML scoring extension system, technical integration occurs through the interfaces that these systems understand. The request is sent to the external gateway. At this point, there is a contact between the organization’s systems and the gates, which translate one transport into the transport understood by the ML scoring extension system.
- 1b: In the event that integration with the organization’s systems occurs through development and integration from their side, these systems can directly use the external API. Requests to our system are natively understood by the system, and no extra layer of transformation is needed.
- 1c: Then, the request is sent to the integration module, where the technical and business transformation of the request coming from the organization’s systems into an abstraction understood by our system occurs. The business data create a valid request to the internal API native to our system.
- Step 2: The internal API verifies the parameters of the operation and authorizations to invoke services. When a decision is made, this service takes the operation upon entry and issues the decision upon exit. It is a high-level service that is completely transparent for the client (they do not know what services are called or in what order).
- Step 3: The request is sent to the core from the internal API. The core has data on active instances of the ML execution module and knows what addresses they have and what the load is. The core selects the best instance according to the load-balancing algorithm (choosing the least-loaded, where the request has the fastest chance to process).
- Step 4: A financial or non-financial operation is processed. The input vector for Keras/ Tensorflow is created here, which is called in the “predict” method in Tensorflow. On the basis of the threshold from the previously trained autoencoder (AE), one of three types of decisions is determined: “OK” (white), “MANUAL” (gray), or “NOT OK” (black).
- Step 5: When notified that the decision is “MANUAL” (gray) or “NOT OK” (black), the ML execution module sends a message via the core to the notification module that an email notification should be sent to the CSIRT. The notification module uses ready-made email templates. If the organization’s systems are responsible for sending the message, it forwards the content of such an email to the gateway out.
- Step 6: In the gateway out, various support systems of the organization are integrated, in this case with the email-sending service. The previously prepared email is pushed to the service within the organization for further processing and sending the email.
- Step 7: The organization’s external system performs the received requests in accordance with the agreed functionalities.
5. ML Execution Module
5.1. Data Acquisition and Preparation
- Date and time of the server-side event;
- Session ID;
- Client’s IP address;
- Operating system type;
- Browser type and version.
5.2. ML Model Descriptions
- (A)
- Classical (AE), which is shallow, consisting only of the input layer I with 36 inputs, encoding data vector features, the code (representation) layer C with 3 neurons, and the symmetrical output layer O, also consisting of 36 neurons (Figure 3);
- (B)
- Deep AE with additional (also symmetrical) hidden layers and , composed of 10 neurons each in the encoder/decoder section (Figure 4).
5.3. Training Procedure
5.4. Results
6. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Cabaj, K.; Torres, J.; Kotulski, Z.; Ksiezopolski, B.; Mazurczyk, W. Cybersecurity: Trends, issues, and challenges. EURASIP J. Inf. Secur. 2018, 2018, 10. [Google Scholar] [CrossRef] [Green Version]
- Srokosz, M.; Ksiezopolski, B. A new WAF-based architecture for protecting web applications against CSRF attacks in malicious environment. In Proceedings of the 5th International Conference on Cryptography and Security Systems, Poznan, Poland, 9–12 September 2018. [Google Scholar]
- Kozlowski, M.; Ksiezopolski, B. A New Method of Testing Machine Learning Models of Detection for Targeted DDoS Attacks. In Proceedings of the 18th International Conference on Security and Cryptography, Online, 6–8 July 2021. [Google Scholar]
- Sánchez, M.; Torres, J.; Zambrano, P.; Flores, P. FraudFind: Financial fraud detection by analyzing human behavior. In Proceedings of the IEEE 8th Annual Computing and Communication Workshop and Conference, Nevada, LV, USA, 8–10 January 2018; pp. 281–286. [Google Scholar]
- Cao, S.; Yang, X.; Chen, C.; Zhou, J.; Li, X.; Qi, Y. TitAnt: Online real-time transaction fraud detection in Ant Financial. Proc. VLDB Endow. 2019, 12, 2082–2093. [Google Scholar] [CrossRef]
- Aschi, M.; Bonura, S.; Masi, N.; Messina, D.; Profeta, D. Cybersecurity and Fraud Detection in Financial Transactions. In Big Data and Artificial Intelligence in Digital Finance; Soldatos, J., Kyriazis, D., Eds.; Springer: Cham, Switzerland, 2022; pp. 269–278. [Google Scholar]
- Powell, B.A. Detecting malicious logins as graph anomalies. J. Inf. Secur. Appl. 2020, 54, 102557. [Google Scholar] [CrossRef]
- Ruff, L.; Kauffmann, J.R.; Vandermeulen, R.A.; Montavon, G.; Samek, W.; Kloft, M.; Dietterich, T.G.; Müller, K.-R. A Unifying Review of Deep and Shallow Anomaly Detection. arXiv 2021, arXiv:2009.11732v3. [Google Scholar] [CrossRef]
- Hilal, W.; Gadsden, S.A.; Yawney, J. Financial Fraud: A Review of Anomaly Detection Techniques and Recent Advances. Expert Syst. Appl. 2022, 193, 116429. [Google Scholar] [CrossRef]
- Siadati, H.; Memon, N. Detecting Structurally Anomalous Logins Within Enterprise Networks. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA, 30 October–3 November 2017; pp. 1273–1284. [Google Scholar]
- Wei, W.; Li, J.; Cao, L.; Ou, Y.; Chen, J. Effective detection of sophisticated online banking fraud on extremely imbalanced data. World Wide Web 2013, 16, 449–475. [Google Scholar] [CrossRef]
- Amirneni, S. Anomaly Detection in Highly Imbalanced Dataset. Master’s Thesis, University of Dublin, Trinity College, Dublin, Ireland, August 2019. Available online: https://www.scss.tcd.ie/publications/theses/diss/2019/TCD-SCSS-DISSERTATION-2019-029.pdf (accessed on 1 January 2022).
- Chan, P.K.; Mahoney, M.V.; Arshad, M.H. A Machine Learning Approach to Anomaly Detection; Technical Report CS-2003-06; Florida Institute of Technology: Melbourne, FL, USA, 2003; Available online: https://www.researchgate.net/publication/228858008_A_machine_learning_approach_to_anomaly_detection (accessed on 1 January 2022).
- Li, K.-L.; Huang, H.-K.; Tian, S.-F.; Xu, W. Improving One-Class SVM for Anomaly Detection. In Proceedings of the Second International Conference on Machine Learning and Cybernetics, Tianjin, China, 14–17 July 2013; pp. 3077–3081. [Google Scholar]
- Chapple, M.J.; Chawla, N.; Striegel, A. Authentication Anomaly Detection: A Case Study On A Virtual Private Network. In Proceedings of the MineNet 2007 Workshop on Mining Network Data, San Diego, CA, USA, 12 June 2007; pp. 17–22. [Google Scholar]
- Chandola, V.; Banerjee, A.; Kumar, V. Anomaly Detection for Discrete Sequences: A Survey. IEEE Trans. Knowl. Data. Eng. 2012, 24, 823–839. [Google Scholar] [CrossRef]
- Omar, S.; Ngadi, A.; Jebur, H.H. Machine Learning Techniques for Anomaly Detection: An Overview. Int. J. Comput. Appl. 2013, 79, 33–41. [Google Scholar] [CrossRef]
- Yao, D.; Shu, X.; Cheng, L.; Stolfo, S.J.; Bertino, E.; Sandhu, R. Anomaly Detection as a Service: Challenges, Advances, and Opportunities; Springer: Cham, Switzerland, 2018. [Google Scholar]
- Abu Sulayman, I.I.M.; Ouda, A. User Modeling via Anomaly Detection Techniques for User Authentication. In Proceedings of the IEEE 10th Annual Information Technology, Electronics and Mobile Communication Conference, Vancouver, BC, Canada, 17–19 October 2019; pp. 0169–0176. [Google Scholar]
- Chalapathy, R.; Chawla, S. Deep Learning for Anomaly Detection: A Survey. arXiv 2019, arXiv:1901.03407v2. [Google Scholar]
- Sarker, I.H. Deep Cybersecurity: A Comprehensive Overview from Neural Network and Deep Learning Perspective. SN Comput. Sci. 2021, 2, 154. [Google Scholar] [CrossRef]
- Plössl, K.; Federrath, H.; Nowey, T. Protection Mechanisms Against Phishing Attacks. In Trust, Privacy, and Security in Digital Business; Gritzalis, S., Weippl, E.R., Kotsis, G., Tjoa, A.M., Khalil, I., Eds.; Springer: Cham, Switzerland, 2005; pp. 20–29. [Google Scholar]
- Arora, R.; Behal, S. Phishing Defense Mechanism. Int. J. Comput. Sci. Technol. 2012, 3, 141–144. [Google Scholar]
- Sankhwar, S.; Pandey, D. A Comparative Analysis of antiPhishing Mechanisms: Email Phishing. Int. J. Adv. Res. Comput. Sci. 2017, 8, 567–574. [Google Scholar]
- Ahmed, M.; Mahmood, A.N.; Islam, M.R. A survey of anomaly detection techniques in financial domain. Future Gener. Comput. Syst. 2016, 55, 278–288. [Google Scholar] [CrossRef]
- Chen, J.; Shen, Y.; Ali, R. Credit Card Fraud Detection Using Sparse Autoencoder and Generative Adversarial Network. In Proceedings of the IEEE 9th Annual Information Technology, Electronics and Mobile Communication Conference, Vancouver, BC, Canada, 1–3 November 2018; pp. 1054–1059. [Google Scholar]
- Lima, S. Deep Learning for Fraud Detection in the Banking Industry; Human IST Institute, University of Fribourg: Fribourg, Switzerland, 2018; Available online: https://www.researchgate.net/publication/329894393_Deep_learning_for_fraud_detection_in_the_banking_industry (accessed on 1 January 2022).
- Zareapoor, M.; Seeja, K.R.; Alam, M.A. Analysis on Credit Card Fraud Detection Techniques: Based on Certain Design Criteria. Int. J. Comput. Appl. 2012, 52, 35–42. [Google Scholar] [CrossRef]
- Wang, D.; Chen, B.; Chen, J. Credit card fraud detection strategies with consumer incentives. Omega 2018, 88, 179–195. [Google Scholar] [CrossRef]
- Bignell, K.B. Authentication in an Internet Banking Environment; Towards Developing a Strategy for Fraud Detection. In Proceedings of the International Conference on Internet Surveillance and Protection, Côte d’Azur, France, 26–29 August 2006; p. 23. [Google Scholar]
- Ramachandran, P.; Zoph, B.; Le, Q.V. Searching for Activation Functions. arXiv 2017, arXiv:1710.05941v2. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2015, arXiv:1412.6980v9. [Google Scholar]
Characteristics of the training dataset | |
---|---|
Data collection period | 26 January–31 March 2018 (65 days) |
Number of records (raw data) | over 5.7 million |
Number of discarded records (hart-beats) | approximately 3.8 million |
Number of training records (effective) | 1,918,349 |
Features extracted (raw) |
|
Features engineered (effective) |
|
Total number of features (including one-hot encoding) | 36 |
Training/test split of dataset (%) | 80:20 |
Model | White | Gray | Black | |||
---|---|---|---|---|---|---|
# | % | # | % | # | % | |
Shallow (A) | 1,765,415 | 92.03 | 146,146 | 7.62 | 6788 | 0.35 |
Deep (B) | 1,825,370 | 95.15 | 91,374 | 4.76 | 1605 | 0.08 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Srokosz, M.; Bobyk, A.; Ksiezopolski, B.; Wydra, M. Machine-Learning-Based Scoring System for Antifraud CISIRTs in Banking Environment. Electronics 2023, 12, 251. https://doi.org/10.3390/electronics12010251
Srokosz M, Bobyk A, Ksiezopolski B, Wydra M. Machine-Learning-Based Scoring System for Antifraud CISIRTs in Banking Environment. Electronics. 2023; 12(1):251. https://doi.org/10.3390/electronics12010251
Chicago/Turabian StyleSrokosz, Michal, Andrzej Bobyk, Bogdan Ksiezopolski, and Michal Wydra. 2023. "Machine-Learning-Based Scoring System for Antifraud CISIRTs in Banking Environment" Electronics 12, no. 1: 251. https://doi.org/10.3390/electronics12010251
APA StyleSrokosz, M., Bobyk, A., Ksiezopolski, B., & Wydra, M. (2023). Machine-Learning-Based Scoring System for Antifraud CISIRTs in Banking Environment. Electronics, 12(1), 251. https://doi.org/10.3390/electronics12010251