*Article* **360***◦* **Retail Business Analytics by Adopting Hybrid Machine Learning and a Business Intelligence Approach**

**Abdulmajeed Alqhatani 1,†, Muhammad Shoaib Ashraf 2,†, Javed Ferzund 3, Ahmad Shaf 3,\*, Hamad Ali Abosaq 4, Saifur Rahman 5, Muhammad Irfan <sup>5</sup> and Samar M. Alqhtani <sup>1</sup>**


**Abstract:** Business owners and managers need strategic information to plan and execute their decisions regarding business operations. They work in a cyclic plan of execution and evaluation. In order to run this cycle smoothly, they need a mechanism that should access the entire business performance. The sole purpose of this study is to assist them through applied research frameworkbased analysis to obtain effective results. The backbone of the purposed framework is a hybrid mechanism that comprises business intelligence (BI) and machine learning (ML) to support 360-degree organization-wide analysis. BI modeling gives descriptive and diagnostic analysis via interactive reports with quick ad hoc analysis which can be performed by executives and managers. ML modeling predicts the performance and highlights the potential customers, products, and time intervals. The whole mechanism is resource-efficient and automated once it binds with the operational data pipeline and presented results in a highly efficient manner. Data analysis is far more efficient when it is applied to the right data at the right time and presents the insights to the right stakeholders in a friendly, usable environment. The results are beneficial to viewing the past, current, and future performance with self-explanatory graphical interpretation. In the proposed system, a clear performance view is possible by utilizing the sales transaction data. By exploring the hidden patterns of sales facts, the impact of the business dimensions is evaluated and presented on a dynamically filtered dashboard.

**Keywords:** business intelligence; digital revolution; sustainable business model; data warehousing; artificial intelligence; B2C; B2B

#### **1. Introduction**

In recent years, business intelligence has drastically changed business operations by providing a smart cloud computing-based solution to enhance the working capacity and expand the business on a large scale, such as using knowledge performance indicators (KPIs) to enhance the performance of Nike's company with conversion into analytical reports. One study showed how KPIs are designed smartly to have a direct impact on overall performance management [1]. The core objective of business intelligence services is to view the past performance, diagnose the decision mistakes, and predict and forecast the future demands, benefits, and the best plan based on business facts and figures. By keeping in mind the past, present, and future direction of a business, warehousing business

**Citation:** Alqhatani, A.; Ashraf, M.S.; Ferzund, J.; Shaf, A.; Abosaq, H.A.; Rahman, S.; Irfan, M.; Alqhtani, S.M. 360◦ Retail Business Analytics by Adopting Hybrid Machine Learning and a Business Intelligence Approach. *Sustainability* **2022**, *14*, 11942. https://doi.org/10.3390/su141911942

Academic Editors: Adam Jabło ´nski, Marek Jabło ´nski and Dariusz Zarzecki

Received: 14 August 2022 Accepted: 15 September 2022 Published: 22 September 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

intelligence (BI) and predictive artificial intelligence (AI) solutions are the best tools for designing flexible and powerful computing applications [2]. The authors of [3] studied how enterprises across the globe are now using big data analytics cloud solutions for their businesses to gain a competitive edge and the strategic planning and development for future demands.

The major challenges in big data analytics and business intelligence are unrefined BI strategies, the right business measures and performance scaling, and the adoption of self-service cross-department analytics [4]. Companies are using customer relationship management (CRM) and the decision support system (DSS) to analyze and develop intelligent and smart business solutions, moving from a product- to customer-centric approach to revise business strategies and high-valued performance metrics adoption [5].

Recently, the customer 360◦ view term was induced in BI, which comprises the broader view of a business with the help of dynamic filtering and querying responsive dashboards. The process of BI mainly consists of four phases which are business policy and operation framing, data extraction, transformation, and loading, data and machine modeling by applying BI and AI techniques, and at last, presenting the results visually by applying data mining and statistical techniques. Broadly speaking, there are four types of businesses in terms of scale, which are wholesale, business-to-consumer (B2C) retail, business-to-business (B2B) production and manufacturing, direct-to-consumer (D2C) deals and marketing, and facilitators in drop shipping. To acquire the intelligent systems for a specific business, the cost, quality, and access are the core factors in terms of business concerns [6].

To address these issues, software and platforms as services based on cloud computing yield easy access and cost-effective solutions for the end users. The primary concern of business professionals is to lead and expand a business using business intelligence smart applications and big data analytics by exploring why, what, which, where, and how business performance indicators could be evaluated to maintain and increase sales, products, employees, and customers. For effective business solutions, business management faces data management, performance evaluation, and smart business plan issues to smartly manage and lead a business and gain a competitive advantage. With the development of big data analytics and cloud computing solutions, businesses are utilizing their services to reduce costs and provide fast access, quick results, and reliable BI solutions. Customer relationship management, customer experience, departmental business data, quality work in operational systems, and knowledge performance indicators are the major factors driving the best solutions [7].

Data management applications based on database technologies are used in different departments such as administration, accounts, human resources (HR), sales, marketing, and customer care services. These departments produce departmental data such as account details, employee details, product manufacturing, stock and supplier details, transactions, payments, shipment details, customer-related information, and feedback via CRM apps and marketing details. With the advancement of eCommerce businesses, the method of doing business is now very smooth and flexible. Online customers can approach e-business easily and select, order, and track products via social media and eCommerce websites. Business owners know the worth of e-channels and try to facilitate eCommerce websites, mobile apps, and social media. These channels are producing big data as the result of both sides' data management operations. Data entries from administrative operators market products publicly and aggregate customers' information, orders, and reviews. Management with database management apps for day-to-day transactions is not a big issue, but big data analytics is a complex process that requires many resources. For a proactive approach to lead a business, the BI set-up must view different aspects of the past, current, and future performance perspectives.

The actual usage of BI is for utilizing data to support decision making with compelling and interactive data visualization, such as with the Qlik View BI tool [8]. To turn the raw data into knowledge and analytical insights into strategy making, the systematic process is followed with advanced BI tools. At the same time, when designing the data warehouse, the first knowledge performance indicators describe what to evaluate and the performance scales. In the second step, information source documents are prepared in which abstract detailed information is maintained. The third step matrix documents are prepared to map the information from the source to the target desired analytical attributes using transforming, cleaning, and preprocessing techniques. These documents keep the sources attributes, technical constraints, and target mapping. All these documents are maintained with log files to maintain the versions and for the team's distribution. All source information is gathered via extraction transformation loading (ETL) or an ELT mechanism and placed in the staging area to avoid data loss. These data are then stored in a warehouse scheme for further processing. Mainly, two approaches are used to design and construct a warehouse first: top to bottom, which starts from a complete warehouse and goes to to data marts, and the second is bottom to top, which combines data marts with the data warehouse for incremental development. For small types of business, the bottomto-top approach is usually used. After collecting business requirements, the warehouse schema is designed, which can be of four types: (1) a start schema, (2) snowflake schema, (3) galaxy schema, or (4) clustered galaxy schema.

Each design has its pros and cons. Usually, the choice of design is based on storage and time constraints. BI is the systematic process that holds the data and business needs and implements a data collection mechanism, ensuring data quality and implementing data marts and data lakes with ETL and ELT to put data into the warehouse [9]. The BI warehousing provides a descriptive framework to view the business performance in terms of business dimensions in a detailed manner using data warehousing, which involves extraction, transformation, and loading to collect relevant business data from operational systems and external sources and preprocessing and data modeling under a relational schema to make them usable for relational online analytical processing (ROLAP) and multidimensional online analytical processing (MOLAP), which filter the data using business dimensions such as time, geo-graphs, products and HR, and a data visualization process in which ad hoc reporting is performed to view and explore the business performance as well as the impact of business features. On the other hand, machine learning (ML) allows for supporting the decision makers with helpful trained models which capture the data trends, acting as a magic box to answer the queries with probability. ML offers multiple analytics (personalized recommendations, sales, demand forecasting, and customer behavior prediction) in eCommerce [10].

ML offers capturing data trends such as customer segmentation, sales forecasting, and product recommendation systems. For predictive analysis to see the future based on past and current business data, strategy makers set up various questions. In [11], the authors performed predictive analysis to predict the sales in weekly collected data.

In [12], the authors proposed an ensemble approach by combining S-ARIMA, vector auto regressor, and long short-term memory (LSTM) for demand forecasting of monthly product distributor orders with some external features such as weather, campaign, and holidays. In [13], the author implemented a model to manage chain stores via BI modeling and presented a dashboard to report on the managers and strategy makers for effective decision making. The proposed model considers the KPIs and divides the dashboard reports for different business layers to support the supply chain and, ultimately, the impact on sales. For practical customer and sales data usage in a retail business, the author of [14] developed a framework under the BI tool and quickly communicated with a chat bot using point of sale (POS) transitional data. The author [15] studied the impact of business intelligence and analytics for retailing intelligent data insights. They used 25 global retailers' online databases and web source datasets and analyzed the retailer activities in different phases.

The authors of [16] presented a profit-based forecasting model for shoe multi-seasonal retail data. They evaluated ML models with 10 parts via cross-validation and real-time demand values and used the mean absolute percentage error. In retail chains, the authors of [17] applied a forecasting (special calendar days as features) framework on a centralized food bakery to predict the product facility in 100 stores on a daily basis. In [18], the authors evaluated the customer engagement data for the telecom sector in terms of social demography and services used by the customer. Segmenting in similar customer traits and then predicting the expected behavior were the main concerns in their study. In [19], the authors applied a data mining approach on an electronic sales dataset (mixed B2B and B2C) for a complete online business process from customer analysis (by customer segmentation via k-means clustering), product analysis (by association mining via the Apriori method), and predicting customer behavior outcomes (by the decision tree method).

The authors implemented an ML model on time series data of B2C Amazon quarterly sales data in 2019 in [20]. After data transformation, three models were applied (Holt winters exponential smoothing, automotive regression integrated moving average (ARIMA), and ANN auto regression), and they compared the accuracy of the three models by evaluating the mean absolute percentage error (MAPE) and root mean squared error (RMSE), as well as other metrics for the implemented models. The seasonal ARIMA was best in the projection of revenue.

In [21], a comparison of two powerful ML algorithms (XGBoost and LSTM) to predict the sales for retail products is performed. The authors used online analytical processing (OLAP) methods to analyze the sales measures on a multi-dimensional data model which contained sales transactions, customers, and item information [22]. The outcome of their analysis was a web dashboard which showed the sales trends for the given dimensions.

In [23], the author implemented a model to manage chain stores via BI modeling and presented a dashboard to report to the managers and strategy makers for effective decision making. In [24], a data warehouse monitoring framework is presented in various applications for effective results and integration with management pipelines. In [25], the authors identified the main KPIs, such as the net income, rate of investment, details, equity, and gross margin, as well as their implementation with machine learning and presented the results on a dashboard.

As discussed above, the intelligent BI and AI mixture for a business provides a smoother platform for dealing with day-to-day operational management and making better decisions for a business. This research primarily focuses on business to consumers, in which the retail and wholesale types target business subdomains. This research will be conducted on utility product industries to add valuable business insights. The following are the core research questions that are going to be addressed in this article:


Retail business sales performance can be evaluated using BI techniques for efficient sales views in various business dimensions and effective strategies. A light and flexible warehouse schema needs to be designed by keeping in mind the business type, analytical needs, and transnational data produced. The main concerns of a retail business are to evaluate the sales so that the appropriate action can be taken. Therefore, the proposed framework is based on a hybrid cloud solution which mainly consists of warehousing business intelligence, which provides statistical analysis about the past and current views of a business and predictive artificial intelligence to forecast the business plans. The dateset used in this study was taken from the UCI repository, having 541,909 data entries of transactions by different customers from 2010 to 2011. By utilizing this data, BI techniques were applied to fulfill the analytical research objectives. Compared with existing techniques, this will provide a lighter weight, less errors, and a centralized and comprehensive view of

a business at a low cost and with fast access and dynamic business analysis. The leading business intelligence solutions are the cloud-based data pipelines with deep data analytics as well as data visualization.

The rest of the paper is formatted as follows. Section 2 describes the system model, while the results are discussed in Section 3. In Section 4, the conclusions and future works are discussed.

#### **2. Materials and Methods**

The business intelligence techniques under cloud computing and big data processing are the core ones to drive the insights from data. In recent years, various cloud services have offered making decision support systems from data collection to dynamic dashboards by Watson IBM, Qlik View, Microsoft Power BI, Tableau, Oracle NetSuite, Amazon Redshift, and Google Data Studio. The proposed methodology is based on a customer 360◦ view with four dimensions and an advanced business analytical dashboard to drive smart business insights. The core components are data sources, ETL, data consumers, a processing module, and an analytical dashboard. The figure below demonstrates the smart application framework, consisting of the following:


Finding the business insights from data is a systematic process which contains multiple layers and processes to drive the actionable results for business strategic development. Data are the core components, which are collected from operational management and point-of-sale systems, CRM systems, as well as social media marketing channels, which are combined into a centralized data repository. Figure 1 depicts the layers of the smart analytical system.

**Figure 1.** The business operation and analytical system mapper for understanding the actual business needs to feed into analytical system development.

Data sources are diverse and may produce different types of data stored in file formats, such as how department management system data are stored in SQL and Excel or CSV files which contain customer data, HR data, store data, financial data, sales, and marketing data from local sources. Customer data can also be collected from social media and mobile apps, which are used to engage and retain customers for a long time. These data are used in the warehousing and predictive data modeling after ETL processing. The output of this process is the refined form of data repositories, which are further used in data modeling for the descriptive diagnostic business intelligence (DD-BI) phase and PP-AI phase. This DD-BI phase uses statistical techniques to explore the data relations and interpret the results in the form of graphs. The PP-AI phase uses machine learning predictive and diagnostic techniques to predict the future outcomes with time, customer, and business demographics

graphs. The final phase is the dashboard analytical phase, which interprets the results in the form of dynamic visuals with the help of BI cloud solutions. The dashboard will give the smart insights of a business for each department as well for the business owners.

The process flow for the proposed research framework is given in Figure 2, showing the abstract flow of the analytical techniques. First, the sales data was collected from the UCI repository, and the initial data was preprocessed to clean it and remove duplicates and irrelevant data entries. For the comprehensive sales analysis, we derived some date and time features from the invoice dates. After that, we constructed a schema design and the ML's needed data. This involved two main components: one was machine learning, and the second was warehousing. The work for these two components was performed in parallel. For the DD-BI phase, one subprocess was compiled. With the help of power BI, the complete development was performed for data loading, relational design, and interactive data reporting. Similarly, for the predictive forecasting machine learning (FP-ML) phase, a second subprocess was compiled with Anaconda Jupyter Notebook to prepare the trained test datasets and the ML modeling and evaluation. The results were compiled on the dashboard. In the next subsections, the complete dataset description and all working components are described.

**Figure 2.** Sales analysis process flow to find the performance insights using warehousing and machine learning algorithms.

The proposed research model has two phases which are based on business intelligence data warehousing and machine learning solutions. To fulfill the research objectives, how these phases will sort out the issues and provide a comprehensive assistive solution for better and more friendly insights is shown in the technical flow diagram.

After applying this model, the final analytical report outcomes address the business objectives with graphical interpretation. In the figure, we can see the analytical solutions of hidden business insights.

A deeper view of the research framework in terms of development and its core techniques are discussed here. As the proposed methodology consists of BI and AI development with deep analytics, using big data analytics and web applications enabled us to present the quick results and effective business analytics for business professionals. The proposed research design provides the BI solution for a business to view the performance and employ friendly and accurate assistance to make future decisions in terms of the product pipeline, customer relationship management, and core business dimensions for strategy development to lead and manage a business in quick, productive, and efficient ways, as shown in Figure 3.

**Figure 3.** Proposed model's ML-BI technical flow for retail data analysis in 360 degrees to answer business analytical questions.

#### *2.1. Designing Knowledge Performance Indicators*

The data analytics process begins from business process framing and conversion into the smart metrics needed in performance evaluation, decision making, and strategy building. Designing KPIs is not a straightforward process, as it needs a thorough understanding of a busienss's set-up, sales channels, entire environment, and polices. As the dataset used in this study was taken from the UCI repository, having eight lakhs of data entries of transection by different customers from 2009 to 2011, by utilizing this data, business intelligence techniques were applied to fulfill the analytical research objectives.

Being a business stakeholder, one has to be concerned about business performance and expansion plans to compete in a market. This research focuses on business performance indicators and customer relationship management by targeting the retail business level. The following are the core business questions which explain the problem statement regarding this research proposal:


Business strategy and planning development, which are addressed in this proposal, are designed for B2C businesses. For business managers and owners, a proactive approach leads one's business on a high scale. Therefore, some problems are mentioned in retail and wholesale businesses in different departments to find the operational and planning efficiency and effectiveness.

To formalize the business problems into detailed analytical objective points, knowledge performance indicators are the best way to describe them easily. The ultimate objective of this study is to answer these KPIs by exploring the sales data and applying ML-BI techniques. These KPIs are dependent on given business information, as the less-studied data contains sales transnational data. Therefore, the following KPIs are the defined goals which will be met at the end of analytical experimentation and data result reporting:


These need to be used to deeply analyze the relationships of relational sales attributes to find the hidden insights so that improvements can be made in weak areas. For the hidden picture of business data, this is the best problem-framing technique for mapping the research objectives. In addition, these KPIs will be useful in the evaluation of the proposed analytical techniques.

#### *2.2. Dataset Description*

Data are the hub of analysis, and they can be in any file format, local or global source of data, and type of data such as text, sound, or visuals. The textual type of data is mainly used for analysis specifically in the banking sector and business industries. These data are produced everywhere with the help of web and mobile applications from the client and administration sides to complete the needed tasks. In this research, online retail sales transactional data will be used to conduct research experiments for B2C and B2B businesses. Most of the transactions were made by a UK-based, registered, non-store internet retailer between 1 December 2010 and 9 December 2011. In this multinational data collection process, 541,909 entries were collected from the following countries, given as percentages; the United Kingdom (88.9%), Germany (2.3%), France (2.1%), the EIRE (1.8%), Spain (0.6%), the Netherlands (0.6%), Belgium (0.5%), Switzerland (0.5%), Portugal (0.4%), and Australia (0.3%). The following data columns were used in the data collection phase and also described in Figure 4:


**Figure 4.** Data analysis of online store.

For the detailed dimensional analysis provided, the features were not enough, so we derived some more features from the date and time column to conduct overall fact measurements and paired-fact measurement analysis.

#### *2.3. BI Data Modeling*

Business intelligence-based application demands structured warehousing data models, which are used for the development of data marts, data lakes, and complete warehouses. Raw data are collected from multiple sources, which consist of different file formats such as SQL from operational systems, Excel, and Word reports from management reporting systems. In this study, online retail data are used to prepare a data model with the help of Python scripting to first prepare dimensional data in different Excel sheets and then prepare a warehousing schema with power BI. As shown in Figure 5, the proposed schema data model contained four dimensions and one fact table to measure the revenue, sold quantity, and number of orders.

**Figure 5.** Data warehousing analytical schema to support ad hoc queries and data reporting.

This is the dimensional star schema model which contains three business dimensions customers, product, and date and time—and one fact sales table connected with all dimensions. These are extremely helpful for ad hoc analysis and how these dimensions are having an impact on sales growth.

#### *2.4. ML Data Modeling*

Machine learning-based applications demand featured data models which are divided into training and testing data to train the ML model using supervised or unsupervised algorithms according to the data attributes and business goals. Usually, Excel, CSV, TSV, and SQL data files are used, from which features are extracted and set as dependent or independent variables for predicting forecasting. In this study, after collection and preprocessing, we prepared the data for unsupervised learning to create similar bought productand customer-based clusters on unlabeled (product category and customer category) data. Therefore, we first arranged the data for products sold and customer orders. To prepare the data for forecasting, we just picked two features from the given data, which were the invoice date and total sales.

Similarly, for customer classification, the customer-driven features were based on product selection and their participation in sales transactions. In the first segment, the products totaled five buckets, as in order transactions, and they were labeled as customer category buckets, which totaled 11, as well the customer minimum, maximum, mean, and sum values contributed in their sales orders. This customer oriented-data will be used in the FP-ML phase for customer classification.

#### 2.4.1. Descriptive Diagnostic Business Intelligence

The deeper analysis provides relevant and accurate insights about the data, which drives actionable results. DD-BI is the first parallel phase and highlights the past and current states of the data by exploration, detection, and association techniques. Experimenting with clustering and association techniques to explore the data by segmentation and dimensionality reduction highlights the customers, products, deviated sales, demand behavior, and aggregated business impact. This phase consists of descriptive and diagnostic analysis, which will be performed on BI data models. To describe the patterns in the data, initial visualization will be performed to know the associations, correlations, and deviations to find and identify the positive and negative factors.

Here, we describe the basic static data measures to overview the data values. The total unique customers, products, orders, canceled orders, quantity sold, and total revenue show the basic descriptive analysis. To diagnose the sales and their dependency on the product, customer, location, and date and time, multiple types of filters were implemented to view the measures in 360◦. These analytical reports were on the completely self-descriptive properties implemented with power BI. Stakeholders can easily find the sales with ondemand business questions and design the strategy based on these stats. This is the flow which shows how descriptive and diagnostic analysis will be performed to find the sales KPIs.

#### 2.4.2. Predictive Forecasting Machine Learning

Proactiveness is the key to attain success in the corporate world while managing any enterprise business to gain a competitive edge in the industry. FP-ML is the second parallel phase to view and decide in advance on the future outcomes which will be faced. This involves first segmenting the ordered products and customer segments based on product clusters with K-mean and principle component analysis. Clustering will be used to prepare the customer-oriented data that will lead to classifying the customers into various categories regarding sales contribution. This mechanism is explained in the ML data modeling section. Forecasting analysis will be performed using ML data models to forecast the performance, behavior, engagement, and demands related to the business domain. To view the future demands, a predictive ML technique by ensemble for ARIMA, forecasting models with Python scripting, and a power BI automated tool are used to determine what suitable action will be best in future to lead the business from the front in terms of decision support systems.

This flow shows how the patterned mining of products in sales orders and customer behavior prediction will be performed with K-means clustering and classifying customer categories with multiple multi-class enabled algorithms. To predict the sales in near future, ARIMA will be applied to find the sales trends.

This proposed methodology contains a BI-ML hybrid framework for effective sales analytics for online retail businesses. As our research objects mainly concern sales facts, which are needed for sales managers and owners to facilitate customers directly, this is dual process uncovers the sales insights from the past to the future as well as user-friendly interpretations. In the next section, the complete implementation is given, which shows both phases and each step of experimentation. As explained in Figure 6, the abstract view of the proposed model which uses BI and ML effectively to view the retail sales yields the KPIs with the help of interactive charts on retail data. In Figure 6C, the analytical BI model and ML model data are shown. Similarly, Figure 6A,B shows how dimensional OLAP analysis will be performed for the descriptive and diagnostic models. The results of these two phases are presented as reports. These reports are highly beneficial in performance analysis, strategy making, and inductive decisions. The real output from this analytical model provides a self-service BI solution for effective sales analysis for performance monitoring.

**Figure 6.** (**A**) Warehousing descriptive data analysis to know the past overall business performance. (**B**) Warehousing diagnostic data analysis to know the past business performance with dynamic filtration and feature importance. (**C**) Machine learning predictive data analysis to predict the future business performance with customer segments and sales forecasting.

#### **3. Results and Discussion**

As the proposed model contains two phases and subprocesses to develop the analytical system to evaluate business performance with possible aspects and determine the data features' importance and dependencies, they were first presented in ML clustering and classification model evaluation. These results were the final outputs of both the ML and BI phases after implementation. They describe the customer clustering (Figures 7 and 8) and classification algorithm evaluations, which show the ML algorithm's accuracy on the data, given in Table 1. Gradient boosting and the voting classifier performed better in the classification task. With the help of power BI, the implemented analytical reports described the data insights with and without smart filters to meet the business KPIs for the concerned research objectives. The following are the results, which show the performance of the hidden facts.

**Figure 7.** Clusters of chosen products by similar customers to show behavior in multiple orders.

**Figure 8.** Product-based clusters' difference in the results of principle component analysis to determine the variance of clusters.

**Table 1.** ML classification model scores.


Applying multiple variants of predictive algorithms to classify the customer behavior will help to see the similar customers and their buying trends.

On the other hand, RFM analysis was performed to find the best potential customers, which is very valuable technique when having low customer and sales figures. Recency is known as the number of days from the last purchase in the analysis data, the frequency is simply how frequent customers are purchasing a product, and monetary is the customer's total individual contribution to the overall sales revenue.

The customers having the lowest recency rate and high frequency and monitory values are the best potential customers which are important for sales revenue because in less time, they became the most frequent customers and yielded more revenue, as shown in Figure 9.


**Figure 9.** Customer recency, frequency, and monitoring values after RFM analysis and computing.

The forecasting results were better, forecasting a two-month sales forecast as shown in Figures 10–15. The sales daily trends, ARIMA modeling, fitted model stats, and forecasting values are displayed. Due to some shortcomings in data length, the results were not up to the mark but showed a clear view of the sales for the next two months in the results of daily seasonality.


**Figure 11.** Sales forecasting behavior summaries.

**Figure 12.** Sales forecasting model with summary visuals to show the model's training fit.

**Figure 13.** Sales forecasting model validation of existing dates to show the model's training fit.

**Figure 14.** ARIMA mean sales forecasting of two months to forecast future sales

**Figure 15.** Automatic forecast using power BI sales forecasting (with minimum, trend, and maximum values) for two months to forecast future sales.

The overall results were compiled on an interactive dashboard to show the data insights statically as well dynamic reporting with multiple ad hoc filters. Here, we only presented two reports for the sake of simplicity. There were 14 dynamic data reports are made which demonstrated the 360-degree view of sales.

We viewed the data patterns with the smart filtering approach presented with power BI data modeling and detailed data reporting to answer the business questions. We can see in Figure 16 the sales overview with respect to the location, customers, date and time, and product values. By applying multiple filters, we could view the data insights in 360◦.

Similarly, an extension the generic sales performance for the order characteristics, such as how many orders were placed in a specific price range of products, a comparison of revenue on weekends and working days, and having multiple options in the date and time hierarchy in terms of years, semesters, quarters, months, and weeks, is shown in Figure 17.

Customers always play vital role, and in this report, we had no rich features about the customers, so we could only explore customer IDs against the facts measured for multiple dimensional values.

The next angle was products which also had features with fewer dimensions, so we could just explore the product type (extracted from the description) for high and low products for other dimensions and against the measurements.

**Figure 16.** Sales performance overview with respect to highest revenue for products, date and time, and location.

**Figure 17.** Product analysis with respect to highest revenue for products, date and time, and location.

The most important dimension in this study was the time data, which gave more rich features (derived from the date) and had multiple scenarios for the evaluation of the facts. Using smart filters, the 360-degree analysis was possible for all of the data. We similarly extended the generic sales performance for order characteristics such as how many orders were placed within a specific price range of products, a comparison of revenue on the weekends and working days, and multiple options in the date and time hierarchy in terms of years, semesters, quarters, months, and weeks.

The order analysis for products and customers showed the quantity of products sold and sales revenue with the order characteristics. With smart filters, viewing the data with multiple factors was very informative and helpful to answering the short- and longterm performance-monitoring questions. The cancellation order analysis for products and customers was summarized. Quality results are totally dependent on the given data and applied analytical framework, as discussed above regarding the results for ML and BI. All the statistical graphs showed the hidden insights as well the analytical framework evaluation. Every section of the charts was cross-filtering enabled, which showed the business insights in a detailed manner. The interactive reports were summarized on a sales dashboard for specific dimensions and are shareable with associated sales persons and stakeholders.

#### **4. Conclusions**

Business intelligence is a growing technology that uses data and computational analytical techniques to find business insights. Business owners and management are facing data management, customer relationship management, customer experience, departmental business data, operational system quality, and KPI issues to smartly manage their businesses and gain competitive advantages. The proposed framework is based on a hybrid solution that mainly consists of warehousing and ML empowered with BI data reporting, which provides statistical, ranking, and dimensional analysis of the past and current views of a business and a predictive view to forecasting a business's performance. These analytical sales reports are user-friendly, easily shareable across departments, and very useful in performance monitoring and for designing the best strategies. Users can utilize smart filters to obtain fully dynamic and versatile business insights that can truly answer analytical questions. This is purely a self-service business intelligence model implementation with ML and effective data reporting. The overall process is very efficient from data loading to processing, schema design, pattern computation, and data reporting, as well sharing with stakeholders to view the performance quickly. The smart framework comprises the sales metrics with respect to different dimensions that are very helpful for designing business strategies. Compared with existing techniques, this will provide a lightweight, centralized, and comprehensive view of a business with less errors, a low cost, and fast access with dynamic business analysis.

#### *Future Work*

As the dataset under consideration belongs to online retail, which has customer-based features, no shipping or payment information was given in the detailed product information, and any supplier information was not provided. The proposed approach will give more interesting and needed insights if the data are enriched with more detailed information. This study can be further extended with more inter-department and external source data such as product reviews and demands to evaluate the overall growth, dependent areas, and future goals and strategies.

**Author Contributions:** Conceptualization, M.S.A.; formal analysis, H.A.A., S.R. and M.I.; investigation, A.A., J.F., A.S., S.R. and M.I.; methodology, M.S.A., J.F. and A.S.; project administration, A.A., S.M.A., S.R. and M.I.; resources, A.A., H.A.A., S.R., S.M.A. and M.I.; software, M.S.A. and A.S.; supervision, H.A.A.; validation, J.F. and S.M.A.; visualization, A.A. and H.A.A.; writing—original draft, M.S.A. and J.F.; writing—review and editing, A.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Deanship of Scientific Research at Najran University and the Kingdom of Saudi Arabia for funding this work under the research group funding program, grant code number NU/RG/SERC/11/3.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data and code presented in this study are available on http://www. kaswa.ai/.

**Acknowledgments:** The authors acknowledge the support from the Deanship of Scientific Research at Najran University and the Kingdom of Saudi Arabia for funding this work under the research group funding program, grant code number NU/RG/SERC/11/3.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


### *Article* **Biometrics Innovation and Payment Sector Perception**

**Barbara Mróz-Gorgo ´n 1,\*, Wojciech Wodo 2, Anna Andrych 1, Katarzyna Caban-Piaskowska <sup>3</sup> and Cyprian Kozyra <sup>4</sup>**


**Abstract:** This paper presents an analysis of innovations in the biometrics market, which have started to play a very important role in personal identification and identification systems. The aim of the study was to analyze current customs and opinions regarding payment methods, as well as to identify threats and opportunities for new biometric solutions in this area. First, the history of the biometrics market is presented. Acceptance patterns of new technologies are explored and modified. The authors used literature reviews, qualitative research (focus groups), and quantitative research (questionnaire survey) as methods. The main value and importance of biometrics is the uniqueness of biometric patterns (e.g., face, fingerprint, iris, etc.), which takes the security of these systems to a new level. The results of the quantitative study based on the qualitative survey show positive verification of the hypothesized reasons; e.g., importantly, that the age of potential users of biometric payments influences the fear about personal data. Fear of losing personal data affects the perceived safety of biometric payments. Perceived security has a very strong influence on attitudes towards biometric payments, which is the strongest predictor of behavioral intention to use biometric payments.

**Keywords:** market research; biometrics; security; biometrics market; biometrics perception; structural equation modeling

#### **1. Introduction**

#### *1.1. Introduction to Biometrics*

Biometrics is the field that deals with measuring the characteristics of living organisms, including humans of course, but also animals and plants. At the basis of the application of biometrics in the field of authentication and identification lies a number of characteristics of living organisms, which allow the unambiguous distinction of an individual against the population. When choosing a particular biometric trait, we should be guided by the following criteria:


Biometric traits can be divided into physical/physiological and behavioral ones. Relating to the structure of individual parts of the body, e.g., iris patterns of the eye, the shape of a hand or ear, a fingerprint, face geometry or the shape of our veins, are the physiological features. Behavioral traits are developed and established during the maturation process of the individual and are related to their behavior, e.g., the way they walk, brain wave P300 (since brain waves have been proved to be unique enough across individuals to be used as

**Citation:** Mróz-Gorgo ´n, B.; Wodo, W.; Andrych, A.; Caban-Piaskowska, K.; Kozyra, C. Biometrics Innovation and Payment Sector Perception. *Sustainability* **2022**, *14*, 9424. https:// doi.org/10.3390/su14159424

Academic Editors: Adam Jabło ´nski, Marek Jabło ´nski and Dariusz Zarzecki

Received: 15 April 2022 Accepted: 13 July 2022 Published: 1 August 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

biometrics), their handwritten signature (keystroking), or the characteristics of their voice (although sometimes the dual nature of this biometric is referred to). An example of a combination of both types of traits can be found in Figure 1.

**Figure 1.** Physiological (**left**) and behavioral (**right**) biometric traits (adapted from [1]).

Biometric security in the modern sense was born in the 19th century, thanks to the innovations of administrators, anthropologists, and French detectives [2]. The first known research publication on automated biometric recognition was Mitchell Trauring's 1963 paper [3] in the journal *Nature* on fingerprint matching. The development of automated biometric systems based on other features, such as voice, face, and signature, also began in the 1960s. Subsequently, biometric systems based on features such as hand geometry and iris were developed. In this sense, almost 70 years have passed since the first paper on automated biometric recognition was published [4]. In Figure 2 there is a timeline of the development of the fingerprint; other biometric traits were investigated in quite similar timeframes in the 19th century.

**Figure 2.** Some major milestones in the history of fingerprint recognition (see [4]).

Each biometric trait has its advantages, disadvantages, and limitations and these should be considered while selecting it for a given usage scenario. Due to the aforementioned characteristics of biometric traits, especially uniqueness, time invariability, and unambiguity, they should be considered as sensitive data and protected on multiple levels. The theft of raw biometric data can be used by an adversary to impersonate the victim and consequently gain unauthorized access or commit theft. The European Union, in its General

Data Protection Regulation (GDPR), has addressed this issue and defined biometric data as particularly sensitive and has assigned to it the need for special protection [5]. Keeping the above in mind, raw biometric data should be treated with special care, which means primarily reducing its processing to a minimum. A fundamental solution to this issue is the use of so-called one-way processing [6], which generates from a sample of raw biometric data a template/profile/code that is an imprint of that data. Such a code still has individual properties and can be compared with others, so the idea of using biometrics is preserved. However, it is not possible to reconstruct the original biometric data only on its basis. This approach also protects the user in case the code database is leaked, because its use in a transformed form is negligible—similar to the case of password hashes. An example of iris encoding according to Daugman's solution [7] can be found in Figure 3.

**Figure 3.** Iris code and accompanying iris pattern. Source: https://www.cl.cam.ac.uk/~jgd1000/ iris\_recognition.html (accessed on 14 April 2022), based on [7].

Referring to palmprints, biometric template protection methods, such as cancelable biometrics and a biometric cryptosystem, are essential to avoid direct disclosure of original palmprint features [8]. To strengthen user security, an approach based on so-called *cancelable biometrics* can also be used [9] in different types of biometrics. This solution is based on the ability to create multiple biometric identities from a single source data and to manage these identities. In a particular case, we can delete a given biometric identity and create another one based on its raw data. Such an application can be particularly useful when the database of the system in which the identity was used has been leaked or when we have a reasonable suspicion that someone is trying to impersonate that identity. When using the approach of cancelable biometrics, we usually need an additional source of external information beyond the biometrics itself so that we can modify the identity accordingly and combine it with biometric data. Such information can be, for example, a string of characters entered by the user, such as a password or PIN. Another existing method of template protection (used in palmprint biometrics) is a *palmprint cryptosystem*, which is the merging of biometrics and cryptography, which attempts to deploy biometrics as the authenticator of cryptographic applications, in which biometric features are claimed to be protected [8].

No matter how much effort is put into securing biometric data, at some point an adversary will manage to gain possession of it and try to use it to impersonate a legitimate user. This type of action is called a presentation attack, and the defense is presentation attack detection (PAD).

To counter such forgery attempts, liveness tests [10] are used, which are the basis of biometrics applications. A general scheme of how a biometrics-based security system works is shown in Figure 4. Liveness tests are designed to detect attempts to replace artificial objects—simulating real biometric features, including stolen biometrics converted into fake samples—photos, masks, recordings, etc. Any security system based on biometrics should place great emphasis on liveness testing and enforce high detection of substitution attacks. However, this is not an easy task as adversaries have increasingly more modern methods of defrauding at their disposal and are often extremely determined in demonstrating system weaknesses.

**Figure 4.** Biometric enrollment and verification. The enrollment phase produces an association between a biometric characteristic and its identity. In the verification phase, an enrolled user claims an identity, which the system verifies on the basis of the user's biometric feature set [10].

Applications of biometric solutions can be found in many fields, the most noteworthy being:


Some of the abovementioned applications have already been addressed earlier and will be described in more detail later in this article. However, now we would like to focus on the last two applications that undeniably distinguish biometrics from other security solutions.

Biometrics, in contrast to standard authentication or authorization mechanisms (e.g., based on the knowledge factor–password or PIN or based on the possession factor– hardware token) allows the introduction of continuous verification of the identity of the person using the system [11]. Thanks to the transparent collection of such biometric data as in the rhythm of typing (keystroking), the way a touch screen is used, or the continuous analysis of the image from a camera, the system is able to verify, on an ongoing basis, whether an authorized person is still using its resources [12]. None of the standard mechanisms based on passwords or hardware keys provides such ease and efficiency of real-time verification while maintaining a high level of usability.

The second specific application of biometrics mentioned is the ability to create a biometric link between the person to whom the identity document has been issued and the person who is currently using it [13]. This is another layer of security that allows us to link an individual to a specific identity document through biometric characteristics. Without the biometric layer, we would only be able to verify that the data inside the chip of the identity document is consistent with the printed data or manually confirm the similarity of the photo with the person who holds the identity document. By using biometrics, we can not only verify the integrity of the data in the physical and electronic layers, but also apply an algorithm that authenticates biometric features and compares the biometrics of the person holding the document with the pattern recorded inside the electronic layer of the document. This solution will significantly reduce the effectiveness of counterfeiting identity documents and the use of documents by people to whom the documents have not been issued. Finally, let us look at iris biometry. The first smartphone with an LED, which allows for the scanning of the iris of the eye, has appeared on the market (so far only in Japan) as a form of camera protection. The Arrows NX F-04G phone by Fujitsu is sold by the Japanese telecommunication operator DOCOMO. The innovative LED is manufactured by OSRAM [14].

#### *1.2. Market of Biometrics*

Market forecasts indicate that the biometric systems market will be worth nearly USD 33 billion by 2022 [15]. The global biometrics market value will rise from USD 33 billion in 2019 to USD 65.3 billion in 2024 [16]. According to analyst firm Global Markets In-sights, the biometrics-based security solutions market will be worth USD 50 billion by 2024 [17]. From a total value of USD 23.4 billion in 2018, the global biometrics technology market is expected to reach USD 71.6 billion by 2024 [18]. By 2024, healthcare applications will register a compound annual growth rate (CAGR) of 26.3%, airport and seaport applications 25.8%, financial services 25.1%, and government services 23.3%. Retail, gaming, and hospitality applications will also see CAGR growth of 23% and 22.8%, respectively [18]. For information on stocks of companies using biometrics, see [19] (p. 3). Figure 5 represents the rise of worldwide biometric technologies market (Figure 5) [20].

The prominent key players in the biometric system industry are: SA (France), NEC Corporation (Japan), Fujitsu Ltd. (Japan), BIO-Key International, Inc. (U.S.), Precise Biometrics AB (Sweden), Secunet Security Networks AG (Germany), Thales SA (France), Aware, Inc. (U.S.), Cognitec Systems GmbH (Germany), and Cross Match Technologies (U.S.), among others [21].

Nearly six in ten people polled in the United States cited some hesitation or concern with biometric authentication. The top concerns among those polled by Statista included concerns about data and that the technology is too easy to fool. Biometric authentication includes fingerprints, face recognition, iris scanners, and any voice recognition. The government end-user sector is currently leading the market. It captured a significant market share of around 48% in 2018 [22]. North America leads the global biometrics market in 2018 with a market share of around 30%, which is followed by Asia-Pacific and Europe. In North America, the requirements of biometrics are higher because of the rise in the demand for developed security precautions and tourist administration after the 9/11 attacks. In North America, the biometrics market has witnessed strong growth over the years, especially in law enforcement, forensics, and government activities. Biometric passports became compulsory for issuance of foreign passports as of 2016 in the U.S. One of the strongest laws regarding biometrics exists in Illinois, named the Biometric Information Privacy Act (BIPA), which forbids companies from collecting information without prior consent from an individual. The Asia-Pacific is also a speedily evolving region. The existence of developing economies such as India, China, South Korea, Japan, Malaysia, Singapore, and Australia, which are displaying increased acceptance of biometrics, is no

doubt driving market growth [23]. Compound annual growth rate (CAGR) of the global biometric market is shown in Figure 6.

**Figure 5.** Worldwide biometric technologies market, source: [20].

**Figure 6.** Biometric market growth, source: [24].

#### *1.3. Biometrics in Poland*

The development of biometric technologies and popularization of their use in Poland has been very dynamic, as shown by numerous studies and reports. Poland is a receptive market for technological innovations, where large numbers of companies implement and experiment with the latest applications of biometrics.

A study by Visa [25] shows that as much as 62% of the surveyed Polish consumers declared their willingness to use biometrics instead of a password to verify payments. Consumers in Poland appreciate that thanks to biometrics, they do not have to remember passwords or codes: 73% of the surveyed consumers thought that using biometric data was faster than passwords and 76% thought it was easier, while 92% of Polish consumers

who took part in the survey believed that fingerprint recognition was the safest form of payment authentication. According to the report *Global TMT Predictions 2018* [15] prepared by Deloitte, by the end of the year 29% of smartphone users will verify their identity by fingerprint, and 42% of all mobile devices will be equipped with a fingerprint reader. Currently, one in four (25%) participants of the Mastercard 2019 survey [26] uses biometrics (not only for authentication and payment), but if they had the option, they would use it every second (50%). Forty-seven percent of Polish e-consumers would prefer biometric authentication for card payments, both online and in physical stores. Those who indicated preference for this method would most willingly use the technology of fingerprint recognition, iris scan, voice recognition, or facial features analysis. Confirmation of a transaction with a fingerprint was considered safe by every fourth respondent (75%) in the Mastercard study. This was a higher result than in the case of one-time codes (66%). More than half of the respondents considered the technology of facial features recognition (54%) or other kinds of biometrics (53%) was safe.

The Polish payment market is developing very dynamically and is one of the most modern in Europe. In general, it is a society that relies on cashless payments [27].

A survey conducted by Ping Identity [17] shows that 92% of companies believe that the use of biometric methods is a very effective way to authenticate people and increase the security of data stored in company resources. It found 86% of the respondents thought that biometrics allowed for good security as access to information is stored in the cloud. However, currently only 28% of companies use biometric systems in their local infrastructure and 22% use them to secure access to applications and data stored in the cloud.

In 2019 MasterCard conducted research in the context of Polish consumers' attitudes towards online shopping, taking into account upcoming changes in e-commerce payments. The result of this work was a report called "Safe e-commerce". The authors prove that biometrics will become a standard for confirming identity in payments. Moreover, more than 75% of the respondents believed that strong online card payment authentication, which came into force in mid-September 2019, was needed, which clearly sets a new trend in banking [28].

Therefore, there is a need to popularize authorization mechanisms such as biometrics, which are a convenient and effective way to confirm identity [28]. Similar conclusions were drawn in the results of research conducted by MasterCard in 2019.

At the same time, the authors of [28] observed that technologies such as biometrics, eidentity services or cashless payments are not something extraordinary for the respondents, but rather a desirable direction of development and providing greater convenience and usability of digital banking systems. The respondents showed great confidence in financial institutions and entrusted their data and money to them, so it is up to them to ensure the highest possible protection of consumers' identity and accumulated assets.

PayEye (https://payeye.com/, accessed on 2 October 2020) is a Polish fintech that introduced the world's first such secure, convenient, and complete payment based system, for both payment acceptance and user identification, using iris biometrics. By combining technology with science, PayEye has created a whole, independent, and secure ecosystem, which consists of proprietary, innovative eyePOS terminals, an electronic wallet for users, algorithms which convert the iris into a biometric pattern and, in the future, also solutions for e-commerce.

#### *1.4. Literature Review of Technology Acceptance Models*

As mentioned previously in the paper, biometrics play a crucial role in many innovative systems. Each innovation is subject to the implementation process and as a result is or is not accepted by market participants. Many researchers emphasize that diffusion is a social process that occurs among people in response to learning about an innovation such as a new evidence-based approach for extending or improving health care. In its classical formulation, diffusion involves an innovation that is communicated through certain channels over time among the members of a social system [29,30]. Market practice aspects and research paradigms known as the diffusion of innovation (DOI) [31] can be applicable into the complex context of biometrics identification processes and its usage in payment systems.

Innovative biometric systems that incorporate biometric payments are rapidly becoming an important part of information technology (IT) and information systems (IS). The literature indicates that biometrics is becoming the standard of modern life, as commercial and governmental entities are rapidly adopting technology that promises increased security and better identification [32] (p. 314). Theoretical frameworks for technology acceptance are IS theories that model how users accept and use a particular technology. These theories suggest that when users are introduced to a new technology, many factors influence their decision about how and when they will use it [33].

It has been noted in the literature that the acceptance and use of information technology has been one of the priority issues in the research of information systems and practice since the late 1980s [34,35]. Building on the theory of reasoned action (TRA) formulated earlier by Fishbein and Ajzen [36], Davis [37] developed the technology acceptance model (TAM) and introduced it to the IS field. TRA has its roots in social psychology and attempts to explain why individuals engage in consciously intended behavior. In TAM, a user's motivation to adopt a new technology can be explained by three constructs: perceived ease of use (PEU), perceived usefulness (PU), and attitude towards using the system [38].

IS and IT are becoming increasingly complex and crucial for business operations, thus making the issue of acceptance an important challenge in IT implementation [33]. Many models and theories have been introduced that examine the acceptance and use of information systems from past to present. The unified theory of acceptance and use of technology (UTAUT) is a model that explains the use of technology by 70% of society. It is also used to estimate the probability of success of a new technology and to evaluate the adoption of various technologies [39,40].

In the study of Venkatesh et al. [39], UTAUT comprises of four main factors. These are performance expectancy, social influence, effort expectancy, and facilitating conditions. In addition, UTAUT includes four intermediate individual variables, gender, age, experience, and voluntariness of use, which predict the relationship between primary factors and behavioral intention and use behavior. According to UTAUT, there are determining factors that directly affect intention or use in models combined within the UTAUT framework. These determining factors are called performance expectancy (PE), social influence (SI), effort expectancy (EE), and facilitating conditions (FC). According to the literature review, the FC are empirically identified as the direct determinant of adopting the behavior. These factors play a prominent role as direct determinants of user acceptance and usage behavior [40].

Part of this complexity of the acceptance issue in biometrics, especially in the context of payments, is the issue of security and privacy. Langenderfer and Linnhoff [32] in their work analyze the costs and shed light on how biometrics can negatively affect consumers. The authors point out that the rapid development of biometric authentication technology represents a double-edged sword for consumers. On the one hand, increased use of biometrics is likely to reduce identity theft, improve consumer convenience by eliminating or reducing the use of passwords, and lower prices by reducing fraud costs for retailers. On the other hand, while overall security is likely to be enhanced, security breaches will be more costly and require significantly more effort to remedy.

The level of security perception in the context of biometrics, as a matter of the individual characteristics, is strongly connected with the privacy issue. A wealth of existing theoretical work has suggested that privacy levels, along with privacy perceptions, regulation behaviors, and information disclosure, are inherently context-dependent and vary across situations [41,42]. As Masur [41] (p. 312) points out, "privacy is a subjective perception resulting from the characteristics of the environment in which an individual happens to be at a given time".

It is also important to emphasize that research in IS has investigated the differences in levels of privacy concerns and their impact on a number of dependent variables such as willingness to provide information and intention to transact online [42–44]. Smith et al. [45], in their interdisciplinary review of privacy research, summarized existing privacy research into the antecedent–privacy concern–outcome (APCO) framework of information privacy, with privacy concerns as the central element, accompanied by antecedents and outcomes. Scientists also suggest that further research on the identification of the factors that contribute to privacy concerns is essential.

Several antecedents of privacy concerns have been found by Li [46] in the process of systematically reviewing existing empirical studies on privacy. The list of factors contains: (a) individual factors (demographics, personality traits, knowledge and experience, selfefficacy), (b) social factors (e.g., social norms), (c) organizational factors (privacy policies, website informativeness, company reputation), (d) macro-environmental factors (culture, regulatory structures), and (e) information contingencies (information sensitivity, type of information) [43,46,47]. Li [46] points out that for some factors (e.g., privacy experiences having a positive impact on privacy concerns), results have been cross-validated across studies, while for others (e.g., internet use and fluency and the big five personality traits), results have been inconsistent. Therefore, the researchers indicate that it is essential to conduct further research to examine the impact of different antecedents on privacy concerns [42].

Drawing on elements of DOI, the technology acceptance model (TAM), and a unified theory of acceptance and use of technology (UTAUT) along with the trust−privacy research field, Miltgen et al. [33] proposed an integrated approach that is both theoretically and empirically grounded. Their study examines individual acceptance of biometric identification techniques in a voluntary environment, measuring the intention to accept and further recommend the technology resulting from a carefully selected set of variables (Figure 7).

Research [33] confirms that the influence of known technology acceptance variables, such as compatibility, perceived usefulness, and facilitating conditions, on the acceptance of biometric systems and subsequent recommendations. Second, antecedent factors such as privacy concerns, trust in technology, and innovation also prove to be influential. Third, if not innovation, the most important factors explaining the acceptance and recommendation of biometric systems do not come from traditional adoption models (TAM, DOI, and UTAUT) but from the trust and privacy literature (trust in technology and perceived risk).

Miltgen et al. [33] in their paper pointed out that there are many other external factors that may influence responses that should be considered and investigated in the future, such as: 'security perceptions of users of biometric systems', 'consumer characteristics', 'situational factors', 'product characteristics', and 'previous experiences'. The authors suggest that additional future research should investigate these 'other' factors and their impact on consumers' behavioral intentions to accept new technologies in general and biometrics in particular.

On the basis of literature studies (both scientific literature and a review of journals, magazines, market reports), a research gap was identified. This gap concerns the need for further exploration of consumer attitudes towards biometrics, with particular emphasis on the use of iris biometrics in payment systems.

**Figure 7.** Determinants of end-user acceptance of biometrics, integrated approach model—see [33] (p. 106).

#### **2. Materials and Methods**

#### *2.1. Methodology of the Qualitative Research*

After reviewing the literature, we decided to use a qualitative research method to explore the research topic from the perspective of a defined scientific gap.

The aim of the study was to analyze current customs and opinions regarding payment methods, as well as to identify threats and opportunities for new biometric solutions in this area. Based on the process of studying literature, as well as the author's own observations regarding the biometric market and its participants, we were able to formulate research questions:


As a method of qualitative studies, we chose the FGI (focus group interview)—a well-known technique for collecting data in social sciences, which consists of conducting collective in-depth (semi-structured) intelligence in groups of 4–9 people (depending on the research area and organizational capabilities). We decided to focus on the newest form of biometric usage, which is iris-based biometrics (for now only available in Poland as a pilot project). We decided to choose this particular case because of its innovative nature and the effect of fresh opinions. A total of 4 focus groups meetings were conducted (November−December 2019), the shortest of which lasted 2.5 h and the longest almost 4 h. The participants were of different age groups, from different localities (both residents of small towns and big cities), and of different ages (the youngest was 17 and the oldest 74).

Focus I—4 women and 5 men predominate in focus groups; Focus II—3 women and 3 men;

Focus III—5 women; Focus IV—4 women and 3 men.

In conclusion, the gender differences in the focus studies conducted were as follows: 16 women and 11 men, for a total of 27 people. They were from different professions and lifestyles—high school students, housewives, secretarial managers, accountants, office workers (4 participants), sales, lawyers, entrepreneurs (3 participants were from businesses). All groups were surveyed without showing any type of biometric devices (no biometric payment tools).

According to the focus group participants, there were no apparent flaws in the current payment system (notably, this generation uses a card and a smartphone). There was enthusiasm about using smartphones as a payment option, especially among the younger generation. All participants stressed that biometric solutions are associated with risks of data leakage, health risks, "data hacking", etc. The visualization of the most frequent word used in relation to eye biometrics during the focus research, marked by emotion, was definitely the word "fear".

After the focus study, without the demonstration of a biometric device (iris-based), two focus studies with the demonstration of an identification device were conducted. After observing the first set of focus groups, we decided to divide the subsequent groups according to gender categories, as we noticed that due to the stereotype of a wider knowledge of technological issues in the male group, women kept their opinions to themselves and were not very open to sharing their thoughts. Due to this, the first meeting was a women-only meeting. These women represented different ages and professional categories:


The age gap of the participants of the study (focus group) was 21–56 years old.

The first phase of the study looked similar to previous audio meetings: the issues of making payments, stressors, annoying and unpleasant payments were discussed (mainly queue times, breathing on the back of other customers, and lack of hygiene, whether using cash or card, "system jams", "internet crashes", and similar problems).

Participants declared that they mostly paid by card (plastic or smartphone) or cash. Another element of discussion was biometrics and their approaches to the use of biometric solutions in payments. In this area, there was a fairly strong element of doubt about biometric solutions, with survey participants associating them with gaining access to their accounts, and therefore lack of security surveillance, the possibility of copying fingerprints, and more than half of the survey participants stated (which was also important in this part) that they did not use biometric features at all and usually paid by card or cash, and used a code to identify themselves on their laptop or smartphone.

In the next part of the study, biometric equipment was presented. After a brief presentation, participants were encouraged to "encode" their eye on the device, which they were rather reluctant to do and even stressed about whether something could happen to their eyes.

In the next round, participants were able to check how the equipment recognized their iris. This step was welcomed more positively, although it must be said it was not met with great enthusiasm.

The final element was a discussion of other possible uses of the equipment. It is worth noting that participants strongly emphasized the health aspect—concerns were raised about the impact of eye scanning on health.

After the study, which took place with a female group, only male participants were intentionally invited to the second meeting. The format of the second focus group (except for different participants) was identical to the format of the stages (including the presentation of the device) in the female group. A total of eight participants, aged 21–54, representing different professions and industries took part in the second study:


There were noticeable differences in the observations in this male group compared to the female group, the most important being:


#### *2.2. Methodology of the Quantitative Research*

A survey was conducted in order to verify and extend the main results of the qualitative research and the observed dependencies. A questionnaire was prepared during a brainstorming session based on the results of qualitative research and the authors' own observations. A pilot survey was used to verify this model. The sample size was planned for a minimum of two hundred respondents, and social platforms were selected as the distribution channel for the questionnaire. For the pilot study, the respondents were not selected randomly, but the aim was to find representatives of working age for the sample. A list of analyzed items in the questionnaire is presented in the Appendix A of Table A1.

The statistical analysis was based on measures of dependence between the variables: mainly on the correlation matrix, and also on cross tables, e.g., for binary variables. Exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) were used for the groups of questionnaire items, and for the variables grouped by EFA, a reliability analysis was performed using the Cronbach's alpha coefficient. Dependencies between constructs based on a linear correlation matrix were analyzed using path analysis within a structural equation model (SEM) to verify the significance of direct and indirect causality.

Due to the indicated research area, on which the research presented in this article focuses (biometric innovations in the payment sector), the model of hypothetical dependencies was based on the analysis of literature studies, including the presented UTAUT model, as well as qualitative research conducted by researchers. Such research procedures made it possible to derive and consequently define a new, proprietary model of hypothetical dependencies, which is presented in Figure 8. Our own observations and conducted qualitative research on the perception of biometric payments using the iris of the eye (BP) allowed us to prepare a set of hypotheses that lead to the acceptance model of BP verified in this study. The observation about the strong influence of consumer age on BP acceptance is due to the fact that age is the main exogenous variable in the first stages of diffusion of this type of innovation. Thus, age is not only an impact moderator, but also a primary predictor. The main target variable is the behavioral intention to accept and use BP (but not yet recommend BP due to the novelty of the technology), as described in the literature models.

**Figure 8.** A Model of the verified hypotheses of dependencies and characteristics measured in qualitative research: model of behavioral intention to accept BP.

During the process of discussion, we formulated direct causes between variables, marked with arrows in Figure 8. The directions of the relationships between the variables of the model are presented in Table 1 together with the content of the hypotheses.

**Table 1.** Set of hypotheses in verified model.



**Table 1.** *Cont.*

The measurements of variables were constructed from questions presented in the Appendix A of Table A1. Predicted behavioral intention to accept and to use BP was measured as the sum of binary items numbered 18 and 19a–19d from Table A1. Three variables are measured with only one question:


The three variables from the UTAUT model (perceived ease of use, perceived usefulness, and facilitating conditions) were considered during the preparation of the questionnaire as combining into a single latent variable called perceived use and facilitating conditions, measured as the sum of items 21–26 from Table A1. Importance of innovative payments was measured by the sum of items 3 and 4; not only biometric payments in their initial use in society (in real shops), but also mobile payments (e.g., codes generated in banking applications used mainly online in online shops), contrasting with traditional cash payments and the very popular card payments. Fear about personal data was measured as the sum of items 10–12 from Table A1 and is understood as concern about data privacy from Figure 7. Fear of barriers in life was measured as the sum of items 13–17, understood as social problems (the inverse of social influence from the UTAUT model or social incompatibility used as inverse in the DOI theory shown in Figure 7). Knowledge and experience was measured as the sum of binary items 5 and 7a–7d and shows respondents' acquired knowledge of BP. By constructing the model shown in Figure 8, discussing what is cause and what is effect, the direction of the hypotheses shown in Table 1 was decided. Comparing Figure 8 with the model shown in Figure 7, the only thing missing from the

model in Figure 8 is the perceived innovativeness of biometrics, but this was evident for payments. Statistical software Statistica 13.3 [48] was used to analyze the collected survey data to assess the measurement of latent variables and to verify the hypotheses.

#### **3. Results**

#### *3.1. Qualitative Research*

From a consumer's perspective, biometric authentication offers many advantages. Once enrolled in a biometric system, the customer is instantly untroubled by the fraudulent use of their credit cards. Payments can be easily made without carrying any cash or other forms of identifiers and in this case the only thing required is their fingerprints. They can be certain that if their car or computer is stolen, it will be worthless to all except the most sophisticated thieves since access is biometrically controlled; in consequence this leads to a decrease in the impetus for theft. The bothersome task of remembering passwords could be considered a thing of the past.

As the general conclusions of the preliminary qualitative studies carried out, we point out the skeptical approach to biometric solutions in payment systems.

It is necessary in the future to indicate market concerns about novelties, but to seek innovation to assuage numerous consumer concerns about the introduction of biometric applications with important implications for marketing communication associations with nature, simplicity, and naturalness. There is a need to promote biometric solutions in the educational form.

#### *3.2. Quantitative Research*

Continuing with the qualitative research, results were collected from the questionnaire. Most analyses (EFA, CFA, reliability analysis, SEM) were based on a linear correlation matrix where most values are significantly different from zero (hard to publish because of matrix size, but available upon request from the corresponding author). The number of observations was 200, which is not a very large number to verify such complex hypothesis models. The age of the respondents was the only objective variable analyzed, the others being subjective or behavioral variables. The mean age was 28.7 years, and the standard deviation was 9.2. The respondents were rather young people: the minimum age was 17 years, and the maximum was 61 years, so the sample included representatives of almost the entire working age range.

Factor analysis of variables with Likert-type response scales confirmed the three measurement scales for fear concerning personal data, fear of barriers in life, and perceived use and facilitating conditions developed from the questionnaire items in Table A1; e.g., the EFA scree plot reduced the dimensions to three and the CFA fit was rather good.

Following the strict methodology of Song et al. [49–52], before proceeding to testing the hypotheses H1–H9, we checked the reliability of scales and measurement items.

#### *3.3. Reliability and Validity*

As a measure of reliability, that is, the internal consistency of the measurement items of the survey, we used Cronbach's alphas, as given in Table 2. All the values were above the threshold 0.7 (the minimum value was 0.830); that is, the scale may be regarded as reliable.


**Table 2.** The internal consistency of the measurement items.

We also investigated the correspondence between the constructs and their operationalization. This constitutes four components: analysis of unidimensionality, convergent validity, discriminant validity, and nomological validity.

To investigate the unidimensionality of the scale, we performed CFA to examine whether the indicators were assigned to the constructs adequately. Using the maximum likelihood method of estimation, we obtained a satisfactory result, including fit indices, presented in Table 3. As chi-square/d.f., it should range between 1 and 5, so our result fitted well within that. GFI and AGFI should exceed 0.9; the latter in our research was a bit below this threshold. RMSEA should range between 0.05 and 0.08, and the value for our research fitted well. SRMR, which should be below 0.08, equaled 0.054 and was below the threshold. Incremental fit indices, NFI (normed fit index), IFI (incremental fit index), TLI (Tucker–Lewis index), and CFI (comparative fit index) were above the threshold of 0.9 apart from NFI, which is slightly below. Overall, we regard the result of this investigation as satisfactory to accept the unidimensionality of the scale.


As for convergent validity, which we present in Table 4, standardized factor loadings should be above a 0.5 threshold, and all were well above this (the minimum was equal to 0.678), while AVE (average variance expected) should also be above 0.5, and within our research all values were above this. Thus, we may conclude that the convergent validity is acceptable.

**Table 4.** Correlation matrix of the constructs.


Note: \*\*\* *p* < 0.001.

For discriminant validity, we calculated the correlation coefficients between constructs (presented in Table 4). Squares of those values should not exceed the minimum AVE. The only statistically significant coefficient of correlation, 0.456, was low enough and its square (0.208) was much lower than the minimum AVE (0.569). Thus, discriminant validity is satisfactory.

Nomological validity refers to possible collinearity and mutual dependencies of constructs. As the highest correlation coefficient was not too high, we did not expect this effect; still, we calculated variance inflation factors (VIF) to check whether they were below the commonly used threshold of 10. All values were well below it (between 1 and 2); thus, we conclude that the nomological validity of our research is acceptable.

The reliability coefficient for behavioral intention to accept and to use BP was 0.719 and was sufficient—the Kuder-Richardson Formula 20 (KR-20) for binary variables is equivalent to Cronbach's alpha). However, the reliability of the knowledge and experience measure was insufficient as the KR-20 was equal to 0.419 for the sum of items 5 and 7a–7d from Table A1 and 0.489 for the sum of items 5–6 (overall experience with BP). A new research question arises: is it possible to combine two characteristics (1) knowledge and (2) experience into one variable? Inferences about knowledge and experience may be distorted by random measurement error; also, the reliability of importance of innovative payments measurement was insufficient: KR-20 was equal to 0.448, so conclusions about this variable may be biased by random error.

The hypotheses in Table 1 were verified in a structural equation model, and its parameters together with the conclusions regarding the hypotheses are presented in Table 5. The fit of this model was not sufficient (SRMR = 0.176, RMSEA = 0.175, GFI = 0.843, AGFI = 0.668, NFI = 0.650, CFI = 0.670), so this complex model should be improved or simplified. However, the estimated parameters and their *p*-values when tested equal to zero are reason to make preliminary inferences about the hypotheses that make up the model shown in Figure 8.

Most of the hypotheses have been positively verified (*p*-value less than 0.05), some of them forming a cause and effect sequence: e.g., age had a positive effect on fear about personal data. Fear about personal data had a negative effect on perceived safety. Perceived safety had a very strong positive effect on attitude towards BP, which was the strongest predictor having a positive effect on behavioral intention to accept and to use BP. Hypotheses in which the *p*-value of the estimated parameter was very close to the significance level of 0.05 were classified as "almost verified". Three hypotheses were rejected due to the insignificance of the parameter estimate (not significantly different from zero). The rejection of H3a is very interesting because the sign of the estimated parameter was opposite to the hypothesized one—a reason to investigate the relationship between the measured variables. Positive verification of the hypotheses can also provide a basis for recommendations on how to manage BP to gain greater acceptance and use in society, alleviating various concerns and balancing them with utility and facilitating conditions.


**Table 5.** Estimated SEM parameters, their *p*-values, and conclusions about hypotheses.

#### **4. Discussion and Conclusions**

Respondents have begun to understand the need to use biometrics, trust it more, and appreciate the benefits it brings. It can be inferred from the responses that users feel the need to increase the level of security. The most common indications were greater use of biometrics, such as fingerprints and iris scans. Such solutions inspire confidence in respondents, regardless of their age and experience in e-banking. This was indicated by both younger and older people. Based on some of the interviews, there is also an image of a person for whom convenience is definitely more important than security. Such a person would gladly give up, for example, confirming actions with SMS codes. This opens the path for the popularization of biometric solutions [53]. However, the connection to new emotional aspects of BP is similar to the connection of psychological aspects to TAM and UTAUT, as presented in work of Koufaris [54] on online shopping.

Reading the results of recent studies in the BP sector shows a similar importance of the attitude variable, shaped by other predictor variables, such as in the study of behavioral intention to use BP reviewed by Moriuchi [55]. Such an important role is played by the attitude variable in the presentation of the research conducted by Rosén et al. [56]. In the research model used by Zhang and Kang [57], perceived usefulness plays a similar role as a mediator in predicting intention to use BP, but also safety is very important, as are concern for personal information and perceived safety in the research model reviewed in this article. The inverse of safety (perceived risk) also plays an important role in the BP model verified by Liu and Tu [58]. In the work of Hizam et al. [59], social influence and perceived system quality are added to TAM as predictors—good functioning of BP systems is also measured by perceived user conditions and facilitation, as well as fear of barriers in the model verified in this paper.

Our study has many limitations to generalize the results—the sample is only from Poland and is rather too small to verify such a complex model of behavioral intention to use BP. New analyses could be conducted on the basis of this data, e.g., investigating the influence of gender on emotional variables such as fear about personal data or perceived safety, or their moderating role in predicting intention to use BP. Future research is planned on a more representative random sample of working-age members of the public (average age is likely to be much higher), with a minimum number of respondents of one thousand. The complex model will also be simplified, and some analysis should be applied so that

the model with direct and indirect causes is better suited to predicting intention to use BP. The main results of this pilot study should be verified in new analyses. The measurement of some constructs (especially without sufficient reliability) could be improved by using measurement scales tested in the literature. Biometric data is so important that consumers are very often unsure whether they can terminate their agreed use of BP and remove their own data from the consumer database. Compared to other technologies, biometrics combines technological innovation with biology and is even similar to medical technologies. Fear about personal data as one of the important variables can be the basis of a cluster analysis of potential consumers using BP, carried out to place the BP market in the BP-open part of society. Conspiracy mentality [60] should also be measured as one of the reasons for avoiding BP. Well measured general openness (or curiosity) to new and innovative technologies could also be added to the models. The research procedure made it possible to derive and define a new, proprietary model of hypothetical dependencies, which is presented in the paper, that along with the analysis and systematization of knowledge of the biometric market, as well as undertaking innovative research in the field of the payments market using biometrics, should be considered the main contribution of this article.

**Author Contributions:** Conceptualization, B.M.-G., C.K. and W.W.; methodology, C.K. and B.M.-G.; software, C.K.; validation, C.K.; formal analysis, C.K. and B.M.-G.; investigation, B.M.-G., A.A. and W.W.; resources, B.M.-G., W.W. and A.A.; data curation, B.M.-G., A.A. and C.K.; writing—original draft preparation, W.W., B.M.-G. and C.K.; writing—review and editing, B.M.-G., W.W., A.A., K.C.-P. and C.K.; visualization, B.M.-G., A.A. and C.K.; supervision, B.M.-G.; project administration, K.C.-P. and B.M.-G.; funding acquisition, B.M.-G. All authors have read and agreed to the published version of the manuscript.

**Funding:** The APC was subsidized by the Ministry of Science and Higher Education.

**Informed Consent Statement:** Informed consent was obtained from all subjects involved in the study.

**Data Availability Statement:** The dataset presented in the study is available upon request from the corresponding author.

**Acknowledgments:** The authors would like to thank cooperating students at the University of Wrocław for their help in the process of quantitative research procedure and members of "proMOTION" students marketing association at the Faculty of Management, Wroclaw University of Economics and Business, for their technical support in organizing the qualitative research.

**Conflicts of Interest:** Although B.M.-G., A.A. and W.W. cooperate with PayEye, they are not employed by PayEye, nor do they have any shares in the company. In addition, care has been taken to ensure that all data and information contained in the text do not raise conflicts either on legal or ethical grounds.

#### **Appendix A**


**Table A1.** Questionnaire items used in quantitative analysis.

#### **Table A1.** *Cont.*


#### **References**

