An Information Management Model for Addressing Residents’ Complaints through Artificial Intelligence Techniques

Bazzan, Jordana; Echeveste, Márcia Elisa; Formoso, Carlos Torres; Altenbernd, Bernardo; Barbian, Márcia Helena

doi:10.3390/buildings13030737

Open AccessArticle

An Information Management Model for Addressing Residents’ Complaints through Artificial Intelligence Techniques

by

Jordana Bazzan

^1,*,

Márcia Elisa Echeveste

¹,

Carlos Torres Formoso

¹

,

Bernardo Altenbernd

² and

Márcia Helena Barbian

²

¹

Postgraduate Program in Civil Engineering: Construction and Infrastructure (PPGCI), Universidade Federal do Rio Grande do Sul, 99 Osvaldo Aranha St., Porto Alegre 90035-190, Brazil

²

Postgraduate Program in Statistics, Universidade Federal do Rio Grande do Sul, 9500 Bento Gonçalves St., Porto Alegre 91509-900, Brazil

^*

Author to whom correspondence should be addressed.

Buildings 2023, 13(3), 737; https://doi.org/10.3390/buildings13030737

Submission received: 31 January 2023 / Revised: 2 March 2023 / Accepted: 5 March 2023 / Published: 11 March 2023

(This article belongs to the Topic Advances in Intelligent Construction, Operation and Maintenance)

Download

Browse Figures

Versions Notes

Abstract

Construction companies usually record customer complaints as unstructured texts, resulting in unsuitable information to understand defect occurrences. Moreover, complaint databases are often manually classified, which is time-consuming and error-prone. However, previous studies have not provided guidance on how to improve customer complaint data collection and analysis. This research aims to devise an information management model for customer complaints in residential projects. Using Design Science Research, a study was undertaken at a Brazilian residential building company. Multiple sources of evidence were used, including interviews, participant observations, and analysis of an existing database. Natural language processing (NLP) was used to build a word menu for customers to lodge a complaint. Moreover, a recommendation system was proposed based on machine learning (ML) and hierarchical defect classification. The system was designed to indicate which defects should be investigated during inspections. The main outcome of this investigation is an information management model that provides an effective classification system for customer complaints, supported by artificial intelligence (AI) applications that improve data collection, and introduce some degree of automation to warranty services. The main theoretical contribution of the study is the use of advanced data management approaches for managing complaints in residential building projects, resulting in the combination of inputs from technical and customer perspectives to support decision-making.

Keywords:

complaint; defect; Natural Language Processing; residential building; machine learning

1. Introduction

The high incidence of defects is a recurring problem in residential building projects, causing rework, cost overruns [1,2,3], and customer complaints [4,5]. One way of reducing building defects is to produce useful information about their occurrence and causes [4]. In fact, customer complaint data collected by residential building companies during the defect liability period (DLP) represent a rich and inexpensive source of information [6]. The DLP is the period during which the companies are accountable for product defects [7]. This period varies according to the laws of each country, e.g., between 12 and 24 months in the UK [8], and 6 months and 6 years in Australia [9]. In Brazil, the Consumer Protection Code [10] and the Building Performance Standard [11] have established a legal framework that supports the lodgment of customer complaints about quality concerns during the DLP. As a result, Brazilian customers have become more aware of their rights in demanding repairs from residential buildings companies. Within this framework, the DLP depends on the building component involved, but it is usually limited to five years after the handover to occupants [11]. Despite the differences in legislation, the negative effect of building defects on the residential building market is similar: such defects damage a company’s reputation if not rectified properly and timeously [3].

Most studies on the use of complaint data have been developed in connection with facilities management systems, usually for non-residential buildings, such as commercial, hospital, or educational buildings. In general, such buildings are managed by a facilities management company [12], and sometimes computerized maintenance management systems are used to capture problems regarding building malfunctions [13,14].

However, studies on customer complaints related to residential buildings are relatively scarce in the international literature. The use phase of residential building projects is often managed by the dwellers themselves or by a manager chosen by them, and not by an organization that has expertise in managing maintenance activities [6]. In general, customer complaints are usually reported directly to residential building companies, and after that an inspection is conducted to investigate the problem [15]. Repairs are carried out if the liability period covers the defect. Afterwards, data on the repair service should be collected and feedback should be given to different stakeholders, such as designers, construction managers, and supply chain managers [6]. A major problem in this process is that customer complaint data are usually stored as unstructured text, and manual input performed by people without a technical background often results in low-quality databases. Another consequence is that problem investigation can take a long time as poor information is available.

Some previous studies on customer complaints have explored manual or computerized categorization and analysis of complaints from large databases. However, knowledge generation from those studies was limited by the existing data [5]. For instance, Peng et al. [16] used 5000 complaint records related to an airport, but only half of them were considered valid for analysis. This data problem is also recurrent in complaints recorded by residential building companies during DLP [6,15,17]. Brito et al. [6] analyzed 6,956 customer complaint data from low-income housing projects in Brazil, but data limitations did not allow the causes of the most frequent problems to be fully investigated.

Considering that data-driven decision-making has become essential in most industries, including construction [18,19], it is necessary to improve data collection by adopting new methods and advanced approaches for data management. Natural language processing (NLP) and Machine Learning (ML) algorithms are examples of Artificial Intelligence (AI) solutions that can bring benefits for problems related to customer complaint data. NLP and ML can extract and interpret valuable information from unstructured texts at a speed not achieved by human analysis [20], reducing the processing and categorizing of complaint data.

Some previous studies have attempted to use these technologies to support facilities management [12,14,21,22,23]. For instance, Bortolini and Forcada [14] and Gunay et al. [22] have used NLP to evaluate building conditions. McArthur et al. [21] adopted ML to classify maintenance work orders of university buildings. Nevertheless, none of those studies explored NLP and ML to improve data collection. Thus, the focus of previous studies has been on generating metrics by applying AI solutions to process and analyze data from existing building defect databases.

The aim of this study is to develop an information management model of customer complaints for residential building projects. This information model includes a hierarchical defect classification system, AI applications for devising word menus to improve data collection from customers, and for defect recommendation. The purpose of the model is to introduce some degree of automation and improve the quality of customer complaint data collection and structuring. Such improvements could improve customer satisfaction with warranty services and knowledge generation for quality management systems.

Regarding AI applications, an NLP tool was devised to input complaints by customers. This tool has a set of words from which customers can choose to make a complaint so that a structured and comprehensive record can be produced. Moreover, ML algorithms were trained to classify the records and recommend to warranty service teams the type of problem that needs to be investigated. However, the main contribution of this investigation is not to the development of AI applications, as these are well known, but rather to the application of these technologies to an information model for receiving and processing customer complaint data in the residential building sector.

This paper is organized into seven sections, including this introduction. The literature review sections cover concepts about building defects, and present ML and NLP techniques. Afterwards, the research method describes the methodological approach, and the steps and sources of evidence used in the development of the model. The results section presents existing customer complaint services and then describes the application of the proposed model. Section 6 evaluates the solution, pointing out its potential benefits and limitations. The final section summarizes the contributions of the investigation and presents suggestion for further research.

2. Building Defects

As customer complaints recorded during the DPL may result from poor product quality, it is important to point out concepts that can be used to devise a consistent building defect classification system. Hence, a brief literature review of these concepts is presented in this section.

Several terms are used to describe the lack of quality in buildings, such as failure, defect, non-compliance, or building pathology. Non-compliance is a term used in the ISO 9000 standard [24] to define the non-fulfilment of a requirement related to an intended or specified use. Atkinson [25] distinguishes between failure and defect: a failure is a deviation from good practice, which may or may not be corrected before project delivery, whereas a defect is a performance deficit that manifests itself once a building is in operation. Finally, the term pathology is also commonly used, borrowed from the field of medicine:. Building pathology is an area of study concerned with the causes and mechanisms of the occurrence of defects [26]. In this study, the term “defect” will be adopted to describe building quality problems.

During the use phase of building projects, defects may have several consequences, being often perceived differently by distinct stakeholders. For instance, Forcada et al. [27] point out that customers often pay attention to aesthetics-related defects, while construction professionals are more directly concerned with a technical perspective of building defects, including the investigation of their causes. Both perspectives need to be considered in the management of customer complaints.

Previous studies on building defects indicate that there is a wide range of causes of building defects, such as design failures, errors in production, the use of defective materials, and a lack of maintenance [1,5,28,29,30]. According to Josephson and Hammarlund [31], a cause can be defined as a proven reason for an undesired result, and can occur in combination, involving different supply chain members. In fact, causes can propagate from one building element to another, as there are many interdependences between them. Therefore, poor performance of one element may lead to problems that may affect the whole building [32].

When causes are systematically identified, stakeholders are able to prevent and detect defects [33]. In this context, two terms concerning causes are particularly important in quality management. Root causes describe the most fundamental reasons for the undesired result, while direct causes are often associated with the individuals who are influenced by these conditions [31]. Thus, complex mechanisms of defect occurrence often require a broad understanding of several factors to mitigate the problem [33].

3. Natural Language Processing and Machine Learning

Text mining is the process of deriving information from free or unstructured text. The information is not previously known and not easily revealed [34]. The text mining process encompasses tasks such as text classification, clustering, entity, relation, and event extraction. Natural language processing (NLP) is an attempt to extract a fuller meaning from free text [35]. According to Liddy [20], NLP can be defined as a set of computational techniques to analyze and represent texts that occur naturally at one or more levels of linguistic analysis, achieving human-like language processing. NLP is positioned as a discipline within AI because it allows a machine to transform human languages into numbers and use this code to learn about the world [35]. The choice of ML algorithms used by NLP applications depends on the problems addressed, and each type has unique advantages and disadvantages [36].

ML can be classified into four categories according to the type of supervision necessary during training: supervised, unsupervised, semi-supervised, and reinforcement learning [37]. Supervised learning is concerned with fitting a statistical model to estimate or predict an outcome based on data input with previous categories. In contrast, the aim of unsupervised learning is to discover hidden patterns or groups in data without such a classification available [21]. A typical supervised learning task is text classification, which is addressed in this paper. In text classification, ML techniques train an algorithm to extract features from a set of pre-labelled documents and classify new ones based on their contents [21]. Naïve Bayes, methods based on decision trees, such as Random Forest (RF) and Gradient Boosting, Support Vector Machine (SVM), K-Nearest Neighbour, and Neural Networks are all types of algorithms used for supervised machine learning [37,38].

SVM is able deal with many attributes by using decision rules based on the fit of hyperplanes. These hyperplanes separate observations by maximizing the margin between data points [38]. Naïve Bayes is a probabilistic classifier that calculates the occurrence probability of variables based on the presence or absence of others; however, it does not consider the correlation between variables [37]. Despite this issue, this classifier often outperforms other types of algorithms considered to be more sophisticated [39]. RF and Gradient Boosting are methods that aggregate a set of trees, improving their predictive performance [40,41]. These methods are adopted extensively as they are non-parametric and robust to outliers [39]. Decision trees are independent of each other in RF, while for Gradient Boosting, the construction of each tree is done sequentially using information from predecessor trees. Thus, each new tree is adjusted based on the errors of the previous tree [38].

Finally, Neural Networks used to support deep learning reproduce interconnected nodes or neurons in a layered structure to learn, similarly to the human brain [37]. In NLP, these models use some word representations, such as word embedding, in which words with similar meanings have a vector with similar values [42].

Regarding applications, NLP has been adopted for several purposes, including the use of deep learning to classify user reviews of products [43] or to generate appropriate responses to users’ emotions in a chat tool [44]. Despite the benefits of deep learning, this method was not addressed in this study because the available complaints database was not large enough. Moreover, the complaint texts have a specific vocabulary that is different to the vocabularies in pre-trained models, based on reports found in the Portuguese language. Some studies in the construction industry have also used NLP to classify clauses of contracts [45], to examine construction specifications to support contractors in the management of project risks [46], and to analyse causes of accidents [47].

In the field of facilities management, text mining has been used to examine building operation documents to identify the words used most frequently about defects [14,22]. McArthur et al. [21] applied ML algorithms, such as Random Forest and Frequent Itemset Analysis models, to classify maintenance work orders of university buildings in different building service categories (e.g., electrical, plumbing, heating). However, other common defects in residential contexts, such as cracks in walls, detachment of floors, and ceiling defects, were not addressed. Despite the contribution of these studies to data management, their main limitation is that they used ML simply to extract useful information from texts or user’s inputs. However, they did not attempt to improve data collection with the aim of improving the structure and completeness of the information produced, as is proposed in this investigation.

Finally, it is worth mentioning some concerns that have emerged about how ML tools and humans affect each other [48]. For instance, cultural barriers aggravated by the possible low performance of some applications can create resistance to their use. An example is customer service chatbots that do not deliver the necessary information to the user in a conversation [49]. One way of overcoming this barrier is by designing systems that closely replicate human communication characteristics, such as those using NLP [39]. Another issue is the loss of people’s skills as they are not involved in routine mental tasks anymore. Such tasks have been taken over by technologies such as text processing tools that provide intelligent corrections or spreadsheets that suggest graphs [50]. This can be regarded as a negative effect, but there are also some benefits, such as mental capabilities that are freed up for more relevant tasks [50]. Overall, it is necessary to be aware of the challenges associated with ML tools and take steps to mitigate their negative impacts [48].

4. Research Method

Design Science Research is the methodological approach adopted in this investigation. This approach supports disciplines concerned with problem-solving, such as management, engineering, or information systems [51]. These disciplines develop and test solutions rather than simply describing and understanding problems [52]. Therefore, the knowledge generated in Design Science Research has a prescriptive and often multidisciplinary character; it seeks to solve classes of problems by devising a solution concept, known as an artifact, and considers the context in which this solution can be applied [52]. Design Science Research may have different types of outcomes, such as constructs, models, methods, or instantiations [53]. Models are used to describe relationships between constructs, but can also be regarded as abstract representations of tasks and processes [53]. Thus, the artifact devised in this investigation is an information management model of customer complaints for residential building projects. The target audience for this model is residential building companies that need to devise information systems for receiving and processing customer complaints.

This study was developed in collaboration with a large Brazilian building company, named Company A in this paper. The company builds residential projects and has a well-structured Warranty Service Department that deals with complaints, providing repair services during the five-year product liability period. The set of projects considered in this investigation was from the middle and higher-middle class market segment. Most building technologies involved in these projects can be considered traditional, such as cast-in-place concrete structure, external block walls, cement and lime plastering, internal dry-wall partitions, and ceramic tiling.

This research project was divided into three phases (Figure 1), following the Design Science Research steps proposed by Kasanen et al. [54]: (i) obtain a deep understanding of the problem; (ii) develop an innovative solution concept for a class of problems; (iii) implement and evaluate the solution. In the first phase, a process map of the customer complaint service was developed to obtain an in-depth understanding of the problems faced by the company in managing customer complaints. In the second phase, the proposed model was devised. A defect classification system was initially developed. Then, the word menu was created for customers to record their complaints. ML algorithms were trained to automatically classify complaint texts and recommend potential problems to be investigated by the warranty service team. Finally, in the third phase of the investigation, the proposed solution was assessed in terms of utility and applicability, and a reflection on the theoretical implications of this investigation was undertaken.

4.1. Process Mapping

The mapping of the warranty service was based on two sources of evidence: participant observations of some customer services and semi-structured interviews. The daily routine of four technicians who carry out building inspections was followed for one month. Twenty inspections in seven different building projects were followed. The interviews were conducted to obtain additional information about the processes involved. The participants were the head of the Warranty Service Department, a building maintenance engineer, and three technicians from the same department. Table 1 presents their profiles.

The interviews were divided into four sections, focusing on the following topics: (i) main role of each interviewee, (ii) existing limitations of data collection and processing, (iii) existing limitations of feedback practices, and (iv) main barriers faced by the sector to provide feedback. The interviews were carried out individually, and lasted for around one hour.

Two data collection forms used by the teams during the service were also analyzed. The first was used to collect data during the inspection and the other one during the repair service. The structure of the information system used to store complaint and repair data was also analyzed.

4.2. Development of the Defect Classification System

The defect classification system was developed according to the definitions of building systems, elements, and components adopted by the Building Performance International Standard ISO 19208 [55] and the Brazilian Performance Standard NBR 15575 [9]. An existing database devised by Berr [56] was used as a point of departure for the development of the classification system, considering a wide range of defect types that may happen in residential building projects in Brazil that adopt traditional building technologies.

Afterwards, Company A provided records from a database with 2,765 customer complaints from 30 projects. These were manually categorized according to the proposed defect classification system. As the database contained only complaint texts, additional data available about the projects were sought, such as notes made by technicians about the status of the repair service. Based on that analysis, new types of defects were identified and included in the database.

4.3. Development of Word Menu

The following steps were carried out to pre-process the complaint texts and develop the word menu using R software [57]: (i) tokenization process, which splits the texts into pieces called “tokens”, removes punctuation and accentuation, and transforms words into lowercase letters [34]; (ii) correction of misspelled words; (iii) stemming process, which removes word suffixes [58], e.g., “crack”, “cracking”, and “cracked” must be equal to “crack”, and; (iv) removal of meaningless words, called “stop-words”, by using existing dictionaries and technical analysis. Moreover, all remaining words were categorized according to their meaning: element, component, defect, and location.

Pairs of words that occur together in the texts were used to create the menu. Figure 2 presents a schematic representation of the frequency of words, how often pairs of words were connected (thickness of the arrow), and citation order (arrow direction). The word “bathroom” is often mentioned before the word “suite”, which is cited together with “wall”. In contrast, “wall” is mentioned before or after “fissure”. Thus, there is no pattern of how customers mention words. For this reason, the word menu was built to establish a logical order of word selection, adopting the hierarchical level of categories as criteria: location, element, component, and defect. Those categories are represented by different colors in Figure 2.

Pairs of words with a frequency equal to 1 were not included in the menu due to the large number of pairs found, around 5000. Finally, the most frequent word among those with the same meaning was included; for example, “infiltration” was more common in complaint texts than “leaks”. Generic words, such as apartment, unit, and problem, were removed.

4.4. Training of Algorithmic Classifiers

ML algorithms were trained to classify complaints according to building system categories in eight stages. The first stage was the text pre-processing described previously. The models were trained with and without stop-words to test improvements in classification performance. These were the steps carried out using the R software [57]:

Data split into two sets: one was used to train the models and the other to test the performance of fitted models. Three proportions were used for training and testing: (i) 70% and 30%; (ii) 75% and 25%; and (iii) 80% and 20%.
Text vectorization: the Document Feature Matrix which convert text documents into a matrix was applied. The rows are the texts and the columns are the words in the texts. “Bag of words” (BOW) and “Term frequency and inverse document frequency” (TF-IDF) are very common Document Feature Matrix models in NLP. In BOW, word frequency is used as a variable and word sorting is not considered, while TF-IDF measures the importance of a term in a text by assigning a weight [34].
Selection of models: Naïve Bayes, SVM, RF, and Gradient Boosting were used, as suggested by literature to solve text classification problems [37,38]. All models were trained by using the Quanteda [59] and Caret packages [60].
Tuning of hyperparameters: a Random Search and cross-validation with five folds was implemented to avoid overfitting and underfitting [37]. Five folds were chosen to increase the probability that all categories would be included in the folds as the database had categories with low frequency.
Evaluation of performance: accuracy, precision, recall, and F1 score were the measures used to evaluate the performance of the models. Accuracy is the percentage of correct predictions made by the model [37]. Precision measures the rate of true positives to all positive predictions, while recall measures the rate between true positives and the sum of true positive and false negative predictions [38]. Lastly, the F1 score is a harmonic mean between precision and recall [38].
Feature engineering: synthetic features (words) were added to the records to improve performance. These were selected according to (i) bi-terms that appeared at least ten times in the records (e.g., “bathroom suites” and “single bedroom”); (ii) most frequent words in the low-frequency categories, and (iii) words indicated by the Keyness score. The Keyness score is a word importance indicator calculated using the chi-squared test, which evaluates whether a term occurs more often in a class than in the whole database [61].
Resampling to unbalanced classes: when there are large differences in class size, the algorithms often result in biases in favor of high-frequency classes, treating low-frequency classes as noise [62]. The complaint database contains some categories with a low occurrence frequency but of great severity. Therefore, an oversampling approach was used, randomly duplicating the data from the minority class until the data number of the majority class was reached.

4.5. Evaluation of Artifacts

The solution was assessed according to utility and applicability constructs, as suggested by March and Smith [53] and Peffers et al. [51]. These constructs were defined using sets of criteria, as shown in Table 2. The applicability construct refers to the feasibility of implementing the proposed model by companies considering the necessary skills, the different contexts of organizations as well as project characteristics. The utility construct is concerned with the degree to which the model helps solve customer complaint management problems.

A one-hour seminar with Company A representatives was conducted to discuss the research results and obtain contributions for the proposed model. Two maintenance team managers and two project management specialists participated in the seminar. First, the proposed model was presented to the participants, and then an open discussion among the participants was undertaken, considering the criteria described in Table 2.

5. Results

5.1. Assessment of the Existing Customer Complaint Service

The existing customer complaint service had four stages, as shown in Figure 3:

Receipt of complaints: the sector named Customer Relationship Unit received customer complaints by phone or through the company website. The complaint was recorded in a descriptive text. Then, the sector checked whether the defect type was still covered by the warranty period, and a technical inspection was scheduled. Staff from the Customer Relationship Unit did not have any technical background in construction, and errors in complaint analysis often occurred, such as sending problems to the warranty service team that were no longer covered by the period. This issue overloaded the technical staff from the Warranty Service Department who were involved in building inspections. Another problem was that records were incomplete and had inconsistent descriptions of defects, making it difficult for the technical staff to clearly understand the problem before the visit.
Technical inspection: the initial investigation of the defect causes was carried out during a visit by a technician with general knowledge about building technologies. A paper form was used during this inspection to describe the defect type and its origin (e.g., design, construction, use phase). As this form had descriptive fields, the information collected was often incomplete. For instance, no information was collected about the lack of preventive maintenance or improper product use, which could provide relevant feedback to the quality system. During the visits, customers often took the opportunity to report other quality problems. If the causes of the defect could not be identified, a new inspection was scheduled to carry out other tests, frequently held by a specialist (e.g., electrician or plumber). Finally, the person or crew in charge of carrying out the repair service was identified and the repair service was scheduled.
Repair service: After the repair, additional data were collected and recorded in the company’s information system, such as defect classification, team name, and number of work hours. The company classified the defects according to 58 categories of building parts, such as plumbing system and brick wall, and 477 categories of defect types. The criteria adopted in those classifications were ambiguous and not detailed enough to enable a clear understanding of the causes. Moreover, only 51.36% of the defect categories had been used to classify complaints in the database. Finally, other defects reported verbally by the customer during the inspection were usually not recorded by the company, and other data collected often required manual pre-processing and categorizing to allow systematic analysis for feedback purposes.
Feedback process: The quality problems identified by the Warranty Service Department were reported to other sectors (e.g., design, production, material supply, etc.). This process was not based on the complaint database due to its limitations, but according to the maintenance team’s perceptions of the most frequently reported quality problems.

5.2. Proposed Model

An overview of the proposed model is presented in Figure 4, which is divided into three stages:

Receipt of complaint: the customer can choose between a set of words that best describe the existing problem. The words chosen are classified automatically by a machine learning algorithm according to a defect classification system. Thus, the system recommends to the warranty service team which defect type should be investigated during the inspection, which can lead to a reduction in service lead time. If the menu does not include the problem perceived by the customer, a complementary text field is provided to specify the complaint. These texts can be further incorporated into the word menu as new options, increasing the amount of information available in the system.
Inspection and repair: the technical staff is in charge of confirming the classifications suggested by the system and collecting additional data on the defect during the inspection. More than one cause can be identified for a single defect. If the proposed category is considered wrong, the technician must choose another category or suggest a new one. A critical review of additional causes of defects must be conducted so that unnecessary categories are not stored in the system.
Feedback process: the data collected must be processed and analyzed, generating quality indicators and knowledge for the company, such as indicators on the frequency of claims for different building elements, and the causes of building defects. This information can be used to support decision-making and provide feedback for future projects.

All data are stored in a cloud database, which is regarded as an enabling technology for the model. It makes information available to different stakeholders, such as individuals involved in planning corrective or preventive actions. The database must have access control so that data integrity, security, and privacy are not compromised. This is because the company’s issues with quality and customers’ personal data are stored on the database.

5.2.1. Defect Classification System

Figure 5 shows the proposed defect classification system with five levels: system, element, component, defect type, and cause. Six categories of systems were identified in Company A: “building service”, “horizontal partitions”, “vertical partitions”, “openings”, “structural”, and “miscellaneous”. The last category refers to elements that do not fit in other categories, such as furniture or equipment.

Figure 5 presents some examples of defects included in the database. For instance, defects in the building service system, plumbing service elements are described in the black sections. A water service pipe component had a leakage caused by the lack of adhesive and anchoring. In that example, some causes cannot be identified by visual inspection but must be included in the database to establish a line of investigation of the root causes. For example, leakages in pipes may also occur due to early use of the system before the adhesive curing time has elapsed. Further investigation of the company’s production procedures should be conducted to confirm the possible cause.

As mentioned earlier, company A originally had 477 categories of defects. The implementation of the proposed defect classification system reduced this number to 53, i.e., a reduction of 88%. Hence, the proposed defect classification system provides a consistent and effective hierarchical organization of defect categories. In addition, it provides details on defects essential to understanding the occurrence of problems.

5.2.2. Word Menu

The database initially had 29,288 words, which were reduced to 17,724 (around 40%) after pre-processing the texts. Figure 6 shows a cross-analysis of the frequency of words organized by category, with the aim of providing an overview of building defects. Words with the same meaning naturally show up, for example, “cracks” and “fissures” (Figure 6a). Furthermore, some words make sense only when combined with others, such as the words “no” with “close”, and “no” with “work”. Nevertheless, despite the limitations shown in Figure 6, it provides some important insights. “Infiltration” is a defect that is frequently mentioned in several locations, resulting in an approximately continuous line on the graph. It is therefore a systemic problem in Company A. A similar situation occurs with “bathroom”, whose location is mentioned with several defects, such as “infiltration”, “loose”, and “leaks”.

In Figure 6b, “windows” and “walls” are the elements that are most frequently mentioned by customers. A wide range of defects is also cited, such as “infiltration” for windows and “infiltration” and “fissure” for walls.

Figure 7 presents the word menu developed according to the terms used by Company A’s customers, displaying possible ways to lodge a complaint. The menu has five levels: Area, Location, Element, Component, and Defect Type. This figure shows the complete list of areas, locations, and elements identified in the empirical study. However, only a sample of components and defect types is presented due to size limitations. The “Area” category was created due to different possible locations for the same defect. In the empirical study, 21 locations were mentioned by customers, such as bathroom, gym, or swimming pool. This grouping reduces the time spent in choosing words to lodge a complaint.

As shown in Figure 7, the customer starts by choosing the area, followed by the location, element, component, and type of defect observed in the dwelling. For example, a customer can record a problem in a private area, in the living room, related to the electrical installation whose circuit breaker disarms. For some complaint types, the component category is impossible to define, such as the smell of gas (defect type) from the kitchen (location). In this case, the customer can lodge a complaint without entering this information. Finally, in the communal area, a customer can record a defect in the entrance hall, such as a wall with peeling paint.

Although Figure 7 shows all options, when an option is selected in a level, the next level should present only the associated categories. For instance, the “single bedroom” location usually does not have the “plumbing services” element. Consequently, this option should not be offered to the customer.

5.2.3. Recommendation System

The customer complaint records were classified into six building system categories, as shown in Figure 8. The most frequent category was building services, with 33.05% of complaints, while the structural and miscellaneous systems had the lowest frequency of complaints, at 1.84% and 2.42%, respectively.

Cycles of ML algorithm adjustments were conducted with several combinations. The different proportions used to split the database resulted in models with similar performance. However, 70% for training and 30% for testing were adopted, so the testing database had enough data with low-frequency categories. The stop-words were maintained as they slightly improved the accuracy of the classifiers. Lastly, the BOW approach had better performance results than TF-IDF, and no synthetic feature was included in the records because it caused overfitting.

Figure 9 shows the Keyness results, so that the impact of the words in the categories can be understood. A rank of the five most important words and their frequencies for building services, vertical partitions, and opening systems is shown. These results indicate that the most frequent terms may not have much relevance for a given category. “Tile” has almost half of the frequency of “Wall” for vertical partitions, but it is in the first position in the importance ranking. This is because the word “Wall” occurs many times in other categories, such as opening systems.

Table 3 shows the performance of the models, indicating the runtime and accuracy. Gradient Boosting had the highest correct classification percentage (82.89%). However, it had a longer runtime than other models, resulting in 4.59 h of runtime. This high computational cost is caused by the high number of hyperparameters to be adjusted. Furthermore, Gradient Boosting learns from the errors of the previous trees, and the dependencies between them make it impossible to process the trees in parallel.

The model with the second best performance was RF. Although there was only a small difference in accuracy (1.21%) between the two models, there was a substantial difference in runtime (around 4 h), making the RF a better than option than the Gradient Boosting. Lastly, Naïve Bayes and SVM had the lowest accuracy, 77.71% and 77.95%, respectively, despite fast processing time.

Figure 10 shows the confusion matrices with the number of correct and incorrect predictions for each class. The rows represent the observed data, while the columns show the predicted data. The diagonal values are predictions that were made correctly by the models. Lighter colors mean correct model classification. For example, in the rows of the SVM confusion matrix, the model classified 94 records of horizontal partitions correctly. However, the model classified 14 complaints as openings, 22 as building service, and 15 as vertical partitions when, in fact, they were not (False Negative). Therefore, the recall score was 63.67%. Finally, the SVM classified 39 records as horizontal partition problems in the column, but these claims referred to other categories (False Positive). As a result, the precision score was 70.68%.

As shown in Figure 10, all models had similar prediction performance for all categories. The worst performance was for miscellaneous and structural systems. The recall score for structural systems ranged from 21.69% to 77.78%. This low performance was caused by the small number of structural records, around 2%. Customers do not usually complain about this type of defect because it is not easy to perceive. Additionally, there is usually rigorous technological control of the structural system during construction. Consequently, the complaint database typically has few records of structural problems, and it is challenging for the trained models to have good prediction results.

Although Naïve Bayes had lower accuracy than Gradient Boosting, its recall score (77.78%) for the structural category was higher. High recall scores for structural systems are preferable in relation to high precision scores as it is much less serious to detect structural defects when they do not exist (false positive) than to classify defects into less severe categories when the real defect is structural (false negative).

Similarly, quality problems in building services are also critical because they include electrical and fire protection defects that can severely compromise customer safety. These categories should also be analyzed by using the recall score. By contrast, as openings and miscellaneous systems were less severe, both recall and precision measures can be considered. Thus, the F1 score was a measure adopted for these systems, as shown in Figure 11.

Gradient Boosting again had the best performance for predicting the openings category (F1 Score of 87.73%). However, the other algorithms also showed good performance at around 85%. For miscellaneous systems, the F1 scores for SVM were the highest (71.12%), proving this to be the best model to predict low-frequency categories.

6. Discussion

6.1. Solution Evaluation

The evaluation of the customer complaint management model was based on utility and applicability constructs, as shown in Table 2. Two criteria were considered for applicability:

Ease of use: during the complaint receipt stage, the choice of words made by the customer was easy to understand as the taxonomy used in the menu is based on the language naturally spoken by customers. Moreover, the options were organized logically, starting from macro (location) to micro level (defect type) information. Accordingly, the ease of lodging a complaint can reduce the negative effect of a repair service on customers. From the point of view of the warranty service teams, the defect classifications are also based on a logical and organized structure that allows ease of use.
Use the model in other contexts: the word menu and the recommendation system had a wide range of defect types and were developed based on data from 30 projects. This indicates that it is potentially applicable to other housing or building companies and to a range of building elements. Nonetheless, the content of the database depends on the design type and building technology adopted in each region. The algorithms had high classification performance levels, and the hardware’s processing capacity did not need to be high to classify quality problems effectively, according to the runtime of the algorithms.

With regard to the utility of the artifacts, three criteria were considered (Table 2):

Improvements in data collection: the word menu improved data collection by avoiding the loss of essential data to understand the problem, i.e., data that are often forgotten or not provided by the user. The same occurs for the proposed defect classification system, which has five levels of information detail, including the cause of building defects. The set of words covers many types of complaints due to the way the database was built. However, the word menu and recommendation system need to be updated periodically as new building technologies and other types of defects emerge. Thus, new data must be used from time to time to train the ML algorithms.
Process automation: the recommendation system automatically classifies problems and indicates the type of defect claimed, eliminating the steps of complaint analysis performed by the customer relationship service. Often, staff from that unit do not have much knowledge about building defects. Consequently, wrong data input overloads warranty service teams. Moreover, there is the possibility of eliminating some steps in warranty services. For instance, the first inspection is carried out by a building technician who often requests specialized professionals to perform tests, such as plumbers and electricians, resulting in long investigation times. This step can be eliminated as the type of defect is identified more accurately. As a result, there is a reduction of the negative impact of the repair service on customer satisfaction. These benefits can also help reduce warranty service costs.
Contributions to the feedback process: the enhanced database can be used to identify the most important building defects, e.g., by associating problems with features of projects. In addition, different levels of the classification system can be used according to the type of assessment. For instance, metrics related to defects in the building systems level are relevant for managers to identify the most critical ones. In contrast, the analysis of “defect type” and “cause” can be directed to the operational levels in which the origin of the defect can be eliminated. Therefore, the defect classification system can be used to provide feedback and support decision-making in both product design and production management. Consequently, building quality tends to improve, which extends the service life of building components and reduces the occurrence of quality problems at handover.

6.2. Managerial Insights and Recommendations

Some recommendations regarding the implementation of the proposed model must be pointed out. Companies may occasionally face barriers to using the word menu by customers who cannot use digital tools, although digital technologies have become more accessible to everyone. Alternative methods for lodging complaints could be offered to customers, for example: (i) training the customer relationship service staff to use the word menu and lodge the complaint in the system; and (ii) use audio transcription technology to record the complaint as a text message.

Regarding the recommendation system, this study tested some ML algorithms with different runtimes in training. It is important for companies to consider this metric in the choice of ML techniques in order to avoid large investments in hardware. In addition, the choice of algorithms must consider the need to achieve high performance for severe problems, such as structural defects.

Finally, the proposed information management model for addressing customer complaints demands a continuous effort to update and extend the database due to changes in building technologies, which may lead to different types of defects and cause–effect relationships.

6.3. Theoretical Contributions

While existing studies have focused on manual categorization and data analysis of customer complaints, this investigation improves and automates complaints management in residential building projects by using NLP and ML. Although the advantages of NLP for assessing building conditions have already been pointed out by Gunay et al. [22] and Bortolini and Forcada [14], this study used NLP methods also to improve data collection and develop a word menu with the most common terms used by customers. Hence, this study has not proposed a simple application of existing technologies but used them to develop an information management model of customer complaints that considers the specific context of residential building projects. Finally, the taxonomy of building defects was organized into a hierarchical defect classification system, which can be considered a secondary theoretical contribution of this study.

As the building product consists of many parts with interfaces and different functions, the defect formation mechanisms are complex. The Keyness analysis indicated that customers often cite walls to describe defects in both vertical partitions and opening systems. These findings highlight the strong interaction between the different parts and the complexity involved in the building product. Furthermore, different technologies and quality controls are adopted for each building system, and then the data will be naturally unbalanced according to the quality of each building part. These features can cause limitations for the use of some ML algorithms. However, unlike McArthur et al. [21], this investigation encompassed all building systems and revealed that a customer complaint management model must consider the interactions between building interfaces.

Lastly, the proposed model includes data input from customers as an essential part of the management system, and not only data collected in a technical evaluation by experts, as addressed by some studies on building quality [6,15,36,63]. Both perspectives–– user perception and technical assessment––play important roles in building quality assessment. In fact, the model does not work and learn without those two perspectives.

7. Conclusions

This study proposed improvements for customer complaint data collection and warranty services. An information management model for customer complaints in residential building projects was devised. The NLP approach was used to identify the most frequent terms and build a word menu for customers to complain. ML algorithms were also trained to classify the complaint texts and suggest the defect type to be investigated by technical staff.

Some training scenarios of ML algorithms were tested, considering different text vectorizations, database splits, feature engineer alternatives. In addition, a resampling approach was adopted and was important as the complaint database had unbalanced classes. Gradient Boosting had the best accuracy score, and the Naïve Bayes achieved better results in low-frequency categories. The latter algorithm is preferable, especially when complaints refer to problems with high severity, such as defects in structural systems.

The practical contributions of this investigation refer to the devised artifact. The model provides data that can improve the understanding of building defects and provide some degree of automation in warranty service processes. Regarding theoretical contributions, this study has proposed advanced data management approaches in the context of residential building projects, which has had little progress in data-driven decision-making. The potential use of NLP and ML to manage customer complaint data has been demonstrated in this study. Besides the taxonomies on building defects from technical and customer perspectives, this study highlights and discusses the complexity of defect occurrence in building products, which must be considered in the development of customer complaint management models.

It is important to point out some overall impacts of this investigation on the residential building sector. The proposed model can help improve both customer complaint services and product quality, which may enhance the image of the industry in society, as well as eliminating waste and reducing the environmental impact of construction activities.

Regarding the limitations of this investigation, although the recommendation system should be applied at all levels of the defect classification system, the implementation was carried out only for the building system level in this paper. Further studies can implement the proposed recommendation system for other levels, such as element and component. The complaint database size also had some limitations, for instance, for structural defects, which have impacted the learning performance of ML algorithms. Further studies can implement the proposed model based on more extensive databases. Moreover, the word menu did not incorporate all possible defects as only the problems found in the context of traditional Brazilian building technologies were considered. Therefore, other studies should adapt the proposed models to be used in other contexts.

Lastly, other opportunities for further research have been proposed: (i) integrate the proposed model with other technologies, such as Building Information Modelling, to visualize the defects in digital models; (ii) apply audio transcription technology to classify additional complaints made verbally by the customer during inspections, and (iii) investigate the defect types that generate higher levels of negative feelings in customers than others by using sentiment analysis.

Author Contributions

Conceptualization, J.B., C.T.F. and M.E.E.; Method, J.B., C.T.F., M.E.E. and M.H.B.; Formal analysis, J.B. and B.A.; Writing—original draft preparation, J.B.; Writing—revision and editing, C.T.F., M.E.E. and M.H.B.; Supervision, C.T.F., M.E.E. and M.H.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Council for Scientific and Technological Development through the Academic Doctorate for Innovation Program (No. 142267/2019-8).

Data Availability Statement

Restrictions apply to the availability of these data. Data was obtained from the company A and are available from the authors with the permission of the company.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Plebankiewicz, E.; Malara, J. Analysis of defects in residential buildings reported during the warranty period. Appl. Sci. 2020, 10, 6123. [Google Scholar] [CrossRef]
Love, P.E.D.; Teo, P.; Morrison, J. Revisiting quality failure costs in construction. J. Constr. Eng. Manag. 2018, 144, 05017020. [Google Scholar] [CrossRef]
Park, J.; Seo, D. Defect Repair Cost and Home Warranty Deposit, Korea. Build 2022, 12, 1027. [Google Scholar] [CrossRef]
Hopkin, T.; Lu, S.L.; Sexton, M.; Rogers, P. Learning from defects in the UK housing sector using action research: A case study of a housing association. Eng. Constr. Archit. Manag. 2019, 26, 1608–1624. [Google Scholar] [CrossRef]
Carretero-Ayuso, M.J.; Rodríguez-Jiménez, C.E.; Bienvenido-Huertas, D.; Moyano, J.J. Interrelations between the types of damages and their original causes in the envelope of buildings. J. Build. Eng. 2021, 39, 102235. [Google Scholar] [CrossRef]
Brito, J.N.D.S.; Formoso, C.T.; Echeveste, M.E.S. Analysis of complaint data in social house-building projects: A study in the Residential Leasing Program. Amb. Constr. 2011, 11, 151–166. [Google Scholar] [CrossRef]
Asante, L.A.; Quansah, D.P.O.; Ayitey, J.; Kuusaana, E.D. The practice of Defect Liability Period in the real estate industry in Ghana. SAGE Open 2017, 7, 2158244017727038. [Google Scholar] [CrossRef]
Hopkin, T.; Lu, S.; Rogers, P.; Sexton, M. Key stakeholders’ perspectives towards UK new-build housing defects. Int. J. Build. Pathol. 2017, 35, 110–123. [Google Scholar] [CrossRef]
Levi, U. Contractual Defects and Statutory Defect Liability in Queensland. 2016. Available online: http://www.findlaw.com.au/arti-cles/5238/contractual-defects-and-statutory-defect-liability.aspx (accessed on 30 April 2020).
Brazil Consumer Protection Code. Law nº 8.078. 11 September 1990; It Provides Information on Consumer Protection and Makes Other Provisions. Available online: https://www.planalto.gov.br/ccivil_03/leis/l8078compilado.htm (accessed on 12 September 2019).
ABNT (Brazilian Association of Technical Standards). NBR 15575-1: Residential Buildings up to Five Floors—Performance Part 1: General Requirements; ABNT: Rio de Janeiro, Brazil, 2013. [Google Scholar]
Assaf, S.; Srour, I. Using a data driven neural network approach to forecast building occupant complaints. Build. Environ. 2021, 200, 107972. [Google Scholar] [CrossRef]
Chen, W.; Chen, K.; Cheng, J.C.P.; Wang, Q.; Gan, V.J.L. BIM-based framework for automatic scheduling of facility maintenance work orders. Automat. Constr. 2018, 91, 15–30. [Google Scholar] [CrossRef]
Bortolini, R.; Forcada, N. Analysis of building maintenance requests using a text mining approach: Building services evaluation. Build. Res. Inf. 2019, 48, 207–217. [Google Scholar] [CrossRef]
Cupertino, D.; Brandstetter, M.C.G.O. Proposal for a post-work management tool based on the records of technical assistance requests. Amb. Constr. 2015, 15, 243–265. [Google Scholar] [CrossRef]
Peng, Y.; Lin, J.R.; Zhang, J.P.; Hu, Z.Z. A hybrid data mining approach on BIM-based building operation and maintenance. Build. Environ. 2017, 126, 483–495. [Google Scholar] [CrossRef]
Milion, R.N.; Alves, T.D.C.; Paliari, J.C. Impacts of residential construction defects on customer satisfaction. Int. J. Build. Pathol. 2017, 35, 218–232. [Google Scholar] [CrossRef]
Ahmed, V.; Aziz, Z.; Tezel, A.; Riaz, Z. Challenges and drivers for data mining in the AEC sector. Eng. Constr. Archit. Manag. 2018, 25, 1436–1453. [Google Scholar] [CrossRef]
Petrova, E.; Pauwels, P.; Svidt, K.; Jensen, R.L. Towards data-driven sustainable design: Decision support based on knowledge discovery in disparate building data. Archit. Eng. Des. Manag. 2019, 15, 334–356. [Google Scholar] [CrossRef]
Liddy, E.D. Natural Language Processing for Information Retrieval. In Encyclopedia of Library and Information Sciences, 4th ed.; Taylor & Francis: New York, NY, USA, 2018. [Google Scholar]
Mcarthur, J.J.; Shahbazy, R.F.; Raghubar, C.; Bortoluzzi, B. Machine learning and BIM visualisation for maintenance issue classification and enhanced data collection. Adv. Eng. Inf. 2018, 38, 101–112. [Google Scholar] [CrossRef]
Gunay, H.B.; Shen, W.; Yang, C. Text-mining building maintenance work orders for component fault frequency. Build. Res. Inf. 2019, 47, 518–533. [Google Scholar] [CrossRef]
Mo, Y.; Zhao, D.; Du, J.; Syal, M.; Aziz, A.; Li, H. Automated staff assignment for building maintenance using natural language processing. Automat. Constr. 2020, 113, 103150. [Google Scholar] [CrossRef]
ISO (International Organization for Standardization). ISO 9000: Quality Management Systems—Fundamentals and Vocabulary; ISO: Geneva, Switzerland, 2005. [Google Scholar]
Atkinson, G. A century of defects. Building 1987, 252, 54–55. [Google Scholar]
Marinho, J.L.A. Building Pathology: Occurrences in Buildings and Historic Heritage, 2nd ed.; Leud: Guarulhos, Brazil, 2022; p. 296. [Google Scholar]
Forcada, N.; Macarulla, M.; Gangolells, M.; Casals, M. Handover defects: Comparison of construction and post-handover housing defects. Build. Res. Inf. 2016, 44, 279–288. [Google Scholar] [CrossRef]
Connor, J.T.O.; Koo, H.J. Proactive Design Quality Assessment Tool for Building Projects. J. Constr. Eng. Manag. 2021, 147, 04020174. [Google Scholar] [CrossRef]
Gonzalez-Caceres, A.; Bobadilla, A.; Karlshøj, J. Implementing post-occupancy evaluation in social housing complemented with BIM: A case study in Chile. Build Environ. 2019, 158, 260–280. [Google Scholar] [CrossRef]
Schultz, C.S.; Jørgensen, K.; Bonke, S.; Rasmussen, G.M.G. Building defects in Danish construction: Project characteristics influencing the occurrence of defects at handover. Archit. Eng. Des. Manag. 2015, 11, 423–439. [Google Scholar] [CrossRef]
Josephson, P.-E.; Hammarlund, Y. The causes and costs of defects in construction. Automat. Constr. 1999, 8, 681–687. [Google Scholar] [CrossRef]
Grussing, M.N.; Liu, L.Y. Knowledge-based optimization of building maintenance, repair, and renovation activities to improve facility life cycle investments. J. Perform. Constr. Facil. 2014, 28, 539–548. [Google Scholar] [CrossRef]
Aljassmi, H.; Han, S.; Davis, S. Analysis of the Complex Mechanisms of Defect Generation in Construction Projects. J. Constr. Eng. Manag. 2015, 138, 51–60. [Google Scholar] [CrossRef]
Silge, J.; Robinson, D. Text Mining with R: A Tidy Approach; O’Reilly Media: Sebastopol, CA, USA, 2017. [Google Scholar]
Lane, H.; Howard, C.; Hapke, H.M. Natural Language Processing in Action Understanding, Analyzing, and Generating Text with Python; Manning: New York, NY, USA, 2019. [Google Scholar]
Fan, C.L. Defect risk assessment using a hybrid machine learning method. J. Constr. Eng. Manag. 2020, 146, 04020102. [Google Scholar] [CrossRef]
Géron, A. Hands-on Machine Learning with Scikit-Learn, Keras, and Tensor Flow: Concepts, Tools, and Techniques to Build Intelligent Systems; O’Reilly Media: Sebastopol, CA, USA, 2019. [Google Scholar]
James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning: With Applications in R, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2021. [Google Scholar]
Zhou, Z. Machine Learning; Springer Nature: Berlin/Heidelberg, Germany, 2021. [Google Scholar]
Breiman, L. Random Forests. In Machine Learning; Springer: Berlin/Heidelberg, Germany, 2001. [Google Scholar] [CrossRef]
Friedman, J.H. Stochastic gradient boosting. Comput. Stat. Data 2002, 38, 367–378. [Google Scholar] [CrossRef]
Bengio, Y.; Ducharme, R.; Vincent, P.; Jauvin, C. A neural probabilistic language model. J. Mach. Learn. Res. 2003, 3, 1137–1155. [Google Scholar]
Sadiq, S.; Umer, M.; Ullah, S.; Mirjalili, S.; Rupapara, V.; Nappi, M. Discrepancy detection between actual user reviews and numeric ratings of Google App store using deep learning. Expert Syst. Appl. 2021, 181, 115111. [Google Scholar] [CrossRef]
Tian, Z.; Wang, Y.; Song, Y.; Lee, D.; Zhao, Y.; Li, D.; Zhang, N.L. Empathetic and Emotionally Positive Conversation Systems with an Emotion-specific Query-Response Memory. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, United Arab Emirates, 7–11 December 2022; Association for Computational Linguistics: Stroudsburg, PA, USA, 2022; pp. 6364–6376. [Google Scholar]
Salama, D.M.; El-Gohary, N.M. Semantic Text Classification for Supporting Automated Compliance Checking in Construction. J. Comput. Civ. Eng. 2016, 30, 04014106. [Google Scholar] [CrossRef]
Moon, S.; Lee, G.; Chi, S.; Oh, H. Automated Construction Specification Review with Named Entity Recognition Using Natural Language Processing. J. Constr. Eng. Manag. 2021, 147, 04020147. [Google Scholar] [CrossRef]
Zhang, F.; Fleyeh, H.; Wang, X.; Lu, M. Construction site accident analysis using text mining and natural language processing techniques. Autom. Constr. 2019, 99, 238–248. [Google Scholar] [CrossRef]
Jordan, M.I.; Mitchell, T.M. Machine learning: Trends, perspectives, and prospects. J. Sci. 2015, 349, 255–260. [Google Scholar] [CrossRef]
Adam, M.; Wessel, M.; Benlian, A. AI-based chatbots in customer service and their effects on user compliance. Electron. Mark. 2021, 31, 427–445. [Google Scholar] [CrossRef]
Schmidt, A. Interactive Human Centered Artificial Intelligence: A Definition and Research Challenges. In Proceedings of the ACM International Conference Proceeding Series, Salerno, Italy, 28 September 2020. [Google Scholar] [CrossRef]
Peffers, K.; Tuunanen, T.; Rothenberger, M.A.; Chatterjee, S. A design science research methodology for information systems research. J. Manag. Inf. Syst. 2007, 24, 45–77. [Google Scholar] [CrossRef]
Van Aken, J.; Chandrasekaran, A.; Halman, J. Conducting and publishing design science research: Inaugural essay of the design science department of the Journal of Operations Management. J. Oper. Manag. 2016, 47, 1–8. [Google Scholar] [CrossRef]
March, S.T.; Smith, G.F. Design and natural science research on information technology. Decis. Support Syst. 1995, 5, 251–266. [Google Scholar] [CrossRef]
Kasanen, E.; Lukka, K.; Siitonen, A. The constructive approach in management accounting research. J. Manag. Account. Res. 1993, 5, 243–264. [Google Scholar] [CrossRef]
ISO (International Organization for Standardization). ISO 19208: Framework for Specifying Performance in Buildings; ISO: Geneva, Switzerland, 2016. [Google Scholar]
Berr, L.R. Evaluation Method of Construction Quality in Social Housing Units in the Use Step—Technical Analysis and Perception by Users. Ph.D. Thesis, Federal University of Rio Grande do Sul, Porto Alegre, Brazil, 2016. [Google Scholar]
R Core Team. R: A Language and Environment for Statistical Computing [R Software]; R Foundation for Statistical Computing; R Core Team: Vienna, Austria, 2021. [Google Scholar]
Bouchet-Valat, M.; SnowballC: Snowball Stemmers Based on the C ‘libstemmer’ UTF-8 Library. R Package Version 0.7.0. 2020. Available online: https://CRAN.R-project.org/package=SnowballC (accessed on 2 March 2021).
Benoit, K.; Watanabe, K.; Wang, H.; Nulty, P.; Obeng, A.; Müller, S.; Matsuo, A. Quanteda: An R package for the quantitative analysis of textual data. J. Open Source Softw. 2018, 3, 774. [Google Scholar] [CrossRef]
Kuhn, M.; Caret: Classification and Regression Training. R package version 6.0-86. 2020. Available online: https://CRAN.R-project.org/package=caret (accessed on 18 January 2021).
Gabrielatos, C. Keyness analysis: Nature, metrics and techniques. In Corpus Approaches to Discourse: A Critical Review; Routledge: Oxford, UK, 2018. [Google Scholar]
Kaur, P.; Gosain, A. Comparing the Behaviour of Oversampling and Undersampling Approach of Class Imbalance Learning by Combining Class Imbalance Problem with Noise; Springer: Berlin/Heidelberg, Germany, 2018. [Google Scholar]
Bortolini, R.; Forcada, N. Building Inspection System for Evaluating the Technical Performance of Existing Buildings. J. Perform. Constr. Facil. 2019, 32, 1–14. [Google Scholar] [CrossRef]

Figure 1. Research design.

Figure 2. Relationships between words.

Figure 3. Warranty service mapping.

Figure 4. Proposed model.

Figure 5. Defect Classification System.

Figure 6. Frequency analysis of (a) defect × location words and (b) defect × element words.

Figure 7. Word menu.

Figure 8. Complaint frequency of the building systems.

Figure 9. Keyness importance results.

Figure 10. Confusion matrices of (a) Naïve Bayes, (b) SVM, (c) RF, and (d) Gradient Boosting.

Figure 11. F1 score results.

Table 1. Warranty service team profile.

Company Position	Educational Background	Experience in the Construction Industry (Years)	Experience in the Warranty Service Department (Years)
Technician	Building Technology	6	4
Head of Department	Civil Engineering	4	2
Technician	Building Technology	8	1
Technician	Building Technology	6	6
Maintenance Manager	Civil Engineering	17	2

Table 2. Criteria for evaluating the solution.

Constructs	Criteria	Guiding Questions
Applicability	Ease of use	Is the model clear and easy to understand by users? Which skill level do users need to operate the model?
Applicability	Possibility of using the model in other contexts	How can different companies use the model?
Utility	Improvements in data collection	To what extent does the model increase the reliability and completeness of the data?
	Process automation	How much effort is necessary to collect and process complaint data?
	Contributions to quality management	Which type of information can be generated to provide feedback?

Table 3. Model performance.

Model	Runtime in the Training	Accuracy
Naïve Bayes	0.04 s	77.71%
SVM	5.24 m	77.95%
RF	47.32 m	81.68%
Gradient Boosting	4.59 h	82.89%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bazzan, J.; Echeveste, M.E.; Formoso, C.T.; Altenbernd, B.; Barbian, M.H. An Information Management Model for Addressing Residents’ Complaints through Artificial Intelligence Techniques. Buildings 2023, 13, 737. https://doi.org/10.3390/buildings13030737

AMA Style

Bazzan J, Echeveste ME, Formoso CT, Altenbernd B, Barbian MH. An Information Management Model for Addressing Residents’ Complaints through Artificial Intelligence Techniques. Buildings. 2023; 13(3):737. https://doi.org/10.3390/buildings13030737

Chicago/Turabian Style

Bazzan, Jordana, Márcia Elisa Echeveste, Carlos Torres Formoso, Bernardo Altenbernd, and Márcia Helena Barbian. 2023. "An Information Management Model for Addressing Residents’ Complaints through Artificial Intelligence Techniques" Buildings 13, no. 3: 737. https://doi.org/10.3390/buildings13030737

APA Style

Bazzan, J., Echeveste, M. E., Formoso, C. T., Altenbernd, B., & Barbian, M. H. (2023). An Information Management Model for Addressing Residents’ Complaints through Artificial Intelligence Techniques. Buildings, 13(3), 737. https://doi.org/10.3390/buildings13030737

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Information Management Model for Addressing Residents’ Complaints through Artificial Intelligence Techniques

Abstract

1. Introduction

2. Building Defects

3. Natural Language Processing and Machine Learning

4. Research Method

4.1. Process Mapping

4.2. Development of the Defect Classification System

4.3. Development of Word Menu

4.4. Training of Algorithmic Classifiers

4.5. Evaluation of Artifacts

5. Results

5.1. Assessment of the Existing Customer Complaint Service

5.2. Proposed Model

5.2.1. Defect Classification System

5.2.2. Word Menu

5.2.3. Recommendation System

6. Discussion

6.1. Solution Evaluation

6.2. Managerial Insights and Recommendations

6.3. Theoretical Contributions

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI