1. Introduction
Noise, which refers to unwanted sounds, has become a substantial environmental problem around the world that impacts human health [
1]. Noise may not only cause auditory problems, but may also have non-auditory effects on health [
1]. Specifically, excessive noise exposure has been demonstrated to be associated with sleep problems, cognitive impairment, cardiovascular diseases, and some metabolic diseases [
2,
3,
4]. However, individuals may experience different effects from noise due to their differences in sensitivity to noise. People who were more sensitive to noise would be more annoyed by noise or be more vulnerable to non-auditory health effects [
5]. Hence, noise sensitivity may moderate the impact of noise on health. Indeed, it has been suggested that it is noise sensitivity that influences individual reactions to noise instead of noise exposure level [
6]. Although it was assumed that people who were sensitive to noise were also sensitive to other environment issues such as odor [
7], studies showed that noise sensitivity was different from other sensitivities [
8]. For instance, neuroticism and smoking were demonstrated to be associated with noise sensitivity, while chemical sensitivity was in correlation with allergies and alcohol use [
8]. Therefore, the independent assessment of noise sensitivity is necessary in epidemiological or interventional studies on the impact of noise on health.
The Weinstein Noise Sensitivity Scale (WNSS) is one of the most frequently used instruments for measuring noise sensitivity. Similar to other noise perception-related protocols, such as the International Organization for Standardization Technical Specifications report protocol which has undergone rigorous translations into 15 different languages [
9], the WNSS has also been rigorously translated and tested in Swedish [
10], German [
11], Persian [
12], Japanese [
13], Italian [
14], simplified Chinese [
15], and traditional Chinese [
16]. Despite the original WNSS being a 21-item unidimensional scale with each item rated on a 6-point Likert scale, multi-dimensional structures were identified in some translated versions. For instance, the Italian version showed two bipolar factors comprising the positively worded and the negatively worded items, respectively [
14]. Moreover, a four-factor model was identified from the Persian version [
12]. Nevertheless, the traditional Chinese version showed a unidimensional structure but had removed three items that did not fit well with the other items, resulting in an 18-item scale [
16]. However, both the original 21-item version and the 18-item traditional Chinese might be too long to be incorporated in epidemiological studies. Therefore, a short form of only five items (short form of Weinstein Noise Sensitivity Scale; NSS-SF) was developed [
17], which has been translated into Bulgarian and simplified Chinese [
18,
19]. However, the NSS-SF was derived from exploratory factor analysis (EFA), without thorough assessment of its adequacy when compared with the full 21-item version except for the total score correlation.
Classical test theory (CTT) and item response theory (IRT) are currently the two most popular methods for shortening scales. Under CTT, the observed score is assumed to be true with no errors in measurement, which is usually unrealistic [
20]. Moreover, CTT focuses on assessment at the scale level, and establishes scale properties dependent on the sample. In contrast, IRT emphasizes the item level and establishes measurement properties independent of the sample [
21]. Therefore, IRT has gained recent popularity. However, the selection of items remained subjective. Recently, the optimal test assembly (OTA) procedure was applied in patient-reported outcome measures (PROMs) for selecting the set of items that best resembles a collection of measurement properties of the full version [
22]. Under specific constraints, e.g., number of items, it iteratively searches for the best set of items that optimize a specific objective, e.g., maximizing test information. Thus, OTA can optimize the attributes of a short test compared with the original test [
23]. The OTA procedure has been shown to be able to produce reliable, replicable, and reproducible short versions with minimal length based on pre-specified and objective procedures [
24].
To our knowledge, there is no short form of the traditional Chinese WNSS, and the current short forms of the WNSS have not been assessed by IRT or OTA. Therefore, this study aimed to obtain a short form of the traditional Chinese WNSS through an OTA procedure based on IRT, and to compare the performance of the obtained short form with the NSS-SF including reliability, validity, and test information.
4. Discussion
This is the first study that used OTA methodology to obtain a short form of the WNSS for assessing noise sensitivity. The new WNSS-8 showed the best performance when considering internal consistency, correlation of summed scores, correlation of factor scores, convergent validity, construct validity and test information.
The EFA revealed a negative factor loading for Item 12. As it is counter to the hypothesized direction of effect of the item, we decided to remove Item 12 from the OTA procedure. This should not greatly impact the results as the item information for Item 12 was the smallest, which means that Item 12 contributes the least for measuring the latent trait level [
40]. The 18-item traditional Chinese version also has this item removed due to the small factor loading and communality [
16]. Item 12 asked “It wouldn’t bother me to hear the sounds of everyday living from neighbors (footsteps, running water, etc.).” It was reported that only 6% of the residents in Hong Kong rated neighborhood noise as annoying compared with a percentage of 55% for traffic noise [
41]. A previous study proposed that the apartment units in Hong Kong are usually separated by concrete walls and floors, and most people would not hear neighborhood noise such as the footsteps and running water [
16]. Hence, people may react less to neighborhood noise and consider neighborhood noise not bothersome. Therefore, this item might not be applicable in a Hong Kong community setting. Moreover, the discrimination parameter for Item 12 was very low with a value of 0.123. This indicated that Item 12 might be unable to discriminate people with different levels of the latent trait [
42]. In addition, the low information of the Item 12 indicated low precision and more measurement error of this item [
25]. Hence, individuals with low trait level might score similarly or higher than those with high trait levels which induced the problematic performance of Item 12.
The iterative Wald test approach employed in this study has been demonstrated to reduce Type I and Type II errors [
29]. The iterative Wald test approach identified the gender related DIF on Item 5 which asked “I am easily awakened by noise”. A previous study indicated that women had more awakenings and more awake time after sleep onset [
43]. Therefore, women and men may not share the same norm in responding to this item even if they share similar sensitivity to noise. Of note, research on this aspect is quite limited, which calls for more studies investigating the role of gender on noise sensitivity.
The convergent validity and construct validity of the WNSS-8 and the NSS-SF were similar. For reliability, a value of Cronbach’s alpha greater than 0.75 was suggested [
44]. We set keeping 95% of the Cronbach’s alpha of the non-DIF version, which held a value of 0.81, as one of the rules for item selection. Using the training sample, the values of Cronbach’s alpha for the WNSS-8 and the NSS-SF were 0.83 and 0.67, respectively. Moreover, the Cronbach’s alpha for the NSS-SF with the test sample was 0.72. Hence, the Cronbach’s alpha of the NSS-SF was not adequate enough. Furthermore, the concurrent validity could be demonstrated by the scale scores’ correlation, which ranges from −1 to 1 [
45]. A greater coefficient in absolute value indicates higher concurrent validity. The correlation of the summed scores of the WNSS-8 and the NSS-SF with the non-DIF version were 0.901 and 0.867, respectively. We proposed 0.90 as the criteria since a value greater than 0.90 indicates very high correlation [
46]. Therefore, the concurrent validity of the NSS-SF was less adequate than that of WNSS-8.
The WNSS-8 and the NSS-SF retained 73.1% and 43.2% of the test information compared with the original 21-item version over the entire ability range, respectively, using the training sample. The removal of another three items induced 30% reduction of the test information. The comparison of the two short scales revealed similar results in the test sample. In view that higher test information represents higher accuracy of estimating the latent trait level, we proposed the WNSS-8 as a better short version [
32]. Despite that there are no standard criteria for discrimination parameter, items with low discrimination, such as <0.4, were reported to have lower ability for differentiating the latent trait levels, carry smaller amounts of information, and are less able to reduce the estimation error [
47]. The discrimination parameters for the items of the WNSS-8 ranged from 0.587 to 1.775, corresponding to moderate-to-high discrimination for assessment of noise sensitivity. Furthermore, the test information curves of the three scales demonstrated that the WNSS-8 more resembled the shape of the original full scale, which indicated that the WNSS-8 holds the similar ability for measuring noise sensitivity around the same latent trait level [
48].
The result obtained from OTA, which uses the pre-specified criteria for conducting item selection, is replicable and reproducible [
24]. We believe that OTA will show its value in shortening PROs for effective epidemiological research due to the burden caused by several and long PROs in surveys. However, there are also some limitations worth noting. The pre-specified criteria could be subjective to some extent. This study set 95% of the reliability and 90% of the correlation in view of the suggestive Cronbach’s alpha and a very high correlation indicated by a value greater than 0.9, which may highly resemble the original scale; other settings could be employed such as if the original Cronbach’s alpha was very high. Second, DIF by other characteristics such as age and responsiveness could be studied. Third, the convergent validity was low in this study. Testing convergent validity by the agreement with other noise sensitivity scales would be desirable in future studies.