Author Contributions
Conceptualization, T.-H.T., Y.-T.L. and Y.-L.C.; methodology, Y.-L.C. and Y.-T.L.; software, Y.-T.L.; validation, M.A., T.-H.T. and Y.-L.C.; formal analysis, Y.-L.C. and M.A.; investigation, Y.-T.L. and M.A.; data curation, Y.-T.L.; writing—review and editing, T.-H.T., M.A. and Y.-L.C.; supervision, T.-H.T. and M.A. All authors have read and agreed to the published version of the manuscript.
Figure 1.
Flow chart of the proposed system.
Figure 1.
Flow chart of the proposed system.
Figure 2.
Simulated space configuration diagram.
Figure 2.
Simulated space configuration diagram.
Figure 3.
Multichannel Impulse Response Database measurement space configuration diagram [
38].
Figure 3.
Multichannel Impulse Response Database measurement space configuration diagram [
38].
Figure 4.
An example of the actual output of IPD where the color map represents the angle (°).
Figure 4.
An example of the actual output of IPD where the color map represents the angle (°).
Figure 5.
The architecture of CNN-R.
Figure 5.
The architecture of CNN-R.
Figure 6.
Performance evaluation criteria where yellow dots are the predicted values. (1) is high accuracy and high precision, which is the best scenario. (2) is low accuracy and high precision. (3) is high accuracy and low precision. (4) is low accuracy and low precision, which is the worst scenario.
Figure 6.
Performance evaluation criteria where yellow dots are the predicted values. (1) is high accuracy and high precision, which is the best scenario. (2) is low accuracy and high precision. (3) is high accuracy and low precision. (4) is low accuracy and low precision, which is the worst scenario.
Figure 7.
The average accuracy and MAE of (a) angle and (b) distance estimation by CNN-R in a single acoustic environment, where SNR = 10 dB, 20 dB, 30 dB, and RT60 = 0.16 s.
Figure 7.
The average accuracy and MAE of (a) angle and (b) distance estimation by CNN-R in a single acoustic environment, where SNR = 10 dB, 20 dB, 30 dB, and RT60 = 0.16 s.
Figure 8.
The average ACC and MAE of angle estimation by CNN-R in a multiple acoustic environment where SNR = 10 dB, 20 dB, and 30 dB, and (a) RT60 = 0.16 s, (b) RT60 = 0.36 s, and (c) RT60 = 0.61 s.
Figure 8.
The average ACC and MAE of angle estimation by CNN-R in a multiple acoustic environment where SNR = 10 dB, 20 dB, and 30 dB, and (a) RT60 = 0.16 s, (b) RT60 = 0.36 s, and (c) RT60 = 0.61 s.
Figure 9.
The average ACC and MAE of distance estimation by CNN-R in a multiple acoustic environment where SNR = 10 dB, 20 dB, and 30 dB, and (a) RT60 = 0.16 s, (b) RT60 = 0.36 s, and (c) RT60 = 0.61 s.
Figure 9.
The average ACC and MAE of distance estimation by CNN-R in a multiple acoustic environment where SNR = 10 dB, 20 dB, and 30 dB, and (a) RT60 = 0.16 s, (b) RT60 = 0.36 s, and (c) RT60 = 0.61 s.
Figure 10.
The training–validation loss curves of CNN-R, where (a) is the performance of CNN-R in a single acoustic environment. (b) is the performance of CNN-R in a multiple acoustic environment. (c) is the performance of CNN-R in a real acoustic environment.
Figure 10.
The training–validation loss curves of CNN-R, where (a) is the performance of CNN-R in a single acoustic environment. (b) is the performance of CNN-R in a multiple acoustic environment. (c) is the performance of CNN-R in a real acoustic environment.
Table 1.
The settings of the training option used in the single acoustic environment.
Table 1.
The settings of the training option used in the single acoustic environment.
Hyperparameters | Configurations |
---|
Optimizer | Adam |
Loss Function | MAE |
Learning Rate | 0.001 |
Decay | |
Execution Environment | GPU |
Batch Size | 64 |
Table 2.
A single acoustic environment configuration.
Table 2.
A single acoustic environment configuration.
| Training Set | Test Set |
---|
Room size (m2) | 5 × 5, 6 × 5, 6 × 7, 7 × 7 | 6 × 6 |
SNR (dB) | 0, 5 | 10, 20, 30 |
RT60 (s) | 0.16 | 0.16 |
Table 3.
Performance of angle estimation by CNN-R in a single acoustic environment.
Table 3.
Performance of angle estimation by CNN-R in a single acoustic environment.
| SNR = 10 dB | SNR = 20 dB | SNR = 30 dB |
---|
Angle (°)
| Acc. (%)
| MAE (°)
| Acc. (%)
| MAE (°)
| Acc. (%)
| MAE (°)
|
---|
0 | 71.00 | 15.67 | 90.50 | 4.28 | 99.00 | 1.68 |
15 | 87.50 | 3.13 | 98.00 | 1.43 | 100.00 | 0.91 |
30 | 83.50 | 5.01 | 94.00 | 2.08 | 97.00 | 1.66 |
45 | 88.50 | 4.32 | 97.00 | 1.60 | 100.00 | 0.67 |
60 | 99.00 | 1.02 | 100.00 | 0.54 | 100.00 | 0.57 |
75 | 100.00 | 0.53 | 100.00 | 0.32 | 100.00 | 0.34 |
90 | 100.00 | 0.47 | 100.00 | 0.22 | 100.00 | 0.18 |
105 | 99.50 | 0.69 | 100.00 | 0.43 | 100.00 | 0.53 |
120 | 100.00 | 0.72 | 100.00 | 0.40 | 100.00 | 0.40 |
135 | 85.50 | 5.27 | 97.50 | 1.49 | 99.50 | 0.52 |
150 | 82.00 | 4.59 | 91.50 | 2.46 | 97.50 | 1.62 |
165 | 94.50 | 2.95 | 99.50 | 0.86 | 100.00 | 0.56 |
180 | 80.00 | 7.85 | 97.00 | 1.53 | 99.50 | 0.81 |
Average | 90.08 | 4.02 | 97.31 | 1.36 | 99.42 | 0.8 |
Table 4.
Performance of distance estimation by CNN-R in a single acoustic environment.
Table 4.
Performance of distance estimation by CNN-R in a single acoustic environment.
| SNR = 10 dB | SNR = 20 dB | SNR = 30 dB |
---|
Distance (m)
| Acc. (%)
| MAE (m)
| Acc. (%)
| MAE (m)
| Acc. (%)
| MAE (m)
|
---|
1 | 87.08 | 0.25 | 95.69 | 0.18 | 96.92 | 0.17 |
2 | 94.15 | 0.21 | 96.08 | 0.18 | 95.31 | 0.19 |
Average | 90.62 | 0.23 | 95.88 | 0.18 | 96.12 | 0.18 |
Table 5.
A multiple acoustic environment configuration.
Table 5.
A multiple acoustic environment configuration.
| Training Set | Testing Set |
---|
Room size (m) | 5 × 5, 6 × 5, 6 × 7, 7 × 7 | 6 × 6 |
SNR (dB) | 0, 5, 10 | 10, 20, 30 |
RT60 (s) | 0.16, 0.36, 0.61 | 0.16, 0.36, 0.61 |
Table 6.
Performance of angle estimation by CNN-R in a multiple acoustic environment at SNRs of 10, 20, and 30 dB, respectively.
Table 6.
Performance of angle estimation by CNN-R in a multiple acoustic environment at SNRs of 10, 20, and 30 dB, respectively.
| | RT60 = 0.16 s | RT60 = 0.36 s | RT60 = 0.61 s |
---|
Angle (°)
| SNR
| Acc. (%)
| MAE (°)
| Acc. (%)
| MAE (°)
| Acc. (%)
| MAE (°)
|
---|
0 | 10 | 80.50 | 05.32 | 85.00 | 04.32 | 59.50 | 10.69 |
20 | 93.00 | 02.66 | 97.50 | 02.69 | 81.50 | 05.57 |
30 | 99.00 | 02.39 | 99.50 | 02.02 | 87.00 | 04.29 |
15 | 10 | 96.00 | 02.82 | 80.00 | 05.17 | 31.50 | 17.70 |
20 | 97.50 | 02.67 | 82.50 | 04.38 | 35.50 | 18.60 |
30 | 99.50 | 02.15 | 93.50 | 03.88 | 44.00 | 15.74 |
30 | 10 | 91.00 | 03.24 | 58.00 | 08.55 | 44.50 | 21.34 |
20 | 96.50 | 02.47 | 61.00 | 08.31 | 48.50 | 24.91 |
30 | 96.00 | 02.19 | 63.50 | 06.54 | 48.50 | 25.24 |
45 | 10 | 89.50 | 03.45 | 79.50 | 06.06 | 45.00 | 19.54 |
20 | 96.50 | 02.16 | 93.50 | 03.16 | 58.50 | 14.38 |
30 | 99.50 | 01.56 | 95.50 | 02.56 | 65.00 | 12.35 |
60 | 10 | 86.50 | 03.90 | 80.00 | 04.92 | 60.50 | 08.24 |
20 | 98.00 | 02.17 | 94.50 | 03.63 | 63.50 | 07.24 |
30 | 100.0 | 01.56 | 97.00 | 03.43 | 60.50 | 06.92 |
75 | 10 | 95.50 | 03.10 | 80.59 | 05.05 | 71.00 | 05.89 |
20 | 100.0 | 01.57 | 98.00 | 02.24 | 81.00 | 04.80 |
30 | 100.0 | 01.16 | 96.50 | 02.00 | 81.00 | 04.74 |
90 | 10 | 100.0 | 00.85 | 99.50 | 00.95 | 100.0 | 00.78 |
20 | 100.0 | 00.52 | 100.0 | 00.50 | 100.0 | 00.52 |
30 | 100.0 | 00.51 | 100.0 | 00.44 | 100.0 | 00.41 |
105 | 10 | 98.50 | 01.66 | 84.50 | 04.14 | 58.00 | 09.04 |
20 | 99.50 | 01.02 | 97.50 | 01.79 | 61.50 | 08.28 |
30 | 100.0 | 01.06 | 99.50 | 01.59 | 71.50 | 06.19 |
120 | 10 | 87.00 | 03.73 | 76.50 | 07.40 | 43.00 | 14.05 |
20 | 98.00 | 01.81 | 84.00 | 04.59 | 49.50 | 10.04 |
30 | 99.50 | 01.37 | 95.50 | 02.56 | 51.50 | 08.14 |
135 | 10 | 83.50 | 03.80 | 64.00 | 09.93 | 29.00 | 32.05 |
20 | 95.00 | 02.19 | 79.00 | 07.12 | 44.50 | 24.60 |
30 | 99.50 | 01.38 | 81.00 | 04.95 | 49.50 | 19.69 |
150 | 10 | 93.50 | 02.97 | 88.50 | 03.45 | 58.00 | 11.81 |
20 | 95.50 | 02.46 | 94.00 | 02.50 | 61.50 | 11.39 |
30 | 99.50 | 01.99 | 97.00 | 02.07 | 65.00 | 10.48 |
165 | 10 | 98.00 | 02.62 | 77.00 | 05.04 | 60.50 | 13.61 |
20 | 100.0 | 01.84 | 75.50 | 05.01 | 45.50 | 10.05 |
30 | 100.0 | 01.79 | 81.00 | 04.76 | 51.50 | 07.10 |
180 | 10 | 72.00 | 06.42 | 69.00 | 08.39 | 33.50 | 22.87 |
20 | 98.50 | 02.31 | 85.00 | 04.27 | 52.00 | 17.15 |
30 | 94.00 | 02.85 | 87.50 | 03.80 | 56.50 | 13.37 |
Average | 10 | 90.12 | 03.37 | 78.62 | 05.64 | 53.38 | 14.43 |
20 | 97.54 | 01.99 | 87.85 | 03.86 | 60.23 | 12.12 |
30 | 98.96 | 01.69 | 91.31 | 03.12 | 63.96 | 10.36 |
Table 7.
Performance of distance estimation by CNN-R in a multiple acoustic environment at SNRs of 10, 20, and 30 dB, respectively.
Table 7.
Performance of distance estimation by CNN-R in a multiple acoustic environment at SNRs of 10, 20, and 30 dB, respectively.
| | RT60 = 0.16 s | RT60 = 0.36 s | RT60 = 0.61 s |
---|
Distance (m)
| SNR
| Acc. (%)
| MAE
| Acc. (%)
| MAE
| Acc. (%)
| MAE
|
---|
1 | 10 | 86.00 | 00.26 | 84.31 | 00.29 | 76.46 | 00.34 |
20 | 96.77 | 00.17 | 94.77 | 00.20 | 85.08 | 00.27 |
30 | 98.46 | 00.16 | 98.38 | 00.16 | 89.08 | 00.24 |
2 | 10 | 92.00 | 00.24 | 92.62 | 00.23 | 85.92 | 00.26 |
20 | 95.92 | 00.19 | 96.38 | 00.18 | 85.69 | 00.26 |
30 | 98.15 | 00.18 | 97.69 | 00.18 | 82.31 | 00.30 |
Average | 10 | 89.00 | 00.25 | 88.46 | 00.26 | 81.19 | 00.30 |
20 | 96.35 | 00.18 | 95.58 | 00.19 | 85.38 | 00.27 |
30 | 98.31 | 00.17 | 98.04 | 00.17 | 85.69 | 00.27 |
Table 8.
The real acoustic environment configuration.
Table 8.
The real acoustic environment configuration.
| Training Set | Test Set |
---|
Room size (m) | 6 × 6 | 6 × 6 |
SNR (dB) | 0, 5, 10 | 10, 20, 30 |
RT60 (s) | 0.16, 0.36, 0.61 | 0.16, 0.36, 0.61 |
Table 9.
Performance of distance estimation by CNN-R in a real acoustic environment at SNRs of 10, 20, and 30 dB, respectively.
Table 9.
Performance of distance estimation by CNN-R in a real acoustic environment at SNRs of 10, 20, and 30 dB, respectively.
| | RT60 = 0.16 s | RT60 = 0.36 s | RT60 = 0.61 s |
---|
Distance (m)
| SNR
| Acc. (%)
| MAE (°)
| Acc. (%)
| MAE (°)
| Acc. (%)
| MAE (°)
|
---|
1 | 10 | 88.08 | 00.25 | 95.31 | 00.20 | 94.62 | 00.21 |
20 | 98.54 | 00.15 | 99.31 | 00.14 | 99.23 | 00.14 |
30 | 99.92 | 00.14 | 99.92 | 00.13 | 99.85 | 00.13 |
2 | 10 | 90.23 | 00.25 | 89.46 | 00.24 | 92.23 | 00.23 |
20 | 97.62 | 00.17 | 97.85 | 00.16 | 97.31 | 00.16 |
30 | 98.85 | 00.15 | 99.69 | 00.14 | 99.15 | 00.15 |
Average | 10 | 89.15 | 00.25 | 92.38 | 00.22 | 93.42 | 00.22 |
20 | 98.08 | 00.16 | 98.58 | 00.15 | 98.27 | 00.15 |
30 | 99.38 | 00.14 | 99.81 | 00.13 | 99.50 | 00.14 |
Table 10.
Performance of angle estimation by CNN-R in a real acoustic environment at SNRs of 10, 20, and 30 dB, and RT60 = 0.16 s, 0.36 s, and 0.61 s.
Table 10.
Performance of angle estimation by CNN-R in a real acoustic environment at SNRs of 10, 20, and 30 dB, and RT60 = 0.16 s, 0.36 s, and 0.61 s.
| | RT60 = 0.16 s | RT60 = 0.36 s | RT60 = 0.61 s |
---|
Angle (°)
| SNR
| Acc. (%)
| MAE (°)
| Acc. (%)
| MAE (°)
| Acc. (%)
| MAE (°)
|
---|
0 | 10 | 88.00 | 04.05 | 91.50 | 04.03 | 79.00 | 10.15 |
20 | 98.50 | 01.66 | 99.50 | 01.28 | 97.00 | 02.62 |
30 | 100.0 | 00.79 | 99.50 | 00.77 | 99.50 | 01.53 |
15 | 10 | 80.00 | 04.62 | 89.00 | 05.08 | 92.50 | 03.40 |
20 | 95.00 | 02.36 | 95.00 | 02.48 | 99.00 | 01.36 |
30 | 99.00 | 01.29 | 99.50 | 01.08 | 99.50 | 00.97 |
30 | 10 | 95.00 | 02.35 | 81.00 | 10.36 | 77.50 | 12.64 |
20 | 99.50 | 00.56 | 97.50 | 01.81 | 95.00 | 02.08 |
30 | 100.0 | 00.37 | 100.0 | 00.44 | 99.00 | 00.55 |
45 | 10 | 87.00 | 04.49 | 91.50 | 03.08 | 80.50 | 07.85 |
20 | 99.50 | 00.64 | 99.50 | 00.46 | 97.50 | 00.78 |
30 | 100.0 | 00.32 | 100.0 | 00.27 | 100.0 | 00.30 |
60 | 10 | 86.00 | 05.87 | 86.00 | 06.12 | 83.00 | 07.33 |
20 | 99.50 | 00.54 | 99.50 | 00.46 | 98.50 | 00.88 |
30 | 100.0 | 00.28 | 100.0 | 00.29 | 100.0 | 00.27 |
75 | 10 | 100.0 | 01.46 | 78.50 | 10.97 | 82.50 | 07.85 |
20 | 100.0 | 00.38 | 98.00 | 01.52 | 99.00 | 00.90 |
30 | 100.0 | 00.32 | 96.50 | 00.40 | 100.0 | 00.35 |
90 | 10 | 88.00 | 04.29 | 85.00 | 05.03 | 72.50 | 12.51 |
20 | 99.50 | 00.81 | 99.00 | 00.77 | 93.50 | 02.50 |
30 | 100.0 | 00.38 | 100.0 | 00.40 | 99.00 | 00.54 |
105 | 10 | 86.00 | 05.98 | 78.50 | 11.36 | 61.50 | 16.93 |
20 | 98.50 | 00.82 | 97.50 | 01.49 | 95.50 | 01.78 |
30 | 100.0 | 00.28 | 100.0 | 00.26 | 100.0 | 00.32 |
120 | 10 | 93.00 | 02.74 | 86.00 | 04.12 | 78.00 | 10.82 |
20 | 98.50 | 00.58 | 98.00 | 01.45 | 96.50 | 01.65 |
30 | 100.0 | 00.21 | 100.0 | 00.23 | 100.0 | 00.29 |
135 | 10 | 85.50 | 05.80 | 87.50 | 05.43 | 82.50 | 06.41 |
20 | 98.50 | 00.81 | 98.00 | 01.04 | 98.50 | 01.17 |
30 | 100.0 | 00.26 | 100.0 | 00.40 | 100.0 | 00.48 |
150 | 10 | 82.50 | 08.74 | 91.00 | 02.90 | 78.50 | 09.86 |
20 | 94.00 | 02.92 | 100.0 | 00.55 | 92.50 | 02.60 |
30 | 99.00 | 00.55 | 100.0 | 00.29 | 99.50 | 00.38 |
165 | 10 | 92.00 | 02.87 | 91.00 | 03.02 | 77.50 | 09.77 |
20 | 99.50 | 00.91 | 99.50 | 01.01 | 97.00 | 01.41 |
30 | 100.0 | 00.53 | 99.50 | 00.68 | 100.0 | 00.59 |
180 | 10 | 76.50 | 05.84 | 81.50 | 05.99 | 82.00 | 09.24 |
20 | 92.50 | 02.37 | 99.00 | 01.02 | 97.50 | 01.61 |
30 | 100.0 | 00.72 | 100.0 | 00.45 | 100.0 | 01.12 |
Average | 10 | 87.38 | 04.55 | 86.00 | 06.02 | 79.04 | 09.06 |
20 | 97.92 | 01.18 | 98.46 | 01.18 | 96.69 | 01.64 |
30 | 98.85 | 00.48 | 99.85 | 00.46 | 99.73 | 00.59 |
Table 11.
The average Acc. and MAE of angle and distance estimation by CNN-R in a real acoustic environment where SNR = 10 dB, 20 dB, and 30 dB, and RT60 = 0.16 s, 0.36 s, and 0.61 s, respectively.
Table 11.
The average Acc. and MAE of angle and distance estimation by CNN-R in a real acoustic environment where SNR = 10 dB, 20 dB, and 30 dB, and RT60 = 0.16 s, 0.36 s, and 0.61 s, respectively.
| Angle | Distance |
---|
RT60 (s)
| SNR (dB)
| Acc. (%)
| MAE (°)
| SNR (dB)
| Acc. (%)
| MAE (m)
|
---|
0.16 | 10 | 87.38 | 04.55 | 10 | 89.15 | 00.28 |
20 | 97.92 | 01.18 | 20 | 98.08 | 00.16 |
30 | 99.85 | 00.48 | 30 | 99.38 | 00.14 |
0.36 | 10 | 86.00 | 06.02 | 10 | 92.38 | 00.22 |
20 | 98.46 | 01.18 | 20 | 98.58 | 00.15 |
30 | 99.85 | 00.46 | 30 | 99.81 | 00.13 |
0.61 | 10 | 79.04 | 09.60 | 10 | 93.42 | 00.22 |
20 | 96.69 | 01.64 | 20 | 98.27 | 00.15 |
30 | 99.73 | 00.59 | 30 | 99.50 | 00.14 |
Table 12.
Comparative results of angle and distance estimation based on the multi-channel impulse response database in a real acoustic environment at SNR = 30 dB and RT60 = 0.16 s.
Table 12.
Comparative results of angle and distance estimation based on the multi-channel impulse response database in a real acoustic environment at SNR = 30 dB and RT60 = 0.16 s.
Method | Average Angle (0–180°) Acc. | Average Distance (1–2 m) Acc. |
---|
CNN-SL [32] | 90.25% | 88.85% |
CRNN [34] | 87.37% | 85.64% |
CNN [35] | 98.51% | 98.09% |
TF-CNN [36] | 95.18% | 94.66% |
CNN-R | 99.85% | 99.38% |