Agent Productivity Modeling in a Call Center Domain Using Attentive Convolutional Neural Networks
Abstract
:1. Introduction
2. Related Work
3. The Proposed Framework
3.1. CNNs and BiLSTMs
3.2. Attention Layer
- The modeling layers are indicated in solid gray.
- The attention layer is the dotted box, including the circles that represent the calculation of the attention weights. The Softmax function calculates the attention weights and generates the context vector C.
- The context vector is fed into a dense layer with a activation function.
- The output layer is the logit regression (sigmoid) function for the segment classification.
4. The Data
5. The Experiment
6. The Results
6.1. CNNs-BiLSTMs vs. CNNs Accuracy
6.2. The Attention Layer Effect
6.3. The Attention Layer and Most Informative Frames
- Stuttering "Umm Ahh": This is the common attention for nonproductive calls, which is repeated during the call with an average duration from one to two seconds.
- The tone level: The high tone triggers the high attention for productive calls. The proper tone level is an important factor in call centers that indicates the wakefulness and enthusiasm of the agent. The primary reason for a frustrated customer is the insincere tone of voice from the person handling the query [37].
6.4. The Speech vs. Text Classification
7. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Breuer, K.; Nieken, P.; Sliwka, D. Social ties and subjective performance evaluations: An empirical investigation. Rev. Manag. Sci. 2013, 7, 141–157. [Google Scholar] [CrossRef]
- Dhanpat, N.; Modau, F.D.; Lugisani, P.; Mabojane, R.; Phiri, M. Exploring employee retention and intention to leave within a call center. SA J. Hum. Resour. Manag. 2018, 16, 1–13. [Google Scholar] [CrossRef]
- Frederiksen, A.; Lange, F.; Kriechel, B. Subjective performance evaluations and employee careers. J. Econ. Behav. Organ. 2017, 134, 408–429. [Google Scholar] [CrossRef] [Green Version]
- Gonzalez-Benito, O.; Gonzalez-Benito, J. Cultural vs. operational market orientation and objective vs. subjective performance: Perspective of production and operations. Ind. Mark. Manag. 2005, 34, 797–829. [Google Scholar] [CrossRef]
- Echchakoui, S.; Baakil, D. Emotional Exhaustion in Offshore Call Centers: A Comparative Study. J. Glob. Mark. 2019, 32, 17–36. [Google Scholar] [CrossRef]
- Cleveland, B. Call Center Management on Fast Forward: Succeeding in the New Era of Customer Relationships; ICMI Press: Colorado Springs, CO, USA, 2012. [Google Scholar]
- Sonnentag, S.; Frese, M. Performance concepts and performance theory. Psychol. Manag. Individ. Perform. 2002, 23, 3–25. [Google Scholar]
- Ahmed, A.; Hifny, Y.; Toral, S.; Shaalan, K. A Call Center Agent Productivity Modeling Using Discriminative Approaches. In Intelligent Natural Language Processing: Trends and Applications; Book Section 1; Springer: Berlin/Heidelberg, Germany, 2018; pp. 501–520. [Google Scholar]
- Ahmed, A.; Toral, S.; Shaalan, K. Agent productivity measurement in call center using machine learning. In Proceedings of the International Conference on Advanced Intelligent Systems and Informatics, Cairo, Egypt, 24–26 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 160–169. [Google Scholar]
- Ahmed, A.; Hifny, Y.; Shaalan, K.; Toral, S. End-to-End Lexicon Free Arabic Speech Recognition Using Recurrent Neural Networks. Comput. Linguist. Speech Image Process. Arab. Lang. 2018, 4, 231. [Google Scholar]
- Dave, N. Feature extraction methods LPC, PLP and MFCC in speech recognition. Int. J. Adv. Res. Eng. Technol. 2013, 1, 1–4. [Google Scholar]
- Sainath, T.N.; Vinyals, O.; Senior, A.; Sak, H. Convolutional, long short-term memory, fully connected deep neural networks. In Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, QLD, Australia, 19–24 April 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 4580–4584. [Google Scholar]
- Cai, M.; Liu, J. Maxout neurons for deep convolutional and LSTM neural networks in speech recognition. Speech Commun. 2016, 77, 53–64. [Google Scholar] [CrossRef]
- Mishne, G.; Carmel, D.; Hoory, R.; Roytman, A.; Soffer, A. Automatic analysis of call-center conversations. In Proceedings of the 14th ACM International Conference on Information and knowledge Management, Bremen, Germany, 31 October 2005; ACM: New York, NY, USA, 2005; pp. 453–459. [Google Scholar]
- Valle, M.A.; Varas, S.; Ruz, G.A. Job performance prediction in a call center using a naive Bayes classifier. Expert Syst. Appl. 2012, 39, 9939–9945. [Google Scholar] [CrossRef]
- Ahmed, A.; Hifny, Y.; Shaalan, K.; Toral, S. Lexicon free Arabic speech recognition recipe. In Proceedings of the International Conference on Advanced Intelligent Systems and Informatics, Cairo, Egypt, 24–26 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 147–159. [Google Scholar]
- Bae, S.M.; Ha, S.H.; Park, S.C. A web-based system for analyzing the voices of call center customers in the service industry. Expert Syst. Appl. 2005, 28, 29–41. [Google Scholar] [CrossRef]
- Karakus, B.; Aydin, G. Call center performance evaluation using big data analytics. In Proceedings of the 2016 International Symposium on Networks, Computers and Communications (ISNCC), Yasmine Hammamet, Tunisia, 11–13 May 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1–6. [Google Scholar]
- Perera, K.N.N.; Priyadarshana, Y.; Gunathunga, K.; Ranathunga, L.; Karunarathne, P.; Thanthriwatta, T. Automatic Evaluation Software for Contact Centre Agents’ voice Handling Performance. Int. J. Sci. Res. Publ. 2019, 5, 1–8. [Google Scholar]
- Sudarsan, V.; Kumar, G. Voice call analytics using natural language processing. Int. J. Stat. Appl. Math. 2019, 4, 133–136. [Google Scholar]
- Neumann, M.; Vu, N.T. Attentive convolutional neural network based speech emotion recognition: A study on the impact of input features, signal length, and acted speech. arXiv 2017, arXiv:1706.00612. [Google Scholar]
- Hifny, Y.; Ali, A. Efficient Arabic Emotion Recognition Using Deep Neural Networks. In Proceedings of the ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 6710–6714. [Google Scholar]
- Cho, J.; Pappagari, R.; Kulkarni, P.; Villalba, J.; Carmiel, Y.; Dehak, N. Deep neural networks for emotion recognition combining audio and transcripts. arXiv 2019, arXiv:1911.00432. [Google Scholar]
- Trigeorgis, G.; Ringeval, F.; Brueckner, R.; Marchi, E.; Nicolaou, M.A.; Schuller, B.; Zafeiriou, S. Adieu features? end-to-end speech emotion recognition using a deep convolutional recurrent network. In Proceedings of the 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP), Shanghai, China, 20–25 March 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 5200–5204. [Google Scholar]
- Zhang, M.; Yang, Y.; Ji, Y.; Xie, N.; Shen, F. Recurrent attention network using spatial-temporal relations for action recognition. Signal Process. 2018, 145, 137–145. [Google Scholar] [CrossRef]
- Broux, P.A.; Desnous, F.; Larcher, A.; Petitrenaud, S.; Carrive, J.; Meignier, S. S4D: Speaker Diarization Toolkit in Python. In Proceedings of the Interspeech 2018, Hyderabad, India, 2–6 September 2018. [Google Scholar]
- Ai, O.C.; Hariharan, M.; Yaacob, S.; Chee, L.S. Classification of speech dysfluencies with MFCC and LPCC features. Expert Syst. Appl. 2012, 39, 2157–2165. [Google Scholar]
- Jothilakshmi, S.; Ramalingam, V.; Palanivel, S. Unsupervised speaker segmentation with residual phase and MFCC features. Expert Syst. Appl. 2009, 36, 9799–9804. [Google Scholar] [CrossRef]
- Palaz, D.; Magimai, M.; Collobert, R. Analysis of CNN-Based Speech Recognition System Using Raw Speech as Input. In Proceedings of the 16th Annual Conference of International Speech Communication Association (Interspeech), Dresden, Germany, 6–10 September 2015; pp. 11–15. [Google Scholar]
- Hochreiter, S. The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 1998, 6, 107–116. [Google Scholar] [CrossRef] [Green Version]
- Graves, A.; Schmidhuber, J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 2005, 18, 602–610. [Google Scholar] [CrossRef]
- Gehring, J.; Auli, M.; Grangier, D.; Yarats, D.; Dauphin, Y.N. Convolutional sequence to sequence learning. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, Sydney, Australia, 6–11 August 2017; pp. 1243–1252. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems; NIPS: Long Beach, CA, USA, 2017; pp. 5998–6008. [Google Scholar]
- Norouzian, A.; Mazoure, B.; Connolly, D.; Willett, D. Exploring attention mechanism for acoustic-based classification of speech utterances into system-directed and non-system-directed. In Proceedings of the ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 7310–7314. [Google Scholar]
- Hayes, A.F.; Krippendorff, K. Answering the call for a standard reliability measure for coding data. Commun. Methods Meas. 2007, 1, 77–89. [Google Scholar] [CrossRef]
- Bogdanov, D.; Wack, N.; Gómez Gutiérrez, E.; Gulati, S.; Boyer, H.; Mayor, O.; Roma Trepat, G.; Salamon, J.; Zapata González, J.R.; Serra, X.; et al. Essentia: An audio analysis library for music information retrieval. In Proceedings of the 14th Conference of the International Society for Music Information Retrieval (ISMIR), Curitiba, Brazil, 4–8 November 2013; Britto, A., Gouyon, F., Dixon, S., Eds.; International Society for Music Information Retrieval (ISMIR): Ottawa, ON, Canada, 2013; pp. 493–498. [Google Scholar]
- Deery, S.; Iverson, R.; Walsh, J. Work relationships in telephone call centers: Understanding emotional exhaustion and employee withdrawal. J. Manag. Stud. 2002, 39, 471–496. [Google Scholar] [CrossRef]
Model Type (Filters-Units) | ||||
---|---|---|---|---|
Layer | CNNs | CNNs-Att | CNNs-BiLSTMs | CNNs-BiLSTMs-Att |
Input | 13 | 13 | 13 | 13 |
CNNs-1 | 500 | 500 | 256 | 256 |
Max Pooling-1 | - | - | 256 | 256 |
CNNs-2 | 500 | 500 | 64 | 64 |
Max Pooling-2 | - | - | 64 | 64 |
CNNs-3 | 500 | 500 | - | - |
CNNs-4 | 500 | 500 | - | - |
BiLSTMs-1 | - | - | 128 | 128 |
BiLSTMs-2 | - | - | 128 | 128 |
Attention | - | 500 | - | 128 |
Global Max Pooling | 500 | - | 128 | - |
Dense | 500 | 500 | 64 | 64 |
Output (Classifier) | 2 | 2 | 2 | 2 |
Accuracy % per Model Type | ||||
---|---|---|---|---|
Fold | CNNs-BiLSTMs | CNNs | CNNs-Att | CNNs-BiLSTMs Att |
1 | 81.97% | 78.4% | 82.5% | 80.2% |
2 | 81.39% | 83.7% | 83.72% | 80.2% |
3 | 84.3% | 83.7% | 83.72% | 84.2% |
4 | 84.8% | 81.9% | 83.72% | 87.2% |
5 | 85.3% | 85.96% | 87.71% | 85.9% |
Average | 83.55% | 82.7% | 84.27 % | 83.54% |
Accuracy % per Model Type | ||
---|---|---|
Classification Method | Type | Accuracy |
Naive Bayes | Text | 67.3% |
Logistic Regression | Text | 80.76% |
Linear Support Vector Machine (LSVM) | Text | 82.69% |
CNNs | Speech | 82.7% |
CNNs-BiLSTMs-Attention | Speech | 83.54% |
CNNs-BiLSTMs | Speech | 83.55% |
CNNs-Attention | Speech | 84.27% |
Performance % per Model Type | ||
---|---|---|
Classification Method | Processing Time (Hour) | Performance % |
LSVM | 31 | 2.4% |
Naive Bayes | 30 | 2.5% |
CNNs-BiLSTMs-Attention | 25.5 | 2.94% |
CNNs-BiLSTMs | 22 | 3.4% |
CNNs-Attention | 2.75 | 27.2% |
CNNs | 0.75 | 100% |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ahmed, A.; Toral, S.; Shaalan, K.; Hifny, Y. Agent Productivity Modeling in a Call Center Domain Using Attentive Convolutional Neural Networks. Sensors 2020, 20, 5489. https://doi.org/10.3390/s20195489
Ahmed A, Toral S, Shaalan K, Hifny Y. Agent Productivity Modeling in a Call Center Domain Using Attentive Convolutional Neural Networks. Sensors. 2020; 20(19):5489. https://doi.org/10.3390/s20195489
Chicago/Turabian StyleAhmed, Abdelrahman, Sergio Toral, Khaled Shaalan, and Yaser Hifny. 2020. "Agent Productivity Modeling in a Call Center Domain Using Attentive Convolutional Neural Networks" Sensors 20, no. 19: 5489. https://doi.org/10.3390/s20195489
APA StyleAhmed, A., Toral, S., Shaalan, K., & Hifny, Y. (2020). Agent Productivity Modeling in a Call Center Domain Using Attentive Convolutional Neural Networks. Sensors, 20(19), 5489. https://doi.org/10.3390/s20195489