Inter-Sentence Segmentation of YouTube Subtitles Using Long-Short Term Memory (LSTM)
Abstract
:1. Introduction
2. Materials
2.1. Data
2.2. Preprocessing
3. Methods
3.1. Word Embedding
3.2. RNN (Recurrent Neural Network)
4. Experimental Results
5. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Bijl, D.; Henry, H.-T. Speech to Text Conversion. U.S. Patent No. 6,173,259, 9 January 2001. [Google Scholar]
- Manning, C.D.; Christopher, D.M.; Hinrich, S. Foundations of Statistical Natural Language Processing, 1st ed.; MIT Press: Cambridge, MA, USA, 1999. [Google Scholar]
- Krogh, A.; Larsson, B.; von Heijne, G.; Sonnhammer, E.L. Predicting transmembrane protein topology with a hidden Markov model: Application to complete genomes. J. Mol. Biol. 2001, 305, 567–580. [Google Scholar] [CrossRef] [PubMed]
- Liao, H.; Erik, M.; Andrew, S. Large scale deep neural network acoustic modeling with semi-supervised training data for YouTube video transcription. In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU 2013), Olomouc, Czech Republic, 8–12 December 2013. [Google Scholar]
- Robinson, T.; Fransen, J.; Pye, D.; Foote, J.; Renals, S. WSJCAMO: A British English speech corpus for large vocabulary continuous speech recognition. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing ICASSP-95, Detroit, MI, USA, 9–12 May 1995; Volume 1. [Google Scholar]
- Forcada, M.L.; Ginesti-Rosell, M.; Nordfalk, J.; O’Regan, J.; Ortiz-Rojas, S.; Perez-Ortiz, J.A.; Sanchez-Martinez, F.; Ramirez-Sanchez, G.; Tyers, F.M. Apertium: A free/open-source platform for rule-based machine translation. Mach. Transl. 2011, 25, 127–144. [Google Scholar] [CrossRef]
- Sommers, H. Example-based machine translation. Mach. Transl. 1999, 14, 113–157. [Google Scholar] [CrossRef]
- Brown, P.F.; Della Pieta, V.J.; Della Pietra, S.A.; Mercer, R.L. The Mathematics of Statistical Machine Translation: Parameter Estimation. Comput. Linguist. 1993, 19, 263–311. [Google Scholar]
- Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. In Proceedings of the ICLR, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Miyamoto, K.; Noriko, N.; Kenichi, A. Subtitle Generation and Retrieval Combining Document with Speech Recognition. U.S. Patent No. 7,739,116, 15 June 2010. [Google Scholar]
- Di Francesco, R.J. Real-time speech segmentation using pitch and convexity jump models: Application to variable rate speech coding. IEEE Trans. Acoust. Speech Signal Process. 1990, 38, 741–748. [Google Scholar] [CrossRef]
- Chen, C.J. Speech recognition with automatic punctuation. In Proceedings of the EUROSPEECH, Budapest, Hungary, 5–9 September 1999. [Google Scholar]
- Batista, F.; Fernando, C.; Diamantino, M.; Nuno, J.; Trancoso, I. Recovering punctuation marks for automatic speech recognition. In Proceedings of the INTERSPEECH, Antwerp, Belgium, 27–31 August 2007. [Google Scholar]
- Matusov, E.; Arne, M.; Hermann, N. Automatic sentence segmentation and punctuation prediction for spoken language translation. In Proceedings of the International Workshop on Spoken Language Translation (IWSLT), Kyoto, Japan, 27–28 November 2006. [Google Scholar]
- Stolcke, A.; Shriberg, E.; Bates, R.A.; Ostendorf, M. Automatic detection of sentence boundaries and disfluencies based on recognized words. In Proceedings of the Fifth International Conference on Spoken Language Processing, Sydney, Australia, 30 November–4 December 1998. [Google Scholar]
- Xue, N.; Yaqin, Y. Chinese sentence segmentation as comma classification. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, Portland, Oregon, 19–24 June 2011; Short papers-Volume 2. [Google Scholar]
- Hochreiter, S.; Jürgen, S. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Mikolov, T.; Karafiát, M.; Burget, L.; Cernocký, J.; Khudanpur, S. Recurrent neural network based language model. In Proceedings of the Eleventh Annual Conference of the International Speech Communication Association, Makuhari, Japan, 26–30 September 2010. [Google Scholar]
- Tilk, O.; Alum, T. LSTM for punctuation restoration in speech transcripts. In Proceedings of the INTERSPEECH, Dresden, Germany, 6–10 September 2015. [Google Scholar]
- Levy, O.; Yoav, G. Neural word embedding as implicit matrix factorization. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014. [Google Scholar]
- Mikolov, T.; Yih, W.T.; Zweig, G. Linguistic regularities in continuous space word representations. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, GA, USA, 9–14 June 2013; pp. 746–751. [Google Scholar]
- Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.S.; Dean, J. Distributed representations of words and phrases and their compositionality. In Proceedings of the Advances in Neural Information Processing Systems, Stateline, NV, USA, 5–10 December 2013; pp. 3111–3119. [Google Scholar]
- Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient estimation of word representations in vector space. arXiv, 2013; arXiv:1301.3781. [Google Scholar]
- Pennington, J.; Socher, R.; Manning, C. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 26–28 October 2014; pp. 1532–1543. [Google Scholar]
- Bojanowski P, Grave E, Joulin A, Mikolov T, Enriching word vectors with subword information. arXiv, 2016; arXiv:1607.04606.
- Christopher Olah’s Github Blog. Available online: http://colah.github.io/posts/2015-08-Understanding-LSTMs/ (accessed on 27 August 2015).
- Sutskever, I.; Oriol, V.; Quoc, V.L. Sequence to sequence learning with neural networks. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014. [Google Scholar]
Example Sentence | I told him eat sam john called imp followed the command |
---|---|
Case 1 | I told him eat. Sam John called imp followed the command. |
Case 2 | I told him eat Sam. John called imp followed the command. |
Raw Data | but if you wanted to parse the web no one use them. and so the cool thing was that… |
Ex.1 | [‘.’, ‘and’, ‘so’, ‘the’, ‘cool’, ‘thing’] |
X_DATA[0] | [‘and’, ‘so’, ‘the’, ‘cool’, ‘thing‘] |
Y_DATA[0] | [1, 0, 0, 0, 0] |
Raw Data | and so the cool thing was that by doing this as neural network dependency parser we were able to get much better accuracy. we were able… |
Ex.2 | [‘get’, ‘much’, ‘better’, ‘accuracy’, ‘.’, ‘we’] |
X_DATA[1] | [‘get’, ‘much’, ‘better’, ‘accuracy’, ‘we’] |
Y_DATA[1] | [0, 0, 0, 0, 1] |
Data and Hyper Parameters | Figure |
---|---|
All data | 27,826 |
Training data | 19,478 |
Test data | 8348 |
Embedding dimension | 100 |
RNN cell layer | 3 |
Epochs | 2000 |
Learning rate | 0.115 |
Cost | 0.181 |
Accuracy | 70.84% |
Predicted | ||||||
---|---|---|---|---|---|---|
Actual | A | B | C | D | E | |
A | 1066 | 40 | 49 | 88 | 99 | |
B | 29 | 1278 | 43 | 49 | 63 | |
C | 75 | 28 | 1276 | 52 | 45 | |
D | 101 | 79 | 46 | 1206 | 48 | |
E | 129 | 112 | 85 | 89 | 1088 |
A | B | C | D | E | Average | |
---|---|---|---|---|---|---|
Precision | 0.761429 | 0.83149 | 0.851234 | 0.818182 | 0.810127 | 0.814492 |
Recall | 0.83249 | 0.880165 | 0.864499 | 0.814865 | 0.723886 | 0.81555 |
f-measure | 0.777535 | 0.855135 | 0.857815 | 0.81652 | 0.764582 | 0.814317 |
A Data | [‘a’ ‘psychiatric’ ‘disease’ ‘the’ ‘notion’] |
answer | [0 0 0 1 0] |
prediction | [0 0 0 1 0] |
predicted period | A psychiatric disease. The notion |
B Data | [‘and’ ‘all’ ‘of’ ‘that’ ‘something’] |
answer | [0 0 0 0 1] |
correctly predicted period | and all of that. Something |
prediction | [1 0 0 0 0] |
wrongly predicted period | And all of that something |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Song, H.-J.; Kim, H.-K.; Kim, J.-D.; Park, C.-Y.; Kim, Y.-S. Inter-Sentence Segmentation of YouTube Subtitles Using Long-Short Term Memory (LSTM). Appl. Sci. 2019, 9, 1504. https://doi.org/10.3390/app9071504
Song H-J, Kim H-K, Kim J-D, Park C-Y, Kim Y-S. Inter-Sentence Segmentation of YouTube Subtitles Using Long-Short Term Memory (LSTM). Applied Sciences. 2019; 9(7):1504. https://doi.org/10.3390/app9071504
Chicago/Turabian StyleSong, Hye-Jeong, Hong-Ki Kim, Jong-Dae Kim, Chan-Young Park, and Yu-Seop Kim. 2019. "Inter-Sentence Segmentation of YouTube Subtitles Using Long-Short Term Memory (LSTM)" Applied Sciences 9, no. 7: 1504. https://doi.org/10.3390/app9071504
APA StyleSong, H. -J., Kim, H. -K., Kim, J. -D., Park, C. -Y., & Kim, Y. -S. (2019). Inter-Sentence Segmentation of YouTube Subtitles Using Long-Short Term Memory (LSTM). Applied Sciences, 9(7), 1504. https://doi.org/10.3390/app9071504