Automated Essay Scoring: A Siamese Bidirectional LSTM Neural Network Architecture
Abstract
:1. Introduction
- For the first time, we introduce some samples to represent the rating criteria to increase the rating information and construct a pair consisting of an essay and a sample as the new input. We can understand it as how similar is the essay and sample or how close is the essay and sample. This, to a certain extent, is similar to semantic similarity [27] and question–answer matches [14]. We introduce it to AES.
- We provide a self-feature mechanism at the LSTM output layer. We compute two kinds of similarities: the similarity between sentences in the essay and the similarity between essay and sample. The experiment shows that it is a benefit for the essays, which are long and complicated. This idea is inspired by the SKIPFLOW [14] approach, but we make an extension of it.
- We proposed a Siamese Bidirectional Long Short-Term Memory Architecture (SBLSTMA); this is a Siamese neural network architecture that can receive the essay and sample in each side. We use the ASAP dataset as an evaluation. The results show that our model empirically outperforms the previous neural network AES approaches.
2. Related Works
3. Automated Essay Scoring
3.1. Description of Input
3.2. Evaluation Metric of Output
3.3. Model Architecture
3.3.1. Embedding Layer
3.3.2. Convolution Layer
3.3.3. LSTM Layer
3.3.4. Self-Feature Layer
3.3.5. Fully-Connected Layer
3.3.6. Softmax Layer
3.4. Training
4. Experiments
4.1. Setup
4.2. Baseline
4.3. Results and Discussion
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Ellis, B. Grading essays by computer: Progress report. In Proceedings of the Invitational Conference on Testing Problems, New York, NY, USA, 29 October 1966; pp. 87–100. [Google Scholar]
- Foltz, P.W.; Laham, D.; Landauer, T.K. Automated essay scoring: Applications to educational technology. Proc. EdMedia 1999, 99, 40–64. [Google Scholar]
- Attali, Y.; Burstein, J. Automated essay scoring with e-raterR v.2.0. ETS Res. Rep. Ser. 2004, 2, 1–21. [Google Scholar]
- Larkey, L.S. Automatic essay grading using text categorization techniques. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia, 24–28 August 1998; pp. 90–95. [Google Scholar] [CrossRef]
- Lawrence, M.R.; Liang, T. Automated essay scoring using bayes’ theorem. J. Technol. Learn. Assess. 2002, 1, 3–21. [Google Scholar]
- Phandi, P.; Chai, K.M.A.; Ng, H.T. Flexible domain adaptation for automated essay scoring using correlated linear regression. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; pp. 431–439. [Google Scholar]
- Yannakoudakis, H.; Briscoe, T.; Medlock, B. A new dataset and method for automatically grading esol texts. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies—Volume 1, Association for Computational Linguistics, Portland, Oregon, 19–24 June 2011; pp. 180–189. [Google Scholar]
- Chen, H.; He, B. Automated essay scoring by maximizing human-machine agreement. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, DC, USA, 18–21 October 2013; pp. 1741–1752. [Google Scholar]
- Hinton, G.E. Learning Distributed Representations of Concepts. In Proceedings of the Eighth Annual Conference of the Cognitive Science Society, Amherst, MA, USA, 15–17 August 1986; pp. 1–12. [Google Scholar]
- Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient Estimation of Word Representations in Vector Space. arXiv, 2013; arXiv:1301.3781. [Google Scholar]
- Alikaniotis, D.; Yannakoudakis, H.; Rei, M. Automatic Text Scoring Using Neural Networks. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, 7–12 August 2016; pp. 715–725. [Google Scholar]
- Taghipour, K.; Ng, H.T. A Neural Approach to Automated Essay Scoring. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, TX, USA, 1–5 November 2016; pp. 1882–1891. [Google Scholar]
- Dong, F.; Zhang, Y.; Yang, J. Attention-based Recurrent Convolutional Neural Network for Automatic Essay Scoring. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), Vancouver, BC, Canada, 3–4 August 2017; pp. 153–162. [Google Scholar]
- Tay, Y.; Phan, M.C.; Tuan, L.A.; Hui, S.C. SKIPFLOW: Incorporating Neural Coherence Features for End-to-End Automatic Text Scoring. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence), New Orleans, LV, USA, 2–7 February 2018. [Google Scholar]
- Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv, 2014; arXiv:1409.0473. [Google Scholar]
- Lee, K.; Han, S.; Han, S.; Myaeng, S. A discourse-aware neural network-based text model for document-level text classification. J. Inf. Sci. 2017, 44, 715–735. [Google Scholar] [CrossRef]
- Santos, C.N.D.; Gatti, M. Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts. In Proceedings of the 25th International Conference on Computational Linguistics, Dublin, Ireland, 23–29 August 2014; pp. 69–78. [Google Scholar]
- Yin, W.; Ebert, S.; Schütze, H. Attention-Based Convolutional Neural Network for Machine Comprehension. In Proceedings of the 2016 NAACL Human-Computer Question Answering Workshop, San Diego, CA, USA, 12–17 June 2016; pp. 15–21. [Google Scholar]
- Zhang, Y.; Wallace, B. A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. arXiv, 2015; arXiv:1510.03820. [Google Scholar]
- Lipton, Z.C.; Berkowitz, J.; Elkan, C. A critical review of recurrent neural networks for sequence learning. arXiv, 2015; arXiv:1506.00019. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Y.D.; Muhammad, K.; Tang, C. Twelve-layer deep convolutional neural network with stochastic pooling for tea category classification on GPU platform. Multimed. Tools Appl. 2018, 77, 22821. [Google Scholar] [CrossRef]
- Wang, S.H.; Lv, Y.D.; Sui, Y.; Liu, S.; Wang, S.J.; Zhang, Y.D. Alcoholism Detection by Data Augmentation and Convolutional Neural Network with Stochastic Pooling. J. Med. Syst. 2018, 42, 2. [Google Scholar] [CrossRef] [PubMed]
- Dong, F.; Zhang, Y. Automatic Features for Essay Scoring—An Empirical Study. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, TX, USA, 1–4 November 2016; pp. 1072–1077. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv, 2017; arXiv:1706.03762. [Google Scholar]
- Dehghani, M.; Gouws, S.; Vinyals, O.; Uszkoreit, J.; Kaiser, Ł. Universal Transformers. arXiv, 2018; arXiv:1807.03819. [Google Scholar]
- Mueller, J.; Thyagarajan, A. Siamese Recurrent Architectures for Learning Sentence Similarity. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016. [Google Scholar]
- Landauer, T.K.; Foltz, P.W.; Laham, D. Introduction to Latent Semantic Analysis. Discourse Process. 1998, 25, 259–284. [Google Scholar] [CrossRef]
- Tandalla, L.; Scoring Short Answer Essays. ASAP Short Answer Scoring Competition–Luis Tandalla’s Approach. ASAP Short Answer Scoring Competition–Luis Tandalla’s Approach. 2012, Volume 9. Available online: https://kaggle2.blob.core.windows.net/competitions/kaggle/2959/media/TechnicalMethodsPaper.pdf (accessed on 14 November 2018).
- Mehmood, A.; On, By.; Lee, I.; Choi, G.S. Prognosis essay scoring and article relevancy using multi text features and machine learning. Symmetry 2017, 9, 11. [Google Scholar] [CrossRef]
- Drolia, S.; Rupani, S.; Agarwal, P.; Singh, A. Automated Essay Rater using Natural Language Processing. Int. J. Comput. Appl. 2017, 163, 44–46. [Google Scholar] [CrossRef]
- McNamara, D.S.; Crossley, S.A.; Roscoe, R.D.; Allen, L.K.; Dai, J. A hierarchical classification approach to automated essay scoring. Assess. Writ. 2015, 23, 35–59. [Google Scholar] [CrossRef]
- Fauzi, M.A.; Utomo, D.C.; Setiawan, B.D. Automatic Essay Scoring System Using N-Gram and Cosine Similarity for Gamification Based E-Learning. In Proceedings of the International Conference on Advances in Image Processing, Bangkok, Thailand, 25–27 August 2017; pp. 151–155. [Google Scholar]
- Zupanc, K.; Bosnić, Z. Automated essay evaluation with semantic analysis. Knowl.-Based Syst. 2017, 120, 118–132. [Google Scholar] [CrossRef]
- Yang, Z.; Yang, D.; Dyer, C.; He, X.; Smola, A.; Hovy, E. Hierarchical Attention Networks for Document Classification. In Proceedings of the NAACL-HLT 2016, San Diego, CA, USA, 12–17 June 2016; pp. 1480–1489. [Google Scholar]
- Kumar, S.; Chakrabarti, S.; Roy, S. Earth Mover’s Distance Pooling over Siamese LSTMs for Automatic Short Answer Grading. In Proceedings of the Twenty Sixth International Joint Conferenceon Artificial Intelligence (IJCAI17), Melbourne, Australia, 19–25 August 2017. [Google Scholar]
- Graves, A.; Fernández, S.; Schmidhuber, J. Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition. ICANN 2005, 3697, 799–804. [Google Scholar]
- Schuster, M.; Paliwal, K.K. Bidirectional Recurrent Neural Networks. IEEE Trans. Signal Process. 1997, 45, 2673–2681. [Google Scholar] [CrossRef]
- Duchi, J.; Hazan, E.; Singer, Y. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. J. Mach. Learn. Res. 2011, 12, 2121–2159. [Google Scholar]
- Jeffrey, P.; Richard, S.; Christopher, D.M. GloVe: GlobalVectorsforWordRepresentation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1532–1543. [Google Scholar]
Layer | Parameter Name | Parameter Value |
---|---|---|
Embedding Layer | Pretrained embedding | GloVe 50-dimensional [40] |
Convolution Layer | Window size | 5 |
Filters | 20 | |
LSTM Layer | Layers | 1 |
Hidden units | 64 | |
Dropout | 0.75 | |
Self-feature Layer | Attention length | 50 |
Epochs | 100–300 | |
Batch size | 100–200 | |
Learning rate | 0.01 |
Prompt | Number of Essays | Average Length | Scores |
---|---|---|---|
1 | 1788 | 350 | 2–12 |
2 | 1800 | 350 | 1–6 |
3 | 1726 | 150 | 0–3 |
4 | 1772 | 150 | 0–3 |
5 | 1805 | 150 | 0–4 |
6 | 1800 | 150 | 0–4 |
7 | 1569 | 250 | 0–30 |
8 | 723 | 650 | 0–60 |
Model | Prompts | ||||||||
---|---|---|---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Average | |
LSTM-CNN-att | 0.822 | 0.682 | 0.672 | 0.814 | 0.803 | 0.811 | 0.801 | 0.705 | 0.764 |
SKIPFLOW | 0.832 | 0.684 | 0.695 | 0.788 | 0.815 | 0.810 | 0.800 | 0.697 | 0.764 |
SBLSTMA | 0.861 | 0.731 | 0.780 | 0.818 | 0.842 | 0.820 | 0.810 | 0.746 | 0.801 |
Module Combination | Prompts | ||||||||
---|---|---|---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Average | |
Ma + Mc | 0.521 | 0.486 | 0.546 | 0.685 | 0.8 | 0.704 | 0.469 | 0.425 | 0.560 |
Mb + Mc | 0.722 | 0.670 | 0.724 | 0.797 | 0.817 | 0.816 | 0.795 | 0.658 | 0.757 |
Ma + Mb + Mc | 0.861 | 0.731 | 0.780 | 0.818 | 0.842 | 0.820 | 0.810 | 0.746 | 0.801 |
SBLSTMA (best of above) | 0.861 | 0.731 | 0.780 | 0.818 | 0.842 | 0.820 | 0.810 | 0.746 | 0.801 |
Prompt | Sample Set | Prompts | Sample Set |
---|---|---|---|
1 | 5 | ||
2 | 6 | ||
3 | 7 | ||
4 | 8 |
Prompts | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Average | ||
Mean | Ma + Mc | 0.366 | 0.367 | 0.477 | 0.606 | 0.759 | 0.613 | 0.240 | 0.26 | 0.461 |
Mb + Mc | 0.614 | 0.493 | 0.542 | 0.711 | 0.694 | 0.691 | 0.26 | 0.313 | 0.540 | |
Ma + Mb + Mc | 0.751 | 0.621 | 0.681 | 0.754 | 0.739 | 0.727 | 0.576 | 0.373 | 0.653 | |
Std.Deviation | Ma + Mc | 0.052 | 0.069 | 0.058 | 0.083 | 0.048 | 0.103 | 0.134 | 0.066 | 0.077 |
Mb + Mc | 0.139 | 0.148 | 0.111 | 0.119 | 0.192 | 0.209 | 0.218 | 0.090 | 0.153 | |
Ma + Mb + Mc | 0.037 | 0.094 | 0.103 | 0.033 | 0.096 | 0.055 | 0.137 | 0.172 | 0.091 |
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liang, G.; On, B.-W.; Jeong, D.; Kim, H.-C.; Choi, G.S. Automated Essay Scoring: A Siamese Bidirectional LSTM Neural Network Architecture. Symmetry 2018, 10, 682. https://doi.org/10.3390/sym10120682
Liang G, On B-W, Jeong D, Kim H-C, Choi GS. Automated Essay Scoring: A Siamese Bidirectional LSTM Neural Network Architecture. Symmetry. 2018; 10(12):682. https://doi.org/10.3390/sym10120682
Chicago/Turabian StyleLiang, Guoxi, Byung-Won On, Dongwon Jeong, Hyun-Chul Kim, and Gyu Sang Choi. 2018. "Automated Essay Scoring: A Siamese Bidirectional LSTM Neural Network Architecture" Symmetry 10, no. 12: 682. https://doi.org/10.3390/sym10120682
APA StyleLiang, G., On, B. -W., Jeong, D., Kim, H. -C., & Choi, G. S. (2018). Automated Essay Scoring: A Siamese Bidirectional LSTM Neural Network Architecture. Symmetry, 10(12), 682. https://doi.org/10.3390/sym10120682