Can Neural Networks Do Arithmetic? A Survey on the Elementary Numerical Skills of State-of-the-Art Deep Learning Models
Abstract
:1. Introduction
2. Elementary Numerical Tasks and Datasets
2.1. Math Word Problems
2.2. Operational Problems Encoded Using Simple Mathematical Notation
2.3. Higher-Level Numeracy and Math Problems
A | Problem: | Bruce purchased 7 kg of grapes at the rate of 70 per kg and 9 kg of mangoes at the rate of 55 per kg. How much did he pay to the shopkeeper? |
Operation stack: | a = multiply (7, 70); b = multiply (9, 55); add (a,b) | |
Answer: | 985 | |
B | Problem: | Sam gave 50 dollars to the shopkeeper to buy a book and the shopkeeper returned some money. What could be the price of the book? Choose all plausible alternatives. |
Answers: | (a) 55 (b) 45 (c) 37 (d) 59 (e) None | |
C | Problem: | Write the Arabic number corresponding to the following number words:“three hundred and thirty-four”“one thousand two hundred and seven”“eighty-nine thousand and one” |
Answer: | 334; 1207; 89001 | |
D | Problem: | Divide 1136975704 by −142121963. |
Answer: | −8 | |
Problem: | Let k(u) = u * 2 + u-4. Find k(0). | |
Answer: | −4 | |
E | Context: | Before the UNPROFOR fully deployed, the HV clashed with an armed force of the RSK in the village of Nos Kalik, located in a pink zone near Sibenik, and captured the village at 4:45 p.m. on 2 March 1992. The JNA formed a battlegroup to counterattack the next day. |
Question: | What date did the JNA form a battlegroup to counterattack after the village of Nos Kalik was captured? | |
Answer: | 3 March 1992 |
3. Neural Network Models for Arithmetic Reasoning
3.1. Ad Hoc Architectures
3.2. Generic Deep Learning Architectures
3.3. Generic Large Language Models
3.3.1. Promoting Step-by-Step Numerical Reasoning
3.3.2. Injecting Numerical Semantics into Word Embeddings
4. Conclusions
Funding
Data Availability Statement
Conflicts of Interest
References
- Dehaene, S. The Number Sense: How the Mind Creates Mathematics; Oxford University Press: New York, NY, USA, 2011. [Google Scholar]
- Nieder, A. The Neuronal Code for Number. Nat. Rev. Neurosci. 2016, 17, 366–382. [Google Scholar] [CrossRef]
- Núñez, R. Is There Really an Evolved Capacity for Number? Trends Cogn. Sci. 2017, 21, 409–424. [Google Scholar] [CrossRef] [PubMed]
- O’Shaughnessy, D.M.; Gibson, E.; Piantadosi, S.T. The Cultural Origins of Symbolic Number. Psychol. Rev. 2022, 129, 1442–1456. [Google Scholar] [CrossRef]
- Beller, S.; Bender, A.; Chrisomalis, S.; Jordan, F.M.; Overmann, K.A.; Saxe, G.B.; Schlimm, D. The Cultural Challenge in Mathematical Cognition. J. Numer. Cogn. 2018, 4, 448–463. [Google Scholar] [CrossRef]
- Newell, A.; Simon, H.A. The Logic Theory Machine: A Complex Information Processing System. IRE Trans. Inf. Theory 1956, 2, 61–79. [Google Scholar] [CrossRef]
- Bundy, A. The Computer Modelling of Mathematical Reasoning; Academic Press: Cambridge, MA, USA, 1983. [Google Scholar]
- Choi, C.Q. 7 Revealing Ways AIs Fail: Neural Networks Can Be Disastrously Brittle, Forgetful, and Surprisingly Bad at Math. IEEE Spectr. 2021, 58, 42–47. [Google Scholar] [CrossRef]
- Harnad, S. The Symbol Grounding Problem. Phys. D Nonlinar Phenom. 1990, 42, 335–346. [Google Scholar] [CrossRef]
- Searle, J.R. Minds, Brains, and Programs. Behav. Brain Sci. 1980, 3, 417–424. [Google Scholar] [CrossRef]
- Leibovich, T.; Ansari, D. The Symbol-Grounding Problem in Numerical Cognition: A Review of Theory, Evidence, and Outstanding Questions. Can. J. Exp. Psychol. 2016, 70, 12–23. [Google Scholar] [CrossRef]
- Marcus, G. Deep Learning: A Critical Appraisal. arXiv 2018, arXiv:1801.00631. [Google Scholar]
- Fodor, J.A.; Pylyshyn, Z.W. Connectionism and Cognitive Architecture: A Critical Analysis. Cognition 1988, 28, 3–71. [Google Scholar] [CrossRef] [PubMed]
- Cartuyvels, R.; Spinks, G.; Moens, M.-F. Discrete and Continuous Representations and Processing in Deep Learning: Looking Forward. AI Open 2021, 2, 143–159. [Google Scholar] [CrossRef]
- Santoro, A.; Lampinen, A.; Mathewson, K.; Lillicrap, T.; Raposo, D. Symbolic Behaviour in Artificial Intelligence. arXiv 2021, arXiv:2102.03406. [Google Scholar]
- Trask, A.; Hill, F.; Reed, S.; Rae, J.; Dyer, C.; Blunsom, P. Neural Arithmetic Logic Units. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 3–8 December 2018. [Google Scholar]
- Lample, G.; Charton, F. Deep Learning for Symbolic Mathematics. arXiv 2019. [Google Scholar] [CrossRef]
- Polu, S.; Sutskever, I. Generative Language Modeling for Automated Theorem Proving. arXiv 2020, arXiv:2009.03393. [Google Scholar]
- Lee, D.; Szegedy, C.; Rabe, M.N.; Loos, S.M.; Bansal, K. Mathematical Reasoning in Latent Space. arXiv 2019, arXiv:1909.11851. [Google Scholar]
- Wang, M.; Deng, J. Learning to Prove Theorems by Learning to Generate Theorems. In Proceedings of the Advances in Neural Information Processing Systems; Virtual, 6–12 December 2020. [Google Scholar]
- Davies, A.; Veličković, P.; Buesing, L.; Blackwell, S.; Zheng, D.; Tomašev, N.; Tanburn, R.; Battaglia, P.; Blundell, C.; Juhász, A.; et al. Advancing Mathematics by Guiding Human Intuition with AI. Nature 2021, 600, 70–74. [Google Scholar] [CrossRef]
- Fawzi, A.; Balog, M.; Huang, A.; Hubert, T.; Romera-Paredes, B.; Barekatain, M.; Novikov, A.; R Ruiz, F.J.; Schrittwieser, J.; Swirszcz, G.; et al. Discovering Faster Matrix Multiplication Algorithms with Reinforcement Learning. Nature 2022, 610, 47–53. [Google Scholar] [CrossRef]
- Drori, I.; Zhang, S.; Shuttleworth, R.; Tang, L.; Lu, A.; Ke, E.; Liu, K.; Chen, L.; Tran, S.; Cheng, N.; et al. A Neural Network Solves, Explains, and Generates University Math Problems by Program Synthesis and Few-Shot Learning at Human Level. Proc. Natl. Acad. Sci. USA 2022, 119, e2123433119. [Google Scholar] [CrossRef]
- Davis, E. Limits of an AI Program for Solving College Math Problems. arXiv 2022, arXiv:2208.06906. [Google Scholar]
- Davis, E. The Use of Deep Learning for Symbolic Integration: A Review of (Lample and Charton, 2019). arXiv 2019, arXiv:1912.05752. [Google Scholar]
- Claessens, A.; Engel, M. How Important Is Where You Start? Early Mathematics Knowledge and Later School Success. Teach. Coll. Rec. 2013, 115, 1–29. [Google Scholar] [CrossRef]
- Nguyen, T.; Watts, T.W.; Duncan, G.J.; Clements, D.H.; Sarama, J.S.; Wolfe, C.; Spitler, M.E. Which Preschool Mathematics Competencies Are Most Predictive of Fifth Grade Achievement? Early Child. Res. Q. 2016, 36, 550–560. [Google Scholar] [CrossRef] [PubMed]
- Jordan, N.C.; Kaplan, D.; Ramineni, C.; Locuniak, M.N. Early Math Matters: Kindergarten Number Competence and Later Mathematics Outcomes. Dev. Psychol. 2009, 45, 850–867. [Google Scholar] [CrossRef] [PubMed]
- Faldu, K.; Sheth, A.; Kikani, P.; Gaur, M.; Avasthi, A. Towards Tractable Mathematical Reasoning: Challenges, Strategies, and Opportunities for Solving Math Word Problems. arXiv 2021, arXiv:2111.05364. [Google Scholar]
- Huang, D.; Shi, S.; Lin, C.-Y.; Yin, J.; Ma, W.-Y. How Well Do Computers Solve Math Word Problems? Large-Scale Dataset Construction and Evaluation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, 7–12 August 2016; Association for Computational Linguistics: Stroudsburg, PA, USA, 2016; Volume 2, pp. 887–896. [Google Scholar]
- Wang, Y.; Liu, X.; Shi, S. Deep Neural Solver for Math Word Problems. In Proceedings of the Empirical Methods in Natural Language Processing (EMNLP), Copenhagen, Denmark, 9–11 September 2017. [Google Scholar]
- Ling, W.; Yogatama, D.; Dyer, C.; Blunsom, P. Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, BC, Canada, 30 July–4 August 2017; Volume 1, pp. 158–167. [Google Scholar]
- Amini, A.; Gabriel, S.; Lin, S.; Koncel-Kedziorski, R.; Choi, Y.; Hajishirzi, H. MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, Minneapolis, MN, USA, 2–7 June 2019; Volume 1, pp. 2357–2367. [Google Scholar]
- Miao, S.; Liang, C.-C.; Su, K.-Y. A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, Virtual, 5–10 July 2020; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 975–984. [Google Scholar]
- Patel, A.; Bhattamishra, S.; Goyal, N. Are NLP Models Really Able to Solve Simple Math Word Problems? In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Virtual Event, 6–11 June; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 2080–2094. [Google Scholar]
- Cobbe, K.; Kosaraju, V.; Bavarian, M.; Chen, M.; Jun, H.; Kaiser, L.; Plappert, M.; Tworek, J.; Hilton, J.; Nakano, R.; et al. Training Verifiers to Solve Math Word Problems. arXiv 2021, arXiv:2110.14168. [Google Scholar]
- Gupta, H.; Varshney, N.; Mishra, S.; Pal, K.K.; Sawant, S.A.; Scaria, K.; Goyal, S.; Baral, C. “John Is 50 Years Old, Can His Son Be 65?” Evaluating NLP Models’ Understanding of Feasibility. In Proceedings of the The 17th Conference of the European Chapter of the Association for Computational Linguistics, Dubrovnik, Croatia, 2–6 May 2023. [Google Scholar]
- Fujisawa, I.; Kanai, R. Logical Tasks for Measuring Extrapolation and Rule Comprehension. arXiv 2022, arXiv:2211.07727. [Google Scholar]
- Cognolato, S.; Testolin, A. Transformers Discover an Elementary Calculation System Exploiting Local Attention and Grid-like Problem Representation. In Proceedings of the International Joint Conference on Neural Networks (IJCNN), Padua, Italy, 18–23 July 2022. [Google Scholar]
- Madsen, A.; Johansen, A.R. Neural Arithmetic Units. In Proceedings of the International Conference on Learning Representations, Virtual Conference, 25–30 April 2020. [Google Scholar]
- Anil, C.; Wu, Y.; Andreassen, A.; Lewkowycz, A.; Misra, V.; Ramasesh, V.; Slone, A.; Gur-Ari, G.; Dyer, E.; Neyshabur, B. Exploring Length Generalization in Large Language Models. In Proceedings of the Advances in Neural Information Processing Systems, New Orleans, LA, USA, 28 November—9 December 2022. [Google Scholar]
- Saxton, D.; Grefenstette, E.; Hill, F.; Kohli, P. Analysing Mathematical Reasoning Abilities of Neural Models. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Dua, D.; Wang, Y.; Dasigi, P.; Stanovsky, G.; Singh, S.; Gardner, M. Drop: A Reading Comprehension Benchmark Requiring Discrete Reasoning over Paragraphs. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; pp. 2368–2378. [Google Scholar]
- Lin, B.Y.; Lee, S.; Khanna, R.; Ren, X. Birds Have Four Legs?! NumerSense: Probing Numerical Commonsense Knowledge of Pre-Trained Language Models. In Proceedings of the Empirical Methods in Natural Language Processing (EMNLP), Virtual Conference, 16–20 November 2020; pp. 6862–6868. [Google Scholar]
- Mishra, S.; Mitra, A.; Varshney, N.; Sachdeva, B.; Clark, P.; Baral, C.; Kalyan, A. NumGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning Tasks. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Dublin, Ireland, 22–27 May 2022; pp. 3505–3523. [Google Scholar]
- Hendrycks, D.; Burns, C.; Kadavath, S.; Arora, A.; Basart, S.; Tang, E.; Song, D.; Steinhardt, J. Measuring Mathematical Problem Solving With the MATH Dataset. In Proceedings of the Advances in Neural Information Processing Systems, Virtual Conference, 6–14 December 2021. [Google Scholar]
- Mishra, S.; Finlayson, M.; Lu, P.; Tang, L.; Welleck, S.; Baral, C.; Rajpurohit, T.; Tafjord, O.; Sabharwal, A.; Clark, P.; et al. Lila: A Unified Benchmark for Mathematical Reasoning. In Proceedings of the Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 7–11 December 2022; pp. 5807–5832. [Google Scholar]
- Lu, P.; Bansal, H.; Xia, T.; Liu, J.; Li, C.; Hajishirzi, H.; Cheng, H.; Chang, K.; Galley, M.; Gao, J. MathVista: Evaluating Math Reasoning in Visual Contexts with GPT-4V, Bard, and Other Large Multimodal Models. arXiv 2023, arXiv:2310.02255. [Google Scholar]
- Lewkowycz, A.; Andreassen, A.; Dohan, D.; Dyer, E.; Michalewski, H.; Ramasesh, V.; Slone, A.; Anil, C.; Schlag, I.; Gutman-Solo, T.; et al. Solving Quantitative Reasoning Problems with Language Models. arXiv 2022, arXiv:2206.14858. [Google Scholar]
- Frieder, S.; Pinchetti, L.; Griffiths, R.-R.; Salvatori, T.; Lukasiewicz, T.; Petersen, P.C.; Chevalier, A.; Berner, J. Mathematical Capabilities of ChatGPT. arXiv 2023. [Google Scholar] [CrossRef]
- Xie, Z.; Sun, S. A Goal-Driven Tree-Structured Neural Model for Math Word Problems. In Proceedings of the International Joint Conference on Artificial Intelligence, Macao, China, 10–16 August 2019; pp. 5299–5305. [Google Scholar]
- Wang, L.; Wang, Y.; Cai, D.; Zhang, D.; Liu, X. Translating a Math Word Problem to a Expression Tree. In Proceedings of the Empirical Methods in Natural Language Processing (EMNLP), Brussels, Belgium, 31 October–4 November 2018; pp. 1064–1069. [Google Scholar]
- Zhang, J.; Wang, L.; Lee, R.K.-W.; Bin, Y.; Wang, Y.; Shao, J.; Lim, E.-P. Graph-to-Tree Learning for Solving Math Word Problems. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, Virtual Event, 5–10 July 2020; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 3928–3937. [Google Scholar]
- Ran, Q.; Lin, Y.; Li, P.; Zhou, J.; Liu, Z. NumNet: Machine Reading Comprehension with Numerical Reasoning. In Proceedings of the Empirical Methods in Natural Language Processing (EMNLP), Hong Kong, China, 3–7 November 2019; pp. 2474–2484. [Google Scholar]
- Csordás, R.; Irie, K.; Schmidhuber, J. The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization. In Proceedings of the International Conference on Learning Representations, Virtual Conference, 25–29 April 2022. [Google Scholar]
- Petruzzellis, F.; Testolin, A.; Sperduti, A. A Hybrid System for Systematic Generalization in Simple Arithmetic Problems. In Proceedings of the 17th International Workshop on Neural-Symbolic Learning and Reasoning, Siena, Italy, 3–5 July 2023; Volume 3432, pp. 289–301. [Google Scholar]
- Jelassi, S.; D’Ascoli, S.; Domingo-Enrich, C.; Wu, Y.; Li, Y.; Charton, F. Length Generalization in Arithmetic Transformers. arXiv 2023. [Google Scholar] [CrossRef]
- Santoro, A.; Bartunov, S.; Botvinick, M.; Wierstra, D.; Lillicrap, T. Meta-Learning with Memory-Augmented Neural Networks. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; Volume 4, pp. 1842–1850. [Google Scholar]
- Graves, A.; Wayne, G.; Reynolds, M.; Harley, T.; Danihelka, I.; Grabska-Barwińska, A.; Gómez Colmenarejo, S.; Grefenstette, E.; Ramalho, T.; Agapiou, J.; et al. Hybrid Computing Using a Neural Network with Dynamic External Memory. Nature 2016, 538, 471–476. [Google Scholar] [CrossRef]
- Kaiser, Ł.; Sutskever, I. Neural GPUs Learn Algorithms. In Proceedings of the International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Kim, S.; Nam, H.; Kim, J.; Jung, K. Neural Sequence-to-Grid Module for Learning Symbolic Rules. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021. [Google Scholar]
- Overmann, K.A. The Role of Materiality in Numerical Cognition. Quat. Int. 2016, 405, 42–51. [Google Scholar] [CrossRef]
- d’Errico, F.; Doyon, L.; Colagé, I.; Queffelec, A.; Le Vraux, E.; Giacobini, G.; Vandermeersch, B.; Maureille, B. From Number Sense to Number Symbols. An Archaeological Perspective. Philos. Trans. R. Soc. B Biol. Sci. 2018, 373, 20160518. [Google Scholar] [CrossRef] [PubMed]
- Petruzzellis, F.; Chen, L.X.; Testolin, A. Learning to Solve Arithmetic Problems with a Virtual Abacus. In Proceedings of the Northern Lights Deep Learning Workshop, Tromso, Norway, 10–12 January 2023. [Google Scholar]
- Sabathiel, S.; Solstad, T.; Testolin, A.; Petruzzellis, F. Self-Communicating Deep Reinforcement Learning Agents Develop External Number Representations. In Proceedings of the Northern Lights Deep Learning Workshop, Tromso, Norway, 10–12 January 2022. [Google Scholar]
- Parisi, A.; Zhao, Y.; Fiedel, N. TALM: Tool Augmented Language Models. arXiv 2022. [Google Scholar] [CrossRef]
- Karpas, E.; Abend, O.; Belinkov, Y.; Lenz, B.; Lieber, O.; Ratner, N.; Shoham, Y.; Bata, H.; Levine, Y.; Muhlgay, D.; et al. MRKL Systems: A Modular, Neuro-Symbolic Architecture That Combines Large Language Models, External Knowledge Sources and Discrete Reasoning. arXiv 2022. [Google Scholar] [CrossRef]
- Brown, T.B.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language Models Are Few-Shot Learners. In Proceedings of the Advances in Neural Information Processing Systems, Virtual Conference, 6–12 December 2020. [Google Scholar]
- Henighan, T.; Kaplan, J.; Katz, M.; Chen, M.; Hesse, C.; Jackson, J.; Jun, H.; Brown, T.B.; Dhariwal, P.; Gray, S.; et al. Scaling Laws for Autoregressive Generative Modeling. arXiv 2020, arXiv:2010.14701. [Google Scholar]
- Rae, J.W.; Borgeaud, S.; Cai, T.; Millican, K.; Hoffmann, J.; Song, F.; Aslanides, J.; Henderson, S.; Ring, R.; Young, S.; et al. Scaling Language Models: Methods, Analysis & Insights from Training Gopher. arXiv 2021, arXiv:2112.11446. [Google Scholar]
- Pal, K.K.; Baral, C. Investigating Numeracy Learning Ability of a Text-to-Text Transfer Model. In Proceedings of the Empirical Methods in Natural Language Processing (EMNLP), Virtual Conference, 7–11 November 2021; pp. 3095–3101. [Google Scholar]
- Razeghi, Y.; Logan, R.L.; Gardner, M.; Singh, S. Impact of Pretraining Term Frequencies on Few-Shot Reasoning. In Proceedings of the Empirical Methods in Natural Language Processing (EMNLP), Abu Dhabi, United Arab Emirates, 7–11 December 2022; pp. 840–854. [Google Scholar]
- Geva, M.; Gupta, A.; Berant, J. Injecting Numerical Reasoning Skills into Language Models. In Proceedings of the The 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 946–958. [Google Scholar]
- Yang, P.-J.; Chen, Y.T.; Chen, Y.; Cer, D. NT5?! Training T5 to Perform Numerical Reasoning. arXiv 2021, arXiv:2104.07307. [Google Scholar]
- Luo, H.; Sun, Q.; Xu, C.; Zhao, P.; Lou, J.; Tao, C.; Geng, X.; Lin, Q.; Chen, S.; Zhang, D. WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct. arXiv 2023. [Google Scholar] [CrossRef]
- Azerbayev, Z.; Schoelkopf, H.; Paster, K.; Santos, M.D.; McAleer, S.; Jiang, A.Q.; Deng, J.; Biderman, S.; Welleck, S. Llemma: An Open Language Model For Mathematics. arXiv 2023, arXiv:2310.10631. [Google Scholar]
- Chen, M.; Tworek, J.; Jun, H.; Yuan, Q.; de Olivera Pinto, H.P.; Kaplan, J.; Edwards, H.; Burda, Y.; Joseph, N.; Brockman, G.; et al. Evaluating Large Language Models Trained on Code. arXiv 2021, arXiv:2107.03374. [Google Scholar]
- Nye, M.; Andreassen, A.J.; Gur-Ari, G.; Michalewski, H.; Austin, J.; Bieber, D.; Dohan, D.; Lewkowycz, A.; Bosma, M.; Luan, D.; et al. Show Your Work: Scratchpads for Intermediate Computation with Language Models. In Proceedings of the International Conference on Learning Representations, Virtual, 3–7 May 2021. [Google Scholar]
- Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Ichter, B.; Xia, F.; Chi, E.; Le, Q.; Zhou, D. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. In Proceedings of the Advances in Neural Information Processing Systems, New Orleans, LA, USA, 28 November–9 December 2022. [Google Scholar]
- Banino, A.; Balaguer, J.; Blundell, C. PonderNet: Learning to Ponder. In Proceedings of the International Conference on Machine Learning: Workshop on Automated Machine Learning, Virtual Conference, 18–24 July 2021. [Google Scholar]
- Graves, A. Adaptive Computation Time for Recurrent Neural Networks. arXiv 2016, arXiv:1603.08983. [Google Scholar]
- Zhou, H.; Nova, A.; Larochelle, H.; Courville, A.; Neyshabur, B.; Sedghi, H. Teaching Algorithmic Reasoning via In-Context Learning. arXiv 2022. [Google Scholar] [CrossRef]
- Imani, S.; Du, L.; Shrivastava, H. MathPrompter: Mathematical Reasoning Using Large Language Models. arXiv 2023, arXiv:2303.0539. [Google Scholar]
- Petruzzellis, F.; Testolin, A.; Sperduti, A. Benchmarking GPT-4 on Algorithmic Problems: A Systematic Evaluation of Prompting Strategies. under review.
- OpenAI GPT-4 Technical Report. arXiv 2023, arXiv:2303.08774.
- Wallace, E.; Wang, Y.; Li, S.; Singh, S.; Gardner, M. Do NLP Models Know Numbers? Probing Numeracy in Embeddings. In Proceedings of the Empirical Methods in Natural Language Processing (EMNLP), Hong Kong, China, 3–7 November 2019; pp. 5307–5315. [Google Scholar]
- Naik, A.; Ravichander, A.; Rose, C.; Hovy, E. Exploring Numeracy in Word Embeddings. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, Vancouver, BC, Canada, 30 July–4 August 2019; pp. 3374–3380. [Google Scholar]
- Nogueira, R.; Jiang, Z.; Lin, J. Investigating the Limitations of Transformers with Simple Arithmetic Tasks. arXiv 2021, arXiv:2102.13019. [Google Scholar]
- Zhang, X.; Ramachandran, D.; Tenney, I.; Elazar, Y.; Roth, D. Do Language Embeddings Capture Scales? In Proceedings of the Empirical Methods in Natural Language Processing (EMNLP), Virtual Conference, 16–20 November 2020; pp. 4889–4896. [Google Scholar]
- Kim, J.; Hong, G.; Kim, K.M.; Kang, J.; Myaeng, S.H. Have You Seen That Number? Investigating Extrapolation in Question Answering Models. In Proceedings of the Empirical Methods in Natural Language Processing (EMNLP), Virtual, 7–11 November 2021; pp. 7031–7037. [Google Scholar]
- Thawani, A.; Pujara, J.; Ilievski, F.; Szekely, P. Representing Numbers in NLP: A Survey and a Vision. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Virtual Event, 2–7 June 2021; pp. 644–656. [Google Scholar]
- Jiang, C.; Nian, Z.; Guo, K.; Chu, S.; Zhao, Y.; Shen, L.; Tu, K. Learning Numeral Embedding. In Proceedings of the Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Virtual, 16–20 November 2020; pp. 2586–2599. [Google Scholar]
- Sundararaman, D.; Si, S.; Subramanian, V.; Wang, G.; Hazarika, D.; Carin, L. Methods for Numeracy-Preserving Word Embeddings. In Proceedings of the Empirical Methods in Natural Language Processing (EMNLP), Virtual, 16–20 November 2020; pp. 4742–4753. [Google Scholar]
- Jin, Z.; Jiang, X.; Wang, X.; Liu, Q.; Wang, Y.; Ren, X.; Qu, H. NumGPT: Improving Numeracy Ability of Generative Pre-Trained Models. arXiv 2021. [Google Scholar] [CrossRef]
- Ryskina, M.; Knight, K. Learning Mathematical Properties of Integers. In Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, Punta Cana, Dominican Republic, 11 November 2021; pp. 389–395. [Google Scholar]
- Charton, F. Linear Algebra with Transformers. arXiv 2021, arXiv:2112.01898. [Google Scholar]
- Chowdhery, A.; Narang, S.; Devlin, J.; Bosma, M.; Mishra, G.; Roberts, A.; Barham, P.; Chung, H.W.; Sutton, C.; Gehrmann, S.; et al. PaLM: Scaling Language Modeling with Pathways. arXiv 2022, arXiv:2204.02311. [Google Scholar]
- Wang, X.; Wei, J.; Schuurmans, D.; Le, Q.; Chi, E.; Narang, S.; Chowdhery, A.; Zhou, D. Self-Consistency Improves Chain of Thought Reasoning in Language Models. arXiv 2022, arXiv:2203.11171. [Google Scholar]
- Davis, E. Mathematics, Word Problems, Common Sense, and Artificial Intelligence. arXiv 2023, arXiv:2301.09723. [Google Scholar]
- Welleck, S.; West, P.; Cao, J.; Choi, Y. Symbolic Brittleness in Sequence Models: On Systematic Generalization in Symbolic Mathematics. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 22 February–1 March 2022; Volume 36, pp. 8629–8637. [Google Scholar]
- Hitzler, P.; Sarker, M.K. (Eds.) Neuro-Symbolic Artificial Intelligence: The State of the Art; Frontiers in Artificial Intelligence and Applications; IOS Press: Clifton, VA, USA, 2021; Volume 342, ISBN 9781643682440. [Google Scholar]
- Kudo, K.; Aoki, Y.; Kuribayashi, T.; Brassard, A.; Yoshikawa, M.; Sakaguchi, K.; Inui, K. Do Deep Neural Networks Capture Compositionality in Arithmetic Reasoning? arXiv 2023, arXiv:2302.07866. [Google Scholar]
- Sarama, J.; Clements, D.H. Early Childhood Mathematics Education Research; Routledge: Abingdon, UK, 2009; ISBN 9780203883785. [Google Scholar]
- Carey, S.; Barner, D. Ontogenetic Origins of Human Integer Representations. Trends Cogn. Sci. 2019, 23, 823–835. [Google Scholar] [CrossRef]
- Rittle-Johnson, B.; Siegler, R.S.; Alibali, M.W. Developing Conceptual Understanding and Procedural Skill in Mathematics: An Iterative Process. J. Educ. Psychol. 2001, 93, 346–362. [Google Scholar] [CrossRef]
- Anders, Y.; Rossbach, H.; Weinert, S.; Ebert, S.; Kuger, S.; Lehrl, S.; Maurice, J. Von Home and Preschool Learning Environments and Their Relations to the Development of Early Numeracy Skills. Early Child. Res. Q. 2012, 27, 231–244. [Google Scholar] [CrossRef]
- Turan, E.; De Smedt, B. Understanding Mathematical Development in Preschool Children: The Association between Mathematical Language and Mathematical Abilities. Cogn. Dev. 2023, 66, 101318. [Google Scholar] [CrossRef]
- Gelman, R.; Butterworth, B. Number and Language: How Are They Related? Trends Cogn. Sci. 2005, 9, 6–10. [Google Scholar] [CrossRef]
- Kellman, P.J.; Massey, C.M.; Son, J.Y. Perceptual Learning Modules in Mathematics: Enhancing Students’ Pattern Recognition, Structure Extraction, and Fluency. Top. Cogn. Sci. 2010, 2, 285–305. [Google Scholar] [CrossRef]
- Piazza, M. Neurocognitive Start-up Tools for Symbolic Number Representations. Trends Cogn. Sci. 2010, 14, 542–551. [Google Scholar] [CrossRef]
- Dehaene, S. Origins of Mathematical Intuitions: The Case of Arithmetic. Ann. N. Y. Acad. Sci. 2009, 1156, 232–259. [Google Scholar] [CrossRef]
- Bender, A.; Beller, S. Nature and Culture of Finger Counting: Diversity and Representational Effects of an Embodied Cognitive Tool. Cognition 2012, 124, 156–182. [Google Scholar] [CrossRef]
- Lakoff, G.; Núñez, R. Where Mathematics Comes from: How the Embodied Mind Brings Mathematics into Being; Basic Books: New York, NY, USA, 2000. [Google Scholar]
- Yang, Z.; Li, L.; Lin, K.; Wang, J.; Lin, C.-C.; Liu, Z.; Wang, L. The Dawn of LMMs: Preliminary Explorations with GPT-4V(Ision). arXiv 2023, arXiv:2309.17421. [Google Scholar]
- Delazer, M.; Girelli, L.; Granà, A.; Domahs, F. Number Processing and Calculation—Normative Data from Healthy Adults. Clin. Neuropsychol. 2003, 17, 331–350. [Google Scholar] [CrossRef] [PubMed]
- Clements, D.H.; Sarama, J.H.; Liu, X.H. Development of a Measure of Early Mathematics Achievement Using the Rasch Model: The Research-Based Early Maths Assessment. Educ. Psychol. 2008, 28, 457–482. [Google Scholar] [CrossRef]
- Purpura, D.J.; Lonigan, C.J. Early Numeracy Assessment: The Development of the Preschool Early Numeracy Scales. Early Educ. Dev. 2015, 26, 286–313. [Google Scholar] [CrossRef]
- Testolin, A.; Hou, K.; Zorzi, M. Large-Scale Generative AI Models Lack Visual Number Sense. under review.
- Testolin, A. The Challenge of Modeling the Acquisition of Mathematical Concepts. Front. Hum. Neurosci. 2020, 14, 100. [Google Scholar] [CrossRef]
- Boccato, T.; Testolin, A.; Zorzi, M. Learning Numerosity Representations with Transformers: Number Generation Tasks and Out-of-Distribution Generalization. Entropy 2021, 23, 857. [Google Scholar] [CrossRef]
- Testolin, A.; Dolfi, S.; Rochus, M.; Zorzi, M. Visual Sense of Number vs. Sense of Magnitude in Humans and Machines. Sci. Rep. 2020, 10, 10045. [Google Scholar] [CrossRef]
- Stoianov, I.; Zorzi, M. Emergence of a “visual Number Sense” in Hierarchical Generative Models. Nat. Neurosci. 2012, 15, 194–196. [Google Scholar] [CrossRef]
- Zhao, S.; Ren, H.; Yuan, A.; Song, J.; Goodman, N.; Ermon, S. Bias and Generalization in Deep Generative Models: An Empirical Study. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, Canada, 3–8 December 2018. [Google Scholar]
- Testolin, A.; Zou, W.Y.; McClelland, J.L. Numerosity Discrimination in Deep Neural Networks: Initial Competence, Developmental Refinement and Experience Statistics. Dev. Sci. 2020, 23, e12940. [Google Scholar] [CrossRef] [PubMed]
- Nguyen, A.; Ngo, H.N.; Hong, Y.; Dang, B.; Nguyen, B.P.T. Ethical Principles for Artificial Intelligence in Education. Educ. Inf. Technol. 2023, 28, 4221–4241. [Google Scholar] [CrossRef] [PubMed]
- Lu, P.; Qiu, L.; Yu, W.; Welleck, S.; Chang, K. A Survey of Deep Learning for Mathematical Reasoning. arXiv 2022, arXiv:2212.10535. [Google Scholar]
A | Problem: | 29 + 57 |
Scratchpad: | 2 9 + 5 7, C: 0 2 + 5, 6 C: 1 # added 9 + 7 = 6 carry 1 8 6 C: 0 # added 2 + 5 + 1 = 8 carry 0 0 8 6 | |
Answer: | 86 | |
B | Problem: | If there are 3 cars in the parking lot and 2 more cars arrive, how many cars are in the parking lot? |
Chain-of-thought: | There are originally 3 cars. 2 more cars arrive. 3 + 2 = 5. | |
Answer: | The answer is 5. | |
Problem: | There were nine computers in the server room. Five more computers were installed each day, from Monday to Thursday. How many computers are now in the server room? | |
Chain-of-thought: | There were originally 9 computers. For each of 4 days, 5 more computers were added. So 5 * 4 = 20 computers were added. 9 + 20 is 29. | |
Answer: | The answer is 29. |
USER: | What is the result of the following expression? 3 * 7 + 9 * 8 + 9 = |
ASSISTANT: | 106 |
USER: | What is the result of the following expression? Please report also the intermediate calculation steps. 3 * 7 + 9 * 8 + 9 = |
ASSISTANT: | First, perform the multiplication operations: 3 * 7 = 21 9 * 8 = 72 After that, add the results along with the 9: 21 + 72 + 9 = 102 So, the result of the expression 3 * 7 + 9 * 8 + 9 = is 102. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Testolin, A. Can Neural Networks Do Arithmetic? A Survey on the Elementary Numerical Skills of State-of-the-Art Deep Learning Models. Appl. Sci. 2024, 14, 744. https://doi.org/10.3390/app14020744
Testolin A. Can Neural Networks Do Arithmetic? A Survey on the Elementary Numerical Skills of State-of-the-Art Deep Learning Models. Applied Sciences. 2024; 14(2):744. https://doi.org/10.3390/app14020744
Chicago/Turabian StyleTestolin, Alberto. 2024. "Can Neural Networks Do Arithmetic? A Survey on the Elementary Numerical Skills of State-of-the-Art Deep Learning Models" Applied Sciences 14, no. 2: 744. https://doi.org/10.3390/app14020744
APA StyleTestolin, A. (2024). Can Neural Networks Do Arithmetic? A Survey on the Elementary Numerical Skills of State-of-the-Art Deep Learning Models. Applied Sciences, 14(2), 744. https://doi.org/10.3390/app14020744