Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessReview

Peer-Review Record

Can Neural Networks Do Arithmetic? A Survey on the Elementary Numerical Skills of State-of-the-Art Deep Learning Models

Appl. Sci. 2024, 14(2), 744; https://doi.org/10.3390/app14020744

by Alberto Testolin^1,2

Reviewer 1: Anonymous

Reviewer 2:

Zhenguo Zhang

Reviewer 3:

Paulo Vasconcelos

Appl. Sci. 2024, 14(2), 744; https://doi.org/10.3390/app14020744

Submission received: 5 December 2023 / Revised: 12 January 2024 / Accepted: 12 January 2024 / Published: 15 January 2024

(This article belongs to the Special Issue Revolutionary Innovation in Artificial Intelligence: Modern Application and Its Impact)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

General comments:

This review presents a comprehensive survey of neural network models for numerical reasoning, meticulously examining various architectures and approaches. The analysis of state-of-the-art models provides valuable insights into their performance on mathematical tasks, shedding light on both accomplishments and limitations. The document is well-structured, systematically navigating through different model categories, from ad hoc architectures to generic deep learning structures and large language models. It emphasizes the challenges inherent in numerical reasoning, underscoring the essential requirement for models to possess a foundational understanding of basic arithmetic.

While the document is largely informative, there are areas that could benefit from refinement. The writing style, although technically sound, occasionally tends towards density and could be enhanced with more concise summaries, particularly in highly technical sections. It is crucial to revise the consistent use of the first person singular ("I") to adhere to the conventional practice of employing the first person plural ("we") in scientific writing. Additionally, the document would be enriched by explicit discussions on the ethical implications associated with the discussed models. Given the growing importance of ethical considerations in AI research, incorporating a dedicated section or brief discussions within relevant sections would be beneficial. Lastly, ensuring consistency in terminology is essential. If specific models or techniques are referred to in a particular way, maintain uniform terminology throughout the document.

Detailed Comments:

1. Consistency in Terminology:

· In Section 3.3, both "GPT-3" and "ChatGPT" are mentioned. For consistency, please decide on using either "GPT-3" or "ChatGPT" and maintain the same term throughout the document.

· The term "numerical reasoning" is used in the heading of Section 3 and the term "arithmetic" is used in the title. Ensure the use of a consistent term to describe the topic.

· In Section 3, the terms "ad hoc architectures" and "generic deep learning architectures" are used. Maintain consistent terminology for clarity.

· The document refers to "Large Language Models (LLMs)" and "large-scale models." Please ensure uniformity in the terminology.

2. Ethical Implications:

The document lacks explicit discussions on ethical implications associated with the discussed models. Please include brief discussions within relevant sections. For instance:

· Consider addressing potential biases in the training data used for these models and how they might manifest in the model's outputs. This is particularly relevant when dealing with real-world numerical reasoning problems that may involve diverse and culturally sensitive scenarios.

· Discuss any privacy considerations related to the integration of neuro-symbolic systems, especially when handling numerical data. How do these models ensure data confidentiality, and are there potential risks associated with sensitive numerical information?

· Models that leverage external tools or verifiers, as discussed in some sections, could raise ethical concerns. Discuss how reliance on external tools may impact the model's autonomy, and consider the ethical implications of using calculators or specific software in conjunction with AI models.

3. Use of First Person:

The document consistently uses the first person singular ("I") instead of the more appropriate first person plural ("we"). In scholarly writing, it is customary to employ the collective "we" to denote the author or authors, even if the work is conducted by a single individual. The use of "we" aligns with the collaborative nature of scientific research and contributes to a sense of shared responsibility for the presented work. Please revise instances of "I" to "we" throughout the document to adhere to the conventions of scientific writing.

Author Response

I am very grateful to the Reviewer for the positive feedback and for a set of very helpful comments. I did my best to address all the concerns raised by the Reviewer. In particular:

I improved the consistency in terminology, by changing the title of Section 3 to “Neural network models for arithmetic reasoning”, by checking that “ad hoc architectures” is used to identify domain-specific models while “generic architectures” is used to identify domain-general models and by avoiding the term “large-scale” when referring to models that do not belong to the family of large language models. I did not merge the terms “GPT-3” and “ChatGPT” because in the present discussion they indicate different models (GPT-3 being the original LLM published in 2020, while ChatGPT being the GPT-3.5 version fine-tuned with RLHF, published in 2023).
I added a brief discussion about the possible ethical implications associated with the models discussed in the final section of the survey. In particular, I discussed the potential issues related to of cultural and linguistic biases in the training corpora and the complications introduced by the use of external tools, which might indeed impact the model’s autonomy. I preferred to not comment on the privacy issues related to the processing of numerical information, since they are generally related to the fact that numbers might encode sensitive data but are not directly related to the capability of manipulating numbers through arithmetic reasoning.
I switched to the first-person plural form across the entire document in order to make the manuscript aligned with the style of scholarly writing.

Reviewer 2 Report

Comments and Suggestions for Authors

This paper is be initially review the main tasks, data sets and benchmarks that have been proposed to train and test the arithmetic capabilities of deep learning models. The paper will then present the main neural network architectures that have been proposed to solve this kind of problems. Overall, this work is interesting and well-organized, some improvements are also still needed. I believe the current manuscript needs minor revision for publication, below are some comments:

1. The placement of Figure 1 needs to be reconsidered. It is suggested that it be reworked. In addition, the formatting of the figure notes is problematic.

2. The images in the manuscript are vague and do not convey clear information. It is suggested that they should be replaced with a table format or redrawn.

3. There are fewer pictures or tables in the manuscript. It is suggested that it be redrawn.

4. There is a problem with the formatting of the references. It should be revised according to the requirements of the journal. In addition, the citation order is problematic. It is suggested that the full text be reviewed and revised.

Author Response

I am very grateful to the Reviewer for the positive feedback and for the useful comments. I did my best to address all the concerns raised by the Reviewer. In particular, I converted all images into tabular format to improve readability and I moved the Figure 1 into a more suitable placement. I also improved the formatting of the figure notes and of the references, at the same time making sure that all bibliographic items adhere to the journal requirements and are cited in order of appearance.

Reviewer 3 Report

Comments and Suggestions for Authors

This article intends to be a summary and critical analysis of the ability of AI, in its current state, to solve mathematical problems.

The author’s review comprehensively addresses the primary tasks, datasets, and benchmarks proposed in the literature for training and evaluating the arithmetic abilities of deep learning models. These encompass numerical problems encoded in natural language or basic mathematical formalism.

The review synthesizes the key neural network architectures proposed to tackle these problems. These architectures include specialized modules designed specifically for processing mathematical notation, versatile deep learning systems, and expansive language models.

Furthermore, the review explores the principal strategies employed to introduce numerical semantics into word embeddings. It also showcases some illustrative examples of arithmetic computations performed by cutting-edge AI models, such as GPT-4.

While the article does not introduce new concepts, its value lies in its concise yet thorough review of the current state of AI’s capabilities in solving mathematical problems. The article’s strength is its well-articulated content and comprehensive literature review. As such, it is an exciting and valuable resource in the field.

Author Response

I am very grateful to the Reviewer for the positive feedback and appreciation.

Article Menu

Can Neural Networks Do Arithmetic? A Survey on the Elementary Numerical Skills of State-of-the-Art Deep Learning Models

Further Information

Guidelines

MDPI Initiatives

Follow MDPI