Next Article in Journal
On the Identification and Prediction of Stalling Events to Improve QoE in Video Streaming
Previous Article in Journal
Complex-Valued Pix2pix—Deep Neural Network for Nonlinear Electromagnetic Inverse Scattering
Previous Article in Special Issue
Robust Active Shape Model via Hierarchical Feature Extraction with SFS-Optimized Convolution Neural Network for Invariant Human Age Classification
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Line Chart Understanding with Convolutional Neural Network

1
Electrical Engineering and Computer Science Department & Artificial Intelligence Graduate School, Gwangju Institute of Science and Technology (GIST), Buk-gu, Gwangju 61005, Korea
2
Lawrence Livermore National Laboratory, Livermore, CA 94550, USA
*
Author to whom correspondence should be addressed.
Electronics 2021, 10(6), 749; https://doi.org/10.3390/electronics10060749
Submission received: 23 February 2021 / Revised: 12 March 2021 / Accepted: 18 March 2021 / Published: 22 March 2021
(This article belongs to the Special Issue Evolutionary Machine Learning for Nature-Inspired Problem Solving)

Abstract

:
Visual understanding of the implied knowledge in line charts is an important task affecting many downstream tasks in information retrieval. Despite common use, clearly defining the knowledge is difficult because of ambiguity, so most methods used in research implicitly learn the knowledge. When building a deep neural network, the integrated approach hides the properties of individual subtasks, which can hinder finding the optimal configurations for the understanding task in academia. In this paper, we propose a problem definition for explicitly understanding knowledge in a line chart and provide an algorithm for generating supervised data that are easy to share and scale-up. To introduce the properties of the definition and data, we set well-known and modified convolutional neural networks and evaluate their performance on real and synthetic datasets for qualitative and quantitative analyses. In the results, the knowledge is explicitly extracted and the generated synthetic data show patterns similar to human-labeled data. This work is expected to provide a separate and scalable environment to enhance research into technical document understanding.

1. Introduction

Understanding the propositions in chart images is a basic task to understand technical documentation. For this task, a variety of problem settings and machine learning solutions have been proposed [1,2,3,4,5]. Because of the ambiguity in defining a standard of knowledge to extract from a chart, in most studies, the task is indirectly solved as part of a larger integrated task as image caption generation.
This end-to-end style of problem solving can hinder research in academia in finding optimally configured deep neural networks for chart understanding. For solve sequential tasks at once, many deep networks are successful, such as neural machine translation [6], compared with the conventional approach of dividing and conquering the integrated tasks [7,8]. This is not the only case observed in this specific area. Deep neural networks showed high-accuracy image classification by mitigating the drawbacks of decomposing feature extraction and abstraction [9]. Because of the impact of the end-to-end style of problem solving, many deep network researchers configure a whole architecture first and analyze its macroscopic behavior. However, if we do not sufficiently understand the properties of separate tasks, architecture configuration to find the optimal generalization, model capacity, connections, and the required input features for each layer are delayed because all the settings should be searched from scratch. The optimal settings for each task can be hidden because of the effects of merging all integrated tasks in the search.
To address this problem in this paper, we propose a problem definition for the explicit analysis of a chart image, provide an algorithm to generate supervised data, and share them (https://github.com/cy-sohn/LCUdataset_generator (accessed on 9 March 2021)). To the best of our knowledge, problem definition and shared data for understanding statements implied in a line chart have been rarely proposed for helping with microscopic architecture design. We focused on understanding knowledge in line chart images from visual perspectives rather than text-mixed information, called  line chart understanding (LCU) in this paper. In the proposed definition, we test well-known and simply tuned convolutional neural networks for image analysis [10]. They are configured for multitask learning [11,12] with various classification and regression subtasks to determine propositions and their numerical arguments. The contributions of this work are summarized as follows:
  • proposing a definition of knowledge implied in a line chart;
  • providing an algorithm to automatically generate input chart images with their labels;
  • analyzing the properties of the task and data by applying well-known neural networks to synthetic and real datasets.
We note that the main contribution is defining LCU and providing synthetic data with an algorithm validated with human-labeled real data. The neural network configuration is just an example we use to provide easy-to-obtain performance and intuition about this task for readers.
In Section 2, we explain state-of-the-art works related to chart understanding, and in Section 3, we introduce the problem definition for specifying target chart images and the knowledge template. Section 4 describes the algorithm to generate synthetic data. Section 5 and Section 6 show experiment setups and their results in the synthetic data and human-labeled real data. In Section 7 and Section 8, we conclude and discuss future challenges.

2. Related Works

Deep-learning-based chart understanding has been proposed [13,14,15,16], but these works focused on estimating the positions of chart objects rather than understanding implied knowledge in a chart. References [1,2,3,4,17] introduced methods to extract data from a chart or to convert data to other forms. They correlated the recognized results to text and graphic information shown in technical chart images rather than extracting implied statements as LCU. Reference [18] introduced the object detection network for the evaluation of scientific plots. This work aimed to build a model to understand a horizontal bar graph by estimating its numerical attributes. LCU targets line charts and extracting implied propositions instead of estimating the numerical values. Chartsense [4] uses deep-learning-based classifiers to determine the chart type of a given chart image and extracts simple information from the chart image, which is an integrated task implicitly using part of the knowledge in a chart, even though there is no explicit knowledge-understanding module. Figureseer [5] recognizes texts using character recognition modules and parses them with plots together for re-designing various charts and applying them to question answering. The mainly discussion was the estimation of a regression form, rather than capturing knowledge in a logic form. Reference [3] proposed a similar method for understanding and redesigning a chart, but its targets are bar and pie charts. It omits a function to predict intents in a chart different compared with LCU. Chart image generation [19,20,21,22] may also include the chart understanding problem. Reference [19] proposed a method to generate line, bar, and pie chart images, but it is partially automatic so the scalability of data is limited for training data-driven models. PlotQA [20], FigureQA [21], and DVQA [22] provide data used for question answering. PlotQA [20] provides data using the three types of plot images: horizontal bar graph, line plot, and dot-line graph. Text appearing in the chart images consists of words in the document texts. Labels, grids, font sizes, tick labels, line styles, line colors, and legend locations are used as attributes of the chart. In our work, we set wider ranges for those attributes and used more data samples to express detailed local implications of a line. Slopes, positions, and the ranges of lines are also more expressive in LCU. LCU uses both human-labeled and synthetic data for evaluation to confirm the impact of the synthetic data as a test bed for real-world understanding. FigureQA [21] provides visual inference data consisting of more than a million pairs of questions and answers. It can express five types of plot types (line, dot-line, vertical and horizontal bar, and pie charts) and learn logic such as maximum, minimum, and smoothness. Similar to PlotQA, these data have limited forms and attributes in line plots. Attributes such as title, label, tick, and axis label are fixed, and the shape and legend of the line are expressed differently for each plot. There are six questions fixed about the line plot in FigureQA that can be answered by yes or no. LCU has a wider variety of logic templates than FigureQA. DVQA [22] provides data for understanding bar charts. These data are not only applied to QA but also used for extracting numerical and semantic information. The targeted chart of this work is different than that of LCU.

3. Problem Definition for Line Chart Understanding

The goal of the LCU problem is to determine the propositions implied in a line chart image. Thus, an input image is given and we need to predict the most accurate labels representing the propositions and estimate their numerical arguments. In this section, we describe the targeted image conditions and propositions that compose a knowledge template.

3.1. Input: A Line Chart Image

A line chart has many diverse attributes [23]. To cover a wide range of graphic perceptions that humans understand [23,24,25], we set a variety of attributes as shown in Table 1. To obtain unbiased and diverse lines, we set the range of attributes as large as possible in a uniform distribution when generating a value for each attribute (the library used for generating lines: https://matplotlib.org (accessed on 1 January 2021)).
In this problem, we focus on a single chart composed of at most two lines, because this is the first step to solve before we consider more complex charts. The target chart image follows these rules:
  • An image has a line chart.
  • A chart has at most two lines.
  • All lines are continuous and have different colors.
This input setting is used to evaluate the basic functionality of understanding knowledge. It can be easily integrated with other practical downstream tasks in a multitask learning or fine-tuning manner. In addition to the rules, the target image uses a standardized chart frame as follows:
  • The origin point is located at the left bottom.
  • The range of each axis is [0,1] (a standardized range).
The conditions show that any statement assigned to this image is based on visual perspectives. For example, if a model predicts an optimum in this graph, it generates an X-coordinate in [0,1]. Then, the selected point is linearly transformed to the range determined by the attached numerical text labels without any additional process. This setting has the advantage of clarifying the effect of predicting combinations with tick labels and images when detecting knowledge determined purely by visual properties.

3.2. Output: A Knowledge Template

The knowledge template proposed in this paper is the set of propositions determined by classification and regression subtasks. It can also be interpreted as the set of discrete labels and related numerical arguments. The structure, labels, and ranges of labels of all the subtasks are shown in Figure 1. Depending on the objects contained in an image, the logics representing knowledge are categorized into chart, line, and partition groups. In the chart group, the superiority subtask determines which line is superior to the other line overall. If lines have a cross point, the superiority has a None label. The line group has three subtasks: number of partitions is used to recognize the number of segments in the line. We allow one to three contiguous partitions to imply different logics. The line segment in each partition can have an independent growth type label. Monotonicity is used to distinguish whether the slope of the line is positive or negative from the starting to the ending points of a line. If a clear monotonicity is not observed, the None label is assigned. The minimum and maximum are subtasks used to detect minimum and maximum real-valued XY-coordinates in a line, respectively. In the partition group, the range is used to estimate the X-coordinates used as the partition boundaries. Growth type determines the growth type of the line segment in each partition. Examples of input images for extracting the knowledge template are shown in Figure 2.

4. Data Generation

4.1. Algorithm to Generate Labeled Data

After generating the attributes for a chart image, lines are automatically generated for the selected labels of the subtasks. The whole process of generating lines and labels is shown in Figure 3 and Algorithm 1. In the overall steps, we select logics and their numerical arguments first, and randomly select data points to satisfy the selected labels.
In the first step, the algorithm randomly generates two points used as the starting and ending point of a line. The points are in the range of 0 to 1. To determine the number of logics for a line in between the two points, the algorithms selects the number of partitions from {1,2,3} and then build partitions by randomly generating intermediate boundary points. Then, the growth type for each partition is randomly selected from the label set {inear, logarithmic, exponential}. After selecting a growth type for each partition, the form of the lines for the selected label is determined as
l i n e a r   label : y = m x + b
  l o g a r i t h m i c   label : y = k log ( x a ) + b
e x p o n e n t i a l   label : y = k e a x + b      
where x and y are the coordinates of a point; k, a, and b are the parameters to be tuned for drawing a line to pass all generated samples. The value for k is a randomly selected number in [ 1 , 5 ] for linear lines; m and b are approximated for generated data points using the Python library. Data points are sampled at regular intervals on the X-axis. In the algorithm, the range of θ is in [ 0.3 , 2.9 ] . In logarithmic and exponential functions, b and k are approximated to pass the initial points. The parameter a is initially fixed in [ 2 , 20 ] for the exponential function and [ 0.85 × X s t a r t , 0.99 × X s t a r t ] for the logarithmic function, where X s t a r t is the X-coordinate of the leftmost initial points. The boundary conditions locate the lines into the first quadrant. The number of data points positioned in a partition is in the range of 10 to 50.
Algorithm 1 Generation of synthetic supervised data.
  • Randomly select the slope of line θ
  • Randomly select starting and ending points of a line with θ
  • Randomly select a label for the number of partitions
  • Randomly select the boundary X-coordinate of partitions
  • for allpdo
  •     Randomly select a label for growth type
  •     Randomly select a line shape in the type
  •     Generate data points
  •     Draw a line in the range of p
  • end for
  • Determine labels for line-level subtasks
  • Determine labels for chart-level subtasks
  • Return (a chart, a set of labels) pairs

4.2. Detailed Settings for Label Generation

Categories and the range of outputs for each task are shown in Figure 1, which use the following specific configurations for their output. For the number of partitions, we assign the number of partitions to each line; therefore, the partition boundaries of lines are also independent. Growth type is independently assigned to each partition of each line. Superiority determines whether the first line is greater than the second line in the overall area. If a chart has only a line, this task is ignored in training. Label 1 means greater than the second in the overall area, 2 means the opposite case, and 0 means that it is too ambiguous. If the first line is greater than the second line, the minimum value of the first line is greater than or equal to the maximum value of the second line. Monotonicity determines a consistently increasing or decreasing state of a line in its all partitions. We set the label 1 for monotonic increasing, 2 for decreasing, and 0 for the inconsistent case. We set the labels by checking the sign and slope of generated lines. Minimum and maximum are regression subtasks to predict two points whose Y values are the minimum or maximum overall X-coordinates in a line, respectively. The growth type label is separately assigned to each partition of each line. Range is the subtask used to predict the meaningful partition boundaries composed of X-coordinates. In this subtask, the starting point S and ending point E on the X-axis are predicted. The total number of output variables to predict and their types are shown in Table 2. Superiority, monotonicity, growth type, and number of partitions are classification tasks and the others are regression tasks.
Table 3 shows the distribution of labels in the generated 75,000 samples.
Figure 4a,c shows the distributions of minimum and maximum points and mean X-coordinates of partitions. To visualize the distribution, 1000 images were sampled for each number of partitions, and the mean X-coordinates for the starting and ending points were plotted.

4.3. Detailed Settings for Input Image Generation

The default resolution of a chart image is 100 dpi at a figure size of 640 × 480. The background color of the chart area is randomly selected except for black. The grid lines and the chart frame containing the axes are turned on or off. The direction of the lines is vertical, horizontal, or both. Text elements appearing on a chart can contain up to 10 uppercase or lowercase characters. This condition for text generation is equally applied to the chart title, X-axis label, Y-axis label, and line labels. The number of ticks in the chart is between 3 and 12 and represented with two decimal places.

5. Experiments

The goal of the following experiments was to show the easy-to-obtain performance of well-known neural networks and their difference between human-labeled and synthetic test data. We note that proposing a novel and extensively optimized architecture was beyond the scope of this study.

5.1. Model Configuration

To evaluate an easy-to-obtain performance in this problem, we tested ResNet-50, Wide-ResNet-50-2, and Chart-Understanding-Spatial-Transformer-Network (CU-STN), as illustrated in Figure 5. ResNet-50 [26] and Wide ResNet-50-2 [27] were modified to leave the spatial information. The average pooling layer was replaced by the conversion layer (channel = 128, kernel = 3, and stride = 2). Their fully connected layer was also modified to fit the output size. CU-STN is a network configuration that was proposed to apply the spatial transformer network to the ResNet backbone resized for LCU. This network constructs a more robust network given the flexibility of the positions of the lines on a chart. The number of parameters for ResNet-50, Wide-ResNet-50-2, CU-STN is 26, 69, and 9 million, respectively.

5.2. Training Setting

The training loss is the sum of loss functions for 17 classification and 32 regression subtasks. We used cross-entropy for classification and average mean squared error for regression. The problem types for each subtask are shown in Table 2. The total loss function L t o t a l is defined as follows:
L t o t a l = i S L ( i ) P ( i ) L i ,
  L ( i ) = 1 , if a line for the subtask i exists 0 , otherwise
P ( i ) = 1 , if a partition for i exists 0 , otherwise
where S is the set of all subtasks and L i is the ith subtask. Because the number of subtasks is dependent on the value of the selected line and the partition number, we used the indicator functions L and P to determine which subtasks to include in the total function. For monotonicity and superiority, ambiguity is very high and their proportion is not uniform as shown in Table 3. To remove the bias in training, we set the balancing parameters as shown in Table 4 multiplied with cross-entropy loss functions. The balancing parameter was set to the ratio of the inverse of the corresponding proportions. To investigate various behaviors with respect to the generated data size, we prepared four training data sets composed of 1000, 5000, 10,000, and 50,000 sample images. The detailed hyperparameter settings for training are listed in Table 4.

5.3. Evaluation Setting

To evaluate performance, we prepared three test data sets composed of 500 synthetic images, 5000 synthetic images, and 500 human-labeled real images. The best validation model observed in training was used for test evaluation.

6. Result and Discussion

6.1. Quantitative Analysis

The accuracy and error results from 5000 synthetic test images are shown in Table 5. The growth type results are split to the three cases of number of partitions. The best results are displayed in bold text. Growth type per partition is more complex than the other tasks. This result may have been caused by the high ambiguity of the growth type values of short lines. The decrease in the accuracy was an expected pattern because the accuracy in each case is the percentage of the images that obtained the correct labels for all the partitions involved. Superiority is the simplest task. The estimation of partition boundary showed significant errors. Minimum and maximum estimation are more complex than the boundary estimation.
According to Table 6 and Table 7, the results varied but overall patterns of accuracy of subtasks were not significantly different between the human-evaluated data. For the superiority and monotonicity tasks, the proportion of labels is unbalanced compared to the other subtasks maintaining uniform distribution, so we additionally evaluated F1 scores in the small synthetic dataset, as shown in Figure 6. In the case of monotonicity, F1 scores were similar to the accuracy results, which implied that average recall was close to one rather than zero. Superiority showed a significantly lower F1 score compared with the accuracy, so the average recalls were also low. This difference was observed even in the high accuracy near 90%, which implied that the dominating labels had sufficiently large precision and recall while the others did not. Because of the high ambiguity of labeling, this task has high problem complexity.
Figure 7 shows the task-wise comparison results between human-labeled real and small synthetic data. Fluctuation patterns were similar in growth type estimation for one partition case. The two and three partition cases showed large difference, which were caused by the ambiguity shown in the quantitative analysis. The number of partitions, monotonicity, boundary estimation, and minimum and maximum regressions showed relatively similar patterns.
The validation results were also collected, as shown in Table 8.
In this setting, the ratio of validation and training samples was 1:1. The highest accuracy was recorded for growth type, partition confidence, monotonicity, and superiority. The lowest mean square error (MSE) values were recorded for range and minimum and maximum. As with the test, growth type and range were separately marked according to the number of partitions. The score was high because it was the best score recorded in each task during validation regardless of the total loss.
The overall results showed that simple CNN settings resulted in good performance on most subtasks, but a few tasks had low performance. The cause of this limitation is the ambiguity of labels in the data, because the rules for data generation with labeling were mainly based on human intuition. For example, determining linear or logarithmic in many images was challenging. Beyond the problem of ambiguous labeling, limits in machine learning perspectives remain. First, we used multitask learning framework, but learning all subtasks together may not be beneficial depending on their similarity.

6.2. Qualitative Analysis

For Figure 8, we selected two sample images in the synthetic data for each number of partitions case from the test result from the synthetic data. Figure 8a,b shows the correct prediction results for growth type, and regression tasks still need improvement. In Figure 8c,d, some partitions are relatively well-predicted but the maximum and minimum values may be distant from correct points. Growth type labels are partially incorrect, but they are ambiguous even in human evaluation. In Figure 8e,f, partition and growth type values show large errors. In the accurate cases of prediction, we obtain somewhat understandable knowledge in human evaluation, but there are still errors that needs improvement in all tasks. Similarly, Figure 9 shows the prediction results on real test data. Compared with the synthetic data, we can see the natural language texts for labels, various ranges of real tick labels, and other practical attributes. The red bar and blue cross are the prediction results. The results in this data set are similar to those of the synthetic test dataset. Because the prediction is completely based on visual perspective, the prediction can be applied to practical images without the loss of generality.

7. Conclusions

In technical document understanding, learning knowledge implied in a line chart is important, but it is conducted together with downstream tasks. This integration slows research on optimizing the configuration of neural networks used for understanding the knowledge. The explicit knowledge template proposed in this paper and the algorithm to automatically generate supervised data can be used as an incubating environment of models to solve the task. As an example of using the environment, we showed three configurations of convolutional neural networks and analyzed their performance and actual prediction cases. The synthetic data showed similar patterns to the human-labeled real data, showing that this environment can work for incubating models without a data-size limitation. This shared task is expected to boost research on the understanding of technical documents.

8. Future Works

In future work, the domain of applicable charts could be extended. We plan to more rigorously analyze the human evaluation results.

Author Contributions

Conceptualization, K.K. and C.S.; methodology, C.S. and H.C.; software, C.S., H.C., and J.P.; validation, C.S., H.C., and K.K.; formal analysis, C.S., H.C., and K.K.; investigation, C.S.; resources C.S. and J.P.; data curation, C.S.; writing–review and editing, K.K., J.P. and J.N.; writing–original draft preparation, C.S.; visualization, C.S. and H.C.; Supervision, K.K.; project administration, K.K.; funding acquisition, K.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Global University Project (GUP) grant funded by the GIST in 2020, the Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No.2019-0-01842, Artificial Intelligence Graduate School Program (GIST)), and the National Research Foundation of Korea (NRF) grant funded by Korean government (MSIT) (2019R1A2C109107712).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Balaji, A.; Ramanathan, T.; Sonathi, V. Chart-text: A fully automated chart image descriptor. arXiv 2018, arXiv:1812.10636. [Google Scholar]
  2. Mishchenko, A.; Vassilieva, N. Chart image understanding and numerical data extraction. In Proceedings of the 2011 Sixth International Conference on Digital Information Management, Melbourne, Australia, 26–28 September 2011; pp. 115–120. [Google Scholar]
  3. Savva, M.; Kong, N.; Chhajta, A.; Fei-Fei, L.; Agrawala, M.; Heer, J. Revision: Automated classification, analysis and redesign of chart images. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology, Santa Barbara, CA, USA, 16–19 October 2011; pp. 393–402. [Google Scholar]
  4. Jung, D.; Kim, W.; Song, H.; Hwang, J.i.; Lee, B.; Kim, B.; Seo, J. ChartSense: Interactive data extraction from chart images. In Proceedings of the 2017 Chi Conference on Human Factors in Computing Systems, Denver, CO, USA, 6–11 May 2017; pp. 6706–6717. [Google Scholar]
  5. Siegel, N.; Horvitz, Z.; Levin, R.; Divvala, S.; Farhadi, A. FigureSeer: Parsing result-figures in research papers. In Proceedings of the Computer Vision—ECCV 2016, Amsterdam, The Netherlands, 11–14 October 2016; pp. 664–680. [Google Scholar]
  6. Sundermeyer, M.; Alkhouli, T.; Wuebker, J.; Ney, H. Translation modeling with bidirectional recurrent neural networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 14–25. [Google Scholar]
  7. Hutchins, W.J.; Somers, H.L. An Introduction to Machine Translation; Academic Press: London, UK, 1992; Volume 362. [Google Scholar]
  8. Koehn, P. Statistical Machine Translation; Cambridge University Press: Cambridge, UK, 2009. [Google Scholar]
  9. Lee, H.; Grosse, R.; Ranganath, R.; Ng, A.Y. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In Proceedings of the 26th Annual International Conference on Machine Learning, Montreal, QC, Canada, 14–18 June 2009; pp. 609–616. [Google Scholar]
  10. Valueva, M.V.; Nagornov, N.; Lyakhov, P.A.; Valuev, G.V.; Chervyakov, N.I. Application of the residue number system to reduce hardware costs of the convolutional neural network implementation. Math. Comput. Simul. 2020, 177, 232–243. [Google Scholar] [CrossRef]
  11. Baxter, J. A model of inductive bias learning. J. Artif. Intell. Res. 2000, 12, 149–198. [Google Scholar] [CrossRef]
  12. Thrun, S. Is learning the n-th thing any easier than learning the first? In Advances in Neural Information Processing Systems; Morgan Kaufmann Publishers: Los Altos, CA, USA, 1996; pp. 640–646. [Google Scholar]
  13. Kavasidis, I.; Pino, C.; Palazzo, S.; Rundo, F.; Giordano, D.; Messina, P.; Spampinato, C. A saliency-based convolutional neural network for table and chart detection in digitized documents. In Proceedings of the Image Analysis and Processing—ICIAP 2019, Trento, Italy, 9–13 September 2019; pp. 292–302. [Google Scholar]
  14. Amara, J.; Kaur, P.; Owonibi, M.; Bouaziz, B. Convolutional Neural Network Based Chart Image Classification. In Proceedings of the 25th International conference in Central Europe on Computer Graphics, Visualization and Computer Vision (WSCG 2017), Primavera Congress Center, Plzen, Czech Republic, 29 May–2 June 2017; pp. 83–88. [Google Scholar]
  15. Siddiqui, S.A.; Malik, M.I.; Agne, S.; Dengel, A.; Ahmed, S. Decnt: Deep deformable cnn for table detection. IEEE Access 2018, 6, 74151–74161. [Google Scholar] [CrossRef]
  16. Saha, R.; Mondal, A.; Jawahar, C. Graphical Object Detection in Document Images. In Proceedings of the 2019 International Conference on Document Analysis and Recognition (ICDAR), Sydney, Australia, 20–25 September 2019; pp. 51–58. [Google Scholar]
  17. Huang, W.; Liu, R.; Tan, C.L. Extraction of vectorized graphical information from scientific chart images. In Proceedings of the Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), Curitiba, Brazil, 23–26 September 2007; Volume 1, pp. 521–525. [Google Scholar]
  18. Ganguly, P.; Methani, N.; Khapra, M.M.; Kumar, P. A Systematic Evaluation of Object Detection Networks for Scientific Plots. arXiv 2020, arXiv:2007.02240. [Google Scholar]
  19. Huang, W.; Tan, C.L.; Zhao, J. Generating ground truthed dataset of chart images: Automatic or semi-automatic? In Proceedings of the Graphics Recognition. Recent Advances and New Opportunities, Curitiba, Brazil, 20–21 September 2007; pp. 266–277. [Google Scholar]
  20. Methani, N.; Ganguly, P.; Khapra, M.M.; Kumar, P. PlotQA: Reasoning over Scientific Plots. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision, Snowmass Village, CO, USA, 1–5 March 2020; pp. 1527–1536. [Google Scholar]
  21. Kahou, S.E.; Michalski, V.; Atkinson, A.; Kádár, Á.; Trischler, A.; Bengio, Y. Figureqa: An annotated figure dataset for visual reasoning. arXiv 2017, arXiv:1710.07300. [Google Scholar]
  22. Kafle, K.; Price, B.; Cohen, S.; Kanan, C. DVQA: Understanding data visualizations via question answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 5648–5656. [Google Scholar]
  23. Huang, W. Scientific Chart Image Recognition and Interpretation. Ph.D. Thesis, National University of Singapore, Singapore, 2008. [Google Scholar]
  24. Cleveland, W.S.; McGill, R. Graphical perception: The visual decoding of quantitative information on graphical displays of data. J. R. Stat. Soc. Ser. A 1987, 150, 192–210. [Google Scholar] [CrossRef]
  25. Cleveland, W.S.; McGill, R. Graphical perception: Theory, experimentation, and application to the development of graphical methods. J. Am. Stat. Assoc. 1984, 79, 531–554. [Google Scholar] [CrossRef]
  26. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  27. Zagoruyko, S.; Komodakis, N. Wide residual networks. arXiv 2016, arXiv:1605.07146. [Google Scholar]
Figure 1. The structure of logic categories used as labels and their associated numerical ranges in the proposed knowledge template (white boxes in the third column are classification and grey boxes are regression subtasks).
Figure 1. The structure of logic categories used as labels and their associated numerical ranges in the proposed knowledge template (white boxes in the third column are classification and grey boxes are regression subtasks).
Electronics 10 00749 g001
Figure 2. Examples of the generated input images and labels for classification subtasks (In., increasing; De., decreasing).
Figure 2. Examples of the generated input images and labels for classification subtasks (In., increasing; De., decreasing).
Electronics 10 00749 g002
Figure 3. Flow chart of algorithm for data generation.
Figure 3. Flow chart of algorithm for data generation.
Electronics 10 00749 g003
Figure 4. Distribution of randomly generated attributes. All these distributions show the large coverage of lines covered by the algorithm. (a) Minimum (blue dots and maximum (red dots) points. Lines are drawn to pass the minimum and the maximum. (b) The initially selected two points of a line. They were all randomly selected and the leftmost point is the starting point and the rightmost point is the ending point. (c) The distribution of X-coordinates of the boundaries. In each box, the X-coordinates are randomly sampled.
Figure 4. Distribution of randomly generated attributes. All these distributions show the large coverage of lines covered by the algorithm. (a) Minimum (blue dots and maximum (red dots) points. Lines are drawn to pass the minimum and the maximum. (b) The initially selected two points of a line. They were all randomly selected and the leftmost point is the starting point and the rightmost point is the ending point. (c) The distribution of X-coordinates of the boundaries. In each box, the X-coordinates are randomly sampled.
Electronics 10 00749 g004
Figure 5. Architecture of CU-STN ( θ : transformation parameter of STN for grid generator).
Figure 5. Architecture of CU-STN ( θ : transformation parameter of STN for grid generator).
Electronics 10 00749 g005
Figure 6. Comparison of F1 score and accuracy of the (a) superiority and (b) monotonicity subtasks for synthetic data.
Figure 6. Comparison of F1 score and accuracy of the (a) superiority and (b) monotonicity subtasks for synthetic data.
Electronics 10 00749 g006
Figure 7. Comparison of synthetic data (red) and human evaluation data (blue). Bars are the accuracy and MSE for classification and regression, respectively. Bars in the same color are the results of models trained with 1000, 5000, 10,000, and 50,000 samples from the leftmost side. Overall subtasks result show similar tendencies. Detailed numerical results of the tests are shown in Table 6 and Table 7.
Figure 7. Comparison of synthetic data (red) and human evaluation data (blue). Bars are the accuracy and MSE for classification and regression, respectively. Bars in the same color are the results of models trained with 1000, 5000, 10,000, and 50,000 samples from the leftmost side. Overall subtasks result show similar tendencies. Detailed numerical results of the tests are shown in Table 6 and Table 7.
Electronics 10 00749 g007
Figure 8. Example of detailed results on synthetic test dataset. Blue, correct prediction; red, wrong prediction; In, increase; De, decrease; blue cross, minimum and maximum point; red line:,partition boundary. These test data consist of one line chart, so superiority evaluation was excluded.
Figure 8. Example of detailed results on synthetic test dataset. Blue, correct prediction; red, wrong prediction; In, increase; De, decrease; blue cross, minimum and maximum point; red line:,partition boundary. These test data consist of one line chart, so superiority evaluation was excluded.
Electronics 10 00749 g008
Figure 9. Example of detailed results with the human-labeled dataset. Blue, correct prediction; red, wrong prediction; In, increase; De, decrease; blue cross, minimum and maximum point; red line, partition boundary. These test data consist of one line chart, so superiority evaluation was excluded.
Figure 9. Example of detailed results with the human-labeled dataset. Blue, correct prediction; red, wrong prediction; In, increase; De, decrease; blue cross, minimum and maximum point; red line, partition boundary. These test data consist of one line chart, so superiority evaluation was excluded.
Electronics 10 00749 g009
Table 1. Attributes of the frame of the targeted line chart image.
Table 1. Attributes of the frame of the targeted line chart image.
AttributeRange
titlenameup to 10 characters
size[5, 10] (font size)
position { top , down } × { left , center , right }
Axislabelup to 10 characters
label position { center , none }
label size[5, 8] (font size)
label color { black , white }
range[0.0, 1.0]
tick labelup to 10 characters
tick label size[4, 7] (font size)
tick digittwo decimal places
number of ticks[3, 12]
Legendlabelup to 10 characters
position { upper , lower } × { right , left }
border { border , none }
border color { black , gray }
background 4 colors
Linetype4 types including sold and dotted type
color 7 colors
thickness[1, 4] (line width)
Backgroundcolor 5 colors
grid { horizontal , vertical , both , none }
plot area frame { lower left , lower left & upper right }
Table 2. Numbers and types of subtasks (the number of output variables to predict is doubled for two lines in all subtasks except superiority).
Table 2. Numbers and types of subtasks (the number of output variables to predict is doubled for two lines in all subtasks except superiority).
CategorySubtaskNumber of SubtasksType
chartSuperiority1classification
lineNumber of Partitions2classification
Monotonicity2classification
XY-coordinates for Minimum4regression
XY-coordinates fpr Maximum4regression
partitionX of start & end for 1 partition case4regression
X of start & end for 2 partition case8regression
X of start & end for 3 partition case12regression
Growth Type labels for 1 partition case2classification
Growth Type labels for 2 partition case4classification
Growth Type labels for 3 partition case6classification
Table 3. Proportion of labels in training data.
Table 3. Proportion of labels in training data.
SubtaskClassProportion
Number of Line149.92%
250.08%
Number of Partitions133.43%
233.22%
333.35%
SuperiorityNone86.85%
Line 16.77%
Line 26.38%
MonotonicityNone51.27%
Increasing24.42%
Decreasing24.31%
Growth TypeLinear Increasing16.74%
Linearly Decreasing16.7%
Logarithmic Increasing16.57%
Logarithmic Decreasing16.45%
Exponential Increasing16.73%
Exponential Decreasing16.82%
Table 4. Hyper-Parameter Settings (In.: Increasing, De.: Decreasing).
Table 4. Hyper-Parameter Settings (In.: Increasing, De.: Decreasing).
DatasetHyper-ParameterValue
Commonbatch size64
pretrained modelfalse
Trainingvalidation ratio0.5
maximum update step500,000
optimizer algorithmAdam
optimizer hyperparameter (alpha, beta)(0.9, 0.999)
learning rate0.001
weight decay0
learning rate scheduler algorithmReduceLROnPlateau
scheduler hyperparameter (patience)min( total epochs 10 , 10)
scheduler hyperparameter (factor)0.1
Testbalancing parameters ( monotonicity (None, In., De.)(0.58, 1.21, 1.21)
label weights for superiority (None, Line1, Line2)(0.11, 1.40, 1.49)
Table 5. Performance for all subtasks in synthetic test data (5000 samples; part., the number of partitions; mono., monotonicity; super., superiority; | D t r | , size of training and validation data; MSE, mean square error; W-ResNet-50-2: Wide-ResNet-50-2).
Table 5. Performance for all subtasks in synthetic test data (5000 samples; part., the number of partitions; mono., monotonicity; super., superiority; | D t r | , size of training and validation data; MSE, mean square error; W-ResNet-50-2: Wide-ResNet-50-2).
Classification Accuracy (%)Average MSE ( 10 1 )
| D tr | Growth TypePart.Mono.Super.RangeMin &
p1p2p3p1p2p3Max
ResNet
50
1K21.474.260.7837.2449.4181.190.650.530.450.70
5K26.275.601.4840.2456.7270.610.550.500.440.69
10K36.489.361.8642.5761.6862.020.550.530.380.64
50K76.2349.9225.9360.8576.8376.170.240.190.150.35
W-ResNet
50-2
1K16.953.090.5834.7636.8765.370.530.390.300.70
5K31.926.651.6340.8749.4958.710.460.400.320.50
10K35.439.662.2946.6663.1376.000.350.240.190.40
50K78.8552.8030.7564.2477.2580.870.250.200.160.36
CU-STN1K20.024.180.3934.2247.6084.140.550.350.270.64
5K17.153.220.4332.5842.5987.160.480.320.220.63
10K38.0111.962.6850.9969.8873.510.380.240.190.39
50K71.7146.2821.0462.1575.1377.470.270.180.150.36
CU-STN+
scheduler
1K16.262.930.7434.5233.5184.870.490.330.230.63
5K17.152.970.2333.2842.5987.160.470.320.220.62
10K37.0910.791.8645.3165.5177.510.350.250.200.39
50K69.4537.4213.6359.2576.9181.190.250.190.160.34
Table 6. Performance for all subtasks in human-labeled real test data (500 samples; part., the number of partitions; mono., monotonicity).
Table 6. Performance for all subtasks in human-labeled real test data (500 samples; part., the number of partitions; mono., monotonicity).
Classification Accuracy (%)Average MSE ( 10 1 )
| D tr | Growth TypePart.Mono.RangeMin &
p1p2p3p1p2p3Max
ResNet
50
1K46.885.560.0035.0444.531.050.640.521.15
5K36.6022.220.0031.3953.280.560.590.421.41
10K50.4516.677.1426.6469.710.840.610.420.82
50K85.7169.4442.8681.0289.050.490.150.130.69
W-ResNet
50-2
1K21.432.780.0020.0736.860.810.310.251.30
5K38.345.560.0043.4367.150.640.400.230.96
10K65.1819.447.1428.1082.480.750.330.160.73
50K86.6166.6750.0079.9390.150.490.200.140.58
CU-STN1K16.522.787.1411.6813.141.210.450.241.24
5K42.860.000.0013.148.760.860.310.171.33
10K59.3825.000.0050.0084.310.600.290.160.62
50K82.1458.3342.8663.1493.070.510.200.130.56
CU-STN+
scheduler
1K20.982.780.005.8468.610.760.280.171.30
5K42.860.000.0081.758.760.780.270.161.29
10K50.4519.440.0022.9970.070.310.250.090.63
50K72.7736.1135.7145.9890.150.610.210.180.62
Table 7. Performance of all subtasks with the synthetic test data (500 samples; part., the number of partitions; mono., monotonicity; super., superiority; | D t r | , size of training and validation data, W-ResNet-50-2: Wide-ResNet-50-2).
Table 7. Performance of all subtasks with the synthetic test data (500 samples; part., the number of partitions; mono., monotonicity; super., superiority; | D t r | , size of training and validation data, W-ResNet-50-2: Wide-ResNet-50-2).
Classification Accuracy (%)Average MSE ( 10 1 )
| D tr | Growth TypePart.Mono.Super.RangeMin &
p1p2p3p1p2p3Max
ResNet
50
1K23.793.190.4138.6551.3583.750.680.570.440.71
5K28.235.980.4137.4358.7868.750.610.550.460.71
10K38.719.962.4940.6863.5161.250.530.530.370.64
50K74.1955.7819.0962.4376.7678.750.260.210.150.35
W-ResNet
50-2
1K16.534.380.0034.8633.9268.750.560.430.300.71
5K32.267.171.2440.6850.9556.670.490.430.320.48
10K36.2911.160.4145.0063.9273.750.370.240.190.40
50K75.0049.8026.5664.5977.1680.830.260.200.160.35
CU-STN1K21.372.390.4134.7346.4985.000.580.360.270.64
5K17.743.980.0032.8440.8188.330.520.340.240.62
10K41.5311.953.3249.5971.4971.250.360.230.200.38
50K68.5552.5920.7563.1176.7678.330.270.180.140.34
CU-STN+
scheduler
1K18.953.190.8333.3835.0085.420.540.350.240.63
5K17.741.590.0033.5140.8188.330.510.340.240.62
10K39.119.161.6645.6866.4976.250.360.250.200.37
50K70.1637.0516.1858.9277.8480.420.260.190.160.33
Table 8. Performance for all subtasks in synthetic validation data (tr., training; va., validation; part., the number of partitions; mono., monotonicity; super., superiority; | D t r | , size of training and validation data; W-ResNet-50-2: Wide-ResNet-50-2).
Table 8. Performance for all subtasks in synthetic validation data (tr., training; va., validation; part., the number of partitions; mono., monotonicity; super., superiority; | D t r | , size of training and validation data; W-ResNet-50-2: Wide-ResNet-50-2).
Classification Accuracy (%)Average MSE ( 10 1 )
| D tr | Growth TypePart.Mono.Super.RangeMin &
p1p2p3p1p2p3Max
ResNet
50
1K27.036.722.2341.5350.8485.350.490.440.400.65
5K34.249.122.3542.1661.8485.900.540.370.360.61
10K43.6712.602.8245.4169.3686.700.380.310.240.45
50K75.9347.7027.7961.0176.1886.450.240.200.150.35
W-ResNet
50-2
1K22.395.532.2842.5651.2384.980.370.270.250.52
5K35.699.592.1844.0764.2185.900.450.340.260.48
10K42.5313.002.7446.8970.1786.660.350.240.190.39
50K77.0252.4932.3063.0776.5987.280.240.200.170.37
CU-STN1K24.714.742.3039.7152.1384.980.410.300.250.58
5K18.444.170.9734.1442.9085.900.480.320.220.62
10K47712.242.7049.2572.4186.850.380.250.200.38
50K73.3745.7424.1360.7575.6487.630.260.190.140.35
CU-STN+
scheduler
1K2854.350.7737.0048.2584.980.460.310.250.63
5K17.243.141.2136.5142.9085.900.480.320.220.62
10K38.9811.112.6244.9466.3686.660.360.260.200.39
50K72.7842.0118.2060.4175.7888.170.250.190.150.34
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Sohn, C.; Choi, H.; Kim, K.; Park, J.; Noh, J. Line Chart Understanding with Convolutional Neural Network. Electronics 2021, 10, 749. https://doi.org/10.3390/electronics10060749

AMA Style

Sohn C, Choi H, Kim K, Park J, Noh J. Line Chart Understanding with Convolutional Neural Network. Electronics. 2021; 10(6):749. https://doi.org/10.3390/electronics10060749

Chicago/Turabian Style

Sohn, Chanyoung, Heejong Choi, Kangil Kim, Jinwook Park, and Junhyug Noh. 2021. "Line Chart Understanding with Convolutional Neural Network" Electronics 10, no. 6: 749. https://doi.org/10.3390/electronics10060749

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop