Dataset of Program Source Codes Solving Unique Programming Exercises Generated by Digital Teaching Assistant
Abstract
:Abstract
Dataset
Dataset License
1. Summary
2. Digital Teaching Assistant
2.1. Digital Teaching Assistant Core
- Translation of some formal notation into source code;
- Conversion between two different data formats.
2.2. Digital Teaching Assistant Web Application
3. Data Description
Algorithm 1: Filtering and preprocessing of source code datasets. | |
Input: | S—dataset with source codes of programs that successfully solved the tasks. |
1. | Define set of string literals K = {print(, eval(, exec(, .com, .ru}. |
2. | Define P = Ø. |
3. | For each s ∈ S do: |
4. | Construct an AST a for s using the ast.parse function [38]. |
5. | Remove docstrings from a. |
6. | Restore the program text s* from a using the ast.unparse function [38]. |
7. | If then . |
8. | End loop. |
9. | Return the filtered set of preprocessed programs. |
4. Intelligent Source Code Analysis Algorithms Used in Digital Teaching Assistant
Algorithm 2: Markov chain construction for the given source code. | |
Input: | s—source code of a program solving a unique programming exercise. |
1. | Construct an AST A = (V,E) for program s using the ast.parse function [38]. |
2. | Delete from A vertices belonging to set {Load, Store, alias, arguments, args}. |
3. | Define the mapping g:V→ that maps a vertex v∈V to its type. |
4. | M = Ø. |
5. | . |
6. | For each vertex type t ∈ T do: |
7. | —descendants of vertices of type t. |
8. | —types of descendants of vertices of type t. |
9. | For each descendant vertex type td∈Td do: |
10. | —normed descendant count for td. |
11. | . |
12. | End loop. |
13. | End loop. |
14. | Return the weighted state transition graph (T,M) of the Markov chain. |
Algorithm 3: Conversion of source codes to vectors based on Markov chains. | |
Input: | S—a set of source codes to be converted into vector representations. |
1. | H = Ø—a set for AST node types that occur in Markov chains. |
2. | G = Ø. |
3. | For each source code s ∈ S do: |
4. | Construct a Markov chain (T,M) for s according to Algorithm 2. |
5. | —add observed vertices to the H set. |
6. | —add a set of weighted edges of a Markov chain to G. |
7. | End loop. |
8. | V = Ø—a set for vector representations of program source codes. |
9. | —count of different AST node types that occur in Markov chains. |
10. | For each set of edges M ∈ G do: |
11. | Construct a weighted adjacency matrix for graph (H,M). |
12. | Convert the matrix to vector , where h = m2. |
13. | . |
14. | End loop. |
15. | Return the V set containing vector representations of source codes. |
5. Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A
References
- Emanuelsson, P.; Nilsson, U. A Comparative Study of Industrial Static Analysis Tools. Electron. Notes Theor. Comput. Sci. 2008, 217, 5–21. [Google Scholar] [CrossRef] [Green Version]
- Ayewah, N.; Pugh, W.; Morgenthaler, J.D.; Penix, J. Using Static Analysis to Find Bugs. IEEE Softw. 2008, 25, 22–29. [Google Scholar] [CrossRef] [Green Version]
- Jiang, H.; Yang, H.; Qin, S.; Su, Z.; Zhang, J.; Yan, J. Detecting Energy Bugs in Android Apps Using Static Analysis. In Proceedings of the Formal Methods and Software Engineering: 19th International Conference on Formal Engineering Methods, ICFEM 2017, Xi’an, China, 13–17 November 2017; Springer International Publishing: Cham, Switzerland, 2017; pp. 192–208. [Google Scholar]
- McPeak, S.; Gros, C.H.; Ramanathan, M.K. Scalable and Incremental Software Bug Detection. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, Saint Petersburg, Russia, 18–26 August 2013; pp. 554–564. [Google Scholar]
- Ebert, C.; Cain, J.; Antoniol, G.; Counsell, S.; Laplante, P. Cyclomatic complexity. IEEE Softw. 2016, 33, 27–29. [Google Scholar] [CrossRef]
- Campbell, G.A. Cognitive complexity: An overview and evaluation. In Proceedings of the 2018 International Conference on Technical Debt, Gothenburg, Sweden, 27–28 May 2018; pp. 57–58. [Google Scholar]
- Bruch, M.; Monperrus, M.; Mezini, M. Learning from Examples to Improve Code Completion Systems. In Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, Amsterdam, The Netherlands, 24–28 August 2009; pp. 213–222. [Google Scholar]
- Svyatkovskiy, A.; Zhao, Y.; Fu, S.; Sundaresan, N. Pythia: Ai-assisted Code Completion System. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 3–7 August 2019; pp. 2727–2735. [Google Scholar]
- Terada, K.; Watanobe, Y. Code Completion for Programming Education Based on Recurrent Neural Network. In Proceedings of the 2019 IEEE 11th International Workshop on Computational Intelligence and Applications (IWCIA), Hiroshima, Japan, 9–10 November 2019; pp. 109–114. [Google Scholar]
- Alon, U.; Zilberstein, M.; Levy, O.; Yahav, E. code2vec: Learning Distributed Representations of Code. In Proceedings of the ACM on Programming Languages, Providence, RI, USA, 22–26 June 2019; pp. 40:1–40:29. [Google Scholar]
- Li, Y.; Wang, S.; Nguyen, T. A Context-based Automated Approach for Method Name Consistency Checking and Suggestion. In Proceedings of the 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), Madrid, Spain, 22–30 May 2021; pp. 574–586. [Google Scholar]
- Lacomis, J.; Yin, P.; Schwarts, E.; Allamanis, M.; Goues, C.; Neubig, G.; Vasilescu, B. Dire: A Neural Approach to Decompiled Identifier Naming. In Proceedings of the 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), San Diego, CA, USA, 11–15 November 2019; pp. 628–639. [Google Scholar]
- Marcus, A.; Maletic, J.I. Identification of High-level Concept Clones in Source Code. In Proceedings of the 16th Annual International Conference on Automated Software Engineering (ASE 2001), San Diego, CA, USA, 26–29 November 2001; pp. 107–114. [Google Scholar]
- Moussiades, L.; Vakali, A. PDetect: A Clustering Approach for Detecting Plagiarism in Source Code Datasets. Comput. J. 2005, 48, 651–661. [Google Scholar] [CrossRef]
- Sovietov, P.N.; Gorchakov, A.V. Digital Teaching Assistant for the Python Programming Course. In Proceedings of the 2022 2nd International Conference on Technology Enhanced Learning in Higher Education (TELE), Lipetsk, Russia, 26–27 May 2022; pp. 272–276. [Google Scholar]
- Andrianova, E.G.; Demidova, L.A.; Sovetov, P.N. Pedagogical Design of a Digital Teaching Assistant in Massive Professional Training for the Digital Economy. Russ. Technol. J. 2022, 10, 7–23. [Google Scholar] [CrossRef]
- Mekterović, I.; Brkić, L.; Milašinović, B.; Baranović, M. Building a Comprehensive Automated Programming Assessment System. IEEE Access 2020, 8, 81154–81172. [Google Scholar] [CrossRef]
- Queirós, R.A.P.; Leal, J.P. PETCHA: A Programming Exercises Teaching Assistant. In Proceedings of the 17th ACM Annual Conference on Innovation and Technology in Computer Science Education, Haifa, Israel, 3–5 July 2012; pp. 192–197. [Google Scholar]
- Combéfis, S. Automated Code Assessment for Education: Review, Classification and Perspectives on Techniques and Tools. Software 2022, 1, 3–30. [Google Scholar] [CrossRef]
- Jiang, L.; Misherghi, G.; Su, Z.; Glondu, S. Deckard: Scalable and Accurate Tree-Based Detection of Code Clones. In Proceedings of the 29-th International Conference on Software Engineering (ICSE’07), Minneapolis, MN, USA, 20–26 May 2007; IEEE: Pistacaway, NJ, USA, 2007; pp. 96–105. [Google Scholar]
- Kustanto, C.; Liem, I. Automatic Source Code Plagiarism Detection. In Proceedings of the 2009 10th ACIS International Conference on Software Engineering, Artificial Intelligences, Networking and Parallel/Distributed Computing, Daegu, Republic of Korea, 27–29 May 2009; IEEE: Pistacaway, NJ, USA, 2009; pp. 481–486. [Google Scholar]
- Yasaswi, J.; Kailash, S.; Chilupuri, A.; Purini, S.; Jawahar, C.V. Unsupervised Learning-Based Approach for Plagiarism Detection in Programming Assignments. In Proceedings of the 10th Innovations in Software Engineering Conference, Jaipur, India, 5–7 February 2017; Association for Computing Machinery: New York, NY, USA, 2017; pp. 117–121. [Google Scholar]
- Sovietov, P. Automatic Generation of Programming Exercises. In Proceedings of the 2021 1st International Conference on Technology Enhanced Learning in Higher Education (TELE), Lipetsk, Russia, 7–9 July 2021; IEEE: Pistacaway, NJ, USA, 2021; pp. 111–114. [Google Scholar]
- Demidova, L.A.; Sovietov, P.N.; Gorchakov, A.V. Clustering of Program Source Text Representations Based on Markov Chains. Vestn. Ryazan State Radio Eng. Univ. 2022, 81, 51–64. [Google Scholar]
- Demidova, L.A.; Gorchakov, A.V. Classification of Program Texts Represented as Markov Chains with Biology-Inspired Algorithms-Enhanced Extreme Learning Machines. Algorithms 2022, 15, 329. [Google Scholar] [CrossRef]
- Allamanis, M.; Sutton, C. Mining Idioms from Source Code. In Proceedings of the 22nd ACM Sigsoft International Symposium on Foundations of Software Engineering, Hong Kong, China, 16–21 November 2014; pp. 472–483. [Google Scholar]
- Pham, H.S.; Nijssen, S.; Mens, K.; Nucci, D.D.; Molderez, T.; Roover, C.D.; Fabry, J.; Zaytsev, V. Mining Patterns in Source Code using Tree Mining Algorithms. In Proceedings of the Discovery Science: 22nd International Conference, DS 2019, Split, Croatia, 28–30 October 2019; Springer International Publishing: New York, NY, USA, 2019; pp. 471–480. [Google Scholar]
- Lin, J. Divergence Measures Based on the Shannon Entropy. IEEE Trans. Inf. Theory 1991, 37, 145–151. [Google Scholar] [CrossRef] [Green Version]
- Nielsen, F. On the Jensen–Shannon Symmetrization of Distances Relying on Abstract Means. Entropy 2019, 21, 485. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Sokal, R.R.; Michener, C.D. A Statistical Method for Evaluating Systematic Relationships. Evolution 1957, 11, 130–162. [Google Scholar]
- Peveler, M.; Maicus, E.; Cutler, B. Comparing Jailed Sandboxes vs Containers Within an Autograding System. In Proceedings of the 50th ACM Technical Symposium on Computer Science Education, Minneapolis, MN, USA, 27 February–2 March 2019; pp. 139–145. [Google Scholar]
- Wang, X.; Du, J.; Liu, H. Performance and Isolation Analysis of RunC, gVisor and Kata Containers Runtimes. Clust. Comput. 2022, 25, 1497–1513. [Google Scholar] [CrossRef]
- Brailsford, S.C.; Potts, C.N.; Smith, B.M. Constraint Satisfaction Problems: Algorithms and Applications. Eur. J. Oper. Res. 1999, 119, 557–581. [Google Scholar] [CrossRef]
- Mailund, T. Introducing Markdown and Pandoc: Using Markup Language and Document Converter; Apress: Berkeley, CA, USA, 2019; 139p. [Google Scholar]
- Gansner, E.R.; North, S.C. An Open Graph Visualization System and its Applications to Software Engineering. Softw. Pract. Exp. 2000, 30, 1203–1233. [Google Scholar] [CrossRef]
- Fowler, M.; Rice, D.; Foemmel, M.; Hieatt, E.; Mee, R.; Stafford, R. Patterns of Enterprise Application Architecture; Addison-Wesley Professional: Boston, MA, USA, 2002; Chapter 14. [Google Scholar]
- Bayer, M.; Brown, A.; Wilson, G. SQLAlchemy. Archit. Open-Source Appl. 2012, 2, 20. [Google Scholar]
- Python Software Foundation. AST—Abstract Syntax Trees. 2023. Available online: https://docs.python.org/3/library/ast.html (accessed on 28 March 2023).
- Wang, Y.; Huang, H.; Rudin, C.; Shaposhnik, Y. Understanding How Dimension Reduction Tools Work: An Empirical Approach to Deciphering t-SNE, UMAP, TriMAP, and PaCMAP for Data Visualization. J. Mach. Learn. Res. 2021, 22, 9129–9201. [Google Scholar]
- Demidova, L.A.; Gorchakov, A.V. Fuzzy Information Discrimination Measures and Their Application to Low Dimensional Embedding Construction in the UMAP Algorithm. J. Imaging 2022, 8, 113. [Google Scholar] [CrossRef] [PubMed]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Shahapure, K.R.; Nicholas, C. Cluster Quality Analysis Using Silhouette Score. In Proceedings of the 2020 IEEE 7th international conference on data science and advanced analytics (DSAA), Sydney, Australia, 6–9 October 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 747–748. [Google Scholar]
- Zhang, Z.; Xing, Z.; Xia, X.; Xu, X.; Zhu, L. Making Python code idiomatic by automatic refactoring non-idiomatic Python code with pythonic idioms. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, University Town, Singapore, 14–16 November 2022; pp. 696–708. [Google Scholar]
- Russell, R.L.; Kim, L.; Hamilton, L.H.; Lazovich, T.; Harer, J.A.; Ozdemir, O.; Ellingwood, P.M.; McConley, M.W. Automated vulnerability detection in source code using deep representation learning. In Proceedings of the 17th IEEE international conference on machine learning and applications (ICMLA), Orlando, FL, USA, 17–20 December 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 757–762. [Google Scholar]
- Bogomolov, E.; Kovalenko, V.; Rebryk, Y.; Baccheli, A.; Bryksin, T. Authorship attribution of source code: A language-agnostic approach and applicability in software engineering. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Athens, Greece, 23–28 August 2021; pp. 932–944. [Google Scholar]
Task | Programming Exercise Type | Category of the Exercise Type |
---|---|---|
1. | Implement a function | Notation into code translation |
2. | Implement a piecewise function | Notation into code translation |
3. | Implement an iterative function | Notation into code translation |
4. | Implement a recurrent function | Notation into code translation |
5. | Implement a function that processes vectors | Notation into code translation |
6. | Implement a function computing a decision tree | Notation into code translation |
7. | Implement bit field conversion | Conversion between data formats |
8. | Implement a text format parser | Conversion between data formats |
9. | Implement a finite state machine as a class | Notation into code translation |
10. | Implement tabular data transformations | Conversion between data formats |
11. | Implement a binary format parser | Conversion between data formats |
Column | Column Description | Column Data Type | Possible Values |
---|---|---|---|
1. | Unique message identifier | Integer | |
2. | Task number | Integer | |
3. | Variant number | Integer | |
4. | Group number | Integer | |
5. | Message submission time | Timestamp | |
6. | Task status | Integer | |
7. | Message check time | Timestamp |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Demidova, L.A.; Andrianova, E.G.; Sovietov, P.N.; Gorchakov, A.V. Dataset of Program Source Codes Solving Unique Programming Exercises Generated by Digital Teaching Assistant. Data 2023, 8, 109. https://doi.org/10.3390/data8060109
Demidova LA, Andrianova EG, Sovietov PN, Gorchakov AV. Dataset of Program Source Codes Solving Unique Programming Exercises Generated by Digital Teaching Assistant. Data. 2023; 8(6):109. https://doi.org/10.3390/data8060109
Chicago/Turabian StyleDemidova, Liliya A., Elena G. Andrianova, Peter N. Sovietov, and Artyom V. Gorchakov. 2023. "Dataset of Program Source Codes Solving Unique Programming Exercises Generated by Digital Teaching Assistant" Data 8, no. 6: 109. https://doi.org/10.3390/data8060109
APA StyleDemidova, L. A., Andrianova, E. G., Sovietov, P. N., & Gorchakov, A. V. (2023). Dataset of Program Source Codes Solving Unique Programming Exercises Generated by Digital Teaching Assistant. Data, 8(6), 109. https://doi.org/10.3390/data8060109