Novel Scratch Programming Blocks for Web Scraping
Abstract
:1. Introduction
- We describe the limitations of existing block-based programming languages in web scraping. After we briefly introduce block-based programming languages, we describe their web-scraping-related blocks (Section 2).
- We present novel Scratch blocks for web scraping. Using these blocks, students can not only scrape the contents of HTML elements in a web page by using CSS selectors but also automate their keyboard and mouse in a number of ways, such as by using XPaths, the coordinates of the mouse, input strings, keys, or hot keys. We also present file access blocks that can easily store and retrieve the scraped data in the form of key–value pairs (Section 3).
- We conducted two lectures for a total of 15 primary school teachers, allowing them to create ten web scraping example applications. We discuss the effectiveness of these blocks by analyzing the teachers’ survey responses (Section 4).
2. Related Work
3. Novel Scratch Programming Blocks for Web Scraping
3.1. Block Interfaces for Communicating with WebSocket Servers
- The first block is the Scratch block that connects a Scratch client to a WebSocket server. Once connected, the connection continues until the program ends.
- The second and third blocks are the blocks that send messages to WebSocket servers. In the example shown in the figure, the second block sends the string “the weather” to the WebSocket server, and the third block sends the two strings “my score” and “10” to the WebSocket server.
- Whenever the first, second, and third blocks are executed, the Scratch client receives a response message from the WebSocket server, and the value is stored in the fourth block of this figure, i.e., “answer from Tooee”.
3.2. Web Scraping Blocks
- Tooee, (#open) is (https://dblp.org/)
- Tooee, (#scrape) is (#blog-532 > a)
3.3. Web Automation Blocks
- Tooee, (#click) is (XPath)
- Tooee, (#scroll) is (XPath)
3.4. Mouse Automation Blocks
- Tooee, what is (#pos)
- Tooee, (#click) is (x y)
- Tooee, (#right) is (x y)
- Tooee, (#double) is (x y)
- Tooee, (#drag) is (x1 y1 x2 y2)
3.5. Keyboard Automation Blocks
- Tooee, (#key) is (a key or a hot-key)
- Tooee, (#input) is (a string)
3.6. File Access Blocks
- Tooee, (key) is (value)
- Tooee, what is (key)
4. Experiments
4.1. Experimental Setup
- A “popdog” program using the file access blocks. When a user clicks on a puppy that appears on the screen, the score is incremented and saved to a file. When the program is restarted, the score saved in the file is loaded.
- A coronavirus dashboard program using “#open” and “#scrape”. When the program starts, it opens the web page on the specific portal site that shows the number of confirmed coronavirus cases. The program scrapes this number using the appropriate CSS selector and displays it on the screen.
- A program that aggregates the weather forecasts from various weather forecast websites using “#open”, “#click”, and “#scrape”.
- A program that scrapes movie information provided by a movie website and ranks them based on movie ratings or cumulative audiences using “#open”, “#click”, and “#scrape”.
- A program that shows the overall ranking by scraping the rankings of several music sites using “#open”, “#click”, and “#scrape”. Because the music sites are usually updated every hour, there is no need to scrape data every time the program is executed. Therefore, in order to speed up the program execution, it is implemented in such a way that the scraped data are saved into a file and loaded whenever necessary using our file access blocks.
- A program that visualizes the statistics of an online video platform using “#open”, “#click”, and “#scrape”.
- A program that visualizes the number of coffee shops of a specific brand by region using “#open”, “#click”, “#scrape”, and “#scroll”. In the coffee shop website we scraped, we used both the “#click” and “#scroll” reserved words because we had to click on several menus sequentially and scroll down to see the information.
- A program to automatically search and download cat images through a search engine using “#open”, “#click”, “#right”, “#input”, and “#key”. Here, our mouse automation blocks are needed to bring up the save menu, and our keyboard automation blocks are also needed to enter a file name and press the enter key.
- A program that visualizes the frequencies of hashtags on a specific community website using “#open”, “#click”, and “#scrape”.
- A program that scrapes product information for wireless vacuum cleaners from a specific shopping mall website using “#open” and “#scrape” and recommends products that meet its user’s requirements.
- How long have you been studying block-based programming languages (such as Scratch or App Inventor)?
- How long have you been studying text-based programming languages (such as Python or JavaScript)?
- How long have you been teaching programming to students?
- If you have studied web scraping (or similar topics), briefly describe what you studied.
- Question 1. Do you think K-12 students could implement these web scraping programs easily?
- Question 2. Do you think K-12 students will be interested in these web scraping programs?
- Question 3. Do you think these web scraping programs are practical programs?
- Question 4. Do you think it will be helpful for K-12 students to improve their programming skills by implementing these web scraping programs?
- Question 5. Do you think it will be helpful for K-12 students to improve their data literacy skills by implementing these web scraping programs?
4.2. Experimental Results
4.3. Analysis of Results
- Question 1. Do you think K-12 students can implement these web scraping programs easily?
“After taking this class, I personally conducted a web-scraping-based programming class for primary school students in grades 1–3. At first, some students found web scraping quite difficult. I think it will be easy for primary school students in grades 4–6 to learn web scraping.”
“The web scraping blocks seem easier because they are simpler and more intuitive than the Entry spreadsheet blocks.”
“Because data are scraped in a block programming language rather than a difficult language such as Python, I think students will be able to program with interest.”
“The advantage of the web scraping blocks is that students are free to scrape data by using them. However, the difficulty of programming will vary greatly depending on which websites the students scrape. If the blocks are used in a classroom setting, a teacher’s guidance is required.”
“It seems that students should have basic programming skills, and rather than just getting them to start programming right away, it seems necessary to explain to them what the web scraping process is and how they can use it.”
- Question 2. Do you think K-12 students will be interested in these web scraping programs?
“I think that this educational material has the advantage of easily collecting data and visualizing it in an interesting way.”
“The web scraping blocks have the huge advantage of allowing students to scrape their own fields of interest. I think it’s great that students can get the latest data they want.”
“Web scraping blocks can generate explosive interest from students and have the advantage of automating data collection.”
- Question 3. Do you think these web scraping programs are practical programs?
“I think it’s good that educational materials using web scraping blocks are closely related to our real life.”
“The web-scraping-based programs are more practical than the Entry-spreadsheet-based programs because they are more relevant to our real life.”
“The Entry spreadsheet blocks are suitable for learning the basic principles of data, and web scraping blocks have the advantage of being able to utilize real-life data and real-time data.”
- Question 4. Do you think it will be helpful for K-12 students to improve their programming skills by implementing these web scraping programs?
- Question 5. Do you think it will be helpful for K-12 students to improve their data literacy skills by implementing these web scraping programs?
“It is essential to collect data for training artificial intelligence models. Therefore, I believe that web scraping is an area of data literacy that is essential for the future society in which students will live.”
4.4. Limitations of the Study
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Schedlbauer, J.; Raptis, G.; Ludwig, B. Medical informatics labor market analysis using web crawling, web scraping, and text mining. Int. J. Med. Inform. 2021, 150, 104453. [Google Scholar] [CrossRef] [PubMed]
- Park, Y.; Shin, Y. Tooee: A Novel Scratch Extension for K-12 Big Data and Artificial Intelligence Education Using Text-Based programming blocks. IEEE Access 2021, 9, 149630–149646. [Google Scholar] [CrossRef]
- Resnick, M.; Maloney, J.; Monroy-Hernández, A.; Rusk, N.; Eastmond, E.; Brennan, K.; Milner, A.; Rosenbaum, E.; Silver, J.; Silverman, B.; et al. Scratch: Programming for all. Commun. ACM 2009, 52, 60–67. [Google Scholar] [CrossRef]
- Maloney, J.; Resnick, M.; Rusk, N.; Silverman, B.; Eastmond, E. The Scratch programming language and environment. ACM Trans. Comput. Educ. 2010, 10, 16. [Google Scholar] [CrossRef]
- ACM. CSTA K-12 Computer Science Standards. Available online: https://portal.ct.gov/-/media/SDE/CTE/CSTA-K12-ComputerScience-Standards-Revised-2017.pdf (accessed on 7 July 2022).
- Wolber, D. App inventor and real-world motivation. In Proceedings of the 42nd ACM Technical Symposium on Computer Science Education, Dallas, TX, USA, 9–12 March 2011; pp. 601–606. [Google Scholar]
- Wolber, D.; Abelson, H.; Spertus, E.; Looney, L. App Inventor; O’Reilly Media: Newton, MA, USA, 2011. [Google Scholar]
- The Official Scratch Website. Available online: https://scratch.mit.edu (accessed on 21 May 2022).
- The Official App Inventor Website. Available online: https://appinventor.mit.edu/ (accessed on 21 May 2022).
- Fronza, I.; Corral, L.; Pahl, C. Combining block-based programming and hardware prototyping to foster computational thinking. In Proceedings of the 20th Annual SIG Conference on Information Technology Education, Tacoma, WA, USA, 3–5 October 2019; pp. 55–60. [Google Scholar]
- Zhang, L.; Nouri, J. A systematic review of learning computational thinking through Scratch in K-9. Comput. Educ. 2019, 141, 103607. [Google Scholar] [CrossRef]
- Dodero, J.M.; Mota, J.M.; Ruiz-Rube, I. Bringing computational thinking to teachers’ training: A workshop review. In Proceedings of the 5th International Conference on Technological Ecosystems for Enhancing Multiculturality, Cádiz, Spain, 18–20 October 2017; pp. 1–6; pp. 1–6. [Google Scholar]
- Zhang, L.; Nouri, J.; Rolandsson, L. Progression of Computational Thinking skills in Swedish compulsory schools with block-based programming. In Proceedings of the Twenty-Second Australasian Computing Education Conference, Melbourne, VIC, Australia, 4–6 February 2020; pp. 66–75. [Google Scholar]
- Grover, S.; Basu, S.; Bienkowski, M.; Eagle, M.; Diana, N.; Stamper, J. A framework for using hypothesis-driven approaches to support data-driven learning analytics in measuring computational thinking in block-based programming environments. ACM Trans. Comput. Educ. 2017, 17, 1–25. [Google Scholar] [CrossRef]
- Arslan Namli, N.; Aybek, B. An Investigation of the Effect of Block-Based Programming and Unplugged Coding Activities on Fifth Graders’ Computational Thinking Skills, Self-Efficacy and Academic Performance. Contemp. Educ. Technol. 2022, 14, ep341. [Google Scholar] [CrossRef]
- Gleasman, C.; Kim, C. Pre-service teacher’s use of block-based programming and computational thinking to teach elementary mathematics. Digit. Exp. Math. Educ. 2020, 6, 52–90. [Google Scholar] [CrossRef]
- Grover, S.; Pea, R. Computational thinking in K-12: A review of the state of the field. Educ. Res. 2013, 42, 38–43. [Google Scholar] [CrossRef]
- Yadav, A.; Mayfield, C.; Zhou, N.; Hambrusch, S.; Korb, J.T. Computational thinking in elementary and secondary teacher education. ACM Trans. Comput. Educ. 2014, 14, 5. [Google Scholar] [CrossRef]
- Barr, D.; Harrison, J.; Conery, L. Computational thinking: A digital age skill for everyone. Learn. Lead. Technol. 2011, 38, 20–23. [Google Scholar]
- Wing, J.M. Computational thinking. Commun. ACM 2006, 49, 33–35. [Google Scholar] [CrossRef]
- Wing, J.M. Research notebook: Computational thinking—What and why. Link Mag. 2011, 6, 20–23. [Google Scholar]
- Wing, J.M. Computational thinking’s influence on research and education for all. Ital. J. Educ. Technol. 2017, 25, 7–14. [Google Scholar]
- Druga, S. Growing up with AI: Cognimates: From coding to teaching machines. Ph.D. Thesis, Program in Media Arts and Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA, 2018. [Google Scholar]
- Lane, D. Machine Learning for Kids: An Interactive Introduction to Artificial Intelligence; No Starch Press: San Francisco, CA, USA, 2021. [Google Scholar]
- The Entry Programming Environment. Available online: https://playentry.org (accessed on 7 July 2022).
- A GitHub Repository for Data Analysis Projects. Available online: https://github.com/Play-with-data/datasalon (accessed on 21 May 2022).
Evaluation Measure | Entry Spreadsheet Blocks | Our Web Scraping Blocks | p-Value |
---|---|---|---|
Ease of Use | 3.47 (0.99) | 3.73 (0.96) | 0.24821 |
Degree of Interest | 4.07 (0.96) | 4.27 (0.88) | 0.17971 |
Practicality | 4.33 (1.05) | 4.87 (0.35) | 0.03389 ** |
Programming Skills | 4.87 (0.35) | 4.93 (0.27) | 1.00000 |
Data Literacy | 4.73 (0.59) | 4.86 (0.53) | 0.31731 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Park, Y.; Shin, Y. Novel Scratch Programming Blocks for Web Scraping. Electronics 2022, 11, 2584. https://doi.org/10.3390/electronics11162584
Park Y, Shin Y. Novel Scratch Programming Blocks for Web Scraping. Electronics. 2022; 11(16):2584. https://doi.org/10.3390/electronics11162584
Chicago/Turabian StylePark, Youngki, and Youhyun Shin. 2022. "Novel Scratch Programming Blocks for Web Scraping" Electronics 11, no. 16: 2584. https://doi.org/10.3390/electronics11162584
APA StylePark, Y., & Shin, Y. (2022). Novel Scratch Programming Blocks for Web Scraping. Electronics, 11(16), 2584. https://doi.org/10.3390/electronics11162584