Selected Papers from the Ninth Swedish Workshop on Data Science (SweDS21)

A special issue of Data (ISSN 2306-5729).

Deadline for manuscript submissions: closed (15 July 2022) | Viewed by 6257

Special Issue Editors


E-Mail Website
Guest Editor
Department of Computer Science and Media Technology, Linnaeus University, Växjö, Sweden
Interests: text visualization and visual text analytics; sentiment and stance visualization of text data;
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Computer Science and Media Technology, Linnaeus University, Växjö, Sweden
Interests: investigating the use of unsupervised learning, especially dimensionality reduction techniques; in the interactive visual analysis of complex and high-dimensional data; interpretable machine learning, text and topic analysis, and learning analytics; increase the level of interactivity in the process; improve the interpretability of complex learning algorithms, and to effectively incorporate these analysis methods into high-impact domain-specific workflows
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

This Special Issue aims to present a collection of the extended version of high-quality papers from the Ninth Swedish Workshop on Data Science (SweDS21, https://lnu.se/en/SweDS21), which is hosted by Linnaeus University in Växjö, Sweden, between the 2nd and 3rd of December 2021.

SweDS is a national event with a focus of maintaining and developing Swedish data science research and its applications by fostering the exchange of ideas and promoting collaboration within and across disciplines. This annual workshop brings together researchers and practitioners of data science working in a variety of academic, commercial, industrial, or other sectors.

Past workshops have included presentations from a variety of domains, e.g., computer science, linguistics, economics, archaeology, environmental science, education, journalism, medicine, health-care, biology, sociology, psychology, history, physics, chemistry, geography, forestry, design, and music.

We invite academic and industrial researchers and practitioners to share their work by submitting papers, giving talks, and/or presenting posters.

Dr. Kostiantyn Kucher
Dr. Rafael M. Martins
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Data is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (3 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

19 pages, 543 KiB  
Article
Stance Classification of Social Media Texts for Under-Resourced Scenarios in Social Sciences
by Victoria Yantseva and Kostiantyn Kucher
Data 2022, 7(11), 159; https://doi.org/10.3390/data7110159 - 13 Nov 2022
Cited by 1 | Viewed by 2270
Abstract
In this work, we explore the performance of supervised stance classification methods for social media texts in under-resourced languages and using limited amounts of labeled data. In particular, we focus specifically on the possibilities and limitations of the application of classic machine learning [...] Read more.
In this work, we explore the performance of supervised stance classification methods for social media texts in under-resourced languages and using limited amounts of labeled data. In particular, we focus specifically on the possibilities and limitations of the application of classic machine learning versus deep learning in social sciences. To achieve this goal, we use a training dataset of 5.7K messages posted on Flashback Forum, a Swedish discussion platform, further supplemented with the previously published ABSAbank-Imm annotated dataset, and evaluate the performance of various model parameters and configurations to achieve the best training results given the character of the data. Our experiments indicate that classic machine learning models achieve results that are on par or even outperform those of neural networks and, thus, could be given priority when considering machine learning approaches for similar knowledge domains, tasks, and data. At the same time, the modern pre-trained language models provide useful and convenient pipelines for obtaining vectorized data representations that can be combined with classic machine learning algorithms. We discuss the implications of their use in such scenarios and outline the directions for further research. Full article
Show Figures

Figure 1

38 pages, 2500 KiB  
Article
Are Source Code Metrics “Good Enough” in Predicting Security Vulnerabilities?
by Sundarakrishnan Ganesh, Francis Palma and Tobias Olsson
Data 2022, 7(9), 127; https://doi.org/10.3390/data7090127 - 7 Sep 2022
Cited by 3 | Viewed by 1730
Abstract
Modern systems produce and handle a large volume of sensitive enterprise data. Therefore, security vulnerabilities in the software systems must be identified and resolved early to prevent security breaches and failures. Predicting security vulnerabilities is an alternative to identifying them as developers write [...] Read more.
Modern systems produce and handle a large volume of sensitive enterprise data. Therefore, security vulnerabilities in the software systems must be identified and resolved early to prevent security breaches and failures. Predicting security vulnerabilities is an alternative to identifying them as developers write code. In this study, we studied the ability of several machine learning algorithms to predict security vulnerabilities. We created two datasets containing security vulnerability information from two open-source systems: (1) Apache Tomcat (versions 4.x and five 2.5.x minor versions). We also computed source code metrics for these versions of both systems. We examined four classifiers, including Naive Bayes, Decision Tree, XGBoost Classifier, and Logistic Regression, to show their ability to predict security vulnerabilities. Moreover, an ensemble learner was introduced using a stacking classifier to see whether the prediction performance could be improved. We performed cross-version and cross-project predictions to assess the effectiveness of the best-performing model. Our results showed that the XGBoost classifier performed best compared to other learners, i.e., with an average accuracy of 97% in both datasets. The stacking classifier performed with an average accuracy of 92% in Struts and 71% in Tomcat. Our best-performing model—XGBoost—could predict with an average accuracy of 87% in Tomcat and 99% in Struts in a cross-version setup. Full article
Show Figures

Figure 1

18 pages, 8780 KiB  
Article
SBGTool v2.0: An Empirical Study on a Similarity-Based Grouping Tool for Students’ Learning Outcomes
by Zeynab (Artemis) Mohseni, Rafael M. Martins and Italo Masiello
Data 2022, 7(7), 98; https://doi.org/10.3390/data7070098 - 18 Jul 2022
Cited by 4 | Viewed by 1603
Abstract
Visual learning analytics (VLA) tools and technologies enable the meaningful exchange of information between educational data and teachers. This allows teachers to create meaningful groups of students based on possible collaboration and productive discussions. VLA tools also allow a better understanding of students’ [...] Read more.
Visual learning analytics (VLA) tools and technologies enable the meaningful exchange of information between educational data and teachers. This allows teachers to create meaningful groups of students based on possible collaboration and productive discussions. VLA tools also allow a better understanding of students’ educational demands. Finding similar samples in huge educational datasets, however, involves the use of effective similarity measures that represent the teacher’s purpose. In this study, we conducted a user study and improved our web-based similarity-based grouping VLA tool, (SBGTool) to help teachers categorize students into groups based on their similar learning outcomes and activities. SBGTool v2.0 differs from SBGTool due to design changes made in response to teacher suggestions, the addition of sorting options to the dashboard table, the addition of a dropdown component to group the students into classrooms, and improvement in some visualizations. To counteract color blindness, we have also considered a number of color palettes. By applying SBGTool v2.0, teachers may compare the outcomes of individual students inside a classroom, determine which subjects are the most and least difficult over the period of a week or an academic year, identify the numbers of correct and incorrect responses for the most difficult and easiest subjects, categorize students into various groups based on their learning outcomes, discover the week with the most interactions for examining students’ engagement, and find the relationship between students’ activity and study success. We used 10,000 random samples from the EdNet dataset, a large-scale hierarchical educational dataset consisting of student–system interactions from multiple platforms at the university level, collected over a two-year period, to illustrate the tool’s efficacy. Finally, we provide the outcomes of the user study that evaluated the tool’s effectiveness. The results revealed that even with limited training, the participants were able to complete the required analysis tasks. Additionally, the participants’ feedback showed that the SBGTool v2.0 gained a good level of support for the given tasks, and it had the potential to assist teachers in enhancing collaborative learning in their classrooms. Full article
Show Figures

Figure 1

Back to TopTop