Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Related Stocks Selection with Data Collaboration Using Text Mining^†

Information 2019, 10(3), 102; https://doi.org/10.3390/info10030102

by Masanori Hirano^1,*

, Hiroki Sakaji², Shoko Kimura³, Kiyoshi Izumi², Hiroyasu Matsushima², Shintaro Nagao³ and Atsuo Kato⁴

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Information 2019, 10(3), 102; https://doi.org/10.3390/info10030102

Submission received: 23 January 2019 / Revised: 17 February 2019 / Accepted: 4 March 2019 / Published: 7 March 2019

(This article belongs to the Special Issue MoDAT: Designing the Market of Data)

Round 1

Reviewer 1 Report

The article describes a method to suggest stocks for specific themes. The application is in a sense novel and interesting to the research community. Cohen's kappa is used to show the difficulty of the task and F1, acc scores show good improvements benchmarked on human performance. The method is detailed and sound. I recommend to accept this article after minor revisions.

1) line 40 answer data -> ground truth data; line 40-43 -> use number instead of a, b, c...

2) line 56-64, line 133-152, line 244-294, table 7,8,9 can be moved to appendices to make the article more compact/concise

3) line 294: typo topnn -> top n

4) brief explanation why use MeCab for morphological analysis not KyTea or JUMAN++?

5) update doi for reference 3-9 and 21-24.

Author Response

Thank you for your reviewing and valuable comments. In followings, we respond to your comments point by point.

Point 1: line 40 answer data -> ground truth data; line 40-43 -> use number instead of a, b, c... 

Response 1: Thank you for pointing it out. We fixed this.

Point 2: line 56-64, line 133-152, line 244-294, table 7,8,9 can be moved to appendices to make the article more compact/concise

Response 2: We appreciate your suggestion. Carefully thought about it, we want these materials not to be moved to appendices with the following reasons. Firstly, we agree with line 56-64 displacement. However, we think others cannot be moved to appendices. The data explanation (line 133-152) is one of the very important parts for discussing data collaboration. Although you might think this explanation was too detailed, we also think it is also important that readers get the correct impression of these data. The hyperparameter tuning part (line 244-294 & table 7,8,9) is also another part of the very important part because here is the biggest contribution of this article besides our previous work. So, in our opinion, only line 56-64 can be moved. But, it is too small to be moved and make appendices. In fact, if we make appendices just for this part, our article might be more confusing. Of course, we totally understand what you suggest. However, we want to keep the original flow without appendices.

Point 3: line 294: typo topnn -> top n

Response 3: We fixed it. Thanks again.

Point 4: brief explanation why use MeCab for morphological analysis not KyTea or JUMAN++?

Response 4: We added a brief explanation in line 166-167.

--- Beginning of the revised part ---

For our preprocessing, we use MeCab (version 0.996) as a Japanese morphological analyzer [6]⁵ because MeCab is faster than the other morphological analyzers and the speed is crucial for processing a large amount of data.

--- End of the revised part ---

Point 5: update doi for reference 3-9 and 21-24.

Response 5: We added doi for 3 and 22-24, but others are not found regardless of our carefully surveying.

Author Response File: Author Response.pdf

Reviewer 2 Report

Summary

In the paper is described a method to create a portfolio of stocks from the stock market, given a theme of interest for investors.

The method applies techniques of Natural Language Processing to companies’ textual informations, crawled from their official websites and from Investor Relations materials.

To construct a portfolio of stocks, they propose a query mechanism, where the scheme gets queried about a theme of interest, returning a list of companies and contexts about the relation of the theme with them, helping fund managers to select appropriate stocks.

In the study are also analyzed what are the best sources of textual informations for such a method between Investor Relations materials(IRs) and text from companies’ official websites, what is the level of difficulty for fund managers in selecting related stocks, using Cohen’s k coefficient for agreement, and finally analyzing multiple combination of hyper-parameters used in constructing word2vec models and to compute word similarities in order to achieve the best results in selection.

Comments and Suggestions

It is not specified what the previous scheme didn’t take account for and so what are the motivations for the current expanded scheme.

A very brief explanation of why use an ensemble of nine word2vec models may help in the reading and understanding of the work, since the reference paper is written in Japanese.

The intuitions that drove the design are not well defined, the focus is mainly on describing how a result is achieved, this does not help for future development.

Spell check: Text label in Figure 2 refers to s2,word1 twice.

Author Response

Thank you for your reviewing and valuable comments. In followings, we respond to your comments point by point.

Point 1: It is not specified what the previous scheme didn’t take account for and so what are the motivations for the current expanded scheme.  

Response 1: As you pointed out, our article was a lack of explanation of our motivation & extension for the current expanded scheme. So, we added explanations for where was the difference from the previous scheme and why we made the difference in line 83-84 & 125-126. Although the differences seem very little in our scheme, these differences are significant for whole our article because adding these difference cause hyperparameter tuning in experiments and improving our achievement.

--- Beginning of the revised part1 (line 83-84) ---

In our previous scheme [1], we used top-100 (topn mode) with no evidence. However, since we thought there are some possible parameters or modes, we extended this part of the scheme.

--- End of the revised part1 ---

--- Beginning of the revised part2 (line 125-126) ---

In our previous scheme [1], we used top-10 (topn mode) with no evidence. However, since we thought there are some possible parameters or modes, we also extended this part of the scheme.

--- End of the revised part2 ---

Point 2: A very brief explanation of why use an ensemble of nine word2vec models may help in the reading and understanding of the work, since the reference paper is written in Japanese.

Response 2: We are sorry for your inconvenience. We added an explanation of the previous work in line 79-81.

--- Beginning of revised part ---

This ensembling is based on a previous study [3], in which they used multiple word2vec models with different settings for deciding whether two words are similar words or not by a majority vote of multiple results.

--- End of revised part ---

Point 3: The intuitions that drove the design are not well defined, the focus is mainly on describing how a result is achieved, this does not help for future development.

Response 3: We assumed it was because our motivation was not clear enough and it also made it difficult to understand why we designed this research. So, we append additional explanations for our motivation in the introduction.

--- Beginning of the revised part ---

Selecting stocks related to the fund’s theme is quite difficult for fund managers because there is a huge amount of stocks. Even in the Tokyo Stock Exchange, there are over 3,600 stocks. For themed mutual funds focusing only on Japanese stocks, fund managers need to search only Japanese stocks and their information (company information) to build funds. However, focusing on stocks from around the world is practically impossible. Even focusing only on Japanese stocks, selecting all related stocks is difficult for fund managers who are not familiar with a fund’s theme. In addition, there is a good chance of missing related stocks because of human errors or fund managers’ lack of knowledge for companies. So, to reduce the burden of fund managers and avoid missing promising stocks, a method selecting related stocks automatically is needed.

As the method, we propose a scheme extended from our previous scheme [1]. We developed the previous scheme to handle this task mainly using natural language processing (NLP). Details of this and extended scheme are given in Section 2. The main contributions in this article are as follows: (a) extending our previous scheme, (b) creating ground truth data through collaboration with experienced fund managers and evaluating our scheme, (c) assessing the task difficulty from preliminary experiments, (d) hyperparameter tuning for our scheme, and (e) deeper analysis for data collaboration.

--- End of the revised part ---

Point 4: Spell check: Text label in Figure 2 refers to s2,word1 twice.

Response 4: We fixed it. The latter is s2,word2. Thank you for pointing it out.

Author Response File: Author Response.pdf

Article Menu

Related Stocks Selection with Data Collaboration Using Text Mining^†

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Related Stocks Selection with Data Collaboration Using Text Mining†

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Related Stocks Selection with Data Collaboration Using Text Mining^†