1. Introduction
With the development of Internet of Things (IoT) [
1], various information can be found online and IoT networks in the form of short text, such as short descriptions, social media, news description, product review, and instant messages, and so forth. Unlike long-textual documents, one piece of short text only contains few sentences or even just a few words. For example, Twitter limits its tweet length to 280 characters. Sparsity and shortness are the two intrinsic characteristics of such short text. Lacking enough word co-occurrences and shared context, it is difficult to extract representative and informative features from short text. Therefore, document representation and word embedding methods, which heavily rely on the word frequency or shared context, may not capture sufficient information from short text in IoT networks and perform well in downstream tasks such as short text classification.
The semantic enhancement of short text representation is a common way to address the problems aforementioned. To implement the semantic enhancement, external knowledge bases like DBpedia and Microsoft Concept Graph are usually adopted as a complement for short text semantic enhancement. There are several reasons why external knowledge bases are chosen. First, mining the entity relationships from the knowledge base can enhance the short text semantic representation. As demonstrated in
Figure 1, in the knowledge graph,
Cristiano Ronaldo and
Lionel Messi have a lot in common—both of them won the
Ballon d’Or,
UEFA Champions League and
La Liga; they share the same career as a
football player, and so forth. These common entities in the knowledge graph are highly correlated with the same category
Sport. With the extra entity relationships from the knowledge graph, short text representation can be enhanced. Second, the entity level representation can help to disambiguate terms which have the same spelling. For example, both sentences “
WHO has named the disease COVID-19, short for Corona Virus Disease 19.” and “
Corona is the best beer I have ever drunk.” have the same term
Corona. According to our common sense, the first one refers to
Coronavirus, and the named entity is
Corona_Virus; and the second one stands for the famous beer brand
Corona, and its entity is
Corona_beer. Hence, at the entity level, we can obtain more precise representation instead of the same word embedding at the word level. Third, the concept level representation is more abstract compared with both word and entity levels of representations. Hence, the concept representation can enhance short text semantic representation. A concept can be regarded as a set or class of entities or “things” within a domain [
2]. It is a higher perspective of description of a “thing”. Those higher perspective descriptions can strengthen the semantic representation. For instance, giving a piece of news “
Dunga will attend the award ceremony”, according to the keywords
Dunga and
ceremony, it would be difficult to identify which category this piece of news belongs to, as the meaning of the keyword
Dunga is not clear here. If the news title changes to “
Brazilian football star will attend the award ceremony”, it is easy to point out that this is a sport news.
Dunga was the captain of Brazilian football team which won the 2002 FIFA world cup, and the “
Brazilian football star” is the concept of term
Dunga. This example show that it would be easier to determine the category of short text by involve word-related concepts. Accordingly, we believe that the concept level representation is a significant supplement for short text representation based on keyword and entity.
Owing to the convenience of integrating extra knowledge into neural networks, deep learning-based short text representation forms the common method for short text classification. Among a majority of neural network types, Kim [
3] first introduced CNN to the text classification. CNN is good at extracting local features through the convolution layer. To capture the informative information from the text, Vaswani et al. [
4] proposed an attention network in NLP. The improvement of combining knowledge bases for downstream deep short text classification tasks has been verified in recent research [
5,
6,
7]. Although such methods gain more accurate short text representations, limitations exist such as on the way of combining extra knowledge bases, that is, they still suffer from making full use of external knowledge bases. They consider only one aspect (only the entity or concept information) from knowledge bases to enrich the short text representation.
In this paper, we involve multiple cognitive aspects [
8,
9,
10] of short text including concept, knowledge and category into short text representation, and propose a multi-level Entity-based Concept Knowledge-Aware (ECKA) representation model for enhancing short text semantic representations. We first extract the named entities from short text, and then retrieve the corresponding concepts and knowledge graph entities through Microsoft Concept Graph and DBpedia, respectively. Short text representation learned from ECKA is very informative since it is the combination of four-level representations, that is, from word, entity, concept to knowledge levels. Specifically, the word-level representation refers to the pretrained word embedding. The entity-level representation represents the identified named-entity embedding. The knowledge-level representation, which is learned and transformed from a knowledge graph, stands for the external knowledge correlation. The concept-level representation refers to a higher perspective of descriptive embedding. Secondly, we apply CNN to extract the local features on different levels, respectively. Lastly, since different items (i.e., words, entities, concepts and knowledge) in one short text contribute differently to the downstream short text classification, the category of short text may be determined by the category-related words. For example, in the aforementioned sentence “
Brazilian football star will attend the award ceremony”,
football is the category-related word for ‘Sport’. Similarly, the category of short text may be determined by the category-related features. Therefore, we further apply the attention network to learn the category-sensitive weights of each item set in the four-level representation, respectively.
The main contributions of this paper are summarized as follows:
We propose a novel multi-level model to learn the short text representation from different aspects respectively. To capture more semantic information, We use the named entity-based approach to obtain the external knowledge information—entity, concept, and knowledge graph. Such external knowledge information is utilized to enrich the short text semantic representation.
To capture the category-related informative representation in terms of multi-level features, we build a joint model by using CNN-based Attention network to capture their respective attentive representations, and then the embeddings learned from different aspects are concatenated for the short text representation.
We conduct extensive experiments on three datasets for short text classification. The results show that our model outperforms the state-of-the-art methods.
The rest of this paper is organized as follows—
Section 2 summarizes a brief review of the related work;
Section 3 presents the details of the proposed method;
Section 4 presents the experiments and analysis; lastly,
Section 5 concludes the paper and outlines the future work.
2. Related Work
Short text classification is an important task of NLP. Many traditional methods like BoW, SVM and KNN, and so forth, have been explored for this task. In recent years, deep neural networks have been increasingly employed in the short text analysis. For example, Kim [
3] first introduced the Convolutional Neural Network (CNN for short) to the text classification. CNN is used to extract local and position-invariant features. Recurrent Neural Network (RNN for short) is another approach for the text processing. Unlike CNN, RNN is good at processing long range semantic dependency rather than local key-phrases. Yang [
11] proposed an attention model to process the problem of different words in a document with informative difference.
The deep models aforementioned are flexible to some extent in the short text classification. However, due to the shortness and sparsity of short text, it is quite difficult for them to capture enough semantic information with limited words in the text content. From this perspective, how to enrich the short text semantic information with extra knowledge or common sense borrowed from other sources becomes a hot topic in this area. Concept is an aspect which is extensively used for text semantic enhancement. The Microsoft Concept Graph is a big graph of concepts, researchers have utilized it for the semantic enhancement. Wang et al. [
2] proposed a ‘Bag-of-Concept’ (instead of word) approach for the short text representation and constructs a concept model for each category, they then conceptualize the short text to a set of relevant concepts. Wang et al. [
7] proposed a deep convolutional neural network model, which utilizes the concept, word and character for short text classification. To measure the importance of each concept from the concept set, Chen et al. [
6] proposed a knowledge powered multiple attention networks for text classification, it applies two attention mechanisms to measure the importance of each concept from two aspects: the concept towards short text attention and the concept towards concept set attention.
In addition, knowledge graph is another effective way to enhance the text semantic representation. A typical knowledge graph describes the structured and unstructured information with a Resource Description Framework (RDF). Information in the knowledge base is stored in the form of entity-relation-entity triples. There are many knowledge graph—DBpedia [
12], Wikidata [
13], Freebase [
14] and YAGO [
15]. They are widely employed in recent research on semantic enhancement for short text. Wang et al. [
16] devised a multi-channel CNN by fusing the word and knowledge graph levels of representations for news text representation. Gao et al. [
17] proposed a word and knowledge level-based self-attention mechanism for the text semantic enhancement.
For further semantic enhancement, entity is usually utilized together with the knowledge base. Flisar et.al [
5] proposed an entity-based text classification, it utilizes entity and its related attributes for the short text enhancement. Türker [
18] proposed a knowledge-based short text categorization, which utilizes the external knowledge base (Wikipedia) and entity.
3. The ECKA Method
The framework of our proposed ECKA representation is illustrated in
Figure 2 and
Figure 3 further shows its semantic information retrieval module. We introduce the architecture of ECKA from bottom up. Our model consists of three modules—the semantic information retrieval module, the feature extraction module, and the attention module. The semantic information retrieval module as illustrated in
Figure 3, retrieves the entity, concept, and knowledge graph from an external knowledge base. The feature extraction module and the attention module are illustrated in
Figure 2. The feature extraction module implemented by CNN is used to extract the local and position-invariant features from multiple sources. The attention module is used to capture category-related informative representation from multi-level features respectively. Taking a short text as input, our model first extracts all the entities implicated in the short text by using the DBpedia Spotlight and then retrieves the relevant concepts and knowledge graph entities through the Microsoft Concept Graph and DBpedia, respectively. TransE is employed to get the knowledge graph embedding. We also utilize CNN with an attention network to capture category-related informative representation from multi-level features respectively. Finally, these multi-level semantic text representation is concatenated and fed into a fully-connected layer to get the category probability distribution. We describe the detail as follows.
3.1. Semantic Information Retrieval Module
The goal of this module is to retrieve the relevant entities, concepts, knowledge graphs from the short text. Firstly, we extract the entities from short text. Entity annotation and linkage are the foundation for our model. Some recently proposed annotation and linking tools, such as the DBpedia Spotlight, TagMe, and wikify!, can satisfy our need here. In this work, we choose DBpedia as our knowledge base and DBpedia Spotlight as our annotation tool. With the DBpedia Spotlight, we can link the extracted named entities in the input short text to the DBpedia resources [
19]. Secondly, we obtain relevant concept for the extracted entities. ConceptNet [
20] and Microsoft Concept Graph [
21,
22,
23,
24,
25,
26,
27] are the two widely used toolkits to obtain the concept of an object. We choose to use the Microsoft Concept Graph, which has 5.3 million concepts learned from billions of website pages and search logs for the conceptualization. Finally, the knowledge graph for the relevant entities can be obtained through DBpedia. A typical knowledge graph is a collection of relationship triples (
h,
r,
t) in which
h represents head,
r represents relation and
t represents tail. The structural knowledge graph information needs to be transformed to the embedding. There are many transform methods that can learn the low-dimensional vector spaces from the knowledge graph. The comparison of some widely-used methods, like TransE, TransD, TransH and TransR, can be found in Reference [
28]. In our model, we choose to use TransE as the knowledge graph embedding method.
3.2. Feature Extraction Module
This module utilizes the word, entity, concept and knowledge graph to generate multi-level semantic short text representations. There are three components in this module: the input layer, the embedding layer, and the representation layer. The input layer demonstrates how to get the different sources from the external knowledge bases. The embedding layer shows how to get the embedding for the input layer and how to translate the different embeddings to the same vector space. The representation shows how to extract the higher level features from the embedding layer. The details of each layer are shown as follows.
3.2.1. The Input Layer
The input of each short text in our model consists of four-level sets which are obtained from different sources, where each set is defined as follows:
The Word set: The word set contains all the words in each short text. W = .
The Entity Set: E = , represents the entities extracted from short text through the DBpedia Spotlight.
The Concept Set: The concept of each entity is retrieved from the Microsoft Concept Graph, and the concept set can be represented as C = .
The Knowledge set: This set is denoted as KE = , it is the same as the entity set, but its representation is learned from different aspects respectively.
3.2.2. The Embedding Layer
Each short text consists of a word level, an entity level, a concept level, and a knowledge level set. The semantic information retrieval process is demonstrated in
Figure 3. We use the pretrained Google word2vec embedding to obtain the embeddings for the first three sets, which can be represented as
=
,
=
and
=
,
n is the entity number in the short text. The knowledge entity embedding is learned by the following steps. First, the related entity of each knowledge entity is retrieved from the DBpedia, then the knowledge transforming method TransE is applied to learn the knowledge graph embedding. Finally, as the word, entity and concept embeddings with 300 dimensions are learned by word2vec and the knowledge graph embedding with 50 dimensions is learned from TransE, the two embeddings need to be transformed to the same vector space. The transformed knowledge entity embedding can be represented as:
In our model, we use a nonlinear function to transform the knowledge entity embedding:
where
M represents the trainable transformed matrix, and
b stands for the trainable bias. By using this function, the knowledge entity embedding can be mapped to the word2vec embedding vector space.
3.2.3. The Representation Layer
CNN is a typical model to extract the local-level features from the embedding matrix. We apply CNN to generate the feature map. For the entity embedding matrix
, firstly, a convolution operation with the filter
, where
d is the dimension of the embedding and
h(
) represents the filter window size, is applied on the embedding matrix to generate a new future
:
where
h stands for the filter window size,
represents the convolution starting from the
entity and ending at
.
represents the concatenation embedding and
f is the nonlinear function, here we use
.
is the bias.
Filtering is applied in all possible windows, then a feature map is generated:
Similarly, the feature map for the word, concept, knowledge entity sets can be represented as:
where
n represents the entity number and
h stands for the window size.
3.3. The Attention Module
Not all items (words, entities, concepts, and knowledge) contribute equally to the representation of short text. The category of a short text may be determined by the category-related words. Similarly, the classification result may be determined by the category-related features. Hence, we apply the attention network on the feature map generated in the representation layer to obtain the attentive short text representation for each level. The feature
generated by the convolution layer is fed into a one-layer MLP to
, which can be treated as a hidden representation of
:
where
is a weight matrix and
is the bias, then the weight
is calculated through a softmax function as follows:
where
is a weight vector. Then, the entity representation can be calculated as follows:
As there are multiple window sizes of the filter, there are multiple feature maps. A maxpooling function is applied over each feature map
C to get the final pooling vector:
where
n is the length of the convolution window. So far, the representations for the words, entities, concepts and knowledge can be represented as:
,
,
and
.
We concatenate all these different-level representations to get the final short text representation
R as follows:
Finally, the short text representation R is fed into the fully-connected softmax layer to get the category probability distribution.
5. Conclusions
IoT networks involve increasing short text, which cannot be handled by document representation and classic NLP tools. This work involves multiple cognitive aspects of text from entity to concept and knowledge, and proposes a novel multi-level entity-based concept knowledge-aware model ECKA to enhance the short text semantic representation. ECKA learns the semantic information of short text from four different levels: the word level, the entity level, the concept level, and the knowledge level. CNN is used to extract the semantic features from different levels respectively. To capture the category-related attentive representations from these multi-level features, attention network is employed on different levels respectively. Experiments on short text classification demonstrate the effectiveness and merits of ECKA compared with traditional and state-of-the-art baseline methods.
The improvement made by ECKA is attributed to the entity identification and knowledge extraction. To further promote ECKA, we will focus on how to improve the accuracy of entity extraction and employ knowledge-enabled language representation model (e.g., K-BERT) for the short text representation. We’ll explore ECKA to the data and tasks of IoT-specific systems.