1. Introduction
Steganography conceals a secret message in cover media. A classic scenario illustrating linguistic steganography is the “Prisoner’s problem”. Consider that Alice and Bob were two inmates in a prison, planning to escape. To facilitate their plan, they decided to exchange secret messages via short notes. However, all message exchanges must be checked by Eve. If Eve successfully detects any concealed message, she retains the authority to terminate any further communication. In this scenario, Alice employed an embedding rule to conceal the secret message within the cover note, which was then sent to Bob under the surveillance of Eve. Upon receiving the note, Bob employed an extraction rule to retrieve the concealed message from the note.
In the rapidly evolving landscape of the Internet of Things (IoT), the surge of connected devices brings new security challenges. Ensuring secure and confidential data transmission is crucial in this context. Traditional encryption methods may be unsuitable due to the limited computational and storage capacities of IoT devices. Thus, steganography emerges as a potential alternative for protecting information. With the IoT now deeply embedded in everyday life, securing user privacy, data integrity, and infrastructure is essential [
1,
2]. Despite advances in the IoT, security and privacy concerns remain major hurdles. Innovative solutions like steganography are vital for building robust security frameworks and supporting the IoT’s sustainable growth.
As time has advanced, except for text, various cover media have been employed in steganography including images [
3,
4], audio [
5], 3D mesh models [
6], and videos [
7]. However, due to the development of deep neural networks, linguistic steganography attracts much attention again [
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27]. Linguistic steganography can be primarily categorized into two types based on whether the steganographic text maintains the semantics of the cover text, namely generation-based linguistic steganography [
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19] (GLS) and modification-based linguistic steganography [
20,
21,
22,
23,
24,
25,
26,
27] (MLS).
GLS primarily embeds secret bits during the generation of high-quality steganographic text, leveraging neural network-based language models such as RNNs [
8], LSTM [
9,
10,
11], auto-encoders [
12], and GANs [
13]. This flexibility in word selection for each position based on secret data is especially beneficial in the IoT context, where devices generate large volumes of text that can serve as cover messages. However, this mechanism results in significant differences in features between the cover text, generated through semantic predictions from the language model, and the steganographic text. To address these discrepancies, recent advancements have focused on enhancing both the security and quality of steganographic text. Zhou et al. [
14] utilized a GAN to overcome issues such as exposure bias and embedding deviation, which can compromise the security of steganography. This scheme dynamically adjusts probability distributions to maintain diversity and embeds information during the model’s training process to enhance security. Yan et al. [
15] proposed a secure token selection principle to further improve security and resolve ambiguity, which ensures the sum of selected token probabilities correlates with statistical imperceptibility. Similarly, Zhang et al. [
16] proposed an adaptive dynamic grouping scheme to embed secret information by grouping tokens based on their probabilities from a language model recursively, addressing the statistical differences between the probability distributions of steganographic text and natural text. To maintain the semantic expression of the generated steganographic text, some schemes [
17,
18,
19] try to incorporate semantic constraints. Yang et al. [
17] utilized context as the constraint, aiming to preserve a strong semantic correlation between the steganographic text and the cover text. Wang et al. [
18] encoded secret data and refined the generated steganographic text through multiple rounds, improving the text’s quality and reducing the negative effects of steganographic encoding. Wang et al. [
19] enhanced the controllability of steganography generation by analyzing the discourse features of the cover, which serve as the inputs to the steganography generator. However, since these GLS schemes operate at the word level, they can still easily lead to significant distortion of the local semantics. Furthermore, state-of-the-art steganalysis models use deep neural networks to extract multidimensional statistical features, enhancing their ability to detect steganographic text. These models integrate temporal features derived from spatial features [
28] or continuous text sequences [
29], improving their effectiveness in detecting steganographic text. This significantly reduces the inherent embedding capacity of GLS.
MLS primarily embeds secret bits by modifying part of the cover text at the word [
20,
21,
22], phrase [
23,
24], and sentence levels [
25,
26,
27]. Word- or phrase-level MLS schemes generally utilize synonym substitutions to embed the secret. For instance, Chang and Clark [
20] employed the Google n-gram corpus to verify the contextual applicability of synonyms, ensuring more accurate and contextually appropriate substitutions. Xiang et al. [
21] introduced a method that combines arithmetic coding with synonym substitution, analyzing synonyms based on their relative frequencies and quantizing them into a binary sequence. This sequence is then compressed using adaptive binary arithmetic coding to create space for additional data. The compressed data, along with the secret data, are embedded into the text using synonym substitutions. Dai and Cai [
22] proposed a steganographic technique using a patient Huffman algorithm, which generates text by combining ciphertext-driven token selection with language model-based sampling.
At the phrase level, Wilson and Ker [
23] introduced distortion measures specific to linguistic steganography, which helps determine the optimal embedding strategy, balancing text quality with embedding capacity. Qiang et al. [
24] emphasized the importance of preserving meaning by using paraphrase modeling to generate suitable substitute candidates.
At the sentence level, MLS schemes typically convert sentences into alternative forms that maintain the original meaning, using techniques such as syntactic analysis [
25] and sentence translation [
26,
27]. For instance, Xiang et al. [
25] developed a syntax-controlled paraphrase generation model to automatically modify the expression of cover text, using a syntactic bins coding strategy for embedding secret information within the generated syntactic space. Yang et al. [
26] pivoted the text between two languages and embedded secret data using a semantic-aware encoding strategy, which modifies the expression of the text while maintaining its original meaning, thereby allowing for a larger payload. Ding et al. [
27] introduced a scheme that integrates semantic fusion and language model reference units into a neural machine translation model. This approach generates translations that embed secret messages while preserving the text’s semantics and context. Additionally, they implemented a new encoding scheme that combines arithmetic coding with a waiting mechanism, enhancing embedding capacity without compromising semantic consistency.
In the context of the IoT, where device-generated texts often follow specific structures or patterns, MLS can be used to subtly modify these texts to embed secret data. However, these MLS schemes suffer from a low embedding capacity.
In summary, GLS schemes offer impressive embedding capacity but may introduce semantic ambiguity and increase suspicion by steganalysis. While MLS schemes successfully maintain the overall semantics, their limited embedding capacity poses a significant challenge. Moreover, previous linguistic steganography schemes typically involve encrypting the secret data, recording additional information for recovery, and managing the secret key. Specifically, in GLS, to ensure that the receiver generates the same text as the sender, the sender must transmit the latent space vectors used to initialize the model, ensuring consistent text output. Furthermore, both GLS and MLS require the encryption of secret data, necessitating the transmission of a secret key for decryption, which can be impractical in certain applications.
In order to solve the above problems, we propose a novel linguistic secret sharing scheme for IoT security. In our scheme, only the most ambiguous word in each sentence is substituted to embed secret data, thereby preserving the semantic integrity of each sentence. Moreover, the receiver can also easily identify this specific word during the extraction process. Additionally, we employ a secret sharing mechanism to encrypt secret data. Instead of relying on a secret key, secret sharing distributes the secret into multiple shares, ensuring that a single share alone cannot restore the original secret. Our contributions are summarized as follows:
We propose a token selection algorithm that enables both the sender and the receiver to identify the same most ambiguous word in each sentence.
Data embedding and extraction can be performed without the need to share any secret key.
The proposed scheme maintains the semantic coherence of the steganographic text.
Secret sharing over a Galois field is first introduced to linguistic steganography.
3. Proposed Linguistic Secret Sharing
Consider a scenario in which a company produces advanced equipment that is restricted for use in certain areas or by specific companies. This equipment has an associated secret code to activate it. The equipment is delivered by a logistics company, while the secret code is distributed among multiple participants with a secret sharing scheme to ensure authorized usage of the equipment.
As illustrated in
Figure 2, during the share generation stage, the secret code is transformed into
distinct shares using a polynomial secret sharing technique. These shares are then concealed within the regular messages intended for the participants with an open-source pre-trained model. These steganographic messages are then transmitted over the IoT devices of the participants.
In the secret recovery stage, the system enables any authorized personnel (where ) to collaboratively extract the activation code of the equipment. By applying the same token selection principle and data embedding rule, participants can extract the secret shares from the messages on their own IoT devices and combine them back into the activation code. This mechanism ensures authorized usage of the protected equipment by preventing unauthorized personnel or an insufficient number of participants from activating it.
3.1. Text Share Generation
Theoretically, the proposed scheme can employ any type of text as its carrier and the three carrier texts can be completely different in content. However, in our case, we choose to utilize texts generated by deep learning-based models. By utilizing generated texts, their uniqueness can prevent attackers from comparing the texts to existing content on social media or the Internet in order to decipher the secret. Additionally, the generated texts offer greater control to tailor the contents according to the user’s requirements. Users can generate texts suitable for linguistic secret sharing by providing appropriate prompts. For implementing the text generation stage, GPT-4 [
34] is applied as the text generator. It is worth mentioning that we do not specifically address grammar or syntax, so the generated text is used directly without modification.
3.2. Token Selection Algorithm and Data Embedding Rule
In the proposed scheme, an ambiguous token is selected and masked for each sentence first. The masked sentence is fed into a token predictor, which gives the prediction results for each masked token. Finally, the masked token is replaced with one of the prediction results according to the data embedding rule.
To select the target token, assume a sentence consists of
tokens denoted as
. Each token
can be masked individually and predicted using masked language modeling. For a token belonging to the vocabulary pool
, the initial candidate pool for prediction is denoted as
with corresponding probabilities
, where
. To quantify the ambiguity of a token’s prediction, we define the probability difference indicator as
where
represents the j-th greatest prediction probability. A lower
value indicates that the top
prediction probabilities for this token are close together, making it harder for the model to confidently predict the token. Altering this token does not significantly affect the overall semantics of the sentence. The Algorithm 1 for ambiguous token selection is given as follows:
Algorithm 1: Ambiguous Token Selection |
Input: Load: Output: 1:
2:
3: | A sentence . Token predictor . ambiguous token . for do compute . end for
return |
This algorithm ensures that the token with the highest prediction ambiguity (i.e., the smallest ) is selected for data embedding. Since this token is the most difficult for the model to predict, replacing it with one of the top candidates will introduce minimal semantic distortion, thus allowing for the embedding of bits of secret data.
After selecting the ambiguous token
, its top
prediction candidates
can be mapped into
different binary codes. Therefore
bits of data can be embedded into the sentence
by replacing
with one of its top
prediction candidates. To illustrate the token selection algorithm and data embedding rule, an example is given in
Figure 3. Suppose the cover text is “I am very happy.” and
is set to four; the predictor can provide the top four prediction results with the highest probability for each masked token. By masking each token individually and predicting, we can obtain the prediction results along with their respective probabilities. After calculating the probability difference for each token’s prediction results, we select the token with the lowest value. In this case, “very” is selected as the ambiguous token because it has the lowest probability difference indicator compared to the other tokens. The top four prediction candidates for this token are then mapped in descending order of probability to represent the secret data (i.e., “very”: 00; “so”: 01; “really”: 10; and “extremely”: 11). Therefore,
bits of data can be embedded by replacing “very” with one of the top four candidates.
Note that when applying the token selection and data embedding rule to a text, as shown in
Figure 4, the modified sentence is considered as the preceding context for the current sentence. This ensures that the embedding process maintains coherence and consistency throughout the entire text.
3.3. Secret Share Generation
Referring to
Figure 2 again, the dealer adopts the polynomial secret sharing over
to distribute the secret data into secret shares and embeds the shares into distinct generated texts correspondingly using the token selection algorithm and the data embedding rule. The procedures are given as follows:
Step 1: Convert the secret data into a sequence of binary segments , where is bit in length.
Step 2: Generate secret shares using Equation (1) by replacing and the coefficients with .
Step 3: Generate texts using a text generator with proper prompts.
Step 4: Embed the secret shares into the texts correspondingly using the token selection algorithm and the data embedding rule to generate text shares .
3.4. Secret Data Recovery
By collecting any k out of n text shares, a combiner can restore the secret data. The procedures are summarized as follows:
Step 1: Collect any k text shares.
Step 2: Split each text into sentences and identify a marked token for each sentence using the token selection algorithm.
Step 3: Retrieve and collect the embedded secret bits from marked tokens according to the data embedding rule.
Step 4: Combine k secret shares to recover the original secret data using Equations (2) and (3).
4. Experimental Results
In this section, we introduce our experimental settings, give a demonstrative example, and evaluate the performance of our scheme in terms of sentiment and semantic analyses.
4.1. Experimental Setting
Model: Our experiments make use of the GPT-4 networks, a substantial language model developed by OpenAI for the purpose of generating cover texts. GPT-4 is a state-of-the-art model with a remarkable capacity for comprehending and producing both natural language and code. Furthermore, we employ RoBERTa as our token predictor in this study.
Implementation: Our experiments are implemented using Python 3.8 and rely on PyTorch 1.7.1 as the foundational framework. Acceleration is achieved through the utilization of Nvidia 3090 and CUDA 11.2. The secret messages applied are binary pseudo-random bitstreams.
4.2. Applicability Demonstration
An example of (2, 3)-linguistic secret sharing is provided. Suppose the original secret data are an 8 bit binary random bitstream “11010011”; after secret sharing, three shared bitstreams “00100100”, “11100101”, and “01001010” are generated. Three cover texts and their corresponding text shares are shown in
Figure 5a and
Figure 5b, respectively. It is noteworthy that each sentence within the share text is capable of being embedded as 4 bit shared data when the number of candidate pools of each selected token is set to 16. During the secret recovery process, the ambiguous token within each sentence can be easily identified through the proposed token selection algorithm. Subsequently, the shared data can be extracted according to the order of selected tokens in the candidate pool. Following this step, any two out of three shared data can be combined to recover the original secret bitstream.
4.3. Performance Analysis
We evaluate the performance of steganographic text using sentiment and semantic analyses. The sentiment of a text is determined by its emotional nature, where “positive” refers to an optimistic or favorable emotion, while “negative” refers to a pessimistic or unfavorable emotion. The sentiment of a text can be assessed using a pre-trained BERT-based sentiment classifier [
33]. In our experiments, text share 1 (474 words) and text share 2 (248 words) are generated from texts classified with positive (P: 99.84%) and negative (N: 96.54%) emotions, respectively. Two strategies are used to segment the text into sentences; strategy 1 segments the text with periods, while strategy 2 segments the text with punctuation marks, including commas and periods. Thus, the number of resulting sentences for strategy 1 is fewer than that for strategy 2. Data embedding is executed by selecting only one word to replace within each sentence. As shown in
Table 1, the steganographic texts generated with different
values successfully preserve the sentiment classification results (CRs) of the cover text.
To validate the preservation of semantic fidelity between the cover text and modified text, we employ BERTScore [
35] to analyze the texts. BERTScore captures the deep semantic representations of both the cover text and the modified text. We then use cosine similarity to compare these representations, ensuring that the intrinsic meanings remain consistent between the two texts. As shown in
Table 2, the modified text effectively maintains the semantic essence of the cover text.
In summary, the modified text generated by the proposed scheme successfully retains the information from the cover text, as only one ambiguous word is replaced in each sentence. Although the text produced using segment strategy 2 exhibits slightly diminished quality compared to that generated through segment strategy 1, this trade-off yields a higher embedding capacity. However, this slight degradation in text fidelity also increases the risk of the text being detected by steganalysis. Therefore, for subsequent experiments, we select strategy 1 with . While the embedding capacity is relatively smaller, the proposed scheme ensures the highest text fidelity, preserving the integrity and meaning of the original cover text.
4.4. Comparison
We further compared the proposed scheme with two state-of-the-art schemes [
25,
26] using a testing set of 1000 sentences. We set
to eight and employed strategy 1 to execute (2, 3)-threshold linguistic secret sharing. As shown in
Table 3, the average embedding capacities (ECs) in bits per word (bpw) for Xiang et al.’s [
25] scheme and Yang et al.’s [
26] scheme are higher than that of our scheme, with values of 0.333 bpw. However, the proposed scheme exhibits better resistance to steganalysis, with detection accuracies of 0.514 and 0.504 for LSCNN [
28] and TSRNN [
29] models, respectively, which is lower than those of the compared schemes. Additionally, our scheme demonstrates a BERTScore of 0.948, indicating better preservation of the semantic information of the cover text.
Although the EC could be enhanced by selecting a larger or by applying strategy 2, this would inevitably introduce a trade-off between security and text fidelity, potentially distorting the naturalness of the generated text and making it more susceptible to detection or steganalysis. Moreover, our scheme is tolerant of data loss or user failure and requires no additional information or secret key management (metadata), which is more practical than the compared schemes.
4.5. Theoretical Analysis for IoT Implementation
Due to limited access to large-scale real-world IoT environments, our study currently focuses on theoretical analysis and simulations. The proposed scheme leverages pre-trained models such as GPT-4 and RoBERTa, which can be deployed without additional training, substantially lowering the implementation threshold. However, the computational complexity of these models presents significant challenges in IoT contexts. RoBERTa’s complexity is , where is the input sequence length and is the model dimension. Assuming an input length of 128 and approximately 300 million parameters, the computational requirement is about FLOPs (floating point operations). In comparison, GPT-4, with around 175 billion parameters, requires approximately FLOPs to generate a sentence of length 128. This scale of computation is suitable for high-performance cloud computing environments but may impact the real-time responsiveness of IoT devices. In practical IoT networks, participants could pre-agree on specific model versions for deployment in local or cloud-based environments. For IoT devices with constrained computational capabilities, we recommend a cloud deployment strategy to mitigate these computational demands and ensure efficient operation.
4.6. Limitations
Although the proposed scheme improves the security of secret data through a secret sharing mechanism, which is unconditionally secure [
36], several limitations still need to be addressed. First, while secret data can be recovered without the need for additional data, the scheme lacks an authentication mechanism to detect cheating by individual participants. Additionally, secret sharing requires more storage space because the secret data must be distributed across multiple shares. Moreover, the proposed ambiguous token selection algorithm imposes constraints on the embedding capacity of each sentence, potentially limiting the overall data throughput. Finally, the computational overhead introduced by neural network operations could become a challenge for resource-constrained IoT devices. Balancing the trade-off between enhanced security and computational efficiency remains a critical area for future research and optimization. Our future work will focus on addressing these issues.