*Article* **Velody 2—Resilient High-Capacity MIDI Steganography for Organ and Harpsichord Music**

**Eric Järpe 1,\* and Mattias Weckstén <sup>2</sup>**


\* Correspondence: eric.jarpe@hh.se; Tel.: +46-729-773-626

**Abstract:** A new method for musical steganography for the MIDI format is presented. The MIDI standard is a user-friendly music technology protocol that is frequently deployed by composers of different levels of ambition. There is to the author's knowledge no fully implemented and rigorously specified, publicly available method for MIDI steganography. The goal of this study, however, is to investigate how a novel MIDI steganography algorithm can be implemented by manipulation of the velocity attribute subject to restrictions of capacity and security. Many of today's MIDI steganography methods—less rigorously described in the literature—fail to be resilient to steganalysis. Traces (such as artefacts in the MIDI code which would not occur by the mere generation of MIDI music: MIDI file size inflation, radical changes in mean absolute error or peak signal-tonoise ratio of certain kinds of MIDI events or even audible effects in the stego MIDI file) that could catch the eye of a scrutinizing steganalyst are side-effects of many current methods described in the literature. This steganalysis resilience is an imperative property of the steganography method. However, by restricting the carrier MIDI files to classical organ and harpsichord pieces, the problem of velocities following the mood of the music can be avoided. The proposed method, called Velody 2, is found to be on par with or better than the cutting edge alternative methods regarding capacity and inflation while still possessing a better resilience against steganalysis. An audibility test was conducted to check that there are no signs of audible traces in the stego MIDI files.

**Keywords:** MIDI; velocity values; carrier file; stego file; capacity; steganalysis resilience; audibility; file-size change-rate; mean absolute error; peak signal-to-noise ratio

## **1. Introduction**

Steganography provides means for hiding information, not just making it intelligible by encrypting it. The concealment of a message at all can be the difference between life and death in cases when the very sending of a message (encrypted or not) is considered a crime and a threat to the authorities. The techniques of steganography have sometimes been criticized for serving criminals seeking to operate outside the law, but the use of it for whistleblowers and for freedom fighters (e.g., [1]) who are in countries with authoritarian regimes is well documented.

The technique of steganography does not by itself change a message, but merely hides its existence in other information. This is what distinguishes steganography from cryptography. Nevertheless, steganography is very often used in combination with cryptography, by first encrypting a message and then hiding it. This combination makes a very strong protection against revealing a secret message since upon looking for hidden messages it may be impossible to perform cryptanalysis on all possibly hidden data found in any of a great number of files. Thus, in effect, adding an encryption step can greatly improve security aspects of the message exchange—the content of the communication is not only secret but even the very existence that any kind of communication took place is unknown. This is an additional property that can be crucial in some circumstances where

**Citation:** Järpe, E.; Weckstén, M. Velody 2—Resilient High-Capacity MIDI Steganography for Organ and Harpsichord Music. *Appl. Sci.* **2021**, *11*, 39. https://dx.doi.org/10.3390/ app11010039

Received: 26 November 2020 Accepted: 19 December 2020 Published: 23 December 2020

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/ licenses/by/4.0/).

the occurrence of encrypted messages can draw attention from the authorities. Since the percentage of users of steganography is unknown to a much larger extent than is the case for e.g., cryptography, it is harder to motivate its relevance [2]. This is the reason why there are few reports on the numbers of use of such methods. However, this does not mean that methods of steganography are not used.

Steganography may be deployed in many respects, but in modern times it usually means involving computer files. This study focuses on musical steganography through the MIDI format. The MIDI format is a standard music protocol used worldwide to create music and to facilitate its accessibility.

#### *1.1. Related Literature*

Ever since 2000, the MIDI format has been subject to methods of steganography. Worth noting are e.g., pioneer works of Inoue and Matsumoto [3] and Adli and Nakao [4]. In the former, which was preceded by several conference papers by the same authors, three requirements for steganography of MIDI files are established. These requirements are (1) that MIDI music sounds the same after steganography as it did before, (2) that the stego MIDI file should satisfy the requirements of the MIDI format, and (3) that extraction of the hidden message from the stego MIDI file should be very difficult without the proper stego key. The authors continue to outline a method for encoding data in MIDI files using permutations of note events. In the latter, three methods of steganography are briefly specified. 2009 Yamamoto and Iwakiri [5] made a short but dense paper where they present a cunning method to implement the hidden message by LSB modifications of durations of notes. This gives a high capacity, relatively speaking, for hiding messages compared to the size of the carrier file. It is claimed that little performance quality is lost which is demonstrated with a *χ*2-test. An ambitious study was made in Wu and Chen [6] and just recently pursued by Liu and Wu [7] about a method which modifies the way deltatimes (i.e., the time elapsed between MIDI events) may be represented in the MIDI format. This renders them high capacity methods which, in the latter paper, is also claimed to be performance preserving, which means that no distortion of any kind is added to the MIDI file upon steganography with that method. Nevertheless, it inflates the MIDI file substantially and is thus not property preserving. Another recent contribution to the field is Wu, Hsiang and Chen [8] where an extremely cautious variant of velocity modification is defined by preserving the common increasing or decreasing trends in velocity among sequences of notes with increasing and decreasing pitch. There are also many methods that deal with steganography of files of the MP3 format, e.g., [9–12]. These methods may be interesting to compare to from a property perspective. For instance capacity and audibility may be considered for any music format regardless of which technique is used. Still, many comparisons are difficult because of different formats.

Aspects of capacity, robustness and transparency are mentioned in Lang et al [13]. This is an extensive inventory of the techniques for steganography in general and it even touches upon steganalysis. A more recent survey study is given by Sumathi, Santanam and Umamaaheswari [14] which also considers various steganography methods, not just audio.

#### *1.2. Aspects of Steganalysis Resilience*

Picture the steganalyst trying to make out whether or not a particular MIDI file is a case of steganography or not. Then, many properties would be more revealing while others are entirely plausible in a stego MIDI file. A few examples of this are:


music is mainly entered in one of two ways: either by automatically scanning notes which are translated to MIDI code by some software which would make only one value of velocity (i.e., all velocity values would be identical), or the music would be played on some MIDI keyboard and entered by some MIDI sequencer program thus resulting in many different velocity values.

C. A few methods leave audible traces (such as clicks or chirps) in the stego file. This is less common in steganography of MIDI files but does occur in music steganography of other formats, such as in Szczypiorski and Zydecki [12] and Adli and Nakao [4].

The method proposed by Liu and Wu [7] is based on the technique of altering the coding of delta times and use this encoding for permutation-based data encoding. While the method has no performance effect it does severely inflate the file size and introduce redundant data that does not contribute to the performance, which would most likely trigger a steganalyst.

By Wu, Hsiang and Chen [8] a method based on the technique of adjusting the velocities of some of the note-on events was proposed. While the goal of this strategy is to make the adjustments in such a way that the performance effect is minimized, it still changes the velocity values and is therefore classified as having a performance effect, even if not registered by mere listening to the music.

In Vaske, Weckstén and Järpe [15] a method based on a simple technique of adjusting the velocities of note-on events up or down one bit was introduced. While impossible to register such a minuscule change for a listener it is considered to have a performance effect and will also leave a telltale pattern of suspect artefacts.

The method proposed by Wu and Chen [6] manipulates the coding of the delta-time events and code data directly into these events. This strategy inflates the file size according to the authors and introduces a minuscule performance effect within a set tolerance.

In Yamamoto and Iwakiri [5] a method which manipulates the duration between events to encode data was proposed. The authors show experimental evidence of naturally occurring fluctuations which would allow the embedding to take place without being noted as suspicious. However, the suggested strategy does introduce a minuscule performance effect.

The LSB method suggested by Adli and Nakao [4] simply encodes the clear text message in the LSB of the velocity of the note-on event. This strategy does introduce a minuscule change in the performance effect, but also since no intermediate step of processing the data exists there will be non-random patterns in the LSB of the velocity values which could be detected.

The repeated command method proposed by Adli and Nakao [4] encodes data using repeating commands configured in such a way that only the last command of a series will affect the output from the interpreted MIDI file. This will inflate the file size and show up as suspect artefacts.

In Adli and Nakao [4] a SysEx method that encodes data in non-standard commands that would normally not contribute to the interpreted MIDI file output was proposed. This strategy inflates the file and shows up as suspect artefacts. The authors also claim that although output is normally not affected, in some cases there is a notable performance effect. Since this strategy does not adhere to the MIDI file standard it would also violate the second rule of SMF steganography, requiring that "The stego SMF flawlessly satisfies the specification of the standard MIDI files" as described in [3].

The method proposed by Inoue, Suzuki and Matsumoto [3] encodes data by the permutation of the order of notes in simulnotes. This strategy does not inflate the file size nor does it have any performance effect, and the two suggested strategies of permutation tries to mimic two common standards of event arrangement in the simulnotes to avoid steganalysis.

The method purposed in this paper is to develop a balanced picture of what aspects are more important in MIDI steganography and put the suggested method for MIDI steganography, Velody 2, into its scientific context among alternative methods.

### **2. The Proposed Method**

The suggested method Velody 2 consists of encoding encrypted data at high capacity into the velocities of the note-on events, while mimicking humanization available in tools with MIDI support such as Ableton Live [16]. The source code of the proposed method is publicly available at http://github.com/wecksten/Velody-2.0 as referred to in Appendix A. It achieves the properties of being blind, high capacity and provides steganalysis resilience of data embedding.

The effect from using this method is that it sets the velocities to values within a narrow interval to be specified, thus removing potential mood swings in the velocity parameter of the music. This effect is minimized by restricting the use of it for organ and harpsichord music (which naturally is performed with close to constant velocities). Therefore, suspicion from steganalysts is avoided regarding the audible change of velocities. Nevertheless, the method can be used with little artificial effect on a wider range of music, as indicated by including the piano piece Für Elise by Ludwig van Beethoven in the set of MIDI songs in the experiment to test for audibility effects of the method. Of course, it is not restricted to single-instrument music, but restricting it to organ and harpsichord merely means requiring that the steganography is performed only on the velocities of these instruments though they could be a part of a larger ensemble. An example of music for an ensemble with multiple instruments is Cantata Cantata Gott der Herr ist Sonn und Schild by J.S. Bach which was part of this study.

Regarding the property of performance preservation, using Velody 2 for steganography of organ and harpsichord music should not change performance to any extent that leads to suspicions from steganalysis. As for the property of reversibility, if the point with this is to be able to show a MIDI file which does not contain any hidden message once having extracted it, this can be achieved in other ways. Therefore, this property is regarded as less important compared to the properties of steganalysis resilience and capacity for instance.

To embed a plaintext message in a carrier MIDI file the process can be split up into three steps: (1) data preparation, (2) data encryption, and (3) data encoding. The extraction process of a plaintext message from a stego MIDI file is performed in a very similar manner by reversing the order of the steps (4) data decoding, (5) data decryption, and (6) data unboxing.

#### *2.1. Preparation*

To be able to extract just the embedded message and nothing more, the message length needs to be known. This can be done in many ways, but assuming that most messages will be less than 256 bytes of length one approach is to add an eight-bit message header to indicate the message length. This approach allows for longer messages if that would be required by stacking several blocks after one another. Assuming most messages are less than two blocks in length this approach will have the same or less overhead than an approach where 16 bits would be used to indicate the message length. To prepare the data for encryption the clear text message *M* of length *w* bytes where *w* = |*M*|, *M* is divided into *<sup>n</sup>* blocks *<sup>B</sup>*1, *<sup>B</sup>*2, ... , *Bn*, where *<sup>n</sup>* <sup>=</sup> *<sup>w</sup>* <sup>256</sup> . An eight-bit header *Hi* is introduced for each block *Bi*, where *Hi* = |*Bi*| and where |*Bi*| is the block size in bytes. As can be seen in the Figure 1, the prepared message *P* is equal to the assembly *P* = (*S*, *H*1, *B*1, *H*2, *B*2,..., *Hn*, *Bn*).

#### *2.2. Encryption*

The prepared message *P* is encrypted using a standard synchronous stream cypher and a shared key generating the encrypted message *E* = *F*(*P*) which is very similar to random data in distribution.

#### *2.3. Encoding*

The carrier MIDI file is unpacked into a stream of MIDI messages *S*1, *S*2, *Sm* where each message of the type "note-on" is evaluated for data embedding. If the velocity for the "note-on" event *Si* is less than 2*Ne* the velocity *vi* = *velocity*(*Si*) is replaced with a random number between a lower bound *<sup>l</sup>* and an upper bound *<sup>u</sup>* <sup>=</sup> <sup>2</sup>log *vi*+<sup>1</sup> <sup>−</sup> 1. If the velocity for the "note-on" event *Si* is greater than or equal to 2*Ne* the velocity is LSB encoded with the next *Ne* bits from the encrypted message stream. LSB encoding is performed by clearing the *Ne* least significant bits of the velocity value and then adding the *Ne* bit long value from the encrypted message stream.

#### *2.4. Decoding*

The stego MIDI file is unpacked into a stream of MIDI messages *S*1, *S*2, *Sm* where each message of the type "note-on" is evaluated for data decoding. If the velocity for the "note-on" event *Si* is greater than or equal to 2*Ne* the velocity is LSB decoded using the *Ne* least significant bits. The LSB decoding is performed by pushing the *Ne* least significant bits of the velocity to the decoded data stream.

#### *2.5. Decryption*

The decoded message *D* is decrypted using a standard synchronous stream cypher and a shared key generating the prepared message *P* = *F*(*E*).

#### *2.6. Unboxing*

The prepared message *P* = (*P*1, *P*2, ... , *Pn*) is unboxed by reconstruction of the header value *Hi* = *Pj* and then copying *Hi* bytes of data from the prepared message *P* to the clear text message *M* by adding the extracted data to the end of the clear text message *M* = *M* + (*Pj*+1, *Pj*+2, ... , *Pj*+*Hi* ). This process is repeated until the prepared message *P* is out of data or the header value read from the prepared message is equal to 0. The full clear text message is now available in *M*.
