Steganography - The Art of Concealing information

Daniel Butean
Software developer @ Siemens

Alex Ghiran
Software developer @ Siemens

Călin Manea
Java developer @ Siemens

PROGRAMMING

Steganography is a technique whereby information is hidden inside otherwise innocuous- seeming information to preserve its secrecy. Etymologically, it comes from the combination of the Greek word steganos (which means concealed) and graphia (which means writing).

It has a similar purpose to cryptography, however, it differs from it by trying to hide the very existence of secret information being passed on. This difference has enabled the use of steganography in instances where the use of cryptography would be incriminating to use, such as under a totalitarian regime that strictly monitors all communications.

Physical steganography has a long history and goes as far back as the ancient period. The first recorded case was documented by Herodotus and involved shaving a slave's head and tattooing a secret message on his scalp, waiting for the hair to grow back, and sending the slave as a messenger with the instruction that when he reaches his destination, he should ask that his head be shaved. Other steganography techniques that have been used are secret inks, Morse code messages, messages written on envelopes under stamps, etc.

Digital steganography is the practice of concealing information within another type of information, usually a multimedia file. Pictures are useful for this purpose, as they are large and contain information about so many pixels that it would be very hard to notice that an image was altered if you did not know beforehand the details of the algorithm being used. Common algorithms are: modifying the least significant bits of every pixel to recombine them and obtain the secret message or modifying the color bytes of every n-th pixel of an image.

Digital Colors

In a digital system, colors are most frequently represented as additive combinations of red (R), blue (B), and green (G). Each of these primary colors is assigned a value from 0 to a maximum. This maximum is dictated by the size of the numbers used to represent the amount of the primary color. Thus, if 8 bits are used to represent the amount of a single primary color we can represent 256 hues of a single primary. Since we have three primaries, we can represent over 16 million colors with 24 bits.

Here we will represent RGB colors in a [R, G, B] format (eg. "[120, 100, 10]"). In the 8-bit RGB color space, [0, 0, 0] is black, [255, 255, 255] is white, and [124, 216, 255] is a light blue. It is also possible to represent grayscale similarly. Where RGB has a value for each of the three primary colors, grayscale values need only represent an intensity. This intensity value will determine how black or white the gray is. Grayscale colors will be referred to by a single number. In one byte grayscale, black has a value of 0, white is 255, and light gray is 200.

A pixel is the fundamental unit of digital imagery. It is the smallest point whose color can be controlled. On a computer monitor, a pixel emits a color of light. When we record images, we size them in pixels. Each pixel is laid out in a grid and is given a specified color.

Each color channel R, G, and B in the RGB color space is represented by a number, and this number is represented by several bits. A bit-plane refers to all the bits at a single bit position across an image. Consider number ten, whose 8-bit binary representation is "00001010". Starting from the right, we have a "0" in the zeroth bit-plane, a "1" in the first, a "0" in the second, and so on for all eight bits. In an image, a bit-plane refers to the 0 or 1 value at a given position for all pixels, laid out in the same format. In LSB Steganography, the least significant bit-planes are manipulated.

LSB Algorithm

Least Significant Bit (LSB) embedding is a simple strategy to implement steganography. Like all steganographic methods, it embeds the data into the cover so that it cannot be detected by a casual observer. The technique works by replacing some of the information in a given pixel with information from the data in the image.

While it is possible to embed data into an image on any bit-plane, LSB embedding is performed on the least significant bit(s). This minimizes the variation in colors that the embedding creates. For example, embedding into the least significant bit changes the color value by one. Embedding into the second bit-plane can change the color value by 2. If embedding is performed on the least significant two pixels, the result is that color in the cover can be any of the four colors after embedding. Steganography avoids introducing as much variation as possible, to minimize the likelihood of detection. In an LSB embedding, we always lose some information from the cover image. This is an effect of embedding directly into a pixel. To do this we must discard some of the cover's information and replace it with information from the data to hide.

The LSB algorithm can be applied using either 1, 2, or 4 least significant bits. When a larger number of bits are used, the chance of visual artifacts becoming apparent in the image increases, but so does the total capacity to encode information. The LSB algorithm works only on images that are encoded with a lossless compression format (e.g. bmp, png), although, other steganography algorithms can be used with loss-prone compression formats such as JPEG.

Discrete cosine transform or DCT-based steganography is a sub-type of LSB steganography that is often applied on JPEG-format carriers (i.e., when JPEG images are used to carry the payload). In this method, the communicated data is secretly encoded into the DCT coefficients. All other factors being equal, this method provides a somewhat lower data-carrying capacity. One of the reasons for this is that the coefficient values of 0 and 1 cannot be altered, so no data can be encoded whenever the coefficients take on these values.

LSB Explained

Let's have a look at some pixels of the cover image(left) and a secret text(right).

First step would be to decide how many bit-planes from the cover image will be used. For this example, we will use two bits.

The secret text is serialized and split into groups of two bits and every two less significant bits from the cover image are replaced with two bits of text.

Each byte of the cover image loses its last two bits and receives instead two bits of the Secret text:

Example

Using a simple implementation of an LSB algorithm we've embedded the text of the well-known Hemingway novel The Old Man and The Sea in a normal image. The original image and the processed one can be seen below and it can be noticed that the human eye cannot see any difference between the two images. This aspect is interesting because the processed image contains over 1000 lines of meaningful information.

Original image

Image containing the entire The Old Man and The Sea novel - 2 bits altered

Image containing the entire The Old Man and The Sea novel - 4 bits altered - the alteration of the image can be observed by the naked eye (check upper part of the image).

Implications for Cybersecurity

Payload carriers are very difficult to detect by anti-malware applications because they look like normal images or files. This is one of the reasons why steganography techniques are increasingly being used by malware and cyber-espionage tools. Steganography helps conceal not just the data itself, but the fact that the data is being exchanged. This makes it difficult for security tools, like deep packet inspection(DPI) systems or anti-APT (advanced persistent threat) products, to check all the communication from a corporate network.

Proofpoint found in 2017 a malicious loader called Zero.T which hides code in some images and then processes them in a particular way to obtain malicious modules:

Fișiere originale	Fișiere procesate
fsguidll.bmp	fsguidll.exe
fslapi.bmp	fslapi.dll
fslapi.dll.bmp	fslapi.dll.bmp

In 2020, Kaspersky Labs discovered a similar malware, called MontysThree, which uses steganography and several encryption schemes to build and run the malicious code on the target system. This malware searches for specific Microsoft Office and Adobe Acrobat documents stored in current documents directories and uploads them to legitimate public cloud services such as Google, Microsoft, and Dropbox.

Unless someone knows the algorithm that is used to encode information beforehand, it is very difficult to detect the presence of hidden information using steganography algorithms. The most effective way is to know the original image and to compare a potentially modified image with it, but this is impossible in most cases, as it is very unlikely that you have the image you analyze beforehand. Other methods perform a statistical analysis of the data in the image to see how ordered it is, if there is a lot more order than can be expected than it can be supposed that there is some information hidden, of course, these methods will be unlikely to detect an image that encodes a lot less information than the potential maximum.

Other Use Cases

It is conceivable that some applications can use steganography algorithms not to hide data but to increase the amount of useful information transmitted without increasing the bandwidth. Such an application must meet several requirements: it must have a limited bandwidth transmission speed, it must nevertheless need to transmit audio or images and it must not be constrained by computing capability. The most obvious candidate for such an application would be in space exploration since exploratory spacecraft have powerful computing ability, need to transmit images of their findings back to Earth, and transmission rates over the vastness of space are very low.

Conclusion

Steganography proves to be an incredibly effective way of hiding the act of communication. The ease and effectiveness of LSB embedding make it an attractive method to transmit messages without detection. With the rise in popularity of image-sharing services on the Internet, it is increasingly likely that an image shared online for a short period of time would not be analyzed. It is important to note that, while steganography does not guarantee that a message cannot be decoded, paring steganography with encryption provides a means of communication that is difficult to detect and can be nearly impossible for a third party to decode.

While steganography can be detected by statistical attacks, relying on safety in numbers and obscure embedding patterns can limit the decoding of any particular hidden message.

Steganography's effectiveness, ease of implementation, and extensibility all suggest that it will be a considerable security concern for the foreseeable future.