Fourandsix on TwitterFollow @Fourand6
« Digital Forensics from ENF | Main | Truthful Photo Editing: Liquify and other Distortions »
Monday
Jan232012

Photo Forensics from JPEG Ghosts

The JPEG format employs a lossy compression scheme which sacrifices image detail in order to reduce the stored file size. Once an image is compressed this image detail is irreversibly lost. While this loss of information is often problematic for a forensic analyst, it can also be used to determine if an image has been altered from the time of its recording.

The JPEG  quality is typically specified by a numeric value, where a higher value corresponds to a better image quality but larger file size. For example, in the “Save for Web & Devices…” interface, Photoshop uses a quality scale of 0 (worst) to 100 (best). Consider the case where an image is recorded by a camera at quality 55, edited in Photoshop, and then re-saved at a higher quality of 80. Although the image quality may appear to be 80, the effective quality is governed by the original lower quality of 55.

This means that if the image is re-saved at quality 55, it will appear surprisingly unchanged from its expected quality of 80. The image contains a JPEG ghost — a memory of itself at quality 55. The reason is that there is little added compression to be gained since the image was previously saved at quality 55. At the same time, an original image of quality 80 re-saved to a quality 55 will result in an image that is more substantially different since in this case the compression results in a loss of information.

A JPEG ghost can be easily uncovered by comparing the image in question to re-saved versions of the image. The simplest comparison is to compute the average absolute pixel value difference. Shown below is the result (red) of this calculation for an image initially saved at quality 55, edited, saved at quality 80, and then re-saved at qualities 10 through 100. Also shown is the result (blue) for an image initially saved at quality 80, and then re-saved at qualities 10 through 100 (blue). 

Note that in the former case there are two dips in the graph — an expected dip at 80 (the current quality) and a second dip at 55 (the original compression quality). This second dip is a telltale sign that the image is not an original. 

There are a few limitations to this analysis. First, this analysis only works when the edited image is saved at a higher quality than the original image. Second, this analysis work on the assumption that the re-compression of the image uses similar quality settings of whatever photo editing software was used. Third, this analysis only works if the image was not cropped after editing — that is, if the JPEG block boundaries are preserved. Each of these limitations can be alleviated by using a slightly more involved analysis as described here and here

Though it is a bit tedious, some of this analysis can actually be done using Photoshop. Copies of the image would need to be saved at each compression level, and then each of those copies would need to be stacked in the Layers palette with the original image in Difference blending mode. You can then view the Mean value in the Histogram panel, and plot these values for each of the compression levels you saved.

The loss of information in a JPEG image can often be frustrating to a forensic analyst. At the same time because the JPEG format eliminates specific image details in a specfic way, this compression format can also be used to our advantage in a forensic setting.

PrintView Printer Friendly Version

Reader Comments (1)

I just tried repeating this experiment. I took a large image, saved it at 32, then resaved it at quality 88 using GIMP.

Then, using Python's PIL, I created a plot like the one above. I got a very, very small dip at 32 and none at all at 88. I guess different softwares can use different ways of coding (although I had imagined PIL and GIMP both using libjpeg internally).

[The most likely explanation for this is that, as you suggest, the JPEG quantization tables are sufficiently different between GIMP and PIL (to verify this, you might want to repeat the experiment using the PIL library to generate the original and subsequent JPEG images). This is why in practice, we analyze the individual DCT coefficients which is more likely to have overlap between the original and subsequent JPEG quantization tables). -Hany]

January 23, 2012 | Unregistered CommenterPetter

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>