Many sorts of image processing software facilitate image editing and also generate a great number of doctored images. Forensic technology emerges to detect the unintentional or malicious image operations. Most of forensic methods focus on the detection of single operations. However, a series of operations may be used to sequentially manipulate an image, which makes the operation detection problem complex. Forensic investigators always want to know as much exhaustive information about a suspicious image’s entire processing history as possible. The detection of the operation chain, consisting of a series of operations, is a significant and challenging problem in the research field of forensics. In this paper, based on the histogram distribution uniformity of a manipulated image, we propose an operation chain detection scheme to identify histogram equalization (HE) followed by the ditherlike operation (DLO). Two histogram features and a local spatial feature are utilized to further determine which DLO may have been applied. Both theoretical analysis and experimental results verify the effectiveness of our proposed scheme for both global and local scenarios.
1. Introduction
W
ith the development of multimedia and network techniques in the last two decades, digital media and social networks have become prevalent. The popularization of portable instruments and social software accelerates these tendencies in daily life. From these digital media, we can acquire news, entertainment and advertising information, etc. Currently, digital media play a key role in making important decisions in news media, law enforcement and government organizations. However, numerous image editing softwares can be used to modify digital images easily, which makes the authenticity of digital media significant and tobeproved. Therefore, digital forensics techniques are emerging to meet the urgent demand.
The primary goals of digital image forensics are to determine whether an image is forgery
[1
,
2]
and, further, to reveal the entire processing history
[3]
. A number of researchers have exploited many distinct fingerprints to detect specific manipulations
[4]
, e.g., resampling
[5
,
6
,
7]
, median filtering
[8
,
9]
, contrast enhancement
[10
,
11]
, JPEG compression
[3]
, etc. A large number of operation detection studies concentrate on designing a specific detector to reveal a specific single operation, while an image may have undergone a series of operations (an operation chain) in real forensic scenarios.
Operation chain detection is a complex and challenging problem. The features of sequential different operations coexist and affect each other, and the types and order of the operations are hardly detectable in the results. A straightforward detection scheme consists of applying stateoftheart single operation detection methods separately in the operation chain scenario. The success of this scheme relies on the robustness of the single operation detectors against subsequent operation. M.C. Stamm et al.
[12]
proposed a conditional fingerprint to determine the order of operations consisting of contrast enhancement followed by resizing. For an image undergone contrast enhancement followed by resizing, the conditional fingerprint is extracted from the pixels that are preserved untouched after resizing. In other words, the conditional fingerprint is equivalent to the wellknown fingerprint of contrast enhancement
[10]
. Because the original fingerprint of contrast enhancement lacks robustness, this method is sensitive to the scaling factor, which influences the number of the pixels reserved contrast enhancement feature. A reverse engineering method has been developed to detect another operation chain: double JPEG interposed by resizing
[16]
. Their method is based on the observation that a decompressed JPEG image trends to be the near lattice distribution (NLD), which is maintained after resizing and subsequent recompression. For each estimated resizing factor, the tested image was reversely resized, and a measure of the NLD was computed. Images with an NLD value above a specific threshold are judged to have undergone this operation chain. For discriminating different image sources in steganalysis,
[13]
proposed a framework consisting of existing JPEG decompressed identifier followed by two corresponding steganalyzers.
An alternate scheme for operation chain detection consists of modeling the change of the distinct features introduced by one operation due to other operations, even exploiting those new distinct features of the operation chain. Conotter, V. et al.
[14]
observed that the DCT coefficients of an image, which has undergone JPEG followed by linear filtering, present a specific probabilistic distribution. As a result, the operation chain (JPEG followed by linearly filtering) can be detected by measuring the distance between the DCT distribution of derived models and the actual distribution of the tested image. The authors make the assumption that the compression quality factors are known as a priori. To overcome this shortcoming, a set of features extracted from the DCT distribution was used to train a classifier to detect the same operation chain
[15]
. P. Ferrara et al.
[17]
applied the peaktovalley artifact and specific distribution of the first digit of DCT coefficients to identify the operation chain, i.e., double JPEG interposed by contrast enhancement. In this chain a linear contrast enhancement is applied to each DCT coefficient of the JPEG image, with the result that the DCT distribution presents a periodicity related to the parameter of the contrast enhancement.
In this paper, we focus on detecting the operation chain consisting of histogram equalization and a ditherlike operation, i.e., HEDLO. HE is a general image editing operation and a step in creating forgery photos, which is often used to enhance the details of the lighter or darker regions of an image. To create a forgery image, some postprocessing, such as filtering and resampling, may need to be performed after applying HE to make the image more consistent, and the HE image is usually stored in JPEG format. These operations, including filtering, resampling and JPEG compression, etc., have the common characteristic that these operations result in dithered gray values. Thus, some forgery images created in this manner can be identified by detecting the HEDLO operation chain.
More specifically, we will analyze the difference between the histogram of an image that has undergone HE and that of an image manipulated by the HEDLO. We observe that applying DLO to a histogram equalized image results in a more uniform histogram effect, which can be used to develop the fingerprint of the HEDLO operation chain. Two features are proposed to describe the specific fingerprints, i.e., the uniformity and centroid of the histogram, to determine if such an operation chain exists. Then, the combination of our two features and the local binary pattern (LBP) feature can further identify the specific DLO that has been utilized. Our scheme can be applied to cutandpaste forgery detection to estimate the location of the tampered region.
The rest of the paper is organized as follows. In Section 2, we present the histogram character of the HEDLO operation chain. We analyze the histogram characteristics of two cases, HE and the HEDLO operation chain in Section 3 and then propose our features to detect the operation chain in Section 4. In Section 5, we present the performance result of the new approaches for detecting both globally and locally applied HEDLO. Finally, we conclude this paper in Section 6. To simplify the notation, we denote the histogramequalized image and the image that has undergone HEDLO as the HE image and HEDLO image, respectively.
2. Character of HEDLO Chain
The aim of this work is to study the trace left behind due to the HEDLO operation chain. We will firstly review the HE operation. Contrast enhancement can be effectively used not only to improve the lightness and darkness of overexposed and underexposed images but also to preprocess an image or a contiguous set of pixels to create a forgery image. Widely used HE is a specific contrast enhancement operation; it has the advantage of making graylevel values span the entire gray range automatically without requiring a parameter option. HE is essentially a gray value mapping from the pixel values in the original image to that in the HE image using the cumulative distribution function (CDF). It can be calculated according to the following equation:
where
round
(·) denotes the rounding function;
h_{X}
(
n
) is the
n
th bin of the graylevel histogram of the original image, i.e., the number of pixels equal to
n
; and
N
is the total number of pixels;
x ,y
are the gray values in the original and HE image, respectively. Given an image, pixellevel
x
is mapped to pixellevel
y
, which is determined by the number of pixels corresponding to gray value less than or equal to
x
in the original image.
Fig. 1
(a) and (b) indicate the graylevel histogram of the Lena image and its HE version.
Gray level histogram of (a) a Lena image and (b) the HE version. The histogram of (c) the original image that has undergone uniform noise and that of (d) the HE image manipulated by uniform noise, where noise~u[1,1]. The histogram of the HE image manipulated by (e) median filter 3×3 and by (f) gamma transformation (r=1.7).
After being enhanced by HE, the image is likely to be manipulated by DLO unconsciously or maliciously
[18]
. Specifically, each sample may be altered to a minor degree or recalculated with the consideration of adjacent samples by DLO operation. For the latter, because of the high correlation of adjacent pixels, the recalculated pixel should be similar to the original one. These two cases are equivalent to introducing a dither to the pixel. Therefore, the corresponding gray levels are dithered in the histogram. Both lowpass (LP) and highpass (HP) filters, e.g., Average, Gaussian and Laplacian filters, possess this effect. Even Resampling and JPEG belong to these types of operation. The inevitable processing of these two operations, i.e., interpolation and quantization, separately play a common role in dithering the pixel value.
We model the process of applying DLO as adding uniformly distributed noise over the interval [1, 1].
Fig. 1
(a)(d) illustrates the histograms of the Lena image, the HE version, the adding uniformly distributed noise version and the HE image manipulated by adding uniformly distributed noise. From
Fig. 1
(b), we observe that the histogram has been stretched to span all possible gray value but still keep the envelope similar to that of the original one. As indicated in
Fig. 1
(c), adding a weak uniformly distributed noise to the original image is not a valid way of changing the envelope of the histogram. In
Fig. 1
(d), although the histogram retains the shapes of the highest peaks, an approximately uniform distribution histogram emerges. The same phenomenon can be found if the other DLO is applied to the HE image. It seems that HE images followed by a postoperation have a more uniform histogram. Nevertheless, not all postoperations following HE can lead to a uniform histogram, e.g., Median Filtering and the Pixel value mapping operation.
Fig. 1
(e) and (f) display the histograms of the HE image manipulated by median filter and gamma correction, respectively. Their envelopes remain similar to that of the HE image. Therefore, the operation chain consisting of HE followed by DLO leads to the generation of the uniform histogram. In the next section, we will analyze and quantify this distinct fingerprint.
3. Histogram Distribution Analysis of HE and HEDLO images
The graylevel histogram of an unaltered image can be determined based on the light intensity reflected from the real world. It can be proven that its histogram is interpolatably connected
[10]
, but it cannot be proven to exhibit a uniform distribution. Moreover, except for gray value mapping, general operations change the histogram distribution only minimally and cannot increase the uniformity of the histogram. To the best of our knowledge, HE can alter the histogram distribution and bring the histogram close to the uniform distribution. We compare the capability to equalize the histogram between the HE operation and HEDLO operation chains. In this section, we study the histogram uniformity induced in these two cases. The similarities between histogram distributions and the ideal uniform distribution are analyzed individually. Specifically, the general histogram bin is estimated and applied to compare to the bin of the ideal uniform distribution histogram.
 3.1 The Histogram Uniformity of the HE Image
The uniform histogram is a basic assumption in the HE detection method
[10]
; in fact, the disappearance and merging of the graylevel often make the histogram far from uniform
[11]
. The authors of
[19]
also indicate that it cannot be proved, in general, that histogram equalization will produce a uniform distribution histogram. Thus, the histogram uniformity of the HE image remains an interesting and significant problem.
Without loss of generality, we take an example to demonstrate the graylevel mapping from the original image to the HE image. We denote
h_{X}
and
h_{Y}
as the histograms of the original image and HE image. It can be seen from
Fig. 2
(a) that one or several gray bins in the histogram
h_{X}
are mapped or merged to one bin in
h_{Y}
. Take three adjacent nonzero graylevels in
h_{Y}
, i.e.,
y_{k}
,
y_{k}
_{+1}
and
y_{k+q}
as examples, the former two graylevels can be calculated by Eq. (1) as follows:
where
k
is the maximum possible graylevel mapped to
y_{k}
,
1
is the difference between
y_{k}
and
y_{k}
_{+}
_{1}
,
1, m
∈
Z
^{+}
, and
k
+
m
is the maximum possible graylevel mapped to
y_{k}
_{+}
_{1}
. It is similar to
Fig. 2
(a),
y_{k}
is mapped by at least the graylevel
k
, and
y_{k}
_{+}
_{1}
is mapped by the graylevels
k
+ 1, ⋯,
k
+
m
.
Example graph indicating (a) graylevel mapping of the original image to the HE image, (b) the introduction of y^{g} into the histogram of the HE image and (c) the variation of y^{g} in the histogram of the HEUN image.
In other words, the bins
h_{X}
(
k
+ 1), ⋯,
h_{X}
(
k
+
m
) all contribute to forming the bin
y_{k}
_{+}
_{1}
. Meanwhile, some zeroheight bins, i.e., the gap bin, are introduced in
y_{k}
. Specifically, if
h_{X}
(
k
+ 1) is sufficiently large, the interval between
y_{k}
and
y_{k}
_{+}
_{1}
may be larger than one, i.e.,
l
≥ 2, one or more gaps between
h_{Y}
(
y_{k}
) and
h_{Y}
(
y_{k}
_{+}
_{1}
) are introduced, as indicated in
Fig. 2
(b).
To investigate the histogram of the HE image, the following two easily proven lemmas are first considered:
Lemma 1: Given
a
∈
R
and
A
∈
Z
Lemma 2: Given
a,b
∈
R
and
A,C
∈
Z
round(
a
+
b
) =
C
and
round
(
a
) =
A
, Hence
Given
y_{k}
_{+}
_{1}

y_{k}
=
1
, Based on Eq. (2), Eq. (3) and inequality (5), we obtain
Because
the inequality (6) can be rewritten as
We normalize the histogram as
H_{Y}
(
y_{k}
_{+}
_{1}
)
Thus, the probability range of
y_{k}
_{+}
_{1}
is related to
1
, increasing with the increase of
1
.
In probability theory and statistics, the discrete uniform distribution takes on a finite number of values with equal probabilities. Each bin of the ideal uniform histogram with 256 gray levels has probability 1/256. The difference between each bin of the normalized histogram of the HE image and that of the ideal uniform histogram can be analyzed as follows. According to the relationship between the
H_{Y}
(
y
) and 1/256, we divide the bins in
h_{Y}
into three types, i.e., the gap bin, the normal bin and the high bin, denoted as
y^{g}
,
y^{n}
and
y^{h}
, respectively. If
H_{Y}
(
y
) = 0, we call the bin
y^{g}
. If
H_{Y}
(
y
) ∈ (0,2 / 255), we call the bin
y^{n}
. Because the high bin is mapped by at least one bin in the original image histogram, overhigh bins are rare. The average numbers of bins larger than 4/255 in the normalized histogram are calculated from the 1338 image from the UCID database. The result indicates that only 1% of the 256 bins are larger than 4/255. Thus, we denote
y^{h}
as
H_{Y}
(
y
) ∈ [2 / 255,4 / 255).
Because the height value of
y^{n}
bin distributes around 1/256, the normal bins bring the histogram close to uniform distribution. Although HE enlarges the graylevel range, it cannot increase the graylevel; thus, a number of zero bins are introduced in the histogram of the HE image. Additionally,
y^{g}
is less than 1/256, and
y^{h}
is far greater than 1/256. Consequently, the gap and high bins make the histogram of the HE image far from a uniform distribution. The number of these three type bins for the Lena image manipulated by HE are calculated and displayed in the first row of
Table 1
. The sum of the gap and high bins is approximately half of all 256 bins. As indicated in
Fig. 1
(b), the histogram of the HE image maintains a similar envelope to the original image.
Numbers of three types of the histogram bins of the HE and HERS Lena image
Numbers of three types of the histogram bins of the HE and HERS Lena image
 3.2 The Histogram Uniformity of the HEDLO Image
For the sake of simplicity, we model the DLO application process as the addition of noise to investigate the variation of the histogram uniformity of the HEDLO image. However, adding strong noise can destroy both perceptual image quality and the histogram, making the histogram uniform. Hence, we select weak noise, which cannot ruin the histogram distribution by itself. Therefore, we take into account the addition of uniformly distributed noise with the interval [1, 1], i.e.,
noise~u
[1,1]. The noise makes each of the gray value
z
spread by the uniform distribution over the range [
z
 1,
z
+ 1]. The probabilities of gray values falling into
z
 1 th bin,
z
th bin and
z
+ 1 th bin are 1/4, 1/2 and 1/4, respectively. We abbreviate uniform distribution noise as UN, the operation chain in which the image is manipulated by HE and then by UN as is abbreviated as HEUN, and
h_{Y'}
represents the histogram of the HEUN image.
We use
1
= 2 as a concrete example to analyze the bins in
h_{Y'}
. The
h_{Y}
(
y^{g}
) between
y_{k}
bin and
y_{k}
_{+}
_{1}
bin is transformed to
h_{Y'}
(
y^{g}
) due to the addition of UN, as indicated in
Fig. 2
. (c). The
H_{Y}
(
y^{g}
) can be calculated as follows:
In this case, the inequality (8) can be rewritten as
According to whether the left adjacent bin of
y_{k}
is normal or high,
y_{k}
can be determined to belong to one of the following two ranges. If
y_{k}
is a normal bin, then
Substituting inequality (10) and (11) into formula (9), yields
If
y_{k}
is a high bin, then
Substituting inequality (10) and (13) into Eq. (9), we can obtain
Eq. (12) and Eq. (14) indicate that the probability of the gap bin in
h_{Y'}
is closer to 1/256 than that in
h_{Y}
.
From the definition of
y^{h}
, it can be easily concluded that
y^{h}
must follow
y^{g}
. We assume that
y_{k}
_{+}
_{1}
is
y^{h}
. The
H_{Y'}
(
y^{h}
) can be calculated in terms of whether the right adjacent bin
y_{k}
_{+}
_{q}
is a gap.
If
y_{k}
_{+}
_{q}
is a gap bin,
H_{Y'}
(
y^{h}
) =
H_{Y}
(
y^{h}
) / 2
If
y_{k}
_{+}
_{q}
is not a gap,
y_{k}
_{+}
_{q}
is a normal bin, so
The value of
H_{Y'}
(
y^{n}
) can be estimated in the same manner. The possible probability range of normal bins varies slightly according to the two adjacent bins. With the addition of the inequality (12), (14), (15) and (16), we can determine that the graylevel bins in the HEUN image normalized histogram are all clustered at the 1/256 than that in the HE image, as indicated in
Fig. 3
. The probabilities of three types of bin in the HE image and HEUN image are indicated as red bars and blue bars, respectively. Although the probability range of the normal bin is enlarged to a certain degree, the probability ranges of the other two bins are closer to the ideal uniform distribution after applying HEUN. The numbers of these three types of bin are calculated for the Lena image that has undergone HEUN and displayed in the second row of
Table 1
. The gap and high bins are all transformed to normal bins. The interference of the gap and high bins is drastically reduced by HEUN.
The probabilities of the normal, gap and high bin in the HE image and HEUN image are indicated as red bars and blue bars, respectively. The purple line is the uniform distribution baseline.
Consequently, the histogram of the HEDLO image is closer to the uniform distribution than the HE image. The theory studied in this section can be generalized to continuous gap bins presented in the histogram of the HE image. As long as the strength of the noise is sufficiently high, the gap bins can be filled, and the high bins can be partially removed. Additionally, given a Gaussian noise with
δ
=0.5, the probabilities of gray values falling into each bin and its two adjacent bins in the graylevel histogram are 68.2%, 15.7% and 15.7% (according to the three sigma rule), respectively. The tendency of the uniform histogram can be obtained in the same manner.
After manipulated by HEDLO operation chain, the gap and high bins are undermined, and a more likely uniform histogram emerges due to the combination of two operations. The uniform histogram can be used as an identifying feature to detect the HEDLO operation chain.
4. Proposed forensic approach
According to the above results and discussions, we propose two histogram features to characterize the uniformly distributed histogram; and then, a local spatial feature is proposed to capture the statistical trace left by every operation chains consisting of the HE and a different DLO. These features are described as follows.
 4.1 Uniform Histogram Metric
Because the digital images have discrete quantities and the general operation cannot perform as oneformany gray mapping, even HE cannot leads to a uniform histogram, as proven above. We believe that the most likely uniform distribution histogram is a specific characteristic of HEDLO. The mean absolute difference between the normalized histogram and the constant value 1/256 is calculated to capture this feature. The scalar metric of uniform histogram is defined as
where
h
(
s
) is the
s
th bin of the graylevel histogram of the image and
N
is the total number of pixels in the image.
τ
+ 1 and
φ
are the thresholds chosen to eliminate the numbers of continuous gap bins and the height of overhigh bins to reduce the disturbance of bins far from uniform distribution. Because the DLO dithering range is limited, not all continuous gaps can be filled, especially gaps without nonzero adjacent bins. The overhigh bins directly corresponding to the continuous gaps are difficult to transform to normal bins after weak dithering. Additionally, these two cases can also be introduced by the saturation effect, which causes an impulsive peak in the histogram of original image. Thus, the histogram of a saturated image that has undergone HEDLO contains a number of continuous gaps and overhigh bins, resulting in a higher MUH than the ideal uniform histogram. Thus, our scheme must eliminate these sources of interference. After calculating the MUH for a suspect image, the detection result can be obtained by a thresholding classification. The images with MUH values less than the decision threshold are to be judged to be HEDLO images.
 4.2 Histogram Centroid
Fig. 4
illustrates two histograms of the Lena image that has undergone HE and the addition of Gaussian noise using standard deviations of
δ
=0.5 and 0.8. In
Fig. 4
(a) the gaps have been partially filled, and the high bins have been partially removed, producing what we refer to as a combshaped histogram. Simultaneously, the envelope of the histogram tends toward the uniform distribution. As the strength of the noise increases, the envelope of the histogram approaches a similar uniform distribution, as shown in
Fig.4
(b). Based on this observation, we believe that the combshaped histogram is an additional distinct fingerprint of HEDLO. The comb shape is a transitional shape between the histogram of the HE image and the HEDLO image. The capability to fill the gap and remove the peak is constrained such that the tendency of the uniform histogram is obstructed, which makes the MUH feature unremarkable.
Graylevel histogram of the Lena image that has undergone HE and Gaussian noise with (a) δ=0.5 and (b) δ=0.8,
To characterize this specific feature, we consider the histogram to be a thin plate with uniform density. The histogram centroid (HC) is introduced to measure the height of the histogram envelope [20]. For the histogram of the unaltered image, its height can be preserved by oneforone graylevel mapping, and it can even be increased by manyforone mapping after manipulated by HE. The gap bins does not contribute to the alteration of this height. Therefore, the HC of the HE image histogram is higher than that of the unaltered image. The entire histogram decreases to an approximately uniform distribution due to HEDLO, such that the lower HC indicates the characteristic fingerprint of HEDLO. The rank of the low HC will be proven later. In addition, as with the histogram of any image, the distance between the centroid and the y axis is approximately the center of [0, 255]; hence the x coordinate of HC cannot be a unique fingerprint of the HEDLO. The y coordinate of HC are selected to be a distinct fingerprint of the HEDLO, which is defined as
where n is the maximum possible gray level with n=255, and
H
(
i
) is the
i
th bin of the normalized histogram. Because
we can rearrange the HC expression as follows:
Minimizing HC can reveal the relationship between each bin’s probability and the ideal uniform distribution. This optimization problem has the following structure:
Using Lagrange multipliers, this problem can be converted into an unconstrained optimization problem:
We compute the partial derivative of the unconstrained problem with respect to each variable and let them equal zero. The solutions of these equations are {
H
(0),
H
(1),⋯,
H
(
n
)} = 1 /(
n
+ 1). Thus, the centroid of the uniform distribution histogram should decrease to the lowest height from the x axis. Therefore, the lower HC indicates a histogram is closer to uniform distribution.
To distinguish between the histogram of the HE and HEDLO images, we preprocess the histogram to fill the gap bin with
μ
/ 256 and then remove bins that are far greater than their adjacent bins using a min filter with window size 1 ×
Φ
. Finally, the decision threshold can be selected according to the user’s expectation of the accepted false alarm rate. An image with an HC less than the decision threshold is to be judged to be an HEDLO image.
 4.3 LBP Feature
Using the MUH, HC or LR classifier, the HEDLO operation chain can be identified. However, the specific DLO that has been applied cannot be determined using only the histogram feature. In this subsection, a local spatial feature is considered to capture the unique traces corresponding to different DLOs. Because a DLO can dither the gray value to a minor degree, many image operations can leave footprints in bit planes as well as across bit planes
[22]
. We infer that the DLO might leave distinct traces mostly in the lesssignificant bit planes. Similar to spatial filtering, image manipulated by DLO can be modeled as a convolution with a filter kernel in the local region. We regard the footprint left by DLO as a local spatial pattern, which can be characterized using LBP
[23]
features. LBP features are histograms characterized by the occurrence of specific local patterns. Consider a local neighborhood involving the center pixel
g_{c}
and its corresponding
p
equally spaced pixels
g_{p}
(
p
= 0, ⋯
P
 1) on a circle of radius
R
. The local neighborhood centered at every pixel is transformed into a binary pattern using the threshold value of the gray value of the center pixel. Then, the LBP number of the centered pixel is computed by weighting
p
directional neighbors as
where
ℓ
(·) is the indicator function. For each image a 2
^{p}
bin histogram can be obtained based on these LBP numbers. When
P
= 8,
R
= 1, 256 possible binary patterns could be calculated, and a 256bin histogram is generated per image. For example,
Fig. 5
displays the 256bin LBP histograms of the original image and those of its manipulated versions. Notice that the LBP histograms of different HEDLO have significant discriminating characteristics. Considering that the application DLO is equivalent to a convolution operation, the majority of the transformation kernels have a circularly symmetric architecture. We select the rotationinvariant LBP features i.e.,
to capture the unique footprints left by different DLO operations.
LBP histogram for (a) Lena image and for its HEDLO manipulated version. (b) HE and average filtering with windowsize 3×3. (c) HE and resizing by 1.3. (d) HE and Laplacian filtering use alpha=0.4. (e) HE and JPEG with quality factor 30.
5. Experimental Results
This section provides experimental procedures to verify the effectiveness of our proposed features, i.e., the MUH, HC, and LR classifier. Then, the combination of MUH, HC and LBP is used to train a SVM classifier using the LIBSVM tools [24]. The Uncompressed Color Image Database (UCID) [21] is introduced for training and testing. Without loss of generality, the green color layer of each of these images was used to create the training set and testing set. To the best of our knowledge, previous works have not addressed the issues of detecting HEDLO. Therefore, we cannot compare the performance of our approaches with other methods.
We process 1338 uncompressed original images (ORI) in UCID with single operations or HEDLO operation chain, using different settings for the window size, variance, gamma value and alpha parameter, as reported in
Table 2
. We create sets of the images that had undergone single operations, i.e., HE, GC, RS, AVG, GN, LP_G, HP_L, JPEG, generated by manipulating images with HE, gamma correction, resizing, average filtering, Gaussian noise, lowpass Gaussian filtering, highpass Laplacian filtering and JPEG compression, respectively. Then, the combination of ORI and these eight image sets constitute the ONE database. Additionally, the CHAIN database is introduced, containing the six sets of images that have undergone HEDLO, i.e., HERS, HEAVG, HEGN, HELP_G, HP_L, HEJPEG, which are generated by applying resizing, average filtering, Gaussian noise, lowpass Gaussian filtering, highpass Laplacian filtering and JPEG compression to the HE images, respectively. Based on the capability to fill up the gap bin and remove the high bin, CHAIN is split into two parts, i.e., CHAIN_A and CHAIN_B, to alternatively evaluate the performance of MUH and HC. CHAIN_A consists of applying Gaussian noise δ=0.3, 0.6, 0.9 and Lowpass Gaussian filter [3×3], [5×5], δ=0.35, 0.4 to the HE images, and CHAIN_B is its complement. In other words, the histogram of an image that has undergone CHAIN_A may present a combshape.
Operations used to create image database
Operations used to create image database
 5.1 MUH and HC Detection Result
To evaluate the performance of MUH and HC for HEDLO detection, experiments were performed on CHAIN_A versus ONE and CHAIN_B versus ONE, respectively. Because the two features are scalar, the measures of MUH or HC extracted from each testing image were used to judge if an image was manipulated by HEDLO using the decision threshold. The area under the ROC curve (AUC) was employed to indicate the performance of our features and is summarized in
Table 3
and
Table 4
.
Table 3
indicates that MUH achieved its best performance using
τ
= 1,
φ
= 4. That is to say, MUH can be extracted regardless the continuous gap bins with the number above two and the overhigh bins with height above 4/256. Under these conditions, the AUC of CHAIN_B versus ONE reached 99%, but the AUC of CHAIN_A versus ONE was only 85%. Additionally, the detection performance improved as the value of
φ
increased and remained stable with different values of
τ
. This result indicates that excluding the overhigh bin is more important than excluding the continuous zero bins; the former is minimally transformed, and the later can be filled easily.
Table 4
indicates that HC achieved its best performance using
μ
= 1,
φ
= 3 , i.e., a fill value of 1/256 and a window size of 1×3. As expected, the AUC of CHAIN_A versus ONE reached 98.3%, so HC can complement MUH. The results indicate that filling the gap bin with too large a value or min filtering with too large a window size introduced a disturbance that decreased the detection accuracy. Specifically, filling the gap bin lifted the centroid of the histogram and the min filtering suppressed the centroid of the histogram, which led to false negatives and false positives, respectively. Thus, the AUC is always approximately 93% in the case of CHAIN_B versus ONE.
AUC for detection of CHAIN_A vs. ONE and CHAIN_B vs. ONE using MUH
AUC for detection of CHAIN_A vs. ONE and CHAIN_B vs. ONE using MUH
AUC for detection of CHAIN_A vs. ONE and CHAIN_B vs. ONE using HC
AUC for detection of CHAIN_A vs. ONE and CHAIN_B vs. ONE using HC
To assess the suitability of the combination of our two features, i.e., MUH and HC, a logistic regression (LR) classifier for detecting HEDLO was trained and implement experiments on CHAIN versus ONE. The hypothesis function used for LR is defined as
h_{θ}
(
θ, MUH, HC
) = 1 / (1 +
e^{z}
). We describe the fitting function
z
as
z
=
θ
_{0}
+
θ
_{1}
×
MUH
+
θ
_{2}
×
HC
+
θ
_{3}
×
MUH^{2}
+
θ
_{4}
×
HC^{2}
+
θ
_{5}
×
MUH
×
HC
. The parameter vector (
θ
_{0}
,
θ
_{1}
,
θ
_{2}
,
θ
_{3}
,
θ
_{4}
,
θ
_{5}
) can be estimated by minimize the cost function
J
(θ), i.e.,
Here the subscript
i
is the
i
th sample,
m
is the number of the sample,
y
∈ {0,1} is the classification label, and
x
is a 2dimensional feature vector consisting of MUH and HC. The CHAIN and ONE were separated into training and testing database, with sizes of 40% and 60% of the corresponding database size, respectively. The LR model was iterated using the measures of MUH and HC extracted from the images of the training database to find the best parameter vector. The parameter vector and the feature vector extracted from a testing image can be used to determine whether it has undergone HEDLO.
Fig. 6
indicates that the LR classifier presents perfect performance for HEDLO detection.
ROC curve for the classification of CHAIN versus ONE using LR classifier
 5.2 HEDLO classification
We demonstrate the performance of LBP in two scenarios. The first scenario is evaluated by applying different LBPs to differentiate six categories of CHAIN. We consider the various combinations of P, R and three LBP features, i.e., uniform LBP, rotationinvariant LBP and uniform rotationinvariant LBP, denoted as
The second scenario consists of the application of combined MUH, HC and LBP to differentiate seven classes corresponding to ONE and six categories of CHAIN. The ONE and CHAIN database are split into halves to generate the SVM training and testing sets. An SVM classifier with a Radial Basis Function (RBF) kernel was trained, and the best kernel parameters were found by performing a grid search using a fivefold crossvalidation.
The classification accuracy results of the first scenario are summarized in
Table 5
. LBP achieved its best performance using rotation invariance pattern features with
P
= 8,
R
= 1. Under this condition, the summary accuracy reached 86.2%. It can be observed that the “uniform” pattern could not characterize the structure of DLO completely; on the other hand, the “nonuniform” pattern possessed more information for DLO. The results confirm our conjecture that DLO is characterized by rotation invariance, as the process of DLO has a circularly symmetric architecture filter. An additional observation is that the classification accuracies cannot be improved by extending the range of LBP. Therefore, DLO may have left the majority of traces in the immediate adjacent neighborhoods of each pixel rather than the region away from the centered pixel. The results of the classification accuracies of the second scenario are displayed in
Fig. 7
. Although the number of ONE sets is far larger than the number of CHAIN sets, the classification accuracy of ONE achieves 95.4%, indicating that the feature combination of MUH, HC and LBP is able to differentiate the operation chain image from the original and single operation images very well. In addition, the combination of three features achieves excellent classification accuracy for further determining which image has undergone HEDLO.
Classification accuracies (%) for six HEDLO
Classification accuracies (%) for six HEDLO
Classification accuracies (%) for seven classes using the combination of MUH, HC and
 5.3 Application to Image Forgery Detection
When performing the forensic detection of HEDLO, our method can identify whether the fullframe image has gone through an operation chain. In real forensics scenarios, it is more important to authenticate an image if some operation chain has been locally applied, e.g., cutandpaste forgery forensics. Two classical scenarios can occur through different procedures for forgery image production. The first scenario is that a region from an HEDLO image is pasted onto a host image. An alternative scenario is that a region from an image processed by HE is pasted onto a host image; then, the composite image is manipulated by DLO. In these two scenarios, the tampered regions can go through HEDLO, and the complement regions can remain original or undergo HE. In this subsection, the LR classifier is used to determine and locate the tampered region of a forged image. Then, the applied HEDLO can be determined by the combination of MUH, HC and LBP.
To determine which block sizes are sufficient to implement the blockwise localization forgery region, we performed the following experiment. Blocks of size 300×300, 200×200 and 100×100 were cropped from the center of each of the images from the ONE and CHAIN databases described at the beginning of this section. Then, each block was classified as have been manipulated by HEDLO or not by the LR classifier using a variety of different thresholds. The ROC curves shown in
Fig. 8
indicate that the HEDLO can be relatively reliably detected using a block size of 300×300. As a result of indicating the global histogram character, the probability of detection decreases with a decreasing block size. When block size is equal to 100×100, the detection rate achieves a random guess.
ROC curve obtained using different testing block size for images manipulated by HEDLO
An example of a cutandpaste forgery image in which the pasted region has gone through HERS is displayed in
Fig. 9
, along with the localized tampered region results obtained from our proposed forensic technique.
Fig. 9
(a) displays an HE image from which an object (a boy on the right) was cut.
Fig. 9
(b) displays the unaltered image into which the cut object was pasted.
Fig. 9
(c) displays the composite image, into which the resized cut object had been pasted with a scaling factor 0.7. Adobe Photoshop was used to composite the forged image. The image was segmented into overlapping pixel blocks with a size of 300×300 and a 16pixel interval, each of which was tested for evidence of locally applied HEDLO.
Fig. 9
(d) displays the results obtained by performing the classification using the LR classifier. The blocks corresponding to the HEDLO are boxed and outlined in red. In this example, some parts of the tower present false alarm because these boxes contains varying gray levels, e.g., the black shadow of the tower, the tree beside the tower and the white tower body, which make the histogram similar to a uniform distribution. Then, the feature set of the MUH, HC and
extracted from the blocks highlighted by the result of the LR classifier is fed into the SVM classifier, which is trained in the second scenario of Section 5.3. The blocks that are classified into HERS and HEGB are displayed in
Fig. 9
(e) and (f), respectively. It can be observed that the main part of the tampered region has been located. In
Fig. 9
(f), the classification result is disturbed by the texture of the boy’s sweater, which presents similar blur artifacts, and the possible operation chains have been reduced to two types of HEDLO. Although our scheme does not resolve the problem of this example, the results we have are desirable.
Image forgery detection example displaying an object cut from (a) and the histogram equalized image pasted into (b) another unaltered image, (c) the composite image then manipulated by AVG. Blocks detected as having been manipulated by HEDLO are outlined in red boxes using pixel blocks 300×300 (d) using LR classifier, and then the blocks are further classified as manipulated by (e) HERS and (f) HELP_G using the combination feature
6. Conclusion
In this paper, a new forensic algorithm to detect HEDLO operation chain applied to a digital image has been proposed. The method is based on distinct histogram distribution and local spatial pattern. Two histogram features were introduced and used to propose a scheme for identifying whether an image has undergone an operation chain. A logistic regression algorithm is trained with these two features to detect HEDLO. The combination of two histogram features and LBP further distinguishes the HEDLO image from the original images and their single operation versions. Experiments indicated that MUH and HC provide excellent performance when histograms present uniform distribution and transitional shape, respectively. The LR classifier incorporating the advantages of MUH and HC exhibited perfect detection performance in two scenarios. The results indicate that the combination of MUH, HC and LBP is able to correctly identify traces of different HEDLO. We extended this technique to a method for detecting locally applied HEDLO and demonstrated its usefulness for detecting cutandpaste forgeries.
BIO
Zhipeng Chen received the B.S. and M.E. degree in computer science and technology from Hebei University of Economics and Business, China, in 2003 and North China University of Technology, China, in 2008, respectively. He is currently a PhD candidate of the Institute of Information Science, Beijing Jiaotong University, Beijing, China. He is working in Tangshan Normal University, China. His research interests include multimedia signal processing, digital forensics and data hiding.
Yao Zhao received the B.S. degree in radio engineering from Fuzhou University, Fuzhou, China, in 1989, the M.E. degree in radio engineering from Southeast University, Nanjing, China, in 1992, and the Ph.D. degree from the Institute of Information Science, Beijing Jiaotong University (BJTU), Beijing, China, in 1996. He was an Associate Professor at BJTU in 1998, where he became a Professor in 2001. He was a Senior Research Fellow with the Information and Communication Theory Group, Faculty of Information Technology and Systems, Delft University of Technology, Delft, The Netherlands, from 2001 to 2002. He is currently the Director with the Institute of Information Science, BJTU. His current research interests include image/video coding, digital watermarking and forensics, and video analysis and understanding. He is currently leading several national research projects from the 973 Program, 863 Program, and the National Science Foundation of China. He serves on the editorial boards of several international journals, including as an Associate Editor of the IEEE TRANSACTIONS ON CYBERNETICS, Associate Editor of the IEEE SIGNAL PROCESSING LETTERS, and Area Editor of Signal Processing: Image Communication. Dr. Zhao was a recipient of the Distinguished Young Scholar by the National Science Foundation of China in 2010 and Chang Jiang Scholar of the Ministry of Education of China in 2013.
Rongrong Ni received the B.S. degree and the Ph.D. degree from Beijing Jiaotong University (BJTU), Beijing, China, in 1998, and 2005, respectively. Since 2005, she has been the faculty of the School of Computer and Information Technology and the Institute of Information Science, BJTU, where she is a Professor since 2013. Her current research interests include image processing, data hiding and digital forensics, pattern recognition, and computer vision. She was selected in the Beijing Science and Technology Stars Projects in 2008, and was awarded the Jeme Tien Yow Special Prize in Science and Technology in 2009. She is the Principal Investigator of three projects funded by the Natural Science Foundation of China. She has participated in the 973 Program, the 863 Program, and international projects. She has published more than 80 papers in academic journals and conferences, and holds six national patents.
Liu Q.
,
Cao X.
,
Deng C.
,
Guo X.
2011
“Identifying image composites through shadow matte consistency,”
Information Forensics and Security, IEEE Transactions on
6
(3)
1111 
1122
DOI : 10.1109/TIFS.2011.2144584
Li J.
,
Li X.
,
Yang B.
,
Sun X.
(2015)
“Segmentationbased Image Copymove Forgery Detection Scheme,”
Information Forensics and Security, IEEE Transactions on
10
(3)
507 
518
DOI : 10.1109/TIFS.2014.2381872
Yang J.
,
Zhu G.
,
Huang J.
,
Zhao X.
2015
“Estimating JPEG compression history of bitmaps based on factor histogram,”
Digital Signal Processing
Li H.
,
Luo W.
,
Qiu X.
,
Huang J.
2015
“Identification of Image Operations Based on Steganalytic Features,” arXiv preprint arXiv:1503.04718 [cs.MM]
Popescu A. C.
,
Farid H.
2005
“Exposing digital forgeries by detecting traces of resampling,”
IEEE Transactions on Signal Processing
53
(2)
758 
767
DOI : 10.1109/TSP.2004.839932
Gallagher A. C.
“Detection of linear and cubic interpolation in JPEG compressed images,”
in Proc. of the Canadian Conference on Computer and Robot Vision
May 2005
65 
72
Feng X.
,
Cox Ingemar J.
,
Doërr Gwenaël
2012
“Normalized Energy DensityBased Forensic Detection of Resampled Images,”
IEEE Transactions on Multimedia
14
(3)
536 
545
DOI : 10.1109/TMM.2012.2191946
Kirchner Matthias
,
Fridrich Jessica
“On Detection of Median Filtering in Digital Images,”
in: SPIE Conference on Media Forensics and Security
San Jose, CA
2010
2010,754110
Cao G.
,
Zhao Y.
,
Ni R.
,
Yu L.
,
Tian H.
“Forensic Detection of Median Filtering in Digital Images,”
IEEE International Conference on Multimedia and Expo
Singapore
2010
Stamm M. C.
,
Liu K. J. R.
2010
“Forensic detection of image manipulation using statistical intrinsic fingerprints,”
IEEE Trans. Inf. Forensics Security
5
(3)
492 
506
DOI : 10.1109/TIFS.2010.2053202
Yuan HD
"Identification of global histogram equalization by modeling graylevel cumulative distribution,"
IEEE China Summit & International Conference on Signal and Information Processing (ChinaSIP)
2013
645 
649
Stamm M. C.
,
Chu X
,
Liu K. J.
“Forensically determining the order of signal processing operations,”
in Proc. Of 2013 IEEE International Workshop on Information Forensics and Security (WIFS)
2013
Hou X.
,
Zhang T.
,
Xiong G.
2012
“Forensics Aided Steganalysis of Heterogeneous Bitmap Images with Different Compression History,”
KSII Transactions on Internet and Information Systems (TIIS)
6
(8)
1926 
1945
Conotter V.
,
Comesaña P.
,
PérezGonzález F.
“Forensic analysis of fullframe linearly filtered JPEG images,”
International Conference on Image Processing (ICIP)
2013
Conotter V.
,
Comesaña P.
,
PérezGonzález F.
"Joint detection of fullframe linear filtering and JPEG compression in digital images,"
in Proc. of Information Forensics and Security (WIFS), 2013 IEEE International Workshop on. IEEE
2013
Bianchi T.
,
Piva A.
“Reverse engineering of double JPEG compression in the presence of image resizing,”
in Proc. of 2012 IEEE International Workshop on Information Forensics and Security (WIFS)
2012
127 
132
Ferrara P.
,
Bianchiy T.
,
De Rosaz A.
,
Piva A.
“Reverse engineering of double compressed images in the presence of contrast enhancement,”
in Proc. of IEEE Workshop Multimedia Signal Process
Pula, Croatia
Sep./Oct. 2013
141 
146
Cao G.
,
Zhao Y.
,
Ni R.
,
Tian H.
"Antiforensics of contrast enhancement in digital images,"
ACM
in Proc. of the 12th ACM workshop on Multimedia and security
2010
25 
34
Gonzalez R. C.
,
Woods R. E.
2001
Digital Image Processing.
AddisonWesley
Boston, MA
Xia Z.
,
Wang X.
,
Sun X.
,
Liu Q.
,
Xiong N.
(2014)
"Steganalysis of LSB matching using differences between nonadjacent pixels,"
Multimedia Tools and Applications
1 
16
Schaefer G.
,
Stich M.
“UCID  An uncompressed color image database,”
in Proc. of SPIE, Storage and Retrieval Methods and Applications for Multimedia
San Jose
2004
Bayram S.
,
Avcibas I.
,
Sankur B.
,
Memon N.
2006
“Image manipulation detection,”
J. Electron. Imaging
15
(4)
DOI : 10.1117/1.2401138
Ojala T.
,
Pietikainen M.
,
Maenpaa T.
2002
"Multiresolution GrayScale and Rotation Invariant Texture classification with Local Binary Patterns,"
IEEE Trans. on Pattern Analysis and Machine Intelligence
24
971 
987
DOI : 10.1109/TPAMI.2002.1017623
Chang C.
,
Lin C.
“LIBSVM : a library for support vector machines,”ACM Transactions on Intelligent Systems and Technology, 2:27:1 27:27
Software available at