cs0501014/PVEA.tex
1: \documentclass[final]{IEEEtran}
2: 
3: \usepackage{amsmath,amssymb,bm}
4: \usepackage{graphicx}
5: \usepackage{cite,url}
6: 
7: \newlength\figwidth
8: \setlength\figwidth{0.32\columnwidth}
9: 
10: \begin{document}
11: 
12: \title{On the Design of Perceptual MPEG-Video Encryption Algorithms%
13: \thanks{Copyright (c) 2006 IEEE. Personal use of this material is
14: permitted. However, permission to use this material for any other
15: purposes must be obtained from the IEEE by sending an email to
16: \texttt{pubs-permissions@ieee.org}.}
17: \thanks{This research was partially supported by the City University
18: of Hong Kong SRG grant 7001702, by The Hong Kong Polytechnic
19: University's Postdoctoral Fellowships Scheme under grant no. G-YX63,
20: , by the Research Grant Council of Hong Kong under grant no. PolyU
21: 5232/06E, and by the US NSF grants ANI-0219110 and RIS-0292890.}}
22: \author{Shujun Li\thanks{Shujun Li and Kwok-Tung Lo are with the
23: Department of Electronic and Information Engineering, The Hong Kong
24: Polytechnic University, Hung Hom, Kowloon, Hong Kong SAR, China.},
25: Guanrong Chen,~\IEEEmembership{Fellow, IEEE}\thanks{Guanrong Chen is
26: with the Department of Electronic Engineering, City University of
27: Hong Kong, 83 Tat Chee Avenue, Kowloon Tong, Hong Kong SAR, China.},
28: Albert Cheung,~\IEEEmembership{Member, IEEE}\thanks{Albert Cheung is
29: with the Department of Building and Construction and Shenzhen
30: Applied R\&D Centres, City University of Hong Kong, Kowloon Tong,
31: Hong Kong SAR, China.}, Bharat Bhargava,~\IEEEmembership{Fellow,
32: IEEE}\thanks{Bharat Bhargava is with the Department of Computer
33: Sciences, Purdue University, 250 N. University Street, West
34: Lafayette, IN 47907-2066, USA.} and Kwok-Tung
35: Lo,~\IEEEmembership{Member, IEEE}}
36: 
37: \markboth{IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO
38: TECHNOLOGY, VOL. 17, NO. 2, PAGES 214-223, FEBRUARY 2007}{Shujun Li
39: \MakeLowercase{\textit{et al.}}: Perceptual MPEG Encryption}
40: 
41: \maketitle
42: 
43: \begin{abstract}
44: In this paper, some existing perceptual encryption algorithms of
45: MPEG videos are reviewed and some problems, especially security
46: defects of two recently proposed MPEG-video perceptual encryption
47: schemes, are pointed out. Then, a simpler and more effective design
48: is suggested, which selectively encrypts fixed-length codewords
49: (FLC) in MPEG-video bitstreams under the control of three
50: perceptibility factors. The proposed design is actually an
51: encryption configuration that can work with any stream cipher or
52: block cipher. Compared with the previously-proposed schemes, the new
53: design provides more useful features, such as strict
54: size-preservation, on-the-fly encryption and multiple
55: perceptibility, which make it possible to support more applications
56: with different requirements. In addition, four different measures
57: are suggested to provide better security against
58: known/chosen-plaintext attacks.
59: \end{abstract}
60: 
61: \begin{keywords}
62: perceptual encryption, MPEG, fixed-length codeword (FLC),
63: cryptanalysis, known/chosen-plaintext attack
64: \end{keywords}
65: 
66: \section{Introduction}
67: 
68: The wide use of digital images and videos in various applications
69: brings serious attention to the security and privacy issues today.
70: Many different encryption algorithms have been proposed in recent
71: years as possible solutions to the protection of digital images and
72: videos, among which MPEG videos attract most attention due to its
73: prominent prevalence in consumer electronic markets
74: \cite{Zeng:MultimediaSecurity:Book2006,
75: Ahl:ImageVideoEncryption:Book2005,
76: Furht:MultimediaSecurity:Book2005,
77: Furht:ImageVideoEncryption:Handbook2004,
78: Li:ChaosImageVideoEncryption:Handbook2004}.
79: 
80: In many applications, such as pay-per-view videos, pay-TV and video
81: on demand (VoD), the following feature called ``perceptual
82: encryption" is useful. This feature requires that the quality of
83: aural/visual data is only \textit{partially} degraded by encryption,
84: i.e., the encrypted multimedia data are still partially perceptible
85: after encryption. Such perceptibility makes it possible for
86: potential users to listen/view low-quality versions of the
87: multimedia products before buying them. It is desirable that the
88: aural/visual quality degradation can be continuously controlled by a
89: factor $p$, which generally denotes a percentage corresponding to
90: the encryption strength. Figure~\ref{figure:PE} shows a diagrammatic
91: view of perceptual encryption. The encryption key is kept secret
92: (not needed when public-key ciphers are used) but the control factor
93: $p$ can be published.
94: 
95: \begin{figure}
96: \centering
97: \includegraphics[width=\columnwidth]{Fig1}
98: \caption{A diagrammatic view of the perceptual
99: encryption.}\label{figure:PE}
100: \end{figure}
101: 
102: Regarding the visual quality degradation of the encrypted videos,
103: the following points should be remarked: 1) since there does not
104: exist a well-accepted objective measure of visual quality of digital
105: images and videos, the control factor is generally chosen to
106: represent a rough measure of the degradation; 2) the visual quality
107: degradations of different frames may be different, so the control
108: factor works only in an average sense for all videos; 3) the control
109: factor is generally selected to facilitate the implementation of the
110: encryption scheme, which may not have a linear relationship with the
111: visual quality degradation (but a larger value always means a
112: stronger degradation); 4) when the control factor $p=1$, the
113: strongest visual quality degradation of the specific algorithm
114: (i.e., of the target application) is reached, but it may not be the
115: strongest degradation that all algorithms can produce (i.e., all
116: visual information of the video is completely concealed).
117: 
118: In recent years, some perceptual encryption schemes have been
119: proposed for G.729 speech
120: \cite{Servetti:PerceptualSpeech:ICASSP2002,
121: Servetti:PerceptualSpeech:IEEETSAP2002}, MP3 music
122: \cite{Torrubia:PerceptualMP3:IEEETCE2002}, JPEG images
123: \cite{Belgian:SelectiveImageEncryption:ACIVS2002,
124: Torrubia:PerceptualJPEG:ICCE2003}, wavelet-compressed (such as
125: JPEG2000) images and videos
126: \cite{Lian:PerceptualCryptography:ICME2004,
127: Lian:PerceptualCryptography:CIT2004,
128: Lian:PerceptualCryptography:ISIMP04} and MPEG videos
129: \cite{Dittmann:EnablingMPEG:LNCS97, Yann:WaterScrambling:ACMMM2002,
130: Turk:MPEG2Scrambling:IEEETCE2002,
131: Chinese:MPEG2Scrambling:IEEETCE2003}, respectively. The selective
132: encryption algorithms proposed in
133: \cite{Pommer&Uhl:WPSelectiveEncryption:SPS2002,
134: Pommer&Uhl:SelectiveWaveletEncryption:MS2003,
135: Pommer:SelectiveWaveletEncryption:Thesis2003} can be considered as
136: special cases of the perceptual encryption for images compressed
137: with wavelet packet decomposition. In some research papers, a
138: different term, ``transparent encryption", is used instead of
139: ``perceptual encryption" \cite{Turk:MPEG2Scrambling:IEEETCE2002,
140: Chinese:MPEG2Scrambling:IEEETCE2003}, emphasizing the fact that the
141: encrypted multimedia data are \textit{transparent} to all
142: standard-compliant decoders. However, transparency is actually an
143: equivalent of another feature called ``format-compliance" (or
144: ``syntax-awareness") \cite{Zeng:VideoScrambling:MMSP2001,
145: Zeng:VideoScrambling:IEEETCASVT2002}, which does not mean that some
146: partial perceptible information in plaintexts still remains in
147: ciphertexts. In other words, a perceptual cipher must be a
148: transparent cipher, but a transparent cipher may not be a perceptual
149: cipher \cite{Li:ChaosImageVideoEncryption:Handbook2004}. Generally,
150: perceptual encryption is realized by selective encryption algorithms
151: with the format-compliant feature. This paper chooses to use the
152: name of ``perceptual encryption" for such a useful feature of
153: multimedia encryption algorithms. More precisely, this paper focuses
154: on the perceptual encryption of MPEG videos. After identifying some
155: problems of the existing perceptual encryption schemes, a more
156: effective design of perceptual MPEG-video encryption will be
157: proposed.
158: 
159: The rest of this paper is organized as follows. The next section
160: will provide a brief survey of related work and point out some
161: problems, especially problems existing in two recently-proposed
162: perceptual encryption algorithms
163: \cite{Turk:MPEG2Scrambling:IEEETCE2002,
164: Chinese:MPEG2Scrambling:IEEETCE2003}. In Section
165: \ref{section:NewDesign}, the video encryption algorithm (VEA)
166: proposed in \cite{Shi:MPEGEncryption:MMTA2004} is generalized to
167: realize a new perceptual encryption design for MPEG videos, called
168: the perceptual VEA (PVEA). Experimental study is presented in Sec.
169: \ref{section:Experiments}, to show the encryption performance of
170: PVEA. The last section presents the conclusion.
171: 
172: \section{Related Work and Existing Problems}
173: 
174: \subsection{Scalability-based perceptual encryption}
175: 
176: Owing to the scalability provided in MPEG-2/4 standards
177: \cite{MPEG2-ISOStandard, MPEG4-ISOStandard}, it is natural to
178: realize perceptual encryption by encrypting the enhancement
179: layer(s) of an MPEG video (but leaving the base layer unencrypted)
180: \cite{Dittmann:EnablingMPEG:LNCS97}. However, since not all MPEG
181: videos are encoded with multiple layers, this scheme is quite
182: limited in practice. More general designs should be developed to
183: support videos that are compliant to the MPEG standards.
184: 
185: \subsection{Perceptual encryption for JPEG images}
186: 
187: Due to the similarity between the encoding of JPEG images
188: \cite{JPEG-ISOStandard} and the frame-encoding of MPEG videos
189: \cite{MPEG1-ISOStandard, MPEG2-ISOStandard, MPEG4-ISOStandard},
190: the ideas of perceptual encryption for JPEG images can be easily
191: extended to MPEG videos.
192: 
193: In \cite{Belgian:SelectiveImageEncryption:ACIVS2002}, two
194: techniques of perceptual encryption were studied: encrypting
195: selective bit-planes of uncompressed gray-scale images, and
196: encrypting selective high-frequency AC coefficients of JPEG
197: images, with a block cipher such as DES, triple-DES or IDEA
198: \cite{Schneier:AppliedCryptography96}. The continuous control of
199: the visual quality degradation was not discussed, however.
200: 
201: In \cite{Torrubia:PerceptualJPEG:ICCE2003}, the perceptual
202: encryption of JPEG images is realized by encrypting VLCs
203: (variable-length codewords) of partial AC coefficients in a ZoE
204: (zone of encryption) to be other VLCs in the Huffman table. The
205: visual quality degradation is controlled via an encryption
206: probability, $p/100\in[0,1]$, where $p\in\{0,\cdots,100\}$. This
207: encryption idea is similar to the video encryption algorithm
208: proposed in \cite{Zeng:VideoScrambling:IEEETCASVT2002}. The main
209: problem with encrypting VLCs is that the size of the encrypted
210: image/video will be increased since the Huffman entropy
211: compression is actually discarded in this algorithm.
212: 
213: \subsection{Perceptual encryption for wavelet-compressed images and
214: videos}
215: 
216: In \cite{Lian:PerceptualCryptography:ICME2004,
217: Lian:PerceptualCryptography:ISIMP04,
218: Lian:PerceptualCryptography:CIT2004}, several perceptual
219: encryption schemes for wavelet-compressed images and videos were
220: proposed. Under the control of a percentage ratio $q$, sign bit
221: scrambling and secret permutations of wavelet
222: coefficients/blocks/bit-planes are combined to realize perceptual
223: encryption. The problem with these perceptual encryption schemes
224: is that the secret permutations are not sufficiently secure
225: against known/chosen-plaintext attacks
226: \cite{Jan-Tseng:SCAN:IPL1996, Yu-Chang:SCAN:PRL2002,
227: Zhao:PositionPermute:ZJUS2004, Li:AttackingPOMC2004}: by comparing
228: the absolute values of a number of plaintexts and ciphertexts, one
229: can reconstruct the secret permutations. Once the secret
230: permutations are removed, the encryption performance will be
231: significantly compromised.
232: 
233: \subsection{Perceptual encryption of motion vectors in MPEG-videos}
234: 
235: In \cite{Yann:WaterScrambling:ACMMM2002}, motion vectors are
236: scrambled to realize perceptual encryption of MPEG-2 videos. Since
237: I-frames do not depend on motion vectors, such a perceptual
238: encryption algorithm can only blur the motions of MPEG videos. It
239: cannot provide enough degradation of the visual quality of the MPEG
240: videos for encryption (see Fig.~\ref{figure:Carphone2} of this
241: paper). Generally speaking, this algorithm can be used as an option
242: for further enhancing the performance of a perceptual encryption
243: scheme based on other techniques.
244: 
245: \subsection{Pazarci-Dip\c{c}in scheme}
246: 
247: In \cite{Turk:MPEG2Scrambling:IEEETCE2002}, Pazarci and Dip\c{c}in
248: proposed an MPEG-2 perceptual encryption scheme, which encrypts
249: the video in the RGB color space via four secret linear transforms
250: before the video is compressed by the MPEG-2 encoder. To encrypt
251: the RGB-format uncompressed video, each frame is divided into
252: $M\times M$ scrambling blocks (SB), which is composed of multiple
253: macroblocks of size $16\times 16$. Assuming the input and the
254: output pixel values are $x_i$ and $x_o$, respectively, the four
255: linear transforms are described as follows:
256: \begin{equation}
257: x_o=\begin{cases}
258: \alpha x_i, & (D,N)=(0,0),\\
259: FS-\alpha x_i, & (D,N)=(0,1),\\
260: FS(1-\alpha)+\alpha x_i, & (D,N)=(1,0),\\
261: FS-[FS(1-\alpha)+\alpha x_i], & (D,N)=(1,1),
262: \end{cases}\label{equation:TurkeyCipher}
263: \end{equation}
264: where $\alpha=\alpha^*/100\;(\alpha^*\in\{50,\cdots,90\})$ is a
265: factor controlling the visual quality degradation, $D,N$ are two
266: binary parameters that determine an affine transform for encryption,
267: and $FS$ means the maximal pixel value (for example, $FS=255$ for
268: 8-bit RGB-videos). The value of $\alpha$ in each SB is calculated
269: from the preceding I-frame, with a function called $\alpha$-rule
270: (see Sec.~2.2 of \cite{Turk:MPEG2Scrambling:IEEETCE2002} for more
271: details). The $\alpha$-rule and its parameters are designated to be
272: the secret key of this scheme.
273: 
274: The main merit of the Pazarci-Dip\c{c}in scheme is that the
275: encryption/decryption and the MPEG encoding/decoding processes are
276: separated, which means that the encryption part can simply be added
277: to an MPEG system without any modification. However, the following
278: defects make this scheme problematic in real applications.
279: 
280: \begin{enumerate}
281: \item Unrecoverable quality loss caused by the encryption always
282: exists, unless $\alpha=1$ (which corresponds to no encryption). Even
283: authorized users who know the secret key cannot recover the video
284: with the original quality. Although it is claimed in
285: \cite{Turk:MPEG2Scrambling:IEEETCE2002} that human eyes are not
286: sensitive to such a quality loss if $\alpha$ is set above 0.5, it
287: may still be undesirable for high-quality video services, such as
288: DVD and HDTV. In addition, limiting the value of $\alpha$ lowers the
289: security and flexibility of the encryption scheme.
290: 
291: \item The compression ratio may be significantly influenced by
292: encryption if there are fast motions in the plain videos. This is
293: because the motion compensation algorithm may fail to work for
294: encrypted videos. The main reason is that the corresponding SBs
295: may be encrypted with different parameters. To reduce this kind of
296: influence, the encryption parameters of all SBs have to be
297: sufficiently close to each other. This, however, compromises the
298: encryption performance and the security.
299: 
300: \item The scheme is not suitable for encrypting MPEG-compressed
301: videos. In many applications, such as VoD services, the
302: plain-videos have already been compressed in MPEG format and
303: stored in digital storage media (DSM). In this case, the
304: Pazarci-Dip\c{c}in scheme becomes too expensive and slow, since
305: the videos have to be first decoded, then encrypted, and finally
306: encoded again. Note that the re-encoding may reduce the video
307: quality, since the encoder is generally different from the
308: original one that produced the videos in the factory. Apparently,
309: this defect is a natural side effect of the merit of the
310: Pazarci-Dip\c{c}in scheme.
311: 
312: \item The scheme is not secure enough against brute-force attacks.
313: For a given color component $C$ of any $2\times 2$ SB structure, one
314: can exhaustively guess the $\alpha$-values of the four SBs to
315: recover the $2\times 2$ SB structure, by minimizing the block
316: artifacts occurring between adjacent SBs. For each color component
317: of a SB, the value of $\alpha^*=100\alpha\in\{50,\cdots,90\}$,
318: $D\in\{0,1\}$ and $N$ is determined by $D$, so one can calculate
319: that the searching complexity is only $(41\times 2)^4\approx
320: 2^{25.4}$, which is sufficiently small for PCs\footnote{Even when
321: $\alpha^*\in\{0,\cdots,100\}$, the searching complexity is only
322: $(101\times 2)^4\approx 2^{30.6}$, which is still practically
323: small.}. Once the value of $\alpha$ of an SB is obtained, one can
324: further break the secret key of the corresponding $\alpha$-rule. For
325: the exemplified $\alpha$-rule given in Eq. (3) of
326: \cite{Turk:MPEG2Scrambling:IEEETCE2002}, the secret key consists of
327: the addresses of two selected subblocks (of size $P\times P$) in a
328: $2\times 2$ SB structure, and a binary shift value
329: $\Delta\in\{5,50\}$. Because $\Delta$ can be uniquely determined
330: from $D$, one only needs to search other part of the key, which
331: corresponds to a complexity of
332: $\left(3\times\left(2M/P\right)^2\right)^2$. When $P=\frac{M}{2}$,
333: the complexity is $48^2=2304\approx 2^{11.2}$, and when when
334: $P=\frac{M}{4}$ it is $192^2=36864\approx 2^{15.2}$. Apparently, the
335: key space is not sufficiently large to resist brute-force attacks,
336: either. In addition, since the values of quality factors and the
337: secret parameters corresponding to the three color components can be
338: separately guessed, the whole attack complexity is only three times
339: of the above values, which is still too small from a cryptographical
340: point of view \cite{Schneier:AppliedCryptography96}. Although using
341: multiple secret keys for different SBs can increase the attack
342: complexity exponentially, the key size will be too long and the
343: key-management will become more complicated. Here, note that the
344: $\alpha$-rule itself should not be considered as part of the key,
345: following the well-known Kerckhoffs' principle in modern
346: cryptography \cite{Schneier:AppliedCryptography96}.
347: 
348: \item The scheme is not sufficiently sensitive to the mismatch of
349: the secret key, since the encryption transforms and the
350: $\alpha$-rule given in \cite{Turk:MPEG2Scrambling:IEEETCE2002} are
351: both linear functions. This means that the security against
352: brute-force attacks will be further compromised, as an approximate
353: value of $\alpha$ may be enough to recover most visual information
354: in the plain-video.
355: 
356: \item The scheme is not secure enough against
357: known/chosen-plaintext attacks. This is because the value of
358: $\alpha$ can be derived approximately from the linear relation
359: between the plain pixel-values and the cipher pixel-values in the
360: same SB. Similarly, the value $N$ can be derived from the sign of
361: the slope of the linear map between $x_i$ and $x_o$, and the value
362: of $D$ can be derived from the value range of the map.
363: Furthermore, assuming that there are $k$ secret parameters in the
364: $\alpha$-rule, if more than $k$ different values of $\alpha$ are
365: determined as above, it is possible to uniquely solve the
366: approximate values of the $k$ secret parameters. To resist
367: known/chosen-plaintext attacks, the secret key has to be changed
368: more frequently than that suggested in
369: \cite{Turk:MPEG2Scrambling:IEEETCE2002} (one key per program),
370: which will increase the computational burden of the servers
371: (especially the key-management system).
372: \end{enumerate}
373: 
374: \subsection{Wang-Yu-Zheng scheme}
375: 
376: A different scheme working in the DCT domain (between DCT transform
377: and Huffman entropy coding) was proposed by Wang, Yu and Zheng in
378: \cite{Chinese:MPEG2Scrambling:IEEETCE2003}, which can be used as an
379: alternative solution to overcome the first two shortcomings of the
380: Pazarci-Dip\c{c}in scheme. By dividing all 64 DCT coefficients of
381: each $8\times 8$ block into 16 sub-bands following the distance
382: between each DCT coefficient and the DC coefficient, this new scheme
383: encrypts the $j$-th AC coefficient in the $i$-th sub-band as
384: follows:
385: \begin{equation}
386: b_{ij}'=\begin{cases}
387: b_{ij}-\lfloor\beta a_i\rfloor, & b_{ij}\geq 0,\\
388: b_{ij}+\lfloor\beta a_i\rfloor, & b_{ij}<0,
389: \end{cases}
390: \end{equation}
391: where $b_{ij}$ and $b_{ij}'$ denotes the plain pixel-value and the
392: cipher pixel-value, respectively, $\beta\in[0,1]$ is the control
393: factor, $a_i$ is the rounding average value of all AC coefficients
394: in the $i$-th sub-band, and $\lfloor\cdot\rfloor$ means the rounding
395: function towards zero. The DC coefficients are encrypted in a
396: different way, as $b_0'=b_0\pm\lfloor C\beta a_0\rfloor$, where
397: $a_0=b_0$ and $C\in[0,1]$ is the second control factor\footnote{Note
398: that the rounding function is missed in Eqs. (3) and (4) of
399: \cite{Chinese:MPEG2Scrambling:IEEETCE2003}. In addition, Eq. (4) of
400: \cite{Chinese:MPEG2Scrambling:IEEETCE2003} should read
401: $b_0'=b_0\pm\lfloor C\beta a_0\rfloor$, not $b_0'=a_0\pm C\beta$.}.
402: The value of $a_i$ can also be calculated in a more complicated way
403: to enhance the encryption performance, following Eqs. (5) and (6) in
404: \cite{Chinese:MPEG2Scrambling:IEEETCE2003}, where three new
405: parameters, $k_1,k_2,k_3$ are introduced to determine the values of
406: $a_i$ for the three color components, Y, Cr and Cb. The 16 average
407: values, $a_0\sim a_{15}$, the two control factors, $\beta$ and $C$,
408: and the three extra parameters (if used), $k_1,k_2,k_3$, altogether
409: serve as the secret scrambling parameters (i.e., the secret key) of
410: each SB. Three different ways are suggested for the transmission of
411: the secret parameters: a) encrypting them and transmitting them in
412: the payload of TS (transport stream); b) embedding them in the
413: high-frequency DCT coefficients; c) calculating them from the
414: previous I-frame in a way similar to the $\alpha$-rule in
415: \cite{Turk:MPEG2Scrambling:IEEETCE2002}.
416: 
417: In fact, the Wang-Yu-Zheng scheme is just an enhanced version of the
418: Pazarci-Dip\c{c}in scheme, without amending all shortcomings of the
419: latter scheme. Precisely, the following problems still remain.
420: \begin{enumerate}
421: \item Though the reduction of the compression ratio about motion
422: compensations is avoided, the encryption will change the natural
423: distribution of the DCT coefficients and thus reduce the
424: compression efficiency of the Huffman entropy encoder. For
425: example, when each sub-band has only one non-zero coefficient, it
426: is possible that all 64 coefficients become non-zero after the
427: encryption. This significantly increases the video size. In
428: addition, if the secret parameters are embedded into the
429: high-frequency DCT coefficients for transmission, the compression
430: performance will be further compromised.
431: 
432: \item The scheme is still not sufficiently sensitive to the
433: mismatch of the secret parameters, since the encryption function and
434: the calculation function of $a_i$ are kept linear. It is still not
435: sufficiently secure against brute-force attacks to the secret
436: parameters, because of the limited values of
437: $a_i,\beta,C,k_1,k_2,k_3$. Furthermore, due to the non-uniform
438: distribution of the DCT coefficients in each sub-band, an attacker
439: needs not to randomly search all possible values of $a_i$.
440: 
441: \item This scheme is still insecure against known/chosen-plaintext
442: attacks if the third way is used for calculating the secret
443: parameters. In this case, $a_i$ of each SB can be easily calculated
444: from the previous I-frame of the plain-video. Additionally, since
445: the value of $\lfloor\beta a_i\rfloor$ can be obtained from
446: $b_{ij}-b_{ij}'$, the secret parameter $\beta$ can be derived
447: approximately. In a similar way, the secret parameter $C$ can also
448: be derived approximately. If $a_i$ is calculated with $k_1,k_2,k_3$,
449: the values of $\beta,k_1,k_2,k_3$ can be solved approximately with a
450: number of known/chosen AC coefficients in four or more different
451: sub-bands, so that $C$ can be further derived from one known/chosen
452: DC coefficient.
453: 
454: \item The method of transmitting the secret parameters in the
455: payload of the transport stream cannot be used under the following
456: conditions: a) the key-management system is not available; b) the
457: video is not transmitted with the TS format. A typical example is
458: the perceptual encryption of MPEG-video files in personal
459: computers.
460: 
461: \iffalse \item When the scheme is used to encrypt MPEG-compressed
462: videos, the video size will be changed and bit-rate control are
463: needed, which causes two problems: a) the computation load of the
464: encryption is still relatively large; b) one has to simultaneously
465: store the original video and the encrypted video if the former is
466: still useful in future, so some disk space for storage is
467: wasted.\fi
468: \end{enumerate}
469: 
470: \section{More Efficient Design of Perceptual MPEG-Video Encryption Schemes}
471: \label{section:NewDesign}
472: 
473: Based on the analysis given above, we propose a simpler design of
474: perceptual encryption for MPEG videos, and attempt to overcome the
475: problems in existing schemes. The following useful features are
476: supported in our new design.
477: \begin{itemize}
478: \item \textit{Format-compliance}: the encrypted video can still be
479: decoded by any standard-compliant MPEG decoder. This is a basic
480: feature of all perceptual encryption schemes.
481: 
482: \item \textit{Lossless visual quality}: the encrypted video has
483: the same visual quality as the original one, i.e., the original
484: full-quality video can be exactly recovered when the secret key is
485: presented correctly.
486: 
487: \item \textit{Strict size-preservation}: the size of each data
488: element in the lowest syntax level, such as VLCs, FLCs and
489: continuous stuffing bits, remains unchanged after encryption. When
490: the video stream is packetized in a system stream (i.e., PS or
491: TS), the size of each video packet remains unchanged after
492: encryption. This enables the following useful features in
493: applications:
494: \begin{itemize}
495: \item \textit{independence of bit-rate types (VBR and FBR)};
496: 
497: \item \textit{avoiding some time-consuming operations when
498: encrypting MPEG-compressed videos}: bit-rate control,
499: re-packetization of the system stream and the re-multiplexing of
500: multiple audio/video streams;
501: 
502: \item \textit{on-the-fly encryption}: a) direct encryption of
503: MPEG-compressed video files without creating temporary files, i.e.,
504: one can open an MPEG video file, read the bitstream and
505: simultaneously update (encrypt) it; b) instantaneous switching
506: encryption on/off for online video transmission;
507: 
508: \item \textit{ROI (region on interest) encryption}: selectively
509: encrypting partial frames, slices, macroblocks, blocks, motion
510: vectors and/or DCT coefficients within specific regions of the
511: video.
512: \end{itemize}
513: 
514: \item \textit{Independence of optional data elements in MPEG
515: videos}: the perceptibility is not obtained by encrypting optional
516: data elements, such as quantiser\_matrix and
517: coded\_block\_pattern\footnote{Strictly speaking, motion vectors
518: are also optional elements in MPEG videos, so the scheme should
519: not encrypt only motion vectors as did in
520: \cite{Yann:WaterScrambling:ACMMM2002}.}. This means that the
521: scheme can encrypt any MPEG-compliant videos with a uniform
522: performance.
523: 
524: \item \textit{Fast encryption speed}: a) the extra computational
525: load added by the encryption is much smaller than the
526: computational load of a typical MPEG encoder; b) MPEG-compressed
527: videos can be quickly encrypted without being fully decoded and
528: re-encoded (at least the time-consuming IDCT/DCT operations are
529: avoided).
530: 
531: \item \textit{Easy implementation}: the encryption/decryption
532: parts can be easily incorporated into the whole MPEG system,
533: without major modification of the structure of the codec.
534: 
535: \item \textit{Multi-dimensional perceptibility}: the degradation
536: of visual quality is controlled by multi-dimensional factors.
537: 
538: \item \textit{Security against known/chosen-plaintext attacks} is
539: ensured by four different measures.
540: \end{itemize}
541: To the best of our knowledge, some of the above features (such as
542: on-the-fly encryption) have never been discussed in the literature
543: on video encryption, in spite of their usefulness in real
544: applications.
545: 
546: With the above features, the perceptual encryption scheme becomes
547: more flexible to fulfill different requirements of various
548: applications. To realize the strict size-preservation feature, the
549: encryption algorithm has to be incorporated into the MPEG encoder,
550: i.e., the (even partial) separation of the cipher and the encoder is
551: impossible. This is a minor disadvantage in some applications.
552: However, if the re-design is sufficiently simple, it is worth doing
553: so to get a better tradeoff between the overall performance and the
554: easy implementation. In the case that the re-design of the MPEG
555: codec is impossible, for example, if the codec is secured by the
556: vendor, a simplified MPEG codec can be developed for the embedding
557: of the perceptual video cipher. Since the most time-consuming
558: operations in a normal MPEG codec including DCT/IDCT and picture
559: reconstruction, are excluded from the simplified MPEG codec, fast
560: encryption speed and low implementation complexity of the whole
561: system can still be achieved.
562: 
563: In the following text of this section, we describe the design
564: principle along with different methods of providing security
565: against known/chosen-plaintext attacks, and discuss several
566: implementation issues.
567: 
568: \subsection{The new design}
569: 
570: This design is a generalized version of VEA
571: \cite{Shi:MPEGEncryption:MMTA2004} for perceptual encryption, by
572: selectively encrypting FLC data elements in the video stream.
573: Apparently, encrypting FLC data elements is the most natural and
574: perhaps the simplest way to maintain all needed features,
575: especially the need for the strict size-preservation feature. The
576: proposed scheme is named PVEA -- perceptual video encryption
577: algorithm. Note that PVEA can also be considered as an enhanced
578: combination of the encryption techniques for JPEG images proposed
579: in \cite{Belgian:SelectiveImageEncryption:ACIVS2002,
580: Torrubia:PerceptualJPEG:ICCE2003} and the perceptual encryption of
581: motion vectors \cite{Yann:WaterScrambling:ACMMM2002}.
582: 
583: There are three main reasons for selecting only FLC data elements
584: for encryption.
585: \begin{enumerate}
586: \item
587: As analyzed below, all existing VLC encryption algorithms cannot be
588: directly used to provide a controllable degradation of the quality.
589: New ideas have to be developed to adopt VLC encryption in perceptual
590: encryption schemes.
591: \begin{itemize}
592: \item
593: \textit{VLC encryption with different Huffman tables}
594: \cite{Wu&Kuo:AudiovisualEncryption:SPIE2001,
595: Wu&Kuo:EntropyCodecEncryption:SPIE2001,
596: Xie&Kuo:MHTEncryption:SPIE2003, Wu&Kuo:MHTEncryption:IEEETMM2004,
597: Kankanhalli:VideoScrambler:IEEETCE2002,
598: Shi:SecretHuffmanCoding:MM98, Shi:MPEGEncryption:MMTA2004}: Since
599: each VLC-codeword is a pair of (run, level), if a VLC-codeword is
600: decoded to get an incorrect ``run" value, then the position of all
601: the following DCT coefficients will be wrong. As a result, the
602: visual quality of the decoded block will be degraded in an
603: uncontrollable way. Thus, it is difficult to find a factor to
604: control such visual quality degradation. Moreover, if the Huffman
605: tables do not keep the size of each VLC-entry as designated in
606: \cite{Wu&Kuo:AudiovisualEncryption:SPIE2001,
607: Wu&Kuo:EntropyCodecEncryption:SPIE2001,
608: Xie&Kuo:MHTEncryption:SPIE2003, Wu&Kuo:MHTEncryption:IEEETMM2004,
609: Kankanhalli:VideoScrambler:IEEETCE2002}, syntax errors may occur
610: when an unauthorized user decodes an encrypted video. This means
611: that the encryption cannot ensure the format compliance to any
612: standard MPEG codecs.
613: 
614: \item
615: \textit{VLC-index encryption} \cite{Zeng:VideoScrambling:MMSP2001,
616: Zeng:VideoScrambling:IEEETCASVT2002}: This encryption scheme can
617: ensure format compliance, but still suffers from the
618: uncontrollability of the visual quality degradation due to the same
619: reason as above. Another weakness of VLC-index encryption is that it
620: may influence the compression efficiency and bring overhead on video
621: size.
622: 
623: \item
624: \textit{Shuffling VLC-codewords or RLE events before the entropy
625: encoding stage} \cite{Zeng:VideoScrambling:IEEETCASVT2002,
626: Liu:MPEGEncryption:IEICE-A2006}: This algorithm can ensure both the
627: format compliance and the strict size-preservation. However, even
628: exchanging only two VLC-codewords may cause a dramatic change of the
629: DCT coefficients distribution of each block. So, this encryption
630: algorithm cannot realize a slight degradation of the visual quality
631: and fails to serve as an ideal candidate for perceptual encryption.
632: \end{itemize}
633: 
634: \item
635: It is obvious that FLC encryption is the simplest way to achieve all
636: the desired properties mentioned in the beginning of this section,
637: especially to achieve format compliance, strict size-preservation
638: and fast encryption \textbf{simultaneously}. For example, naive
639: encryption\footnote{In the image/video encryption literature, the
640: term ``naive encryption" means to consider the video as a 1-D
641: bitstream and encrypt it via a common cipher.} can realize strict
642: size-preservation and fast encryption, but cannot ensure format
643: compliance.
644: 
645: \item
646: As will be seen below, using FLC encryption is sufficient to fulfill
647: the needs of most real applications for perceptual encryption.
648: \end{enumerate}
649: 
650: According to MPEG standards \cite{MPEG1-ISOStandard,
651: MPEG2-ISOStandard, MPEG4-ISOStandard}, the following FLC data
652: elements exist in an MPEG-video bitstream:
653: \begin{itemize}
654: \item 4-byte start codes: 000001xx (hexadecimal);
655: 
656: \item almost all information elements in various headers;
657: 
658: \item sign bits of non-zero DCT coefficients;
659: 
660: \item (differential) DC coefficients in intra blocks;
661: 
662: \item ESCAPE DCT coefficients;
663: 
664: \item sign bits and residuals of motion vectors.
665: \end{itemize}
666: 
667: To maintain the format-compliance to the MPEG standards after the
668: encryption, the first two kinds of data elements should not be
669: encrypted. So, in PVEA, only the last four FLC data elements are
670: considered, which are divided into three categories according to
671: their contributions to the visual quality:
672: \begin{itemize}
673: \item \textit{intra DC coefficients}: corresponding to the rough
674: view (in the level of $8\times 8$ block) of the video;
675: 
676: \item \textit{sign bits of non-intra DC coefficients and AC
677: coefficients, and ESCAPE DCT coefficients}: corresponding to
678: details in $8\times 8$ blocks of the video;
679: 
680: \item \textit{sign bits and residuals of motion vectors}:
681: corresponding to the visual quality of the video related to the
682: motions (residuals further corresponds to the details of the
683: motions).
684: \end{itemize}
685: Based on the above division, three control factors, $p_{sr}$,
686: $p_{sd}$, and $p_{mv}$ in the range [0,1], are used to control the
687: visual quality in three different dimensions: the low-resolution
688: rough (spatial) view, the high-resolution (spatial) details, and
689: the (temporal) motions. With the three control factors, the
690: encryption procedure of PVEA can be described as follows:
691: \begin{enumerate}
692: \item encrypting intra DC coefficients with probability $p_{sr}$;
693: 
694: \item encrypting sign bits of non-zero DCT coefficients (except
695: for intra DC coefficients) and ESCAPE DCT coefficients with
696: probability $p_{sd}$;
697: 
698: \item encrypting sign bits and residuals of motion vectors with
699: probability $p_{mv}$.
700: \end{enumerate}
701: The encryption of selected FLC data elements can be carried out with
702: either a stream cipher or a block cipher. When a block cipher is
703: adopted, the consecutive FLC data elements should be first
704: concatenated together to form a longer bit stream, then each block
705: of the bit stream is encrypted, and finally each encrypted FLC data
706: element is placed back into its original position in the video
707: stream. Under the assumption that the stream cipher or block cipher
708: embedded in PVEA is secure, some special considerations should be
709: taken into account in order to ensure the security against various
710: attacks, as discussed below.
711: 
712: In the above-described PVEA, the three factors control the visual
713: quality, as follows:
714: \begin{itemize}
715: \item $p_{sr}=1\to 0$: the spatial perceptibility changes from
716: ``almost imperceptible" to ``perfectly perceptible" when
717: $p_{sd}=0$ or to ``roughly perceptible" when $p_{sd}>0$;
718: 
719: \item $p_{sr}=0$, $p_{sd}=1\to 0$: the spatial perceptibility
720: changes from ``roughly perceptible" to ``perfectly perceptible";
721: 
722: \item $p_{mv}=1\to 0$: the temporal (motion) perceptibility (for
723: P/B-pictures only) changes from ``almost imperceptible" to
724: ``perfectly perceptible".
725: \end{itemize}
726: The encryption may bring the recovered motion vectors out of the
727: spatial range of the picture, so the motion compensation
728: operations (or even the involved picture itself) may be simply
729: discarded by the MPEG decoder. In this case, the temporal (motion)
730: perceptibility will be ``perfectly imperceptible", not just
731: ``almost imperceptible".
732: 
733: In the Appendix of \cite{Shi:MPEGEncryption:MMTA2004}, it was
734: claimed that the DC coefficients of each block can be uniquely
735: derived from the other 63 AC coefficients. This means that the
736: perceptual encryption of DC coefficients must not be used alone,
737: i.e., some AC coefficients must also be encrypted to make the
738: encryption of the DC coefficients secure. It was lately observed
739: that this claim is not correct
740: \cite{Li:MPEGEncryption:MMTA2004note}. In fact, the DC coefficient
741: of a block means the average brightness of the block, and is
742: independent of the other 63 AC coefficients. Thus, the
743: DC-encryption and AC-encryption of PVEA are independent of each
744: other, i.e., the two control factors, $p_{sr}$ and $p_{sd}$, are
745: independent of each other, and they can be freely combined in
746: practice.
747: 
748: \subsection{Security against ciphertext-only attacks and a constraint of the control
749: factor}
750: 
751: The format compliance of perceptual encryption makes it possible for
752: the attacker to guess the values of all encrypted FLC data elements
753: separately in ciphertext-only attacks. The simplest attack is to try
754: to recover more visual information by setting all the encrypted FLC
755: data elements to zeros. This is called error-concealment-based
756: attack (ECA) \cite{Zeng:VideoScrambling:IEEETCASVT2002}. Our
757: experimental results have shown that PVEA is secure against such
758: attack. More details are given in the next section.
759: 
760: To guess the value of each FLC data element, one can also employ the
761: local correlation existing between adjacent blocks in each frame.
762: That is, one can search for a set of all encrypted FLC data elements
763: in each frame to achieve the least blocking artifact. Does such a
764: deblocking attack work? Now let us try to get a lower bound of this
765: attack's complexity, by assuming that the number of all FLC data
766: elements in each frame is $N$, which means that the number of
767: encrypted FLC data elements is $pN$. Then, the complexity of the
768: deblocking attack will not be less than
769: $O\left(\binom{N}{pN}2^{pN}\right)$, since each FLC data elements
770: has at least two candidate values. So, if $\binom{N}{pN}2^{pN}$ is
771: cryptographically large, the deblocking attack will not compromise
772: the security of PVEA. As a lower bound of $p$ corresponding to a
773: typical security level, one can get $p\geq 100/N$ by assuming
774: $2^{pN}\geq 2^{100}$. For most consumer videos that need to be
775: protected via perceptual encryption, $N$ is generally much larger
776: than 100, so this constraint on $p$ generally does not have too much
777: influence on the overall performance of PVEA. Because the complexity
778: $2^{pN}$ is much over-estimated\footnote{There are two reasons about
779: the over-estimation: 1) the omission of $\binom{N}{pN}$, which is
780: very large when $N\gg pN$ and $pN$ is not very small; 2) some FLC
781: elements (such as intra DC coefficients) have more than 2 candidate
782: values.}, the constraint can be further relaxed in practice. For
783: example, when $N=200$, the above condition suggests that $p\geq
784: 100/N=1/2$. However, calculations showed that $p\geq 9/100$ is
785: enough to ensure a complexity larger than $O(2^{100})$.
786: 
787: Since it is generally impractical to carry out the deblocking attack
788: on the whole frame, another two-layer deblocking attack may be
789: adopted by the attacker: 1) performing the deblocking attack on
790: small areas of the frame; 2) for all candidates of these small
791: areas, performing the deblocking attack on the area-level again.
792: Though this two-layer attack generally has a much smaller complexity
793: than the simple attack, its efficiency is still limited due to the
794: following reasons.
795: \begin{itemize}
796: \item
797: For each small area, the number of encrypted FLC elements is
798: generally not equal to $pN^*$, where $N^*$ denotes the total number
799: of all FLC elements in the area. Thus, even this number has to be
800: exhaustively guessed and then validated by considering the numbers
801: of other areas (i.e., the whole frame). The existence of three
802: independent quality factors makes the attack even more complicated.
803: 
804: \item
805: For small each areas, the probability that the least deblocking
806: result does not correspond to the real scene may not very small.
807: Accordingly, the attacker has to mount a more loose deblocking
808: attack, thus leading to a higher attacking complexity.
809: 
810: \item
811: Even for the smallest area of size $16\times 16$, there are
812: generally more than one hundred FLC elements (i.e., $N^*\geq 100$),
813: especially when there are rich visual information included in the
814: area.
815: 
816: \item
817: If the number of FLC elements in an area is relatively small, this
818: area generally contains less significant visual information (such as
819: a smooth area).
820: 
821: \item
822: The smaller each area is, the more the number of fake results will
823: be, and then the more the complexity of the second stage will be.
824: \end{itemize}
825: Of course, with the two-layer deblocking attack, the attacker can
826: have a chance to recover a number of small areas, though he/she
827: generally cannot get the whole frame. Such a minor security problem
828: is an unavoidable result of the inherent format-compliance property
829: of the perceptual encryption algorithms and related to the essential
830: disadvantage of perceptual encryption exerted on some special
831: MPEG-videos (see the discussion on Fig.~\ref{figure:Animation} in
832: the next section).
833: 
834: \subsection{Security against known/chosen-plaintext attacks}
835: 
836: Generally speaking, there are four different ways to provide
837: security against known/chosen-plaintext attacks. Users can select
838: one solution for a specific application.
839: 
840: \subsubsection{Using a block cipher}
841: 
842: With a block cipher, it is easy to provide security against
843: known/chosen-plaintext attacks. Since the lengths of different FLC
844: data elements are different, the block cipher may have to run in CFB
845: (cipher feedback) mode with variable-length feedback bits to realize
846: the encryption. Note that $n$-bit error propagation exists in block
847: ciphers running in the CFB mode
848: \cite{Schneier:AppliedCryptography96}, where $n$ is the block size
849: of the cipher. It is also possible to cascade multiple FLC data
850: elements to compose an $n$-bit block for encryption, as in RVEA
851: \cite[Sec. 7]{Shi:MPEGEncryption:MMTA2004}. Compared to the CFB
852: mode, the latter encryption mode can achieve a faster encryption
853: speed (with a little more implementation complexity for bit
854: cascading), since in the CFB mode only one element can be encrypted
855: in each run of the block cipher.
856: 
857: \subsubsection{Using a stream cipher with plaintext/ciphertext
858: feedback}
859: 
860: After encrypting each plain data element, the plaintext or the
861: ciphertext is sent to perturb the stream cipher for the encryption
862: of the next plain data element. In such a way, the keystream
863: generated by the stream cipher becomes dependent on the whole
864: plain-video, which makes the known/chosen-plaintext attacks
865: impractical. Note that an initial vector is needed for the
866: encryption of the first plain data element.
867: 
868: \subsubsection{Using a key-management system and a stream cipher}
869: 
870: When a key-management system is available in an application, the
871: encryption procedure of PVEA can be realized with a stream cipher.
872: To effectively resist known/chosen-plaintext attacks, the secret key
873: of the stream cipher should be frequently changed by the
874: key-management system. In most cases, it is enough to change one key
875: per picture, or per GOP. Note that this measure needs more
876: computational load with higher implementation cost, and is suitable
877: mainly for encrypting online videos.
878: 
879: \subsubsection{Using a stream cipher with UID}
880: 
881: When key-management systems are not available in some applications,
882: a unique ID (UID) can be used to provide the security against
883: known/chosen-plaintext attacks by ensuring that the UIDs are
884: different for different videos. The UID of an MPEG-video can be
885: stored in the user\_data area. The simplest form of the UID is the
886: vendor ID plus the time stamp of the video. It is also possible to
887: determine the UID of a video with a hash function or a secure
888: pseudo-random number generator (PRNG). In this case, the UIDs of two
889: different videos may be identical, but the probability is
890: cryptographically small if the UID is sufficiently long. The UID is
891: used to initialize the stream cipher together with the secret key,
892: which ensures that different videos are encrypted with different
893: keystreams. Thus, when an attacker successfully gets the keystream
894: used for $n$ known/chosen videos, he cannot use the broken
895: keystreams to break other different videos. Of course, the employed
896: stream cipher should be secure against plaintext attacks in the
897: sense that the secret key cannot be derived from a known/chosen
898: segment of the long keystream that encrypts the whole video stream
899: \cite{Schneier:AppliedCryptography96}.
900: 
901: \subsection{Implementation issues}
902: 
903: Since PVEA is a generalization of VEA, it is obvious that fast
904: encryption speed can be easily achieved, as shown in
905: \cite{Shi:MPEGEncryption:MMTA2004}. In addition, by carefully
906: optimizing the implementation, the encryption speed can be further
907: increased. We give two examples to show how to optimize the
908: implementation of PVEA so as to increase the encryption speed.
909: 
910: A typical way to realize the probabilistic quality control with a
911: decimal factor $p$ is as follows: generate a pseudo-random
912: decimal, $r\in[0,1]$, for each data element with a
913: uniformly-distributed PRNG, and then encrypt the current element
914: only when $r\leq p$. The above implementation can be modified as
915: follows to further increase the encryption speed:
916: \begin{enumerate}
917: \item pseudo-randomly select $N_p=\mathrm{round}(N\cdot p)$
918: integers from the set $\{0,\cdots,N-1\}$;
919: 
920: \item create a binary array $SE[0]\sim SE[N]$: $SE[i]=1$ if the
921: integer $i$ is selected; otherwise, $SE[i]=0$;
922: 
923: \item encrypt the $i$-th FLC data element only when $SE[i\bmod
924: N]=1$.
925: \end{enumerate}
926: In this modified implementation, only a modulus addition and a
927: look-up-table operation are needed to determine whether the current
928: data element should be encrypted. As a comparison, in the typical
929: implementation, one run of the PRNG is needed for each data element,
930: which is generally much slower. Although $N$ bits of extra memory is
931: needed to store the array in the modified implementation, it is
932: merely a trivial problem since video codec generally requires much
933: more memory. To ensure the security against deblocking attacks, in
934: the modified implementation the value of $N$ should not be too
935: small\footnote{In most cases, it is enough to set $N\geq 300$.}.
936: 
937: To further reduce the computational load of PVEA, another way is to
938: selectively encrypt partial FLC data elements. Two possible options
939: are as follows: 1) encrypt only intra blocks; 2) encrypt only sign
940: bits (or a few number of most significant bits) of intra DC
941: coefficients, ESCAPE DCT coefficients, and residuals of motion
942: vectors. The above two options can also be combined together. This
943: will have very little effect on the encryption performance, since an
944: attacker can only recover video frames with a poor visual quality
945: from other unencrypted data elements
946: \cite{Agi&Gong:StudySecureMPEG:NDSS96,
947: Zeng:VideoScrambling:IEEETCASVT2002}.
948: 
949: \section{Encryption Performance of PVEA}
950: \label{section:Experiments}
951: 
952: Some experiments have been conducted to test the real encryption
953: performance of PVEA for a widely-used MPEG-1 test video,
954: ``Carphone". The encryption results of the 1st frame (I-type) are
955: shown in Fig.~\ref{figure:Carphone}, with different values of the
956: two control factors $p_{sr}$ and $p_{sd}$. It can be seen that the
957: degradation of the visual quality is effectively controlled by the
958: two factors. The encryption results of the third control factor
959: $p_{mv}$ are given in Fig.~\ref{figure:Carphone2}, where the 313th
960: frame (B-type) is selected for demonstration. It can be seen that
961: encrypting only the motion vectors will not cause much degradation
962: in the visual quality.
963: 
964: \begin{figure}
965: \centering \centering
966: \begin{minipage}{\figwidth}
967: \centering
968: \includegraphics[width=\textwidth]{Fig2a}
969: a)
970: \end{minipage}
971: \begin{minipage}{\figwidth}
972: \centering
973: \includegraphics[width=\textwidth]{Fig2b}
974:  b)
975: \end{minipage}
976: \begin{minipage}{\figwidth}
977: \centering
978: \includegraphics[width=\textwidth]{Fig2c}
979: c)
980: \end{minipage}
981: \begin{minipage}{\figwidth}
982: \centering
983: \includegraphics[width=\textwidth]{Fig2d}
984: d)
985: \end{minipage}
986: \begin{minipage}{\figwidth}
987: \centering
988: \includegraphics[width=\textwidth]{Fig2e}
989: e)
990: \end{minipage}
991: \begin{minipage}{\figwidth}
992: \centering
993: \includegraphics[width=\textwidth]{Fig2f}
994: f)
995: \end{minipage}
996: \begin{minipage}{\figwidth}
997: \centering
998: \includegraphics[width=\textwidth]{Fig2g}
999: g)
1000: \end{minipage}
1001: \begin{minipage}{\figwidth}
1002: \centering
1003: \includegraphics[width=\textwidth]{Fig2h}
1004: h)
1005: \end{minipage}
1006: \begin{minipage}{\figwidth}
1007: \centering
1008: \includegraphics[width=\textwidth]{Fig2i}
1009: i)
1010: \end{minipage}
1011: \caption{The encryption results of the 1st frame in ``Carphone":
1012: a) $(p_{sr},p_{sd})=(0,0)$ -- the plain frame; b)
1013: $(p_{sr},p_{sd})=(0,0.2)$; c) $(p_{sr},p_{sd})=(0,1)$; d)
1014: $(p_{sr},p_{sd})=(0.2,0)$; e) $(p_{sr},p_{sd})=(0.2,0.2)$; f)
1015: $(p_{sr},p_{sd})=(0.5,0.5)$; g) $(p_{sr},p_{sd})=(1,0)$; h)
1016: $(p_{sr},p_{sd})=(1,0.2)$; i)
1017: $(p_{sr},p_{sd})=(1,1)$.}\label{figure:Carphone}
1018: \end{figure}
1019: 
1020: \begin{figure}
1021: \centering
1022: \begin{minipage}{\figwidth}
1023: \centering
1024: \includegraphics[width=\textwidth]{Fig3a}
1025: a)
1026: \end{minipage}
1027: \begin{minipage}{\figwidth}
1028: \centering
1029: \includegraphics[width=\textwidth]{Fig3b}
1030: b)
1031: \end{minipage}
1032: \begin{minipage}{\figwidth}
1033: \centering
1034: \includegraphics[width=\textwidth]{Fig3c}
1035: c)
1036: \end{minipage}
1037: \caption{The encryption results of the 313th frame in ``Carphone":
1038: a) $(p_{sr},p_{sd},p_{mv})=(0,0,0)$ -- the plain frame; b)
1039: $(p_{sr},p_{sd},p_{mv})=(0,0,0.5)$; c)
1040: $(p_{sr},p_{sd},p_{mv})=(0,0,1)$.}\label{figure:Carphone2}
1041: \end{figure}
1042: 
1043: Our experiments have also shown that PVEA is secure against
1044: error-concealment based attacks. For two encrypted frames shown in
1045: Fig.~\ref{figure:Carphone}, the recovered images after applying ECA
1046: are shown in Fig.~\ref{figure:Carphone3}. In
1047: Fig.~\ref{figure:Carphone3}a, the sign bits of all AC coefficients
1048: are set to be zeros, and in Fig.~\ref{figure:Carphone3}b all DC
1049: coefficients are also set to be zeros. It can be seen that the
1050: visual quality of the recovered images via such an attack is even
1051: worse than the quality of the cipher-images, which means that ECA
1052: cannot help an attacker get more visual information. Actually, the
1053: security of PVEA against ECA depends on the fact that an attacker
1054: cannot tell encrypted data elements from un-encrypted ones without
1055: breaking the key. As a result, he has to set all possible data
1056: elements to be fixed values, which is equivalent to perceptual
1057: encryption with the control factor 1, i.e., the strongest level of
1058: perceptual encryption.
1059: 
1060: \begin{figure}
1061: \centering
1062: \begin{minipage}{\figwidth}
1063: \centering
1064: \includegraphics[width=\textwidth]{Fig4a}
1065: a)
1066: \end{minipage}
1067: \begin{minipage}{\figwidth}
1068: \centering
1069: \includegraphics[width=\textwidth]{Fig4b}
1070: b)
1071: \end{minipage}
1072: \caption{The recovered results after applying ECA for the 1st
1073: frame in ``Carphone": a) breaking Fig.~\ref{figure:Carphone}c; b)
1074: breaking Fig.~\ref{figure:Carphone}i.}\label{figure:Carphone3}
1075: \end{figure}
1076: 
1077: Finally, it is worth mentioning that PVEA has a minor disadvantage
1078: that the degradation in the visual quality is dependent on the
1079: amplitudes of the intra DC coefficients. As an extreme example,
1080: consider an intra picture whose DC coefficients are all zeros, which
1081: means that the FLC-encoded differential value of each intra DC
1082: coefficient does not occur in the bitstream, i.e., only the
1083: VLC-encoded dct\_dc\_size=0 occurs. In this case, the control of the
1084: rough visual quality by $p_{sr}$ completely disappears. Similarly,
1085: when dct\_dc\_size=1, the encryption can only change the
1086: differential value from $\pm 1$ to $\mp 1$, so the degradation will
1087: not be very significant. As a result, this problem will cause the
1088: perceptibility of some encrypted videos become ``partially
1089: perceptible" when $p_{sr}=1$ (should be ``almost imperceptible" for
1090: most videos). For an MPEG-1 video\footnote{Source of this test
1091: video:
1092: \url{http://www5.in.tum.de/forschung/visualisierung/duenne_gitter/DG_4.mpg}.}
1093: with a dark background (i.e., with many intra DC coefficients of
1094: small amplitudes), the encryption results are shown in
1095: Fig.~\ref{figure:Animation}. Fortunately, this problem is not so
1096: serious in practice, for the following reasons:
1097: \begin{itemize}
1098: \item most consumer videos contain
1099: sufficiently many intra DC coefficients of large amplitudes;
1100: 
1101: \item even when there are many zero intra DC coefficients, the
1102: content of the video has to be represented by other intra DC
1103: coefficients of sufficiently large amplitudes;
1104: 
1105: \item the differential encoding can increase the number of
1106: non-zero intra DC coefficients;
1107: 
1108: \item the partial degradation caused by $p_{sr}$ and the
1109: degradation caused by $p_{sd}$ and $p_{mv}$ are enough for most
1110: applications of perceptual encryption (see Figs.
1111: \ref{figure:Carphone} and \ref{figure:Animation}).
1112: \end{itemize}
1113: 
1114: From this minor disadvantage of PVEA, a natural result can be
1115: immediately derived: for the protection of MPEG videos that are
1116: highly confidential, VLC data elements should also be encrypted. In
1117: fact, our additional experiments on various video encryption
1118: algorithms have shown that it might be impossible to effectively
1119: degrade the visual quality of the MPEG videos with dark background
1120: via format-compliant encryption, unless the compression ratio and
1121: the strict size-preservation feature are compromised. The relations
1122: among the encryption performance, the compression ratio, the
1123: size-preservation feature, and other features of video encryption
1124: algorithms, are actually much more complicated. These problems will
1125: be investigated in our future research.
1126: 
1127: \setlength\figwidth{0.32\columnwidth}
1128: \begin{figure}
1129: \centering
1130: \begin{minipage}{\figwidth}
1131: \centering
1132: \includegraphics[width=\textwidth]{Fig5a}
1133: a)
1134: \end{minipage}
1135: \begin{minipage}{\figwidth}
1136: \centering
1137: \includegraphics[width=\textwidth]{Fig5b}
1138: b)
1139: \end{minipage}\\
1140: \begin{minipage}{\figwidth}
1141: \centering
1142: \includegraphics[width=\textwidth]{Fig5c}
1143: c)
1144: \end{minipage}
1145: \begin{minipage}{\figwidth}
1146: \centering
1147: \includegraphics[width=\textwidth]{Fig5d}
1148: d)
1149: \end{minipage}
1150: \caption{The encryption results of the 169th frame in an MPEG-1
1151: video: a) $(p_{sr},p_{sd},p_{mv})=(0,0,0)$ -- the plain frame; b)
1152: $(p_{sr},p_{sd},p_{mv})=(1,0,0)$; c)
1153: $(p_{sr},p_{sd},p_{mv})=(1,1,0)$; d)
1154: $(p_{sr},p_{sd},p_{mv})=(1,1,1)$.}\label{figure:Animation}
1155: \end{figure}
1156: 
1157: \section{Conclusion}
1158: 
1159: This paper focuses on the problem of how to realize perceptual
1160: encryption of MPEG videos. Based on a comprehensive survey on
1161: related work and performance analysis of some existing perceptual
1162: video encryption schemes, we have proposed a new design with more
1163: useful features, such as on-the-fly encryption and multi-dimensional
1164: perceptibility. We have also discussed its security against
1165: deblocking attack and pointed out some measures against
1166: known/chosen-plaintext attack. The proposed perceptual encryption
1167: scheme can also be extended to realize non-perceptual encryption by
1168: simply adding a VLC-encryption part.
1169: 
1170: \bibliographystyle{IEEEtran}
1171: \bibliography{PVEA}
1172: 
1173: \end{document}
1174: