0501:cs0501014/PVEA.tex

1: \documentclass[final]{IEEEtran}

2:

3: \usepackage{amsmath,amssymb,bm}

4: \usepackage{graphicx}

5: \usepackage{cite,url}

6:

7: \newlength\figwidth

8: \setlength\figwidth{0.32\columnwidth}

9:

10: \begin{document}

11:

12: \title{On the Design of Perceptual MPEG-Video Encryption Algorithms%

13: \thanks{Copyright (c) 2006 IEEE. Personal use of this material is

14: permitted. However, permission to use this material for any other

15: purposes must be obtained from the IEEE by sending an email to

16: \texttt{pubs-permissions@ieee.org}.}

17: \thanks{This research was partially supported by the City University

18: of Hong Kong SRG grant 7001702, by The Hong Kong Polytechnic

19: University's Postdoctoral Fellowships Scheme under grant no. G-YX63,

20: , by the Research Grant Council of Hong Kong under grant no. PolyU

21: 5232/06E, and by the US NSF grants ANI-0219110 and RIS-0292890.}}

22: \author{Shujun Li\thanks{Shujun Li and Kwok-Tung Lo are with the

23: Department of Electronic and Information Engineering, The Hong Kong

24: Polytechnic University, Hung Hom, Kowloon, Hong Kong SAR, China.},

25: Guanrong Chen,~\IEEEmembership{Fellow, IEEE}\thanks{Guanrong Chen is

26: with the Department of Electronic Engineering, City University of

27: Hong Kong, 83 Tat Chee Avenue, Kowloon Tong, Hong Kong SAR, China.},

28: Albert Cheung,~\IEEEmembership{Member, IEEE}\thanks{Albert Cheung is

29: with the Department of Building and Construction and Shenzhen

30: Applied R\&D Centres, City University of Hong Kong, Kowloon Tong,

31: Hong Kong SAR, China.}, Bharat Bhargava,~\IEEEmembership{Fellow,

32: IEEE}\thanks{Bharat Bhargava is with the Department of Computer

33: Sciences, Purdue University, 250 N. University Street, West

34: Lafayette, IN 47907-2066, USA.} and Kwok-Tung

35: Lo,~\IEEEmembership{Member, IEEE}}

36:

37: \markboth{IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO

38: TECHNOLOGY, VOL. 17, NO. 2, PAGES 214-223, FEBRUARY 2007}{Shujun Li

39: \MakeLowercase{\textit{et al.}}: Perceptual MPEG Encryption}

40:

41: \maketitle

42:

43: \begin{abstract}

44: In this paper, some existing perceptual encryption algorithms of

45: MPEG videos are reviewed and some problems, especially security

46: defects of two recently proposed MPEG-video perceptual encryption

47: schemes, are pointed out. Then, a simpler and more effective design

48: is suggested, which selectively encrypts fixed-length codewords

49: (FLC) in MPEG-video bitstreams under the control of three

50: perceptibility factors. The proposed design is actually an

51: encryption configuration that can work with any stream cipher or

52: block cipher. Compared with the previously-proposed schemes, the new

53: design provides more useful features, such as strict

54: size-preservation, on-the-fly encryption and multiple

55: perceptibility, which make it possible to support more applications

56: with different requirements. In addition, four different measures

57: are suggested to provide better security against

58: known/chosen-plaintext attacks.

59: \end{abstract}

60:

61: \begin{keywords}

62: perceptual encryption, MPEG, fixed-length codeword (FLC),

63: cryptanalysis, known/chosen-plaintext attack

64: \end{keywords}

65:

66: \section{Introduction}

67:

68: The wide use of digital images and videos in various applications

69: brings serious attention to the security and privacy issues today.

70: Many different encryption algorithms have been proposed in recent

71: years as possible solutions to the protection of digital images and

72: videos, among which MPEG videos attract most attention due to its

73: prominent prevalence in consumer electronic markets

74: \cite{Zeng:MultimediaSecurity:Book2006,

75: Ahl:ImageVideoEncryption:Book2005,

76: Furht:MultimediaSecurity:Book2005,

77: Furht:ImageVideoEncryption:Handbook2004,

78: Li:ChaosImageVideoEncryption:Handbook2004}.

79:

80: In many applications, such as pay-per-view videos, pay-TV and video

81: on demand (VoD), the following feature called ``perceptual

82: encryption" is useful. This feature requires that the quality of

83: aural/visual data is only \textit{partially} degraded by encryption,

84: i.e., the encrypted multimedia data are still partially perceptible

85: after encryption. Such perceptibility makes it possible for

86: potential users to listen/view low-quality versions of the

87: multimedia products before buying them. It is desirable that the

88: aural/visual quality degradation can be continuously controlled by a

89: factor $p$, which generally denotes a percentage corresponding to

90: the encryption strength. Figure~\ref{figure:PE} shows a diagrammatic

91: view of perceptual encryption. The encryption key is kept secret

92: (not needed when public-key ciphers are used) but the control factor

93: $p$ can be published.

94:

95: \begin{figure}

96: \centering

97: \includegraphics[width=\columnwidth]{Fig1}

98: \caption{A diagrammatic view of the perceptual

99: encryption.}\label{figure:PE}

100: \end{figure}

101:

102: Regarding the visual quality degradation of the encrypted videos,

103: the following points should be remarked: 1) since there does not

104: exist a well-accepted objective measure of visual quality of digital

105: images and videos, the control factor is generally chosen to

106: represent a rough measure of the degradation; 2) the visual quality

107: degradations of different frames may be different, so the control

108: factor works only in an average sense for all videos; 3) the control

109: factor is generally selected to facilitate the implementation of the

110: encryption scheme, which may not have a linear relationship with the

111: visual quality degradation (but a larger value always means a

112: stronger degradation); 4) when the control factor $p=1$, the

113: strongest visual quality degradation of the specific algorithm

114: (i.e., of the target application) is reached, but it may not be the

115: strongest degradation that all algorithms can produce (i.e., all

116: visual information of the video is completely concealed).

117:

118: In recent years, some perceptual encryption schemes have been

119: proposed for G.729 speech

120: \cite{Servetti:PerceptualSpeech:ICASSP2002,

121: Servetti:PerceptualSpeech:IEEETSAP2002}, MP3 music

122: \cite{Torrubia:PerceptualMP3:IEEETCE2002}, JPEG images

123: \cite{Belgian:SelectiveImageEncryption:ACIVS2002,

124: Torrubia:PerceptualJPEG:ICCE2003}, wavelet-compressed (such as

125: JPEG2000) images and videos

126: \cite{Lian:PerceptualCryptography:ICME2004,

127: Lian:PerceptualCryptography:CIT2004,

128: Lian:PerceptualCryptography:ISIMP04} and MPEG videos

129: \cite{Dittmann:EnablingMPEG:LNCS97, Yann:WaterScrambling:ACMMM2002,

130: Turk:MPEG2Scrambling:IEEETCE2002,

131: Chinese:MPEG2Scrambling:IEEETCE2003}, respectively. The selective

132: encryption algorithms proposed in

133: \cite{Pommer&Uhl:WPSelectiveEncryption:SPS2002,

134: Pommer&Uhl:SelectiveWaveletEncryption:MS2003,

135: Pommer:SelectiveWaveletEncryption:Thesis2003} can be considered as

136: special cases of the perceptual encryption for images compressed

137: with wavelet packet decomposition. In some research papers, a

138: different term, ``transparent encryption", is used instead of

139: ``perceptual encryption" \cite{Turk:MPEG2Scrambling:IEEETCE2002,

140: Chinese:MPEG2Scrambling:IEEETCE2003}, emphasizing the fact that the

141: encrypted multimedia data are \textit{transparent} to all

142: standard-compliant decoders. However, transparency is actually an

143: equivalent of another feature called ``format-compliance" (or

144: ``syntax-awareness") \cite{Zeng:VideoScrambling:MMSP2001,

145: Zeng:VideoScrambling:IEEETCASVT2002}, which does not mean that some

146: partial perceptible information in plaintexts still remains in

147: ciphertexts. In other words, a perceptual cipher must be a

148: transparent cipher, but a transparent cipher may not be a perceptual

149: cipher \cite{Li:ChaosImageVideoEncryption:Handbook2004}. Generally,

150: perceptual encryption is realized by selective encryption algorithms

151: with the format-compliant feature. This paper chooses to use the

152: name of ``perceptual encryption" for such a useful feature of

153: multimedia encryption algorithms. More precisely, this paper focuses

154: on the perceptual encryption of MPEG videos. After identifying some

155: problems of the existing perceptual encryption schemes, a more

156: effective design of perceptual MPEG-video encryption will be

157: proposed.

158:

159: The rest of this paper is organized as follows. The next section

160: will provide a brief survey of related work and point out some

161: problems, especially problems existing in two recently-proposed

162: perceptual encryption algorithms

163: \cite{Turk:MPEG2Scrambling:IEEETCE2002,

164: Chinese:MPEG2Scrambling:IEEETCE2003}. In Section

165: \ref{section:NewDesign}, the video encryption algorithm (VEA)

166: proposed in \cite{Shi:MPEGEncryption:MMTA2004} is generalized to

167: realize a new perceptual encryption design for MPEG videos, called

168: the perceptual VEA (PVEA). Experimental study is presented in Sec.

169: \ref{section:Experiments}, to show the encryption performance of

170: PVEA. The last section presents the conclusion.

171:

172: \section{Related Work and Existing Problems}

173:

174: \subsection{Scalability-based perceptual encryption}

175:

176: Owing to the scalability provided in MPEG-2/4 standards

177: \cite{MPEG2-ISOStandard, MPEG4-ISOStandard}, it is natural to

178: realize perceptual encryption by encrypting the enhancement

179: layer(s) of an MPEG video (but leaving the base layer unencrypted)

180: \cite{Dittmann:EnablingMPEG:LNCS97}. However, since not all MPEG

181: videos are encoded with multiple layers, this scheme is quite

182: limited in practice. More general designs should be developed to

183: support videos that are compliant to the MPEG standards.

184:

185: \subsection{Perceptual encryption for JPEG images}

186:

187: Due to the similarity between the encoding of JPEG images

188: \cite{JPEG-ISOStandard} and the frame-encoding of MPEG videos

189: \cite{MPEG1-ISOStandard, MPEG2-ISOStandard, MPEG4-ISOStandard},

190: the ideas of perceptual encryption for JPEG images can be easily

191: extended to MPEG videos.

192:

193: In \cite{Belgian:SelectiveImageEncryption:ACIVS2002}, two

194: techniques of perceptual encryption were studied: encrypting

195: selective bit-planes of uncompressed gray-scale images, and

196: encrypting selective high-frequency AC coefficients of JPEG

197: images, with a block cipher such as DES, triple-DES or IDEA

198: \cite{Schneier:AppliedCryptography96}. The continuous control of

199: the visual quality degradation was not discussed, however.

200:

201: In \cite{Torrubia:PerceptualJPEG:ICCE2003}, the perceptual

202: encryption of JPEG images is realized by encrypting VLCs

203: (variable-length codewords) of partial AC coefficients in a ZoE

204: (zone of encryption) to be other VLCs in the Huffman table. The

205: visual quality degradation is controlled via an encryption

206: probability, $p/100\in[0,1]$, where $p\in\{0,\cdots,100\}$. This

207: encryption idea is similar to the video encryption algorithm

208: proposed in \cite{Zeng:VideoScrambling:IEEETCASVT2002}. The main

209: problem with encrypting VLCs is that the size of the encrypted

210: image/video will be increased since the Huffman entropy

211: compression is actually discarded in this algorithm.

212:

213: \subsection{Perceptual encryption for wavelet-compressed images and

214: videos}

215:

216: In \cite{Lian:PerceptualCryptography:ICME2004,

217: Lian:PerceptualCryptography:ISIMP04,

218: Lian:PerceptualCryptography:CIT2004}, several perceptual

219: encryption schemes for wavelet-compressed images and videos were

220: proposed. Under the control of a percentage ratio $q$, sign bit

221: scrambling and secret permutations of wavelet

222: coefficients/blocks/bit-planes are combined to realize perceptual

223: encryption. The problem with these perceptual encryption schemes

224: is that the secret permutations are not sufficiently secure

225: against known/chosen-plaintext attacks

226: \cite{Jan-Tseng:SCAN:IPL1996, Yu-Chang:SCAN:PRL2002,

227: Zhao:PositionPermute:ZJUS2004, Li:AttackingPOMC2004}: by comparing

228: the absolute values of a number of plaintexts and ciphertexts, one

229: can reconstruct the secret permutations. Once the secret

230: permutations are removed, the encryption performance will be

231: significantly compromised.

232:

233: \subsection{Perceptual encryption of motion vectors in MPEG-videos}

234:

235: In \cite{Yann:WaterScrambling:ACMMM2002}, motion vectors are

236: scrambled to realize perceptual encryption of MPEG-2 videos. Since

237: I-frames do not depend on motion vectors, such a perceptual

238: encryption algorithm can only blur the motions of MPEG videos. It

239: cannot provide enough degradation of the visual quality of the MPEG

240: videos for encryption (see Fig.~\ref{figure:Carphone2} of this

241: paper). Generally speaking, this algorithm can be used as an option

242: for further enhancing the performance of a perceptual encryption

243: scheme based on other techniques.

244:

245: \subsection{Pazarci-Dip\c{c}in scheme}

246:

247: In \cite{Turk:MPEG2Scrambling:IEEETCE2002}, Pazarci and Dip\c{c}in

248: proposed an MPEG-2 perceptual encryption scheme, which encrypts

249: the video in the RGB color space via four secret linear transforms

250: before the video is compressed by the MPEG-2 encoder. To encrypt

251: the RGB-format uncompressed video, each frame is divided into

252: $M\times M$ scrambling blocks (SB), which is composed of multiple

253: macroblocks of size $16\times 16$. Assuming the input and the

254: output pixel values are $x_i$ and $x_o$, respectively, the four

255: linear transforms are described as follows:

256: \begin{equation}

257: x_o=\begin{cases}

258: \alpha x_i, & (D,N)=(0,0),\\

259: FS-\alpha x_i, & (D,N)=(0,1),\\

260: FS(1-\alpha)+\alpha x_i, & (D,N)=(1,0),\\

261: FS-[FS(1-\alpha)+\alpha x_i], & (D,N)=(1,1),

262: \end{cases}\label{equation:TurkeyCipher}

263: \end{equation}

264: where $\alpha=\alpha^*/100\;(\alpha^*\in\{50,\cdots,90\})$ is a

265: factor controlling the visual quality degradation, $D,N$ are two

266: binary parameters that determine an affine transform for encryption,

267: and $FS$ means the maximal pixel value (for example, $FS=255$ for

268: 8-bit RGB-videos). The value of $\alpha$ in each SB is calculated

269: from the preceding I-frame, with a function called $\alpha$-rule

270: (see Sec.~2.2 of \cite{Turk:MPEG2Scrambling:IEEETCE2002} for more

271: details). The $\alpha$-rule and its parameters are designated to be

272: the secret key of this scheme.

273:

274: The main merit of the Pazarci-Dip\c{c}in scheme is that the

275: encryption/decryption and the MPEG encoding/decoding processes are

276: separated, which means that the encryption part can simply be added

277: to an MPEG system without any modification. However, the following

278: defects make this scheme problematic in real applications.

279:

280: \begin{enumerate}

281: \item Unrecoverable quality loss caused by the encryption always

282: exists, unless $\alpha=1$ (which corresponds to no encryption). Even

283: authorized users who know the secret key cannot recover the video

284: with the original quality. Although it is claimed in

285: \cite{Turk:MPEG2Scrambling:IEEETCE2002} that human eyes are not

286: sensitive to such a quality loss if $\alpha$ is set above 0.5, it

287: may still be undesirable for high-quality video services, such as

288: DVD and HDTV. In addition, limiting the value of $\alpha$ lowers the

289: security and flexibility of the encryption scheme.

290:

291: \item The compression ratio may be significantly influenced by

292: encryption if there are fast motions in the plain videos. This is

293: because the motion compensation algorithm may fail to work for

294: encrypted videos. The main reason is that the corresponding SBs

295: may be encrypted with different parameters. To reduce this kind of

296: influence, the encryption parameters of all SBs have to be

297: sufficiently close to each other. This, however, compromises the

298: encryption performance and the security.

299:

300: \item The scheme is not suitable for encrypting MPEG-compressed

301: videos. In many applications, such as VoD services, the

302: plain-videos have already been compressed in MPEG format and

303: stored in digital storage media (DSM). In this case, the

304: Pazarci-Dip\c{c}in scheme becomes too expensive and slow, since

305: the videos have to be first decoded, then encrypted, and finally

306: encoded again. Note that the re-encoding may reduce the video

307: quality, since the encoder is generally different from the

308: original one that produced the videos in the factory. Apparently,

309: this defect is a natural side effect of the merit of the

310: Pazarci-Dip\c{c}in scheme.

311:

312: \item The scheme is not secure enough against brute-force attacks.

313: For a given color component $C$ of any $2\times 2$ SB structure, one

314: can exhaustively guess the $\alpha$-values of the four SBs to

315: recover the $2\times 2$ SB structure, by minimizing the block

316: artifacts occurring between adjacent SBs. For each color component

317: of a SB, the value of $\alpha^*=100\alpha\in\{50,\cdots,90\}$,

318: $D\in\{0,1\}$ and $N$ is determined by $D$, so one can calculate

319: that the searching complexity is only $(41\times 2)^4\approx

320: 2^{25.4}$, which is sufficiently small for PCs\footnote{Even when

321: $\alpha^*\in\{0,\cdots,100\}$, the searching complexity is only

322: $(101\times 2)^4\approx 2^{30.6}$, which is still practically

323: small.}. Once the value of $\alpha$ of an SB is obtained, one can

324: further break the secret key of the corresponding $\alpha$-rule. For

325: the exemplified $\alpha$-rule given in Eq. (3) of

326: \cite{Turk:MPEG2Scrambling:IEEETCE2002}, the secret key consists of

327: the addresses of two selected subblocks (of size $P\times P$) in a

328: $2\times 2$ SB structure, and a binary shift value

329: $\Delta\in\{5,50\}$. Because $\Delta$ can be uniquely determined

330: from $D$, one only needs to search other part of the key, which

331: corresponds to a complexity of

332: $\left(3\times\left(2M/P\right)^2\right)^2$. When $P=\frac{M}{2}$,

333: the complexity is $48^2=2304\approx 2^{11.2}$, and when when

334: $P=\frac{M}{4}$ it is $192^2=36864\approx 2^{15.2}$. Apparently, the

335: key space is not sufficiently large to resist brute-force attacks,

336: either. In addition, since the values of quality factors and the

337: secret parameters corresponding to the three color components can be

338: separately guessed, the whole attack complexity is only three times

339: of the above values, which is still too small from a cryptographical

340: point of view \cite{Schneier:AppliedCryptography96}. Although using

341: multiple secret keys for different SBs can increase the attack

342: complexity exponentially, the key size will be too long and the

343: key-management will become more complicated. Here, note that the

344: $\alpha$-rule itself should not be considered as part of the key,

345: following the well-known Kerckhoffs' principle in modern

346: cryptography \cite{Schneier:AppliedCryptography96}.

347:

348: \item The scheme is not sufficiently sensitive to the mismatch of

349: the secret key, since the encryption transforms and the

350: $\alpha$-rule given in \cite{Turk:MPEG2Scrambling:IEEETCE2002} are

351: both linear functions. This means that the security against

352: brute-force attacks will be further compromised, as an approximate

353: value of $\alpha$ may be enough to recover most visual information

354: in the plain-video.

355:

356: \item The scheme is not secure enough against

357: known/chosen-plaintext attacks. This is because the value of

358: $\alpha$ can be derived approximately from the linear relation

359: between the plain pixel-values and the cipher pixel-values in the

360: same SB. Similarly, the value $N$ can be derived from the sign of

361: the slope of the linear map between $x_i$ and $x_o$, and the value

362: of $D$ can be derived from the value range of the map.

363: Furthermore, assuming that there are $k$ secret parameters in the

364: $\alpha$-rule, if more than $k$ different values of $\alpha$ are

365: determined as above, it is possible to uniquely solve the

366: approximate values of the $k$ secret parameters. To resist

367: known/chosen-plaintext attacks, the secret key has to be changed

368: more frequently than that suggested in

369: \cite{Turk:MPEG2Scrambling:IEEETCE2002} (one key per program),

370: which will increase the computational burden of the servers

371: (especially the key-management system).

372: \end{enumerate}

373:

374: \subsection{Wang-Yu-Zheng scheme}

375:

376: A different scheme working in the DCT domain (between DCT transform

377: and Huffman entropy coding) was proposed by Wang, Yu and Zheng in

378: \cite{Chinese:MPEG2Scrambling:IEEETCE2003}, which can be used as an

379: alternative solution to overcome the first two shortcomings of the

380: Pazarci-Dip\c{c}in scheme. By dividing all 64 DCT coefficients of

381: each $8\times 8$ block into 16 sub-bands following the distance

382: between each DCT coefficient and the DC coefficient, this new scheme

383: encrypts the $j$-th AC coefficient in the $i$-th sub-band as

384: follows:

385: \begin{equation}

386: b_{ij}'=\begin{cases}

387: b_{ij}-\lfloor\beta a_i\rfloor, & b_{ij}\geq 0,\\

388: b_{ij}+\lfloor\beta a_i\rfloor, & b_{ij}<0,

389: \end{cases}

390: \end{equation}

391: where $b_{ij}$ and $b_{ij}'$ denotes the plain pixel-value and the

392: cipher pixel-value, respectively, $\beta\in[0,1]$ is the control

393: factor, $a_i$ is the rounding average value of all AC coefficients

394: in the $i$-th sub-band, and $\lfloor\cdot\rfloor$ means the rounding

395: function towards zero. The DC coefficients are encrypted in a

396: different way, as $b_0'=b_0\pm\lfloor C\beta a_0\rfloor$, where

397: $a_0=b_0$ and $C\in[0,1]$ is the second control factor\footnote{Note

398: that the rounding function is missed in Eqs. (3) and (4) of

399: \cite{Chinese:MPEG2Scrambling:IEEETCE2003}. In addition, Eq. (4) of

400: \cite{Chinese:MPEG2Scrambling:IEEETCE2003} should read

401: $b_0'=b_0\pm\lfloor C\beta a_0\rfloor$, not $b_0'=a_0\pm C\beta$.}.

402: The value of $a_i$ can also be calculated in a more complicated way

403: to enhance the encryption performance, following Eqs. (5) and (6) in

404: \cite{Chinese:MPEG2Scrambling:IEEETCE2003}, where three new

405: parameters, $k_1,k_2,k_3$ are introduced to determine the values of

406: $a_i$ for the three color components, Y, Cr and Cb. The 16 average

407: values, $a_0\sim a_{15}$, the two control factors, $\beta$ and $C$,

408: and the three extra parameters (if used), $k_1,k_2,k_3$, altogether

409: serve as the secret scrambling parameters (i.e., the secret key) of

410: each SB. Three different ways are suggested for the transmission of

411: the secret parameters: a) encrypting them and transmitting them in

412: the payload of TS (transport stream); b) embedding them in the

413: high-frequency DCT coefficients; c) calculating them from the

414: previous I-frame in a way similar to the $\alpha$-rule in

415: \cite{Turk:MPEG2Scrambling:IEEETCE2002}.

416:

417: In fact, the Wang-Yu-Zheng scheme is just an enhanced version of the

418: Pazarci-Dip\c{c}in scheme, without amending all shortcomings of the

419: latter scheme. Precisely, the following problems still remain.

420: \begin{enumerate}

421: \item Though the reduction of the compression ratio about motion

422: compensations is avoided, the encryption will change the natural

423: distribution of the DCT coefficients and thus reduce the

424: compression efficiency of the Huffman entropy encoder. For

425: example, when each sub-band has only one non-zero coefficient, it

426: is possible that all 64 coefficients become non-zero after the

427: encryption. This significantly increases the video size. In

428: addition, if the secret parameters are embedded into the

429: high-frequency DCT coefficients for transmission, the compression

430: performance will be further compromised.

431:

432: \item The scheme is still not sufficiently sensitive to the

433: mismatch of the secret parameters, since the encryption function and

434: the calculation function of $a_i$ are kept linear. It is still not

435: sufficiently secure against brute-force attacks to the secret

436: parameters, because of the limited values of

437: $a_i,\beta,C,k_1,k_2,k_3$. Furthermore, due to the non-uniform

438: distribution of the DCT coefficients in each sub-band, an attacker

439: needs not to randomly search all possible values of $a_i$.

440:

441: \item This scheme is still insecure against known/chosen-plaintext

442: attacks if the third way is used for calculating the secret

443: parameters. In this case, $a_i$ of each SB can be easily calculated

444: from the previous I-frame of the plain-video. Additionally, since

445: the value of $\lfloor\beta a_i\rfloor$ can be obtained from

446: $b_{ij}-b_{ij}'$, the secret parameter $\beta$ can be derived

447: approximately. In a similar way, the secret parameter $C$ can also

448: be derived approximately. If $a_i$ is calculated with $k_1,k_2,k_3$,

449: the values of $\beta,k_1,k_2,k_3$ can be solved approximately with a

450: number of known/chosen AC coefficients in four or more different

451: sub-bands, so that $C$ can be further derived from one known/chosen

452: DC coefficient.

453:

454: \item The method of transmitting the secret parameters in the

455: payload of the transport stream cannot be used under the following

456: conditions: a) the key-management system is not available; b) the

457: video is not transmitted with the TS format. A typical example is

458: the perceptual encryption of MPEG-video files in personal

459: computers.

460:

461: \iffalse \item When the scheme is used to encrypt MPEG-compressed

462: videos, the video size will be changed and bit-rate control are

463: needed, which causes two problems: a) the computation load of the

464: encryption is still relatively large; b) one has to simultaneously

465: store the original video and the encrypted video if the former is

466: still useful in future, so some disk space for storage is

467: wasted.\fi

468: \end{enumerate}

469:

470: \section{More Efficient Design of Perceptual MPEG-Video Encryption Schemes}

471: \label{section:NewDesign}

472:

473: Based on the analysis given above, we propose a simpler design of

474: perceptual encryption for MPEG videos, and attempt to overcome the

475: problems in existing schemes. The following useful features are

476: supported in our new design.

477: \begin{itemize}

478: \item \textit{Format-compliance}: the encrypted video can still be

479: decoded by any standard-compliant MPEG decoder. This is a basic

480: feature of all perceptual encryption schemes.

481:

482: \item \textit{Lossless visual quality}: the encrypted video has

483: the same visual quality as the original one, i.e., the original

484: full-quality video can be exactly recovered when the secret key is

485: presented correctly.

486:

487: \item \textit{Strict size-preservation}: the size of each data

488: element in the lowest syntax level, such as VLCs, FLCs and

489: continuous stuffing bits, remains unchanged after encryption. When

490: the video stream is packetized in a system stream (i.e., PS or

491: TS), the size of each video packet remains unchanged after

492: encryption. This enables the following useful features in

493: applications:

494: \begin{itemize}

495: \item \textit{independence of bit-rate types (VBR and FBR)};

496:

497: \item \textit{avoiding some time-consuming operations when

498: encrypting MPEG-compressed videos}: bit-rate control,

499: re-packetization of the system stream and the re-multiplexing of

500: multiple audio/video streams;

501:

502: \item \textit{on-the-fly encryption}: a) direct encryption of

503: MPEG-compressed video files without creating temporary files, i.e.,

504: one can open an MPEG video file, read the bitstream and

505: simultaneously update (encrypt) it; b) instantaneous switching

506: encryption on/off for online video transmission;

507:

508: \item \textit{ROI (region on interest) encryption}: selectively

509: encrypting partial frames, slices, macroblocks, blocks, motion

510: vectors and/or DCT coefficients within specific regions of the

511: video.

512: \end{itemize}

513:

514: \item \textit{Independence of optional data elements in MPEG

515: videos}: the perceptibility is not obtained by encrypting optional

516: data elements, such as quantiser\_matrix and

517: coded\_block\_pattern\footnote{Strictly speaking, motion vectors

518: are also optional elements in MPEG videos, so the scheme should

519: not encrypt only motion vectors as did in

520: \cite{Yann:WaterScrambling:ACMMM2002}.}. This means that the

521: scheme can encrypt any MPEG-compliant videos with a uniform

522: performance.

523:

524: \item \textit{Fast encryption speed}: a) the extra computational

525: load added by the encryption is much smaller than the

526: computational load of a typical MPEG encoder; b) MPEG-compressed

527: videos can be quickly encrypted without being fully decoded and

528: re-encoded (at least the time-consuming IDCT/DCT operations are

529: avoided).

530:

531: \item \textit{Easy implementation}: the encryption/decryption

532: parts can be easily incorporated into the whole MPEG system,

533: without major modification of the structure of the codec.

534:

535: \item \textit{Multi-dimensional perceptibility}: the degradation

536: of visual quality is controlled by multi-dimensional factors.

537:

538: \item \textit{Security against known/chosen-plaintext attacks} is

539: ensured by four different measures.

540: \end{itemize}

541: To the best of our knowledge, some of the above features (such as

542: on-the-fly encryption) have never been discussed in the literature

543: on video encryption, in spite of their usefulness in real

544: applications.

545:

546: With the above features, the perceptual encryption scheme becomes

547: more flexible to fulfill different requirements of various

548: applications. To realize the strict size-preservation feature, the

549: encryption algorithm has to be incorporated into the MPEG encoder,

550: i.e., the (even partial) separation of the cipher and the encoder is

551: impossible. This is a minor disadvantage in some applications.

552: However, if the re-design is sufficiently simple, it is worth doing

553: so to get a better tradeoff between the overall performance and the

554: easy implementation. In the case that the re-design of the MPEG

555: codec is impossible, for example, if the codec is secured by the

556: vendor, a simplified MPEG codec can be developed for the embedding

557: of the perceptual video cipher. Since the most time-consuming

558: operations in a normal MPEG codec including DCT/IDCT and picture

559: reconstruction, are excluded from the simplified MPEG codec, fast

560: encryption speed and low implementation complexity of the whole

561: system can still be achieved.

562:

563: In the following text of this section, we describe the design

564: principle along with different methods of providing security

565: against known/chosen-plaintext attacks, and discuss several

566: implementation issues.

567:

568: \subsection{The new design}

569:

570: This design is a generalized version of VEA

571: \cite{Shi:MPEGEncryption:MMTA2004} for perceptual encryption, by

572: selectively encrypting FLC data elements in the video stream.

573: Apparently, encrypting FLC data elements is the most natural and

574: perhaps the simplest way to maintain all needed features,

575: especially the need for the strict size-preservation feature. The

576: proposed scheme is named PVEA -- perceptual video encryption

577: algorithm. Note that PVEA can also be considered as an enhanced

578: combination of the encryption techniques for JPEG images proposed

579: in \cite{Belgian:SelectiveImageEncryption:ACIVS2002,

580: Torrubia:PerceptualJPEG:ICCE2003} and the perceptual encryption of

581: motion vectors \cite{Yann:WaterScrambling:ACMMM2002}.

582:

583: There are three main reasons for selecting only FLC data elements

584: for encryption.

585: \begin{enumerate}

586: \item

587: As analyzed below, all existing VLC encryption algorithms cannot be

588: directly used to provide a controllable degradation of the quality.

589: New ideas have to be developed to adopt VLC encryption in perceptual

590: encryption schemes.

591: \begin{itemize}

592: \item

593: \textit{VLC encryption with different Huffman tables}

594: \cite{Wu&Kuo:AudiovisualEncryption:SPIE2001,

595: Wu&Kuo:EntropyCodecEncryption:SPIE2001,

596: Xie&Kuo:MHTEncryption:SPIE2003, Wu&Kuo:MHTEncryption:IEEETMM2004,

597: Kankanhalli:VideoScrambler:IEEETCE2002,

598: Shi:SecretHuffmanCoding:MM98, Shi:MPEGEncryption:MMTA2004}: Since

599: each VLC-codeword is a pair of (run, level), if a VLC-codeword is

600: decoded to get an incorrect ``run" value, then the position of all

601: the following DCT coefficients will be wrong. As a result, the

602: visual quality of the decoded block will be degraded in an

603: uncontrollable way. Thus, it is difficult to find a factor to

604: control such visual quality degradation. Moreover, if the Huffman

605: tables do not keep the size of each VLC-entry as designated in

606: \cite{Wu&Kuo:AudiovisualEncryption:SPIE2001,

607: Wu&Kuo:EntropyCodecEncryption:SPIE2001,

608: Xie&Kuo:MHTEncryption:SPIE2003, Wu&Kuo:MHTEncryption:IEEETMM2004,

609: Kankanhalli:VideoScrambler:IEEETCE2002}, syntax errors may occur

610: when an unauthorized user decodes an encrypted video. This means

611: that the encryption cannot ensure the format compliance to any

612: standard MPEG codecs.

613:

614: \item

615: \textit{VLC-index encryption} \cite{Zeng:VideoScrambling:MMSP2001,

616: Zeng:VideoScrambling:IEEETCASVT2002}: This encryption scheme can

617: ensure format compliance, but still suffers from the

618: uncontrollability of the visual quality degradation due to the same

619: reason as above. Another weakness of VLC-index encryption is that it

620: may influence the compression efficiency and bring overhead on video

621: size.

622:

623: \item

624: \textit{Shuffling VLC-codewords or RLE events before the entropy

625: encoding stage} \cite{Zeng:VideoScrambling:IEEETCASVT2002,

626: Liu:MPEGEncryption:IEICE-A2006}: This algorithm can ensure both the

627: format compliance and the strict size-preservation. However, even

628: exchanging only two VLC-codewords may cause a dramatic change of the

629: DCT coefficients distribution of each block. So, this encryption

630: algorithm cannot realize a slight degradation of the visual quality

631: and fails to serve as an ideal candidate for perceptual encryption.

632: \end{itemize}

633:

634: \item

635: It is obvious that FLC encryption is the simplest way to achieve all

636: the desired properties mentioned in the beginning of this section,

637: especially to achieve format compliance, strict size-preservation

638: and fast encryption \textbf{simultaneously}. For example, naive

639: encryption\footnote{In the image/video encryption literature, the

640: term ``naive encryption" means to consider the video as a 1-D

641: bitstream and encrypt it via a common cipher.} can realize strict

642: size-preservation and fast encryption, but cannot ensure format

643: compliance.

644:

645: \item

646: As will be seen below, using FLC encryption is sufficient to fulfill

647: the needs of most real applications for perceptual encryption.

648: \end{enumerate}

649:

650: According to MPEG standards \cite{MPEG1-ISOStandard,

651: MPEG2-ISOStandard, MPEG4-ISOStandard}, the following FLC data

652: elements exist in an MPEG-video bitstream:

653: \begin{itemize}

654: \item 4-byte start codes: 000001xx (hexadecimal);

655:

656: \item almost all information elements in various headers;

657:

658: \item sign bits of non-zero DCT coefficients;

659:

660: \item (differential) DC coefficients in intra blocks;

661:

662: \item ESCAPE DCT coefficients;

663:

664: \item sign bits and residuals of motion vectors.

665: \end{itemize}

666:

667: To maintain the format-compliance to the MPEG standards after the

668: encryption, the first two kinds of data elements should not be

669: encrypted. So, in PVEA, only the last four FLC data elements are

670: considered, which are divided into three categories according to

671: their contributions to the visual quality:

672: \begin{itemize}

673: \item \textit{intra DC coefficients}: corresponding to the rough

674: view (in the level of $8\times 8$ block) of the video;

675:

676: \item \textit{sign bits of non-intra DC coefficients and AC

677: coefficients, and ESCAPE DCT coefficients}: corresponding to

678: details in $8\times 8$ blocks of the video;

679:

680: \item \textit{sign bits and residuals of motion vectors}:

681: corresponding to the visual quality of the video related to the

682: motions (residuals further corresponds to the details of the

683: motions).

684: \end{itemize}

685: Based on the above division, three control factors, $p_{sr}$,

686: $p_{sd}$, and $p_{mv}$ in the range [0,1], are used to control the

687: visual quality in three different dimensions: the low-resolution

688: rough (spatial) view, the high-resolution (spatial) details, and

689: the (temporal) motions. With the three control factors, the

690: encryption procedure of PVEA can be described as follows:

691: \begin{enumerate}

692: \item encrypting intra DC coefficients with probability $p_{sr}$;

693:

694: \item encrypting sign bits of non-zero DCT coefficients (except

695: for intra DC coefficients) and ESCAPE DCT coefficients with

696: probability $p_{sd}$;

697:

698: \item encrypting sign bits and residuals of motion vectors with

699: probability $p_{mv}$.

700: \end{enumerate}

701: The encryption of selected FLC data elements can be carried out with

702: either a stream cipher or a block cipher. When a block cipher is

703: adopted, the consecutive FLC data elements should be first

704: concatenated together to form a longer bit stream, then each block

705: of the bit stream is encrypted, and finally each encrypted FLC data

706: element is placed back into its original position in the video

707: stream. Under the assumption that the stream cipher or block cipher

708: embedded in PVEA is secure, some special considerations should be

709: taken into account in order to ensure the security against various

710: attacks, as discussed below.

711:

712: In the above-described PVEA, the three factors control the visual

713: quality, as follows:

714: \begin{itemize}

715: \item $p_{sr}=1\to 0$: the spatial perceptibility changes from

716: ``almost imperceptible" to ``perfectly perceptible" when

717: $p_{sd}=0$ or to ``roughly perceptible" when $p_{sd}>0$;

718:

719: \item $p_{sr}=0$, $p_{sd}=1\to 0$: the spatial perceptibility

720: changes from ``roughly perceptible" to ``perfectly perceptible";

721:

722: \item $p_{mv}=1\to 0$: the temporal (motion) perceptibility (for

723: P/B-pictures only) changes from ``almost imperceptible" to

724: ``perfectly perceptible".

725: \end{itemize}

726: The encryption may bring the recovered motion vectors out of the

727: spatial range of the picture, so the motion compensation

728: operations (or even the involved picture itself) may be simply

729: discarded by the MPEG decoder. In this case, the temporal (motion)

730: perceptibility will be ``perfectly imperceptible", not just

731: ``almost imperceptible".

732:

733: In the Appendix of \cite{Shi:MPEGEncryption:MMTA2004}, it was

734: claimed that the DC coefficients of each block can be uniquely

735: derived from the other 63 AC coefficients. This means that the

736: perceptual encryption of DC coefficients must not be used alone,

737: i.e., some AC coefficients must also be encrypted to make the

738: encryption of the DC coefficients secure. It was lately observed

739: that this claim is not correct

740: \cite{Li:MPEGEncryption:MMTA2004note}. In fact, the DC coefficient

741: of a block means the average brightness of the block, and is

742: independent of the other 63 AC coefficients. Thus, the

743: DC-encryption and AC-encryption of PVEA are independent of each

744: other, i.e., the two control factors, $p_{sr}$ and $p_{sd}$, are

745: independent of each other, and they can be freely combined in

746: practice.

747:

748: \subsection{Security against ciphertext-only attacks and a constraint of the control

749: factor}

750:

751: The format compliance of perceptual encryption makes it possible for

752: the attacker to guess the values of all encrypted FLC data elements

753: separately in ciphertext-only attacks. The simplest attack is to try

754: to recover more visual information by setting all the encrypted FLC

755: data elements to zeros. This is called error-concealment-based

756: attack (ECA) \cite{Zeng:VideoScrambling:IEEETCASVT2002}. Our

757: experimental results have shown that PVEA is secure against such

758: attack. More details are given in the next section.

759:

760: To guess the value of each FLC data element, one can also employ the

761: local correlation existing between adjacent blocks in each frame.

762: That is, one can search for a set of all encrypted FLC data elements

763: in each frame to achieve the least blocking artifact. Does such a

764: deblocking attack work? Now let us try to get a lower bound of this

765: attack's complexity, by assuming that the number of all FLC data

766: elements in each frame is $N$, which means that the number of

767: encrypted FLC data elements is $pN$. Then, the complexity of the

768: deblocking attack will not be less than

769: $O\left(\binom{N}{pN}2^{pN}\right)$, since each FLC data elements

770: has at least two candidate values. So, if $\binom{N}{pN}2^{pN}$ is

771: cryptographically large, the deblocking attack will not compromise

772: the security of PVEA. As a lower bound of $p$ corresponding to a

773: typical security level, one can get $p\geq 100/N$ by assuming

774: $2^{pN}\geq 2^{100}$. For most consumer videos that need to be

775: protected via perceptual encryption, $N$ is generally much larger

776: than 100, so this constraint on $p$ generally does not have too much

777: influence on the overall performance of PVEA. Because the complexity

778: $2^{pN}$ is much over-estimated\footnote{There are two reasons about

779: the over-estimation: 1) the omission of $\binom{N}{pN}$, which is

780: very large when $N\gg pN$ and $pN$ is not very small; 2) some FLC

781: elements (such as intra DC coefficients) have more than 2 candidate

782: values.}, the constraint can be further relaxed in practice. For

783: example, when $N=200$, the above condition suggests that $p\geq

784: 100/N=1/2$. However, calculations showed that $p\geq 9/100$ is

785: enough to ensure a complexity larger than $O(2^{100})$.

786:

787: Since it is generally impractical to carry out the deblocking attack

788: on the whole frame, another two-layer deblocking attack may be

789: adopted by the attacker: 1) performing the deblocking attack on

790: small areas of the frame; 2) for all candidates of these small

791: areas, performing the deblocking attack on the area-level again.

792: Though this two-layer attack generally has a much smaller complexity

793: than the simple attack, its efficiency is still limited due to the

794: following reasons.

795: \begin{itemize}

796: \item

797: For each small area, the number of encrypted FLC elements is

798: generally not equal to $pN^*$, where $N^*$ denotes the total number

799: of all FLC elements in the area. Thus, even this number has to be

800: exhaustively guessed and then validated by considering the numbers

801: of other areas (i.e., the whole frame). The existence of three

802: independent quality factors makes the attack even more complicated.

803:

804: \item

805: For small each areas, the probability that the least deblocking

806: result does not correspond to the real scene may not very small.

807: Accordingly, the attacker has to mount a more loose deblocking

808: attack, thus leading to a higher attacking complexity.

809:

810: \item

811: Even for the smallest area of size $16\times 16$, there are

812: generally more than one hundred FLC elements (i.e., $N^*\geq 100$),

813: especially when there are rich visual information included in the

814: area.

815:

816: \item

817: If the number of FLC elements in an area is relatively small, this

818: area generally contains less significant visual information (such as

819: a smooth area).

820:

821: \item

822: The smaller each area is, the more the number of fake results will

823: be, and then the more the complexity of the second stage will be.

824: \end{itemize}

825: Of course, with the two-layer deblocking attack, the attacker can

826: have a chance to recover a number of small areas, though he/she

827: generally cannot get the whole frame. Such a minor security problem

828: is an unavoidable result of the inherent format-compliance property

829: of the perceptual encryption algorithms and related to the essential

830: disadvantage of perceptual encryption exerted on some special

831: MPEG-videos (see the discussion on Fig.~\ref{figure:Animation} in

832: the next section).

833:

834: \subsection{Security against known/chosen-plaintext attacks}

835:

836: Generally speaking, there are four different ways to provide

837: security against known/chosen-plaintext attacks. Users can select

838: one solution for a specific application.

839:

840: \subsubsection{Using a block cipher}

841:

842: With a block cipher, it is easy to provide security against

843: known/chosen-plaintext attacks. Since the lengths of different FLC

844: data elements are different, the block cipher may have to run in CFB

845: (cipher feedback) mode with variable-length feedback bits to realize

846: the encryption. Note that $n$-bit error propagation exists in block

847: ciphers running in the CFB mode

848: \cite{Schneier:AppliedCryptography96}, where $n$ is the block size

849: of the cipher. It is also possible to cascade multiple FLC data

850: elements to compose an $n$-bit block for encryption, as in RVEA

851: \cite[Sec. 7]{Shi:MPEGEncryption:MMTA2004}. Compared to the CFB

852: mode, the latter encryption mode can achieve a faster encryption

853: speed (with a little more implementation complexity for bit

854: cascading), since in the CFB mode only one element can be encrypted

855: in each run of the block cipher.

856:

857: \subsubsection{Using a stream cipher with plaintext/ciphertext

858: feedback}

859:

860: After encrypting each plain data element, the plaintext or the

861: ciphertext is sent to perturb the stream cipher for the encryption

862: of the next plain data element. In such a way, the keystream

863: generated by the stream cipher becomes dependent on the whole

864: plain-video, which makes the known/chosen-plaintext attacks

865: impractical. Note that an initial vector is needed for the

866: encryption of the first plain data element.

867:

868: \subsubsection{Using a key-management system and a stream cipher}

869:

870: When a key-management system is available in an application, the

871: encryption procedure of PVEA can be realized with a stream cipher.

872: To effectively resist known/chosen-plaintext attacks, the secret key

873: of the stream cipher should be frequently changed by the

874: key-management system. In most cases, it is enough to change one key

875: per picture, or per GOP. Note that this measure needs more

876: computational load with higher implementation cost, and is suitable

877: mainly for encrypting online videos.

878:

879: \subsubsection{Using a stream cipher with UID}

880:

881: When key-management systems are not available in some applications,

882: a unique ID (UID) can be used to provide the security against

883: known/chosen-plaintext attacks by ensuring that the UIDs are

884: different for different videos. The UID of an MPEG-video can be

885: stored in the user\_data area. The simplest form of the UID is the

886: vendor ID plus the time stamp of the video. It is also possible to

887: determine the UID of a video with a hash function or a secure

888: pseudo-random number generator (PRNG). In this case, the UIDs of two

889: different videos may be identical, but the probability is

890: cryptographically small if the UID is sufficiently long. The UID is

891: used to initialize the stream cipher together with the secret key,

892: which ensures that different videos are encrypted with different

893: keystreams. Thus, when an attacker successfully gets the keystream

894: used for $n$ known/chosen videos, he cannot use the broken

895: keystreams to break other different videos. Of course, the employed

896: stream cipher should be secure against plaintext attacks in the

897: sense that the secret key cannot be derived from a known/chosen

898: segment of the long keystream that encrypts the whole video stream

899: \cite{Schneier:AppliedCryptography96}.

900:

901: \subsection{Implementation issues}

902:

903: Since PVEA is a generalization of VEA, it is obvious that fast

904: encryption speed can be easily achieved, as shown in

905: \cite{Shi:MPEGEncryption:MMTA2004}. In addition, by carefully

906: optimizing the implementation, the encryption speed can be further

907: increased. We give two examples to show how to optimize the

908: implementation of PVEA so as to increase the encryption speed.

909:

910: A typical way to realize the probabilistic quality control with a

911: decimal factor $p$ is as follows: generate a pseudo-random

912: decimal, $r\in[0,1]$, for each data element with a

913: uniformly-distributed PRNG, and then encrypt the current element

914: only when $r\leq p$. The above implementation can be modified as

915: follows to further increase the encryption speed:

916: \begin{enumerate}

917: \item pseudo-randomly select $N_p=\mathrm{round}(N\cdot p)$

918: integers from the set $\{0,\cdots,N-1\}$;

919:

920: \item create a binary array $SE[0]\sim SE[N]$: $SE[i]=1$ if the

921: integer $i$ is selected; otherwise, $SE[i]=0$;

922:

923: \item encrypt the $i$-th FLC data element only when $SE[i\bmod

924: N]=1$.

925: \end{enumerate}

926: In this modified implementation, only a modulus addition and a

927: look-up-table operation are needed to determine whether the current

928: data element should be encrypted. As a comparison, in the typical

929: implementation, one run of the PRNG is needed for each data element,

930: which is generally much slower. Although $N$ bits of extra memory is

931: needed to store the array in the modified implementation, it is

932: merely a trivial problem since video codec generally requires much

933: more memory. To ensure the security against deblocking attacks, in

934: the modified implementation the value of $N$ should not be too

935: small\footnote{In most cases, it is enough to set $N\geq 300$.}.

936:

937: To further reduce the computational load of PVEA, another way is to

938: selectively encrypt partial FLC data elements. Two possible options

939: are as follows: 1) encrypt only intra blocks; 2) encrypt only sign

940: bits (or a few number of most significant bits) of intra DC

941: coefficients, ESCAPE DCT coefficients, and residuals of motion

942: vectors. The above two options can also be combined together. This

943: will have very little effect on the encryption performance, since an

944: attacker can only recover video frames with a poor visual quality

945: from other unencrypted data elements

946: \cite{Agi&Gong:StudySecureMPEG:NDSS96,

947: Zeng:VideoScrambling:IEEETCASVT2002}.

948:

949: \section{Encryption Performance of PVEA}

950: \label{section:Experiments}

951:

952: Some experiments have been conducted to test the real encryption

953: performance of PVEA for a widely-used MPEG-1 test video,

954: ``Carphone". The encryption results of the 1st frame (I-type) are

955: shown in Fig.~\ref{figure:Carphone}, with different values of the

956: two control factors $p_{sr}$ and $p_{sd}$. It can be seen that the

957: degradation of the visual quality is effectively controlled by the

958: two factors. The encryption results of the third control factor

959: $p_{mv}$ are given in Fig.~\ref{figure:Carphone2}, where the 313th

960: frame (B-type) is selected for demonstration. It can be seen that

961: encrypting only the motion vectors will not cause much degradation

962: in the visual quality.

963:

964: \begin{figure}

965: \centering \centering

966: \begin{minipage}{\figwidth}

967: \centering

968: \includegraphics[width=\textwidth]{Fig2a}

969: a)

970: \end{minipage}

971: \begin{minipage}{\figwidth}

972: \centering

973: \includegraphics[width=\textwidth]{Fig2b}

974:  b)

975: \end{minipage}

976: \begin{minipage}{\figwidth}

977: \centering

978: \includegraphics[width=\textwidth]{Fig2c}

979: c)

980: \end{minipage}

981: \begin{minipage}{\figwidth}

982: \centering

983: \includegraphics[width=\textwidth]{Fig2d}

984: d)

985: \end{minipage}

986: \begin{minipage}{\figwidth}

987: \centering

988: \includegraphics[width=\textwidth]{Fig2e}

989: e)

990: \end{minipage}

991: \begin{minipage}{\figwidth}

992: \centering

993: \includegraphics[width=\textwidth]{Fig2f}

994: f)

995: \end{minipage}

996: \begin{minipage}{\figwidth}

997: \centering

998: \includegraphics[width=\textwidth]{Fig2g}

999: g)

1000: \end{minipage}

1001: \begin{minipage}{\figwidth}

1002: \centering

1003: \includegraphics[width=\textwidth]{Fig2h}

1004: h)

1005: \end{minipage}

1006: \begin{minipage}{\figwidth}

1007: \centering

1008: \includegraphics[width=\textwidth]{Fig2i}

1009: i)

1010: \end{minipage}

1011: \caption{The encryption results of the 1st frame in ``Carphone":

1012: a) $(p_{sr},p_{sd})=(0,0)$ -- the plain frame; b)

1013: $(p_{sr},p_{sd})=(0,0.2)$; c) $(p_{sr},p_{sd})=(0,1)$; d)

1014: $(p_{sr},p_{sd})=(0.2,0)$; e) $(p_{sr},p_{sd})=(0.2,0.2)$; f)

1015: $(p_{sr},p_{sd})=(0.5,0.5)$; g) $(p_{sr},p_{sd})=(1,0)$; h)

1016: $(p_{sr},p_{sd})=(1,0.2)$; i)

1017: $(p_{sr},p_{sd})=(1,1)$.}\label{figure:Carphone}

1018: \end{figure}

1019:

1020: \begin{figure}

1021: \centering

1022: \begin{minipage}{\figwidth}

1023: \centering

1024: \includegraphics[width=\textwidth]{Fig3a}

1025: a)

1026: \end{minipage}

1027: \begin{minipage}{\figwidth}

1028: \centering

1029: \includegraphics[width=\textwidth]{Fig3b}

1030: b)

1031: \end{minipage}

1032: \begin{minipage}{\figwidth}

1033: \centering

1034: \includegraphics[width=\textwidth]{Fig3c}

1035: c)

1036: \end{minipage}

1037: \caption{The encryption results of the 313th frame in ``Carphone":

1038: a) $(p_{sr},p_{sd},p_{mv})=(0,0,0)$ -- the plain frame; b)

1039: $(p_{sr},p_{sd},p_{mv})=(0,0,0.5)$; c)

1040: $(p_{sr},p_{sd},p_{mv})=(0,0,1)$.}\label{figure:Carphone2}

1041: \end{figure}

1042:

1043: Our experiments have also shown that PVEA is secure against

1044: error-concealment based attacks. For two encrypted frames shown in

1045: Fig.~\ref{figure:Carphone}, the recovered images after applying ECA

1046: are shown in Fig.~\ref{figure:Carphone3}. In

1047: Fig.~\ref{figure:Carphone3}a, the sign bits of all AC coefficients

1048: are set to be zeros, and in Fig.~\ref{figure:Carphone3}b all DC

1049: coefficients are also set to be zeros. It can be seen that the

1050: visual quality of the recovered images via such an attack is even

1051: worse than the quality of the cipher-images, which means that ECA

1052: cannot help an attacker get more visual information. Actually, the

1053: security of PVEA against ECA depends on the fact that an attacker

1054: cannot tell encrypted data elements from un-encrypted ones without

1055: breaking the key. As a result, he has to set all possible data

1056: elements to be fixed values, which is equivalent to perceptual

1057: encryption with the control factor 1, i.e., the strongest level of

1058: perceptual encryption.

1059:

1060: \begin{figure}

1061: \centering

1062: \begin{minipage}{\figwidth}

1063: \centering

1064: \includegraphics[width=\textwidth]{Fig4a}

1065: a)

1066: \end{minipage}

1067: \begin{minipage}{\figwidth}

1068: \centering

1069: \includegraphics[width=\textwidth]{Fig4b}

1070: b)

1071: \end{minipage}

1072: \caption{The recovered results after applying ECA for the 1st

1073: frame in ``Carphone": a) breaking Fig.~\ref{figure:Carphone}c; b)

1074: breaking Fig.~\ref{figure:Carphone}i.}\label{figure:Carphone3}

1075: \end{figure}

1076:

1077: Finally, it is worth mentioning that PVEA has a minor disadvantage

1078: that the degradation in the visual quality is dependent on the

1079: amplitudes of the intra DC coefficients. As an extreme example,

1080: consider an intra picture whose DC coefficients are all zeros, which

1081: means that the FLC-encoded differential value of each intra DC

1082: coefficient does not occur in the bitstream, i.e., only the

1083: VLC-encoded dct\_dc\_size=0 occurs. In this case, the control of the

1084: rough visual quality by $p_{sr}$ completely disappears. Similarly,

1085: when dct\_dc\_size=1, the encryption can only change the

1086: differential value from $\pm 1$ to $\mp 1$, so the degradation will

1087: not be very significant. As a result, this problem will cause the

1088: perceptibility of some encrypted videos become ``partially

1089: perceptible" when $p_{sr}=1$ (should be ``almost imperceptible" for

1090: most videos). For an MPEG-1 video\footnote{Source of this test

1091: video:

1092: \url{http://www5.in.tum.de/forschung/visualisierung/duenne_gitter/DG_4.mpg}.}

1093: with a dark background (i.e., with many intra DC coefficients of

1094: small amplitudes), the encryption results are shown in

1095: Fig.~\ref{figure:Animation}. Fortunately, this problem is not so

1096: serious in practice, for the following reasons:

1097: \begin{itemize}

1098: \item most consumer videos contain

1099: sufficiently many intra DC coefficients of large amplitudes;

1100:

1101: \item even when there are many zero intra DC coefficients, the

1102: content of the video has to be represented by other intra DC

1103: coefficients of sufficiently large amplitudes;

1104:

1105: \item the differential encoding can increase the number of

1106: non-zero intra DC coefficients;

1107:

1108: \item the partial degradation caused by $p_{sr}$ and the

1109: degradation caused by $p_{sd}$ and $p_{mv}$ are enough for most

1110: applications of perceptual encryption (see Figs.

1111: \ref{figure:Carphone} and \ref{figure:Animation}).

1112: \end{itemize}

1113:

1114: From this minor disadvantage of PVEA, a natural result can be

1115: immediately derived: for the protection of MPEG videos that are

1116: highly confidential, VLC data elements should also be encrypted. In

1117: fact, our additional experiments on various video encryption

1118: algorithms have shown that it might be impossible to effectively

1119: degrade the visual quality of the MPEG videos with dark background

1120: via format-compliant encryption, unless the compression ratio and

1121: the strict size-preservation feature are compromised. The relations

1122: among the encryption performance, the compression ratio, the

1123: size-preservation feature, and other features of video encryption

1124: algorithms, are actually much more complicated. These problems will

1125: be investigated in our future research.

1126:

1127: \setlength\figwidth{0.32\columnwidth}

1128: \begin{figure}

1129: \centering

1130: \begin{minipage}{\figwidth}

1131: \centering

1132: \includegraphics[width=\textwidth]{Fig5a}

1133: a)

1134: \end{minipage}

1135: \begin{minipage}{\figwidth}

1136: \centering

1137: \includegraphics[width=\textwidth]{Fig5b}

1138: b)

1139: \end{minipage}\\

1140: \begin{minipage}{\figwidth}

1141: \centering

1142: \includegraphics[width=\textwidth]{Fig5c}

1143: c)

1144: \end{minipage}

1145: \begin{minipage}{\figwidth}

1146: \centering

1147: \includegraphics[width=\textwidth]{Fig5d}

1148: d)

1149: \end{minipage}

1150: \caption{The encryption results of the 169th frame in an MPEG-1

1151: video: a) $(p_{sr},p_{sd},p_{mv})=(0,0,0)$ -- the plain frame; b)

1152: $(p_{sr},p_{sd},p_{mv})=(1,0,0)$; c)

1153: $(p_{sr},p_{sd},p_{mv})=(1,1,0)$; d)

1154: $(p_{sr},p_{sd},p_{mv})=(1,1,1)$.}\label{figure:Animation}

1155: \end{figure}

1156:

1157: \section{Conclusion}

1158:

1159: This paper focuses on the problem of how to realize perceptual

1160: encryption of MPEG videos. Based on a comprehensive survey on

1161: related work and performance analysis of some existing perceptual

1162: video encryption schemes, we have proposed a new design with more

1163: useful features, such as on-the-fly encryption and multi-dimensional

1164: perceptibility. We have also discussed its security against

1165: deblocking attack and pointed out some measures against

1166: known/chosen-plaintext attack. The proposed perceptual encryption

1167: scheme can also be extended to realize non-perceptual encryption by

1168: simply adding a VLC-encryption part.

1169:

1170: \bibliographystyle{IEEEtran}

1171: \bibliography{PVEA}

1172:

1173: \end{document}

1174: