0601:cs0601080/papi.tex

1:

2:

3: %----------------------------------------------------------------

4: %%%%%%%%%%%%%%%%%%%%5Check-

5:

6: % check whether to use pseudo-additivity or nonextensive additivity

7:

8:

9:

10: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

11: %    INSTITUTE OF PHYSICS PUBLISHING                                   %

12: %                                                                      %

13: %   `Preparing an article for publication in an Institute of Physics   %

14: %    Publishing journal using LaTeX'                                   %

15: %                                                                      %

16: %    LaTeX source code `ioplau2e.tex' used to generate `author         %

17: %    guidelines', the documentation explaining and demonstrating use   %

18: %    of the Institute of Physics Publishing LaTeX preprint files       %

19: %    `iopart.cls, iopart12.clo and iopart10.clo'.                      %

20: %                                                                      %

21: %    `ioplau2e.tex' itself uses LaTeX with `iopart.cls'                %

22: %                                                                      %

23: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

24: %

25: %

26: % First we have a character check

27: %

28: % ! exclamation mark    " double quote

29: % # hash                ` opening quote (grave)

30: % & ampersand           ' closing quote (acute)

31: % $ dollar              % percent

32: % ( open parenthesis    ) close paren.

33: % - hyphen              = equals sign

34: % | vertical bar        ~ tilde

35: % @ at sign             _ underscore

36: % { open curly brace    } close curly

37: % [ open square         ] close square bracket

38: % + plus sign           ; semi-colon

39: % * asterisk            : colon

40: % < open angle bracket  > close angle

41: % , comma               . full stop

42: % ? question mark       / forward slash

43: % \ backslash           ^ circumflex

44: %

45: % ABCDEFGHIJKLMNOPQRSTUVWXYZ

46: % abcdefghijklmnopqrstuvwxyz

47: % 1234567890

48: %

49: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

50: %

51: \documentclass[12pt]{iopart}

52: \newcommand{\gguide}{{\it Preparing graphics for IOP journals}}

53:

54: %==============================

55: %Mine

56:

57: \usepackage{amssymb}

58: \usepackage{amsthm}

59:

60: %------------------theorm env---------------

61:  \newtheorem{theorem}{Theorem}[section]

62:         \newtheorem{lemma}[theorem]{Lemma}

63:         \newtheorem{proposition}[theorem]{Proposition}

64:         \newtheorem{corollary}[theorem]{Corollary}

65:      \newtheorem{definition}[theorem]{Definition}

66:         \newtheorem{remark}[theorem]{Remark}

67: % \def\QED{\mbox{\rule[0pt]{1.5ex}{1.5ex}}}

68: % \def\proof{\noindent\hspace{2em}{\it Proof: }}

69: % \def\endproof{\hspace*{\fill}~\QED\par\endtrivlist\unskip}

70: %--------------------------------------------

71: \newcommand{\ud}{\mathrm{d}}

72: %=====================================

73:

74: %Uncomment next line if AMS fonts required

75: %\usepackage{iopams}

76: \begin{document}

77:

78: \title[]{On Measure Theoretic definitions of Generalized Information

79:   Measures and Maximum Entropy Prescriptions}

80:

81: \author{Ambedkar Dukkipati, M Narasimha Murty\footnote{Corresponding author} and

82: Shalabh Bhatnagar}

83:

84: \address{Department of Computer Science and Automation,

85: Indian Institute of Science, Bangalore-560012, India.}

86: \ead{\mailto{ambedkar@csa.iisc.ernet.in},

87: \mailto{mnm@csa.iisc.ernet.in}, \mailto{shalabh@csa.iisc.ernet.in}}

88:

89:

90: %----------------------------------------

91: \begin{abstract}

92:  	Though Shannon entropy of a probability measure $P$, defined

93:         as $- \int_{X} \frac{\ud P}{\ud \mu} \ln \frac{\ud P}{\ud

94:         \mu} \, \ud \mu$ on a measure space $(X, \mathfrak{M},\mu)$, does not

95:         qualify itself as an information measure (it is not a natural

96:         extension of the discrete case), maximum entropy (ME)

97:         prescriptions in the measure-theoretic case are consistent with that of

98:         discrete case.

99:         In this paper, we study the

100:         measure-theoretic definitions of generalized information

101:         measures and discuss the ME prescriptions. We present two

102:         results in this regard: (i) we prove that, as in

103:         the case of classical relative-entropy, the measure-theoretic

104:         definitions of generalized relative-entropies, R\'{e}nyi and

105:         Tsallis, are natural extensions of their respective discrete

106:         cases, (ii) we show that, ME prescriptions of

107:         measure-theoretic Tsallis entropy are consistent with the

108:         discrete case.

109: \end{abstract}

110:

111: %Uncomment for PACS numbers title message

112: \pacs{}

113: % Keywords required only for MST, PB, PMB, PM, JOA, JOB?

114: %\vspace{2pc}

115: %\noindent{\it Keywords}: Article preparation, IOP journals

116: % Uncomment for Submitted to journal title message

117: %\submitto{\JPA}

118: % Comment out if separate title page not required

119: \maketitle

120:

121: %=========================Introduction===========================

122: \section{Introduction}

123: \label{Section:Introduction}

124:         Shannon measure of information was developed

125:         essentially for the case when the random variable takes a

126:         finite number of values. However, in the literature, one often

127:         encounters an extension of Shannon entropy in the discrete

128:         case to the case

129:         of a one-dimensional random variable with density function $p$

130:         in the form~(e.g \cite{ShannonWeawer:1949:TheMathematicalTheoryOfCommunication,Ash:1965:InformationTheory})

131:         \begin{displaymath}

132:           S(p) = - \int_{- \infty}^{+ \infty} p(x) \ln p(x)\, \ud x \enspace.

133:         \end{displaymath}

134:         This entropy in the continuous case

135:         as a pure-mathematical formula (assuming convergence of

136:         the integral and absolute continuity of the density $p$ with

137:         respect to Lebesgue measure) resembles Shannon entropy in the

138:         discrete case, but can not be used as a measure of

139:         information. First, it is not a natural extension of Shannon

140:         entropy in the discrete case, since it is not the limit of the sequence

141:         finite discrete entropies corresponding to pmf which

142:         approximate the pdf $p$. Second, it is not strictly positive.

143:

144:         Inspite of these short comings, one can still use the

145:         continuous entropy functional in conjunction with the principle of maximum

146:         entropy where one wants to find a probability density function

147:         that has greater uncertainty than any other distribution

148:         satisfying a set of given constraints. Thus, in this use of

149:         continuous measure one is interested in it as a measure of

150:         relative uncertainty, and not of absolute uncertainty. This

151:         is where one can relate maximization of Shannon entropy to the

152:         minimization of Kullback-Leibler relative-entropy

153:         (see~\cite[pp. 55]{KapurKesavan:1997:EntropyOptimizationPrinciples}).

154: %        It

155: %        is well known that the continuous version of

156: %        KL-entropy defined for two probability density functions $p$

157: %        and $r$ as,

158: %        \begin{displaymath}

159: %         I(p\|r) = \int_{- \infty}^{+ \infty} p(x) \ln

160: %        \frac{p(x)}{r(x)} \, \ud x \enspace,

161: %        \end{displaymath}

162: %        is indeed a natural generalization of same in the discrete

163: %        case.

164:

165:         Indeed, during the early stages of development of

166:         information theory, the important paper

167:         by Gelfand, Kolmogorov and Yaglom~\cite{GelfandKolmogorovYaglom:1956:OnTheGeneralDefinitionOfTheAmountOfInformation}

168:         called attention to the case of defining entropy functional on

169:         an arbitrary measure space $(X, \mathfrak{M},\mu)$.

170: 	In this respect, Shannon entropy of a probability density function $p:X

171:         \rightarrow {\mathbb{R}}^{+}$ can be written as,

172:         \begin{displaymath}

173:           S(p) = - \int_{X} p(x) \ln p(x) \, \ud \mu \enspace.

174:         \end{displaymath}

175:         One can see from the above definition that the concept of

176:         ``the entropy of a pdf'' is a misnomer: there

177:         is always another measure $\mu$ in  the background. In the

178:         discrete case considered by Shannon, $\mu$ is the cardinality

179:         measure\footnote{Counting or cardinality measure $\mu$ on a

180:           measurable space $(X,\mathfrak{M})$, when is $X$ is a

181:           finite set and $\mathfrak{M} = 2^{X}$, is defined as $\mu(E)

182:           = \# E$, $\forall E \in \mathfrak{M}$.}~\cite[pp. 19]{ShannonWeawer:1949:TheMathematicalTheoryOfCommunication};

183:         in the continuous case considered by both Shannon and Wiener,

184:         $\mu$ is the Lebesgue

185:         measure cf.~\cite[pp. 54]{ShannonWeawer:1949:TheMathematicalTheoryOfCommunication}

186:         and

187:         \cite[pp. 61, 62]{Wiener:1948:Cybernetics}.

188:          All entropies are

189:         defined with respect to some measure

190:         $\mu$,

191:         as Shannon and Wiener both emphasized in~\cite[pp.57,

192:         58]{ShannonWeawer:1949:TheMathematicalTheoryOfCommunication}

193:         and~\cite[pp.61, 62]{Wiener:1948:Cybernetics} respectively.

194:

195:         This case was studied independently

196:         by Kallianpur~\cite{Kallianpur:1960:OnTheAmountOfInformationContainedInASingmaField}

197:         and Pinsker~\cite{Pinsker:1960:InformationAndInformationStability},

198:         and perhaps others were guided by the earlier work

199:         of Kullback~\cite{KullbackLeibler:1951:OnInformationAndSufficiency},

200:         where one would define entropy in terms of Kullback-Leibler

201:         relative entropy. Unlike Shannon entropy, measure-theoretic

202:         definition of KL-entropy is a natural extension of definition

203:         in the discrete case.

204:

205: 	%In this respect,

206:         %the Gelfand-Yaglom-Perez  theorem

207:         %(GYP-theorem)~\cite{GelfandYaglom:1959:CalculationOfTheAmountOfInformation_Etc,Perez:1959:InformationTheoryWithAbstractAlphabets,Dobrushin:1959:GeneralFormulationsOfShannonsbasicTheorems}

208:         %plays an important role, which equips measure-theoretic

209:         %KL-entropy with a fundamental definition. The main

210:         %contribution of this chapter is to prove GYP-theorem for

211: 	%R\'{e}nyi relative-entropy of order $\alpha >1$, which can be

212:         %extended to Tsallis relative-entropy.

213:

214: 	%Before proving GYP-theorem for R\'{e}nyi relative-entropy,

215: 	In this paper we present the measure-theoretic definitions of

216: 	generalized information measures and show that as in

217:         the case of KL-entropy, the measure-theoretic

218:         definitions of generalized relative-entropies, R\'{e}nyi and

219:         Tsallis, are natural extensions of their respective discrete

220:         cases. We discuss the ME prescriptions for generalized

221: 	entropies and show that ME prescriptions of

222:         measure-theoretic Tsallis entropy are consistent with the

223:         discrete case, which is true for measure-theoretic

224: 	Shannon-entropy.

225:

226: 	Rigorous studies of the Shannon and KL entropy functionals in

227: 	measure spaces can be found in the papers by

228:         Ochs~\cite{Ochs:1976:BasicPropertiesOfTheGeneralizedBoltzmann-Gibbs-ShannonEntropy}

229:         and by

230:         Masani~\cite{Masani:1992:TheMeasureTheoreticAspectsOfEntropy_Part_1,Masani:1992:TheMeasureTheoreticAspectsOfEntropy_Part_2}.

231:         Basic measure-theoretic aspects of classical information measures can be

232:         found

233:         in~\cite{Pinsker:1960:InformationAndInformationStability,Guiasu:1977:InformationTheoryWithApplications,Gray:1990:EntropyAndInformationTheory}.

234: %        in~\cite[Chapter~2]{Guiasu:1977:InformationTheoryWithApplications}

235: %        and~\cite[Chapter~5]{Gray:1990:EntropyAndInformationTheory}.

236:

237:         We review the measure-theoretic formalisms for classical

238:         information measures in

239:         \S~\ref{Section:ME:MeasureTheoreticDefinitionsOfInformationMeasures}

240:         and extend these definitions to generalized

241:         information measures in

242:         \S~\ref{Section:ME:MeasureTheoreticDefinitionsOfGeneralizedInformationMeasures}. In

243:         \S~\ref{Section:ME:MaximumEntropyAndCanonicalDistributions} we

244:         present the ME prescription for Shannon entropy followed by

245:         prescriptions for

246:         Tsallis entropy in

247:         \S~\ref{Section:ME:ME-prescriptionForTsallisEntropy}. We

248:         revisit measure-theoretic definitions of generalized entropic

249:         functionals in

250:         \S~\ref{Section:ME:MeasureTheoreticDefinitions_Revisited} and

251:         present some results.

252:

253: %================================Section:==========================================

254: \section{Measure-Theoretic definitions of Classical Information Measures}

255: \label{Section:ME:MeasureTheoreticDefinitionsOfInformationMeasures}

256: %	Information measures like entropy, mutual information,

257: %	conditional entropy, and conditional mutual information

258: %	etc., can be expressed in terms of KL-entropy and hence

259: %	the measure-theoretic analogs of these measures will follow

260: %	from the measure-theoretic definition of KL-entropy.

261: %	In this section, we study the measure-theoretic

262: %	definitions of KL-entropy and its relation to entropy in this

263: %	case.

264:   %-----------------------------SubSection-------------------------------------

265:   \subsection{Discrete to Continuous}

266:    \label{SubSection:ME:DiscreteToContinuous}

267:         \noindent

268:         Let $p:[a,b] \rightarrow {\mathbb{R}}^{+}$ be a probability

269:         density function,  where $[a,b] \subset \mathbb{R}$. That is,

270:         $p$ satisfies

271:         \begin{displaymath}

272:         p(x) \geq 0, \:\:\: \forall x \in [a,b] \:\:\: \mathrm{and}\:\:\:

273:         \int_{a}^{b} p(x) \, \ud x =1 \enspace.

274:         \end{displaymath}

275:         In trying to define entropy in the continuous case, the

276:         expression of Shannon entropy was automatically extended by

277:         replacing the sum in the

278: 	Shannon entropy discrete case by the

279:         corresponding integral. We obtain, in this way, Boltzmann's

280:         H-function (also known

281:   	as differential entropy in information theory),

282: %	~\cite{Grad:1965:OnBoltzmannsH-Theorem}: reference for

283: %        Boltzmann-H function

284:         \begin{equation}

285:         \label{Equation:ME:ContinuousEntropy}

286:         S(p) = - \int_{a}^{b} p(x) \ln p(x) \, \ud x \enspace.

287:         \end{equation}

288:         But the ``continuous entropy'' given

289:         by~(\ref{Equation:ME:ContinuousEntropy}) is not a natural

290:         extension of definition in discrete case in the sense that, it

291:         is not the limit of

292:         the finite discrete entropies corresponding to a sequence of

293:         finer partitions of the interval $[a,b]$ whose norms tend to

294:         zero. We can show this by a counter example.

295:         Consider a uniform probability distribution

296:         on the interval $[a,b]$, having the probability density

297:         function

298:         \begin{displaymath}

299:         p(x) = \frac{1}{b-a}\enspace, \:\:\:\:\: x \in [a,b] \enspace.

300:         \end{displaymath}

301:         The continuous

302:         entropy~(\ref{Equation:ME:ContinuousEntropy}), in this case will be

303:         \begin{displaymath}

304:         S(p) = \ln (b - a) \enspace.

305:         \end{displaymath}

306:         On the other hand, let us consider a finite partition of the the interval

307:         $[a,b]$ which is composed of $n$ equal subintervals, and let

308:         us attach to this partition the finite discrete uniform

309:         probability distribution whose corresponding entropy will be,

310:         of course,

311:         \begin{displaymath}

312:         S_{n}(p) = \ln n \enspace.

313:         \end{displaymath}

314:         Obviously, if $n$ tends to infinity, the discrete entropy

315:         $S_{n}(p)$ will tend to infinity too, and not to $\ln (b-a)$;

316:         therefore $S(p)$ is not the limit of $S_{n}(p)$, when $n$ tends

317:         to infinity. Further, one can observe that $\ln (b-a)$ is negative

318:         when~$b-a <1$.

319:

320: 	Thus, strictly speaking

321:         continuous entropy~(\ref{Equation:ME:ContinuousEntropy}) cannot

322:         represent a measure of uncertainty since uncertainty should

323:         in general be positive.

324: 	We are able to prove the ``nice'' properties only for the

325:         discrete entropy, therefore, it

326:         qualifies as a ``good'' measure of information (or

327:         uncertainty) supplied by an random experiment. The ``continuous

328:         entropy'' not being the limit of the discrete

329:         entropies, we cannot extend the so called nice properties to

330:         it.

331:

332:         Also, in physical applications, the coordinate $x$ in

333:         (\ref{Equation:ME:ContinuousEntropy}) represents an abscissa,

334:         a distance from a fixed reference point. This distance $x$ has

335:         the dimensions of length. Now, with the density function

336:         $p(x)$, one can specify the probabilities of an event $[c,d)

337:         \subset [a,b]$ as $\int_{c}^{d} p(x) \, \ud x$, one has to

338:         assign the dimensions ${(\mbox{length})}^{-1}$, since

339:         probabilities are dimensionless. Now for $0 \leq z < 1$, one

340:         has the series expansion

341:         \begin{equation}

342:           - \ln (1-z) = z + \frac{1}{2}z^{2} + \frac{1}{3}z^{3}+

343:           \ldots \enspace,

344:         \end{equation}

345:         it is necessary that the argument of the logarithm function

346:         in~(\ref{Equation:ME:ContinuousEntropy}) be

347:         dimensionless.

348: 	Hence the formula (\ref{Equation:ME:ContinuousEntropy}) is

349:         then seen to be dimensionally incorrect, since the argument of

350:         the logarithm on its right hand side has the dimensions of a

351:         probability

352:         density~\cite{Smith:2001:SomeObservationsOnTheConceptsOfInformationTheoreticEntropy}.

353:         Although

354:         Shannon~\cite{Shannon:1948:MathematicalTheoryOfCommunication_BellLabs}

355:         used the formula (\ref{Equation:ME:ContinuousEntropy}), he

356:         does note its lack of invariance with respect to changes in

357:         the coordinate system.

358:

359:         In the context of maximum entropy principle

360:         Jaynes~\cite{Jaynes:1968:PriorProbabilities}

361:         addressed this problem and suggested the formula,

362:         \begin{equation}

363:         \label{Equation:ME:JaynesSuggestion}

364:           S'(p) = - \int_{a}^{b} p(x) \ln \frac{p(x)}{m(x)}\, \ud x \enspace,

365:         \end{equation}

366:         in the place of (\ref{Equation:ME:ContinuousEntropy}),

367:         where $m(x)$ is a prior function. Note that when $m(x)$ is probability density

368:         function, (\ref{Equation:ME:JaynesSuggestion}) is nothing but

369:         the relative-entropy. However, if we choose $m(x) = c$, a constant

370:         (e.g \cite{ZellnerHighfield:1988:CalculationOfMaximumEntropyDistributions}),

371:         we get

372:         \begin{displaymath}

373:           S'(p) = S(p) - \ln c \enspace,

374:         \end{displaymath}

375:         where $S(p)$ refers to the continuous

376:         entropy (\ref{Equation:ME:ContinuousEntropy}).

377:         Thus, maximization of $S'(p)$ is equivalent to maximization of

378:         $S(p)$.

379: 	Further discussion on estimation of probability

380:         density functions by ME-principle in the continuous case can be found in

381:         \cite{LazoRathie:1978:OnTheEntropyOfContinuousProbabilityDistributions,ZellnerHighfield:1988:CalculationOfMaximumEntropyDistributions,Ryu:1993:MaximumEntropyEstimationOfDensityAndRegressionFunction}.

382:

383:         Prior to that, Kullback~\cite{KullbackLeibler:1951:OnInformationAndSufficiency} too

384:         suggested that in the measure-theoretic definition of entropy,

385:         instead of examining the entropy

386:         corresponding to only on given measure, we have to compare the

387:         entropy inside a whole class of measures.

388:

389:   %-----------------------SubSection------------------------------------

390:   \subsection{Classical information measures}

391:   \label{SubSection:ME:ClassicalInformationMeasures}

392:

393:         \noindent

394:         Let $(X,\mathfrak{M},\mu)$ be a measure space. $\mu$

395:         need not be a probability measure unless otherwise specified.

396:         Symbols $P$, $R$ will denote probability measures on

397:         measurable space $(X,\mathfrak{M})$ and $p$, $r$

398:         denote $\mathfrak{M}$-measurable functions on $X$.

399:         An $\mathfrak{M}$-measurable function $p:X \rightarrow

400:         {\mathbb{R}}^{+}$ is said to be a probability

401:         density function (pdf) if $\int_{X} p \, \ud \mu = 1$.

402:

403:         In this general setting, Shannon entropy $S(p)$ of pdf $p$ is

404:         defined as follows~\cite{Athreya:1994:EntropyMaximization}.

405:         %DEFINITION: Shannon entropy for pdf

406:         \begin{definition}

407:         \label{Definition:ME:ShannonEntropy_Measuretheroetic_pdf}

408:         Let $(X,\mathfrak{M},\mu)$ be a measure space and

409:         $\mathfrak{M}$-measurable function $p:X \rightarrow

410:         {\mathbb{R}}^{+}$ be pdf. Shannon entropy of $p$

411:         is defined as

412:         \begin{equation}

413:          \label{Equation:ME:ShannonEntropyOf-pdf}

414:         S(p) = - \int_{X} p \ln p \, \ud \mu \enspace,

415:         \end{equation}

416:         provided the integral on right exists.

417:         \end{definition}%EndDefinition

418:         Entropy functional $S(p)$ defined in (\ref{Equation:ME:ShannonEntropyOf-pdf}) can be

419:         referred to as entropy of the probability measure

420:         $P$, in the sense that the measure $P$ is induced by $p$,

421:         i.e.,

422:         \begin{equation}

423:         \label{Equation:ME:ProbabilityMeasureInducedByaPdf}

424:           P(E) = \int_{E} p(x) \, \ud \mu(x) \enspace, \:\:\:\:\:

425:           \forall E \in \mathfrak{M} \enspace.

426:         \end{equation}

427: 	This reference is consistent\footnote{Say

428:         $p$ and

429:         $r$ are two pdfs and $P$ and $R$ are corresponding

430:         induced measures on measurable space $(X,\mathfrak{M})$ such

431:         that $P$ and $R$ are identical, i.e., $\int_{E} p \,

432:         \ud \mu = \int_{E} r \, \ud \mu$, $\forall E \in \mathfrak{M}$. Then

433:         we have $p \stackrel{\mathrm{a.e}}{=} r$ and hence

434:         $ -\int_{X} p \ln p \, \ud \mu = -\int_{X} r \ln r \, \ud

435:         \mu$.} because the probability measure

436:         $P$ can be identified {\it a.e} by the pdf $p$.

437:

438:         Further, the definition of the probability measure $P$

439:         in (\ref{Equation:ME:ProbabilityMeasureInducedByaPdf}), allows us

440:         to write entropy functional

441:         (\ref{Equation:ME:ShannonEntropyOf-pdf})

442:         as,

443:         \begin{equation}

444:         \label{Equation:ME:ShannonEntropyOf-PM-inducedBy-pdf}

445:         S(p) = - \int_{X} \frac{\ud P}{\ud \mu} \ln \frac{\ud P}{\ud

446:         \mu} \, \ud \mu \enspace,

447:         \end{equation}

448:         since (\ref{Equation:ME:ProbabilityMeasureInducedByaPdf})

449:         implies\footnote{If a

450:         nonnegative measurable function $f$ induces a measure $\nu$ on

451:         measurable space $(X,\mathfrak{M})$ with respect to a measure

452:         $\mu$, defined as $\nu(E) = \int_{E} f \, \ud \mu, \:\:\: \forall E \in

453:         \mathfrak{M}$ then $\nu \ll \mu$. Converse is given by

454:         Radon-Nikodym theorem~\cite[pp.36, Theorem

455:           1.40(b)]{Kantorovitz:2003:IntroductionToModernAnalysis}.} $P

456:         \ll \mu$, and pdf $p$ is the

457:         Radon-Nikodym derivative of $P$ w.r.t $\mu$.

458:

459:         Now we proceed to the definition of Kullback-Leibler

460:         relative-entropy or KL-entropy for probability measures.

461:         %Definition:Kullback-Leibler Relative-Entropy1

462:         \begin{definition}

463:         \label{Definition:ME:RelativeEntropy_1}

464:         Let $(X,\mathfrak{M})$ be a measurable space. Let $P$ and $R$

465:         be two probability measures on $(X,\mathfrak{M})$. Kullback-Leibler

466:         relative-entropy  KL-entropy of $P$ relative to $R$ is

467:         defined as

468:         \begin{equation}

469:         \label{Equation:ME:RelativeEntropyOfProbabilityMeasures}

470:         I(P\|R) = \left\{ \begin{array}{ll}

471:         \displaystyle{\int_{X} \ln \frac{\ud P}{\ud R} \, \ud P }     &

472:         \:\:\:\:\:\textrm{if}\:\:\:\:\:  P \ll R \enspace, \\ \\

473:           +\infty   & \:\:\:\:\:\textrm{otherwise.}

474:            \end{array} \right.

475:         \end{equation}

476:         \end{definition}%EndDefinition:Kullback-Leiber Relative-Entropy1

477: 	The divergence inequality

478:         $I(P\|R) \geq 0$ and $I(P\|R) =0$ if and only if $P=R$ can be

479:         shown in this case too.

480:         KL-entropy~(\ref{Equation:ME:RelativeEntropyOfProbabilityMeasures})

481:         also can be written as

482:         \begin{equation}

483:         \label{Equation:ME:AnotherFormForRelativeEntropyOfProbabilityMeasures}

484:         I(P\|R) = \int_{X} \frac{\ud P}{\ud R} \ln \frac{\ud P}{\ud R}

485:         \, \ud R \enspace.

486:         \end{equation}

487:

488:         Let the $\sigma$-finite measure $\mu$ on $(X,\mathfrak{M})$

489:         such that $P \ll R \ll \mu$. Since $\mu$ is $\sigma$-finite, from

490:         Radon-Nikodym theorem, there exists a non-negative

491:         $\mathfrak{M}$-measurable functions $p: X \rightarrow

492:         \mathbb{R}^{+}$ and $r: X \rightarrow \mathbb{R}^{+}$ unique

493:         $\mu$-{\em a.e}, such that

494:         \begin{equation}

495: 	\label{Equation:ME:DefinitionOfPdf_p}

496:         P(E) = \int_{E} p \, \ud \mu \enspace, \:\:\: \forall E \in \mathfrak{M} \enspace,

497:         \end{equation}

498:         and

499:         \begin{equation}

500: 	\label{Equation:ME:DefinitionOfPdf_r}

501:         R(E) = \int_{E} r \, \ud \mu \enspace, \:\:\: \forall E \in

502:         \mathfrak{M} \enspace.

503:         \end{equation}

504:         The pdfs $p$ and $r$ in (\ref{Equation:ME:DefinitionOfPdf_p})

505:         and (\ref{Equation:ME:DefinitionOfPdf_r}) (they are indeed

506:         pdfs) are Radon-Nikodym

507:         derivatives of probability measures $P$ and $R$ with respect

508:         to $\mu$, respectively, i.e., $p =\frac{\ud P}{\ud \mu}$ and

509:         $r=\frac{\ud R}{\ud \mu}$.

510:         Now one can define relative-entropy of pdf $p$ w.r.t $r$ as

511:         follows\footnote{This follows from the chain rule for

512:         Radon-Nikodym derivative:

513:          \begin{displaymath}

514:            \frac{\ud P}{\ud R} \stackrel{\mathrm{a.e}}{=} \frac{\ud

515:              P}{\ud \mu} {\left( \frac{\ud R}{\ud \mu} \right)}^{-1}\enspace.

516:          \end{displaymath}

517:         }.

518:

519:        %Definition:KullbackLeibler Relative-Entropy2

520:         \begin{definition}

521:         \label{Definition:ME:RelativeEntropy_of_pdf}

522:         Let $(X,\mathfrak{M},\mu)$ be a measure space. Let

523:        $\mathfrak{M}$-measurable functions $p,r:X \rightarrow

524:         {\mathbb{R}}^{+}$ be two pdfs. The KL-entropy of $p$

525:        relative to $r$

526:         is defined as

527:         \begin{equation}

528:          \label{Equation:ME:RelativeEntropy_of_pdf}

529:         I(p\|r) = \int_{X} p(x) \ln \frac{p(x)}{r(x)} \, \ud \mu(x) \enspace,

530:         \end{equation}

531:         provided the integral on right exists.

532:         \end{definition}%EndDefinition:KullbackLeibler Relative-Entropy2

533:

534:         As we have mentioned earlier, KL-entropy

535:         (\ref{Equation:ME:RelativeEntropy_of_pdf}) exist if the two

536:         densities are absolutely continuous with respect to one

537:         another. On the real line the same definition can be written

538:         as

539:         \begin{displaymath}

540:         I(p\|r) = \int_{\mathbb{R}} p(x) \ln \frac{p(x)}{r(x)} \, \ud x \enspace,

541:         \end{displaymath}

542:         which exist if the densities $p(x)$ and $r(x)$ share the same support.

543:         Here, in the sequel we use the convention

544:         \begin{equation}

545:         \ln 0 = - \infty, \:\:\:\:\:\:\:\:\:\:\: \ln \frac{a}{0} = + \infty\:\:

546:         \mathrm{for any}\:\: a \in \mathbb{R}, \:\:\:\:\:\:\:\:\:\:\:

547:         0.(\pm \infty) = 0.

548:         \end{equation}

549:

550:         Now we turn to the definition of entropy functional on a

551:         measure space.

552:         Entropy functional in

553:         ~(\ref{Equation:ME:ShannonEntropyOf-PM-inducedBy-pdf}) is defined

554:         for a probability measure

555:         that is induced by a pdf. By the Radon-Nikodym theorem, one can

556:         define Shannon entropy for any arbitrary $\mu$-continuous probability measure as follows.

557:         %Definition: Shannon entropy of Probability measure

558:         \begin{definition}

559:          \label{Definition:ME:ShannonEntropy_of_ProbabiliyMeasure}

560:          Let $(X,\mathfrak{M},\mu)$ be a $\sigma$-finite measure

561:         space. Entropy of any $\mu$-continuous probability measure $P$

562:         ($P \ll \mu$) is defined as

563:         \begin{equation}

564:         \label{Equation:ME:ShannonEntropy_of_ProbabilityMeasure}

565:         S(P) = - \int_{X} \ln \frac{\ud P}{\ud \mu} \, \ud P  \enspace.

566:         \end{equation}

567:         \end{definition}

568:         Properties of entropy of a probability measure in the

569:         Definition~\ref{Definition:ME:ShannonEntropy_of_ProbabiliyMeasure} are

570:         studied in detail by

571:         Ochs~\cite{Ochs:1976:BasicPropertiesOfTheGeneralizedBoltzmann-Gibbs-ShannonEntropy}

572:         under the name generalized Boltzmann-Gibbs-Shannon

573:         Entropy. In the literature, one can find notation of the form

574:         $S(P|\mu)$ to represent the entropy functional in

575:         (\ref{Equation:ME:ShannonEntropy_of_ProbabilityMeasure}) viz., the

576:         entropy of a

577:         probability measure, to stress the role of the measure

578:         $\mu$ (e.g~\cite{Ochs:1976:BasicPropertiesOfTheGeneralizedBoltzmann-Gibbs-ShannonEntropy,Athreya:1994:EntropyMaximization}). Since

579:         all the information measures we define are with

580:         respect to the measure $\mu$ on $(X, \mathfrak{M})$, we omit

581:         $\mu$ in the entropy

582:         functional notation.

583:

584:         By assuming $\mu$ as a probability measure in the

585:         Definition~\ref{Definition:ME:ShannonEntropy_of_ProbabiliyMeasure},

586:         one can relate Shannon entropy with Kullback-Leibler entropy

587:         as,

588:         \begin{equation}

589:         \label{Equation:ME:RelationBetweenMeasureTheoreticEntropyAndKullback}

590:         S(P) = - I(P\|\mu) \enspace.

591:         \end{equation}

592: 	Note that when $\mu$ is not a probability measure, the

593:         divergence inequality $I(P\|\mu) \geq 0$ need not be

594:         satisfied.

595:

596: 	A note on the

597:         $\sigma$-finiteness of measure $\mu$. In the definition of

598:         entropy functional we assumed that $\mu$ is a $\sigma$-finite

599:         measure. This condition was used by

600:         Ochs~\cite{Ochs:1976:BasicPropertiesOfTheGeneralizedBoltzmann-Gibbs-ShannonEntropy},

601:         Csisz\'{a}r~\cite{Csiszar:1969:OnGeneralizedEntropy}

602:         and

603:         Rosenblatt-Roth~\cite{Rosenblatt-Roth:1964:TheConceptOfEntropyInProbabilityTheory}

604:         to tailor the measure-theoretic definitions. For all practical

605:         purposes and for most applications, this assumption is

606:         satisfied. (See

607:         \cite{Ochs:1976:BasicPropertiesOfTheGeneralizedBoltzmann-Gibbs-ShannonEntropy}

608:         for a discussion on the physical interpretation of measurable space

609:         $(X,\mathfrak{M})$ with $\sigma$-finite measure $\mu$ for

610:         entropy measure of the

611:         form~(\ref{Equation:ME:ShannonEntropy_of_ProbabilityMeasure}),

612:         and of the relaxation $\sigma$-finiteness

613:         condition.) By relaxing this condition, more universal

614:         definitions of entropy functionals are studied

615:         by Masani~\cite{Masani:1992:TheMeasureTheoreticAspectsOfEntropy_Part_1,Masani:1992:TheMeasureTheoreticAspectsOfEntropy_Part_2}.

616:

617: %        In this thesis we will not go into those details.

618:

619:   %---------------------------------------------------

620:   \subsection{Interpretation of Discrete and Continuous Entropies in

621:   terms of KL-entropy}

622:   \label{SubSection:ME:MeasureTheoreticCasesinDiscrete}

623: 	\noindent

624:         First, let us consider discrete case of $(X, \mathfrak{M},

625: 	\mu)$, where $X= \{x_{1}, \ldots, x_{n} \} $, $\mathfrak{M} =

626: 	2^{X}$ and $\mu$ is a cardinality probability measure. Let $P$

627: 	be any probability measure on $(X, \mathfrak{M})$. Then $\mu$

628:   	and $P$ can be specified as follows.

629:         \begin{displaymath}

630:         \mu \mbox{:} \:\:\: {\mu}_{k} = \mu(\{x_{k}\})  \geq 0, \:\:k = 1,

631:         \ldots, n, \:\:\:\sum_{k=1}^{n}

632:         \mu_{k} =1 \enspace, \:\:\: %\mbox{and}

633:         \end{displaymath}

634: 	and

635:         \begin{displaymath}

636:         P \mbox{:}\:\:\:  P_{k} = P(\{x_{k}\}) \geq 0 , \:\:k =1,

637:         \ldots, n, \:\:\: \sum_{k=1}^{n} P_{k} =1 \enspace.

638:         \end{displaymath}

639:         The probability measure $P$ is absolutely

640:         continuous with respect to the probability measure $\mu$ if

641:         $\mu_{k} =0$ implies $P_{k} =0$ for any $k=1,\ldots n$. The

642:         corresponding Radon-Nikodym

643:         derivative of $P$ with respect to $\mu$ is given by

644:         \begin{displaymath}

645:                 \frac{\ud P}{\ud \mu}(x_{k}) = \frac{P_{k}}{\mu_{k}}, \,

646:                 k = 1, \ldots n \enspace.

647:         \end{displaymath}

648:         The measure-theoretic entropy $S(P)$

649:         (\ref{Equation:ME:ShannonEntropy_of_ProbabilityMeasure}),

650:         in this case, can be written as

651:         \begin{displaymath}

652:         S(P) = - \sum_{k=1}^{n} P_{k}\ln \frac{P_{k}}{\mu_{k}} =

653:         \sum_{k=1}^{n} P_{k} \ln \mu_{k} - \sum_{k=1}^{n} P_{k} \ln

654:         P_{k} \enspace.

655:         \end{displaymath}

656:         If we take referential

657:         probability measure $\mu$ as a uniform probability

658:         distribution on the set $X$, i.e. $\mu_{k} = \frac{1}{n}$, we obtain

659:         \begin{equation}

660:         \label{Equation:ME:RelativionBetweenMeasureTheoreticAndDiscreteEntropies}

661:         S(P) = S_{n}(P) - \ln n \enspace,

662:         \end{equation}

663: 	where $S_{n}(P)$ denotes the Shannon entropy of pmf $P =

664:         (P_{1}, \ldots, P_{n})$ and $S(P)$ denotes

665: 	the

666:         measure-theoretic entropy in the discrete case.

667:

668: 	Now, lets consider the continuous case of

669: 	$(X,\mathfrak{M},\mu)$, where $X = [a,b] \subset \mathbb{R}$,

670: 	$\mathfrak{M}$ is set of Lebesgue measurable sets of $[a,b]$,

671: 	and $\mu$ is the Lebesgue probability measure. In this case

672: 	$\mu$ and $P$ can be specified as follows.

673:         \begin{displaymath}

674:         \mu \mbox{:}\:\:\: \mu(x) \geq 0 , x \in

675: 	[a,b], \ni \mu(E) = \int_{E} \mu(x) \, \ud x, \forall E \in

676: 	\mathfrak{M}, \: \int_{a}^{b} \mu(x)\, \ud x  =1 \enspace,

677:         \end{displaymath}

678: 	and

679:         \begin{displaymath}

680:         P \mbox{:}\:\:\:  P(x) \geq 0 , x \in

681: 	[a,b], \ni  P(E) = \int_{E} P(x) \, \ud x, \forall E \in \mathfrak{M}, \:\int_{a}^{b} P(x)\, \ud x =1 \enspace.

682:         \end{displaymath}

683: 	Note the abuse of notation in the above specification of

684: 	probability measures $\mu$ and $P$, where we have used the same

685: 	symbols for both measures and pdfs.

686:

687:

688:         The probability measure $P$ is absolutely continuous with

689:         respect to the probability measure $\mu$, if $\mu(x)=0$ on a

690:         set of a positive Lebesgue measure implies

691:         that $P(x)=0$ on the same

692:         set. The Radon-Nikodym derivative of the probability measure

693:         $P$ with respect to the probability measure $\mu$ will be

694:         \begin{displaymath}

695:                 \frac{\ud P}{\ud \mu}(x) = \frac{P(x)}{\mu(x)} \enspace.

696:         \end{displaymath}

697:         Then the measure-theoretic entropy $S(P)$ in this case

698: 	can be written as

699:         \begin{displaymath}

700:         S(P) = - \int_{a}^{b} P(x) \ln \frac{P(x)}{\mu(x)} \, \ud x

701:         \enspace.

702:         \end{displaymath}

703:         If we take referential probability measure $\mu$ as a uniform

704: 	distribution, i.e. $\mu(x) = \frac{1}{b-a}$, $x \in [a,b]$,

705:         then we obtain

706:         \begin{displaymath}

707:         \label{Equation:ME:RelativionBetweenMeasureTheoreticAndContinuousEntropies}

708:         S(P) = S_{[a,b]}(P) - \ln (b-a) \enspace,

709:         \end{displaymath}

710: 	where $S_{[a,b]}(P)$ denotes the Shannon entropy of pdf

711:         $P(x)$, $x \in [a,b]$ (\ref{Equation:ME:ContinuousEntropy})

712: 	and $S(P)$ denotes the measure-theoretic entropy in the

713: 	continuous case.

714:

715:         Hence, one can conclude that

716:         measure theoretic entropy $S(P)$ defined for a probability measure $P$ on

717:         the measure space $(X,\mathcal{M},\mu)$, is equal to both Shannon

718:         entropy in the discrete and continuous case case up to an

719:         additive constant, when the reference measure $\mu$ is chosen as a uniform

720:         probability distribution.

721: 	On the other hand, one can see that measure-theoretic KL-entropy,

722:         in discrete and continuous cases are equal to its discrete and

723:         continuous definitions.

724:

725:         Further, from

726:         (\ref{Equation:ME:RelationBetweenMeasureTheoreticEntropyAndKullback}) and

727:         (\ref{Equation:ME:RelativionBetweenMeasureTheoreticAndDiscreteEntropies}),

728:         we can write Shannon Entropy in terms Kullback-Leibler

729:         relative entropy

730:         \begin{equation}

731:         S_{n}(P) = \ln n - I(P \| \mu) \enspace.

732:         \end{equation}

733:         Thus, Shannon entropy appearers as being (up to an additive

734:         constant) the variation of information when we pass from the

735:         initial uniform probability distribution to new probability

736:         distribution given by $P_{k} \geq 0$, $\sum_{k=1}^{n} P_{k}

737:         =1$, as any such probability distribution is obviously

738:         absolutely continuous with respect to the uniform discrete

739:         probability distribution.

740:         Similarly, by

741:         (\ref{Equation:ME:RelationBetweenMeasureTheoreticEntropyAndKullback})

742:         and

743:         (\ref{Equation:ME:RelativionBetweenMeasureTheoreticAndContinuousEntropies})

744:         the relation between Shannon entropy and Relative entropy in

745:         discrete case

746:         we can write Boltzmann H-function in terms of Relative entropy

747:         as

748:         \begin{equation}

749:         S_{[a,b]}(p) = \ln (b-a) - I(P \| \mu) \enspace.

750:         \end{equation}

751:         Therefore, the continuous entropy or Boltzmann H-function

752:         $S(p)$ may be interpreted as being (up to an additive

753:         constant) the variation of information when we pass from the

754:         initial uniform probability distribution on the interval

755:         $[a,b]$ to the new probability measure defined by the

756:         probability distribution function $p(x)$ (any such

757:         probability measure is absolutely continuous with respect to

758:         the uniform probability distribution on the interval

759:         $[a,b]$).

760:

761: 	Thus, KL-entropy equips one with unitary interpretation of both

762: 	discrete entropy and continuous entropy.

763:         One can utilize Shannon entropy in the continuous case,

764:         as well as Shannon entropy in the discrete

765:         case, both being interpreted as the variation of information

766:         when we pass from the initial uniform distribution to the

767:         corresponding probability measure.

768:

769:         Also,

770:         since measure theoretic entropy is equal to the discrete and

771:         continuous entropy upto an additive constant, ME prescriptions

772:         of measure-theoretic Shannon entropy are consistent with

773:         discrete case and the continuous case.

774:

775: %=======================Section:================================

776: \section{Measure-Theoretic Definitions of Generalized Information

777:   Measures}

778: \label{Section:ME:MeasureTheoreticDefinitionsOfGeneralizedInformationMeasures}

779:         \noindent

780: %        In this section we extend the measure-theoretic definitions to

781: %        generalized information measures discussed in

782: %        Chapter~\ref{Chapter:KN}.

783: 	We begin with a brief note on the notation and assumptions

784:         used.

785:         We define all the information measures

786:         on the measurable space $(X,\mathfrak{M})$, and default reference

787:         measure is $\mu$ unless otherwise stated.

788:         To avoid clumsy formulations, we will not

789:         distinguish between functions differing on a $\mu$-null set

790:         only; nevertheless, we can work with equations between

791:         $\mathfrak{M}$-measurable functions on $X$ if they are

792:         stated as valid as being only $\mu$-almost everywhere ($\mu$-a.e or

793:         a.e).

794:         Further we assume that all the quantities of interest

795:         exist and assume, implicitly, the $\sigma$-finiteness of $\mu$ and

796:         $\mu$-continuity of probability measures whenever

797:         required. Since these assumptions repeatedly occur in various

798:         definitions and formulations, these will not be mentioned in

799:         the sequel.

800:         With these assumptions we do not distinguish between

801:         an information measure of pdf $p$ and of corresponding probability

802:         measure $P$ -- hence we give definitions of

803:         information measures for pdfs, we use  corresponding

804:         definitions of probability measures as well, when ever it is

805:         convenient or required  --  with the understanding that $P(E) = \int_{E} p\,

806:         \ud \mu $, the converse being due to the Radon-Nikodym theorem, where $p =

807:         \frac{\ud P}{\ud \mu}$. In both the cases we have $P \ll \mu$.

808:

809:         First we consider the R\'{e}nyi generalizations.

810:         Measure-theoretic definition of R\'{e}nyi entropy can be given

811:         as follows.

812:         %DEFINITION: Measure-theoretic definition of Renyi entropy

813:         \begin{definition}

814:         \label{Definition:ME:Measure-TheoreticRenyiEntropy}

815:         R\'{e}nyi entropy

816:         of a pdf $p:X \rightarrow {\mathbb{R}}^{+}$ on a measure space

817:         $(X,\mathfrak{M},\mu)$ is defined as

818:         \begin{equation}

819:         \label{Equation:ME:RenyiEntropyOf-pdf}

820:         S_{\alpha}(p) = \frac{1}{1-\alpha} \ln

821:         \int_{X}p(x)^{\alpha}\, \ud \mu(x) \enspace,

822:         \end{equation}

823:         provided the integral on the right exists and $\alpha \in

824:         \mathbb{R}$, $\alpha > 0$.

825:         \end{definition}%EndDEFINITION: Measure-theoretic definition of RenyiEntropy

826:         The same can be defined for any $\mu$-continuous probability

827:         measure $P$ as

828:         \begin{equation}

829:         \label{Equation:ME:RenyiEntropyOf-PM}

830:           S_{\alpha}(P) = \frac{1}{1-\alpha} \ln  \int_{X}

831:           {\left( \frac{\ud P}{\ud \mu} \right)}^{\alpha -1} \, \ud P \enspace.

832:         \end{equation}

833:         On the other hand, R\'{e}nyi relative-entropy can be defined as

834:         follows.

835:         %DEFINITION: Measure-theoretic definition of Tsallis relative entropy

836:         \begin{definition}

837:         Let $p,r:X \rightarrow

838:         {\mathbb{R}}^{+}$ be two pdfs on measure space $(X,\mathfrak{M},\mu)$. The

839:         R\'{e}nyi relative-entropy of $p$ relative to $r$

840:         is defined as

841:         \begin{equation}

842:         \label{Equation:ME:RenyiRelativeEntropyOf-pdf}

843:         I_{\alpha}(p\|r) = \frac{1}{\alpha -1} \ln \int_{X}

844:         \frac{p(x)^{\alpha}}{r(x)^{\alpha -1}} \, \ud \mu(x) \enspace,

845:         \end{equation}

846:         provided the integral on the right exists and $\alpha \in

847:         \mathbb{R}$, $\alpha > 0$.

848:         \end{definition}%EndDEFINITION: Measure-theoretic definition of Tsallis

849:          %relative entropy

850:         The same can be written in terms of probability measures as,

851:         \begin{eqnarray}

852: 	\label{Equation:ME:RenyiRelativeEntropyOf-PMs}

853:           I_{\alpha}(P\|R) &=& \frac{1}{\alpha -1} \ln   \int_{X}

854:           {\left( \frac{\ud P}{\ud R} \right)}^{\alpha -1} \, \ud P

855:           \nonumber \\

856:           &=& \frac{1}{\alpha -1} \ln   \int_{X}

857:           {\left( \frac{\ud P}{\ud R} \right)}^{\alpha} \, \ud R

858:           \enspace,

859:         \end{eqnarray}

860: 	whenever $P \ll R$; $I_{\alpha}(P \|R) = + \infty$, otherwise.

861: 	 Further if we assume $\mu$ in

862:         (\ref{Equation:ME:RenyiEntropyOf-PM}) is a probability measure

863:         then

864: 	\begin{equation}

865: 	\label{Equation:ME:Renyi_EntropyandRelativeEntropy}

866: 	S_{\alpha}(P) = I_{\alpha}(P\|\mu) \enspace.

867: 	\end{equation}

868:

869:         Tsallis entropy in the measure theoretic setting can be defined as

870:         follows.

871:         %DEFINITION: Measure-theoretic definition of Tsallis entropy

872:         \begin{definition}

873:         \label{Definition:ME:Measure-TheoreticTsallisEntropy}

874:         Tsallis entropy of a pdf $p$ on $(X,\mathfrak{M},\mu)$ is

875:         defined as

876:         \begin{equation}

877:         \label{Equation:ME:TsallisEntropyOf-pdf}

878:         S_{q}(p) = \int_{X} p(x) \ln_{q} \frac{1}{p(x)}\, \ud \mu(x) =

879:         \frac{1 - \int_{X} p(x)^{q}\, \ud \mu(x) }{q-1}

880:         \enspace,

881:         \end{equation}

882:         provided the integral on the right exists and $q \in

883:         \mathbb{R}$ and $q > 0$.

884:         \end{definition}%EndDEFINITION: Measure-theoretic definition

885:         %of TsallisEntropy

886:

887: 	$\ln_{q}$ in

888:             (\ref{Equation:ME:TsallisEntropyOf-pdf}) is referred to as

889:             $q$-logarithm and is defined as $\ln_{q} x = \frac{\displaystyle

890:             x^{1-q} -1}{\displaystyle 1-q}

891:         \:\:\: (x >0, q \in {\mathbb{R}})$.

892:         The same can be defined for $\mu$-continuous probability

893:         measure $P$, and can be written as

894:         \begin{equation}

895: 	\label{Equation:ME:TsallisEntropyOf-PM}

896:            S_{q}(P) = \int_{X} \ln_{q}  {\left(\frac{\ud P}{\ud \mu}\right)}^{-1}

897:           \, \ud P \enspace.

898:         \end{equation}

899:

900:         The definition of Tsallis relative-entropy is given below.

901:         %DEFINITION: Measure-theoretic definition of Tsallis relative entropy

902:         \begin{definition}

903:         Let $(X,\mathfrak{M},\mu)$ be a measure space. Let $p,r:X \rightarrow

904:         {\mathbb{R}}^{+}$ be two probability density functions. The

905:         Tsallis relative-entropy of $p$ relative to $r$

906:         is defined as

907:         \begin{equation}

908:         \label{Equation:ME:TsallisRelativeEntropyOf-pdf}

909:         I_{q}(p\|r) = - \int_{X} p(x) \ln_{q} \frac{r(x)}{p(x)}\, \ud

910:         \mu(x)    = \frac{\int_{X} \frac{p(x)^{q}}{r(x)^{q-1}}\,

911:           \ud \mu(x) -1 }{q-1}

912:         \end{equation}

913:         provided the integral on right exists and $q \in

914:         \mathbb{R}$ and $q > 0$.

915:         \end{definition}%EndDEFINITION: Measure-theoretic definition of Tsallis

916:          %relative entropy

917:         The same can be written for two probability measures $P$ and

918:         $R$, as

919:         \begin{equation}

920:           I_{q}(P\|R)= - \int_{X} \ln_{q} {\left(\frac{\ud P}{\ud R}\right)}^{-1}\,

921:           \ud P \enspace,

922:         \end{equation}

923: 	whenever $P \ll R$; $I_{q}(P \|R) = + \infty$, otherwise.

924: 	If $\mu$ in

925:         (\ref{Equation:ME:TsallisEntropyOf-PM}) is a probability measure

926:         then

927: 	\begin{equation}

928: 	\label{Equation:ME:Tsallis_EntropyandRelativeEntropy}

929: 	S_{q}(P) = I_{q}(P\|\mu) \enspace.

930: 	\end{equation}

931:

932: %        We discuss the relations between generalized entropic

933: %        functionals in measure-theoretic case to discrete or continuous

934: %        case in

935: %        \S~\ref{Section:ME:MeasureTheoreticDefinitions_Revisited}. The

936: %        reason for this is the various relations discussed for

937: %        classical information measures cannot be extended to the

938: %        generalized case. As we are going to see contrary to the

939: %        classical case, where consistency of ME-prescriptions of measure-theoretic

940: %        definitions with discrete or continuous case can be argued

941: %        without invoking ME-prescriptions, consistent arguments for measure-theoretic

942: %        generalized entropy functionals involve explicitly

943: %        ME-prescriptions. Hence it is important for us to discuss the

944: %        ME-prescriptions in generalized case. First we briefly review

945: %        the ME-prescriptions in the classical case.

946:

947: %=========================Section:=====================================

948: \section{Maximum Entropy and Canonical Distributions}

949: \label{Section:ME:MaximumEntropyAndCanonicalDistributions}

950:         \noindent

951:         For all the ME prescriptions of classical information measures

952:         we consider set of constrains of the form

953:         \begin{equation}

954:         \label{Equation:ME:ExpectationConstraints}

955:         \int_{X} u_{m} \, \ud P = \int_{X} u_{m}(x) p(x) \, \ud \mu(x) =

956:         \langle u_{m} \rangle \enspace, \:\:\:m =

957:         1, \ldots , M \enspace,

958:         \end{equation}

959:         with respect to $\mathfrak{M}$-measurable functions $u_{m}: X

960:         \rightarrow \mathbb{R}, \:\: m = 1, \ldots M$, whose expectation

961:         values $\langle u_{m} \rangle, \, m=1,\ldots M$ are (assumed

962:         to be) {\it a priori} known, along with the normalizing

963:         constraint $\int_{X} \, \ud P =1$.

964:         (From now on we assume that any set of constraints on

965:         probability distributions implicitly includes this

966:         constraint, which will not be mentioned in the sequel.)

967:

968: %-----Note on the notation for next chapter...

969: %        A note on the notation: To avoid proliferation of symbols we

970: %        use the same notation for the minimum or maximum entropy

971: %        distributions and Lagrange multipliers in the various case;

972: %        the correspondence should be clear from the context. In the

973: %        maximum entropy case use $Z$ for the partition function and in

974: %        minimum entropy case we $\widehat{Z}$.

975:

976:         To maximize the

977:         entropy~(\ref{Equation:ME:ShannonEntropyOf-pdf})

978:         with respect

979:         to the constraints~(\ref{Equation:ME:ExpectationConstraints}), the

980:         solution is calculated via the Lagrangian:

981:         {\setlength\arraycolsep{0pt}

982:         \begin{eqnarray}

983:         \label{Equation:ME:LagranginForMaximumEntropy}

984:         \mathcal{L}(x, \lambda, \beta) = - \int_{X} \ln \frac{\ud

985:         P}{\ud \mu}(x)&& \, \ud P(x) - \lambda \left(\int_{X}\, \ud P(x) - 1

986:         \right) \nonumber \\

987:         && - \sum_{m=1}^{M} \beta_{m} \left(\int_{X} u_{m}(x)\, \ud P(x) -

988:         \langle u_{m} \rangle \right) \enspace,

989:         \end{eqnarray}}

990:         where $\lambda$ and $\beta_{m}\, m=1,\ldots,M$ are Lagrange

991:         parameters (we use the notation $\beta = (\beta_{1}, \ldots, \beta_{M})$).

992:         \noindent

993:         The solution is given by

994:         \begin{displaymath}

995:         \ln \frac{\ud P}{\ud \mu}(x) + \lambda + \sum_{m=1}^{M}

996:         \beta_{m} u_{m}(x) = 0 \enspace.

997:         \end{displaymath}

998:         The solution can be calculated as

999:         \begin{equation}

1000:         \ud P(x, \beta) = \exp \left( -\ln Z(\beta) - \sum_{m=1}^{M}

1001:         \beta_{m} u_{m}(x)\right) \ud \mu(x)

1002:         \end{equation}

1003:         or

1004:         \begin{equation}

1005:         p(x) = \frac{\ud P}{\ud \mu} (x) = \frac{e^{ -

1006:             \sum_{m=1}^{M} \beta_{m}

1007:         u_{m}(x)}}{Z(\beta)}  \enspace,

1008:         \end{equation}

1009:         where the partition function $Z(\beta)$ is written as

1010:         \begin{equation}

1011:         \label{Equation:PartitionFunctionForMaximumEntropy}

1012:         Z(\beta) = \int_{X} \exp \left( - \sum_{m=1}^{M} \beta_{m}

1013:         u_{m}(x)\right) \ud \mu(x) \enspace.

1014:         \end{equation}

1015:         The Lagrange parameters $\beta_{m},\: m = 1, \ldots M$ are

1016:         specified by the set of

1017:         constraints (\ref{Equation:ME:ExpectationConstraints}).

1018:

1019:         The maximum entropy, denoted by $S$, can be calculated as

1020:         \begin{equation}

1021:         \label{Equation:ME:MaximumEntropy}

1022:         S = \ln Z + \sum_{m=1}^{M} \beta_{m} \langle u_{m} \rangle \enspace.

1023:         \end{equation}

1024:

1025:         The Lagrange parameters $\beta_{m},\: m = 1, \ldots M$, are

1026:         calculated by searching the unique solution (if it exists) of the

1027:         following system of nonlinear equations:

1028:         \begin{equation}

1029:         \label{Equation:ME:MaximumEntropy_ThermodynamicEquation_1}

1030:           \frac{\partial}{\partial \beta_{m}} \ln Z(\beta) = - \langle

1031:         u_{m} \rangle \enspace, \:\:\:m = 1, \ldots M \enspace.

1032:         \end{equation}

1033:         We also have

1034:         \begin{equation}

1035:         \label{Equation:ME:MaximumEntropy_ThermodynamicEquation_2}

1036:         \frac{\partial S}{\partial \langle u_{m} \rangle} = -

1037:         \beta_{m} \enspace, \:\:\: m = 1, \ldots M \enspace.

1038:         \end{equation}

1039:         Equations

1040:         (\ref{Equation:ME:MaximumEntropy_ThermodynamicEquation_1}) and

1041:         (\ref{Equation:ME:MaximumEntropy_ThermodynamicEquation_1}) are

1042:         referred to as the thermodynamic equations.

1043:

1044: %================================Section:===================================

1045: \section{ME prescription for Tsallis Entropy}

1046: \label{Section:ME:ME-prescriptionForTsallisEntropy}

1047:         \noindent

1048:          The great success of Tsallis entropy is

1049:          attributed to the power-law distributions one can derive as

1050:          maximum entropy distributions by maximizing Tsallis entropy

1051:          with respect to the moment constraints. But there are

1052:          subtilities  involved in the choice of constraints one would

1053:          choose for ME prescriptions of these

1054:          entropy functionals. These subtilities  are still part of the

1055:          major discussion in the nonextensive formalism~\cite{FerriMartinezPlastino:2005:TheRoleOfConstraintsInTsallisNonextensiveTreatmentRevisited,AbeBagci:2005:NecessityOfqExpectation,WadaScarfone:2005:ConnectionsBetweenTsallisFormalismEtc}.

1056:

1057:         In the nonextensive formalism maximum entropy distributions

1058:         are derived with respect to the constraints which are

1059:         different from (\ref{Equation:ME:ExpectationConstraints}),

1060:         which are used for classical information measures. The

1061:         constraints of the

1062:         form~(\ref{Equation:ME:ExpectationConstraints}) are

1063:         inadequate for handling the serious mathematical difficulties

1064:         (see~\cite{TsallisMendesPlastino:1998:TheRoleOfConstraints}). To

1065:         handle these difficulties constraints of the form

1066:         \begin{equation}

1067:         \label{Equation:ME:Normalized-q-ExpectationConstraints}

1068:         \frac{\int_{X} u_{m}(x) p(x)^{q} \, \ud \mu(x)}{\int_{X}

1069:           p(x)^{q}\, \ud \mu(x)} = {\langle\langle u_{m} \rangle\rangle}_{q} \enspace, m =

1070:         1, \ldots , M

1071:         \end{equation}

1072: 	are proposed.

1073:         (\ref{Equation:ME:Normalized-q-ExpectationConstraints}) can

1074:           be considered as the expectation with respect to the

1075:           modified probability measure $P_{(q)}$ (it is indeed a

1076:           probability measure) defined as

1077:           \begin{equation}

1078:             P_{(q)}(E) = {\left( \int_{X} p(x)^{q} \, \ud \mu

1079:               \right)}^{-1} \int_{E} p(x)^{q} \, \ud \mu \enspace.

1080:           \end{equation}

1081:           The measure $ P_{(q)}$ is known as escort probability

1082:           measure.

1083:

1084:           The variational principle for Tsallis entropy maximization

1085:           with respect to

1086:           constraints~(\ref{Equation:ME:Normalized-q-ExpectationConstraints})

1087:           can be written as

1088:           \begin{eqnarray}

1089:           \label{Equation:ME:Lagrangin_TsallisMaximumEntropy_wrt_Norm-q-Expt}

1090:           \mathcal{L}(x, \lambda, \beta) =  &&\int_{X} \ln_{q}

1091:           \frac{1}{p(x)} \, \ud P(x) - \lambda \left(\int_{X}\, \ud P(x) - 1

1092:           \right) \nonumber \\

1093:           && - \sum_{m=1}^{M} \beta^{(q)}_{m} \left(\int_{X} {p(x)}^{q-1}

1094:           \left(u_{m}(x) - {\langle\langle u_{m}  \rangle\rangle}_{q}

1095:           \right) \, \ud P(x) \right) \enspace,

1096:           \end{eqnarray}

1097:           where the parameters $\beta_{m}^{(q)}$ can be defined in

1098:           terms of true Lagrange parameters $\beta_{m}$ as

1099:          \begin{equation}

1100:            \beta_{m}^{(q)} = {\left(\int_{X} p(x)^{q}\, \ud \mu

1101:              \right)}^{-1} \beta_{m}\enspace, \, m = 1, \ldots, M.

1102:           \end{equation}

1103:           The maximum entropy distribution in this case can be written

1104:           as

1105:           \begin{equation}

1106:           \label{Equation:ME:TsallisMaximumEntropyDistribution_wrt_q-Expt}

1107:           p(x) = \frac{\displaystyle {\left[ 1 - (1-q)  {\left( \int

1108:             dx\,{p(x)}^{q} \right)}^{-1}  \sum_{m=1}^{M} \beta_{m} \left( u_{m}(x) -

1109:           {\langle\langle {u}_{m} \rangle\rangle}_{q} \right) \right]}^{\frac{1}{1-q}}}

1110:           {\displaystyle {\overline{Z_{q}}}  }

1111:           \end{equation}

1112:

1113:

1114:          \begin{equation}

1115:          \label{Equation:ME:TsallisMaximumEntropyDistribution_wrt_q-Expt_q-exponentialForm}

1116:          p(x) = \frac{\displaystyle e_{q}^{-   {\left(\int_{X} p(x)^{q}\, \ud \mu

1117:              \right)}^{-1}   \sum_{m=1}^{M} \beta_{m} (u_{m}(x) -

1118:              {\langle\langle u_{m}\rangle\rangle}_{q}  )

1119:          }}{\displaystyle \overline{Z_{q}}} \enspace,

1120:          \end{equation}

1121:          where

1122:          \begin{equation}

1123:            \overline{Z_{q}} = \int_{X} {e_{q}^{- {\left(\int_{X} p(x)^{q}\, \ud \mu

1124:              \right)}^{-1}   \sum_{m=1}^{M} \beta_{m} (u_{m}(x) -

1125:              {\langle\langle u_{m}\rangle\rangle}_{q}  ) }} \, \ud \mu(x) \enspace.

1126:          \end{equation}

1127:

1128:         Maximum Tsallis entropy in this case satisfies

1129:         \begin{equation}

1130:         S_{q} = \ln_{q}\overline{{Z}_{q}} \enspace,

1131:         \end{equation}

1132:         while corresponding thermodynamic equations can be written

1133:         as

1134:         \begin{equation}

1135:         \frac{\partial}{\partial \beta_{m}} \ln_{q} Z_{q}  =  -

1136:         {\langle\langle{{u}_{m}}\rangle\rangle}_{q} \enspace, \:\:\: m = 1, \ldots M

1137:         \enspace,

1138:         \end{equation}

1139:         \begin{equation}

1140:         \frac{\partial S_{q}}{\partial

1141:         {\langle\langle{{u}_{m}}\rangle\rangle}_{q}  }  =  -

1142:         \beta_{m} \enspace, \:\:\: m =1, \ldots M \enspace,

1143:         \end{equation}

1144:         where

1145:         \begin{equation}

1146:         \ln_{q} Z_{q} = \ln_{q} \overline{{Z}_{q}}

1147:         - \sum_{m=1}^{M} \beta_{m}

1148:         {\langle\langle{{u}_{m}}\rangle\rangle}_{q} \enspace.

1149:         \end{equation}

1150:

1151: %=============================================================================

1152: \section{Measure-Theoretic Definitions: Revisited}

1153: \label{Section:ME:MeasureTheoreticDefinitions_Revisited}

1154:        \noindent

1155: 	It is well known that unlike Shannon entropy, Kullback-Leibler

1156:        relative-entropy in the discrete

1157:        case can be extended naturally to the measure-theoretic

1158:        case.

1159:        In this section, we show

1160:        that this fact is true for generalized relative-entropies

1161:        too. R\'{e}nyi relative-entropy on continuous valued space

1162:        $\mathbb{R}$ and its

1163:        equivalence with the discrete case is studied

1164:        by R\'{e}nyi~\cite{Renyi:1960:SomeFundamentalQuestionsOfInformationTheory}. Here,

1165:        we present the result in the measure-theoretic case and

1166:        conclude that both measure-theoretic definitions of Tsallis and

1167:        R\'{e}nyi relative-entropies are equivalent to its discrete

1168:        case.

1169:

1170:        We also present a result pertaining to ME of

1171:        measure-theoretic Tsallis entropy. We show that ME of Tsallis

1172:        entropy in the measure-theoretic case is consistent with the

1173:        discrete case.

1174:

1175:    %-----------------------Sub Section------------------

1176:   \subsection{On Measure-Theoretic Definitions of Generalized Relative-Entropies}

1177:        \noindent

1178:         Here we show that generalized relative-entropies in the

1179:         discrete case can be naturally extended to measure-theoretic

1180:         case, in the  sense that measure-theoretic definitions can

1181:         be defined as a limit of a sequence of finite discrete

1182:         entropies of pmfs which approximate the pdfs involved.

1183:         We call this

1184:         sequence of pmfs as ``approximating sequence of pmfs of a

1185:         pdf''. To formalize these aspects we need the following

1186:         lemma.

1187:         %--------------Lemmma-------------

1188:         \begin{lemma}

1189:         \label{Lemma:ME:ExistenceOfApproximatingSequenceOfSimpleFunctionsForPdf}

1190:         Let $p$ be a pdf defined on measure space

1191:         $(X,\mathfrak{M},\mu)$. Then there exists a sequence of simple

1192:         functions $\{f_{n}\}$ (we refer to them as approximating sequence of

1193:         simple functions of $p$) such that $\lim_{n \to \infty} f_{n} = p$

1194:         and each $f_{n}$ can be written as

1195:         \begin{equation}

1196:         \label{Equation:ME:ActualDefinitionOfSeqenceOfSimpleFunctions}

1197:          f_{n}(x) = \frac{1}{\mu(E_{n,k})} \int_{E_{n,k}} p \, \ud

1198:         \mu \enspace, \:\:\:\:\:\:\: \forall x \in E_{n,k},

1199: 	 \:\:\: k = 1, \ldots m(n) \enspace,

1200:         \end{equation}

1201:         where $(E_{n,1}, \ldots, E_{n,m(n)})$ is the measurable

1202:         partition corresponding to $f_{n}$ (the notation $m(n)$

1203:         indicates that $m$ varies with $n$).  Further each $f_{n}$

1204:   	satisfies

1205:         \begin{equation}

1206:          \int_{X} f_{n} \, \ud \mu = 1 \enspace.

1207:         \end{equation}

1208:         \end{lemma}

1209:         %Proof----

1210:         \proof

1211: %	\footnote{$ \cup_{k=1}^{m(n)} E_{n,k} = X$ and $E_{n,i}

1212: %        \cap E_{n,j} = \emptyset$, $\forall i \neq j$}

1213:          Define a sequence of simple functions $\{f_{n}\}$ as

1214:         \begin{equation}

1215:          f_{n}(x) = \left\{ \begin{array}{ll}

1216:           \frac{1}{ \mu p^{-1} \left(

1217:             \left[ \frac{k}{2^{n}}, \frac{k+1}{2^{n}} \right) \right)}

1218:             \displaystyle \int_{  p^{-1} \left(

1219:             \left[ \frac{k}{2^{n}}, \frac{k+1}{2^{n}} \right) \right)

1220:             } p \, \ud \mu \enspace,& \: \:

1221:          \:\:\textrm{if}\:\:  \frac{k}{2^{n}} \leq p(x) <

1222:          \frac{k+1}{2^{n}} , \\

1223:          & \:\:\:k = 0, 1, \ldots n 2^{n}-1

1224:          \\ \\

1225:          \frac{1}{ \mu p^{-1} \left(

1226:             \left[ n, \infty \right) \right)}

1227:             \displaystyle \int_{  p^{-1} \left(

1228:             \left[ n , \infty \right) \right)

1229:             } p \, \ud \mu \enspace,& \: \:

1230:          \:\:\textrm{if}\:\: n \leq p(x),

1231:            \end{array} \right.

1232:          \end{equation}

1233:          Each $f_{n}$ is indeed a simple function and can be written as

1234:          \begin{equation}

1235:           f_{n} = \sum_{k=0}^{n2^{n}-1} \left( \frac{1}{\mu E_{n,k}}

1236:           \int_{E_{n,k}} p\, \ud \mu \right) \chi_{E_{n,k}} + \left( \frac{1}{\mu

1237:             F_{n}} \int_{F_{n}} p \, \ud \mu \right) \chi_{F_{n}} \enspace,

1238:          \end{equation}

1239:          where $E_{n,k} =

1240:          p^{-1}\left(\left[\frac{k}{2^{n}},\frac{k+1}{2^{n}}\right)

1241:           \right)$, $k= 0, \ldots, n2^{n}-1$ and $F_{n} = p^{-1} \left(

1242:             \left[ n, \infty \right) \right)$.

1243:          Since $\int_{E} p \, \ud \mu < \infty$ for any $E \in

1244:          \mathfrak{M}$, we have $\int_{E_{n,k}} p\, \ud \mu = 0$

1245:          whenever $\mu E_{n,k} =0$, for $k = 0, \ldots n2^{n} -1$. Similarly

1246:          $\int_{F_{n}} p\, \ud \mu = 0$ whenever $\mu F_{n} =0$.

1247:          Now we show that $\lim_{n \to \infty} f_{n} = p$, point-wise.

1248:

1249:          First assume that $p(x) < \infty$. Then $\exists \: n \in

1250:          {\mathbb{Z}}^{+} \ni p(x) \leq n$. Also $\exists \, k \in

1251:          {\mathbb{Z}}^{+} $, $0 \leq k

1252:          \leq n2^{n}-1

1253:          \ni \frac{k}{2^{n}} \leq p(x) <

1254:          \frac{k+1}{2^{n}}$ and $\frac{k}{2^{n}} \leq f_{n}(x) <

1255:          \frac{k+1}{2^{n}}$. This implies $0 \leq |p - f_{n} | <

1256:          \frac{1}{2^{n}}$ as required.

1257:

1258:          If $p(x) = \infty$, for some $x \in X$, then $x \in F_{n}$ for

1259:          all $n$, and therefore $f_{n}(x) \geq n$ for all $n$; hence

1260:          $\lim_{n \to \infty} f_{n}(x) = \infty = p(x) $.

1261:

1262:          Finally we have

1263:          \begin{eqnarray}

1264:            \int_{X} f_{n} \, \ud \mu &=& \sum_{k=1}^{n(m)} \left[

1265:            \frac{1}{\mu(E_{n,k})} \int_{E_{n,k}} p \,\ud \mu \right]

1266:             \mu(E_{n,k}) \nonumber \\

1267:             &=& \sum_{k=1}^{n(m)} \int_{E_{n,k}} p \,\ud \mu \nonumber \\

1268:             &=& \int_{X} p \, \ud \mu =1 \nonumber

1269:          \end{eqnarray}

1270:          \endproof

1271:          %-------------End: lemmma-----------------

1272:          The above construction of a sequence of simple functions which

1273:          approximate a measurable function is similar to the

1274:          approximation theorem~\cite[pp.6, Theorem

1275:            1.8(b)]{Kantorovitz:2003:IntroductionToModernAnalysis} in

1276:          the theory of integration. But, approximation in

1277:          Lemma~\ref{Lemma:ME:ExistenceOfApproximatingSequenceOfSimpleFunctionsForPdf}

1278:          can be seen as a mean-value approximation where as in the later

1279:          case it is the lower approximation. Further, unlike in the case

1280:          of lower approximation, the sequence of simple functions

1281:          which approximate $p$ in

1282:          Lemma~\ref{Lemma:ME:ExistenceOfApproximatingSequenceOfSimpleFunctionsForPdf}

1283:          are neither monotone nor satisfy $f_{n} \leq p$.

1284:

1285:         Now one can define a sequence of pmfs $\{\tilde{p}_{n}\}$ corresponding

1286:         to the sequence

1287:         of simple functions constructed in

1288:         Lemma~\ref{Lemma:ME:ExistenceOfApproximatingSequenceOfSimpleFunctionsForPdf},

1289:         denoted by $\tilde{p}_{n} = (\tilde{p}_{n,1}, \ldots,\tilde{p}_{n,m(n)})$, as

1290:         \begin{equation}

1291:         \label{Equation:ME:ActualDefinitionOfSeqenceOfPmfs}

1292:          \tilde{p}_{n,k} = \mu(E_{n,k})f_{n}\chi_{E_{n,k}} = \int_{E_{n,k}} p \, \ud

1293:          \mu \enspace, k = 1, \ldots m(n),

1294:         \end{equation}

1295:         for any $n$.

1296:         We have

1297:         \begin{equation}

1298:          \sum_{k=1}^{m(n)} \tilde{p}_{n,k} = \sum_{k=1}^{m(n)} \int_{E_{n,k}} p

1299:          \, \ud \mu

1300:          = \int_{X} p \, \ud \mu =1 \enspace,

1301:         \end{equation}

1302:         and hence $\tilde{p}_{n}$ is indeed a pmf.

1303:         We call $\{\tilde{p}_{n}\}$ as the approximating sequence of pmfs of pdf

1304:         $p$.

1305:

1306: %        We say an measure-theoretic definition of an information

1307: %        measure $\overline{S}$ is exact if

1308: %        \begin{equation}

1309: %         \lim_{n \to \infty} \overline{S}(P_{n}) = \overline{S}(p) \enspace.

1310: %        \end{equation}

1311:

1312:           Now we present our main theorem, where we assume that $p$ and

1313: 	  $r$ are bounded. The

1314:           assumption of boundedness of $p$ and $r$ simplifies the

1315: 	  proof. However, the result can be

1316:           extended to an unbounded

1317:           case. See~\cite{Renyi:1959:OnTheDimensionAndEntropyOfProbabilityDistributions}

1318:           analysis of Shannon entropy and relative entropy on $\mathbb{R}$.

1319:          %THEOREM:Measure-theoretic definition of generalized relative entropies.

1320:          \begin{theorem}

1321:          \label{Theorem:ME:MeasureTheoreticDefinitionsOfGeneralizedRelative-Entropies}

1322:             Let $p$ and $r$ be pdf, which are bounded, defined on a

1323:             measure space $(X,\mathfrak{M}, \mu)$. Let $\tilde{p}_{n}$

1324:             and $\tilde{r}_{n}$ be the approximating sequence of pmfs of $p$ and $r$

1325:             respectively. Let $I_{\alpha}$ denotes the R\'{e}nyi relative-entropy as

1326:             in~(\ref{Equation:ME:RenyiRelativeEntropyOf-pdf}) and

1327: 	  $I_{q}$ denote the Tsallis

1328:             relative-entropy as

1329:             in~(\ref{Equation:ME:TsallisRelativeEntropyOf-pdf})

1330:             then

1331:             \begin{equation}

1332:             \label{Equation:ME:InRenyisTheoremStatement_2}

1333:             \lim_{n \to \infty} I_{\alpha}(\tilde{p}_{n} \| \tilde{r}_{n}) = I_{\alpha}(p\|r)

1334:             \end{equation}

1335: 	     and

1336:             \begin{equation}

1337:             \label{Equation:ME:InRenyisTheoremStatement_1}

1338:             \lim_{n \to \infty} I_{q}(\tilde{p}_{n} \| \tilde{r}_{n}) = I_{q}(p\|r)

1339:             \end{equation}

1340:          \end{theorem}

1341:          \proof

1342:          It is enough to prove the result for either Tsallis or

1343:          R\'{e}nyi since each are monotone and continuous functions of

1344:          each other. Hence we write down the proof for the case of R\'{e}nyi

1345:          and we use the entropic index $\alpha$ in the proof.

1346:

1347:          Corresponding to pdf $p$, let $\{f_{n}\}$ be the approximating

1348:          sequence of simple functions such that $\lim_{n \to \infty}

1349:          f_{n} = p$ as in

1350:          Lemma~\ref{Lemma:ME:ExistenceOfApproximatingSequenceOfSimpleFunctionsForPdf}.

1351:          Let $\{g_{n}$ be the approximating sequence of simple

1352:          functions for $r$ such that $\lim_{n \to \infty} g_{n} = r$.

1353:          Corresponding

1354:          to simple functions $f_{n}$ and $g_{n}$ there exists a common

1355:          measurable partition\footnote{Let $\varphi$ and $\phi$ are two

1356:          simple functions defined on $(X,\mathfrak{M})$. Let $\{E_{1},

1357:          \ldots E_{n}\}$ and $\{F_{1},\ldots, F_{m}\}$ be the measurable

1358:          partitions corresponding to $\varphi$ and $\phi$

1359:          respectively. Then partition defined as $\{E_{i} \cap E_{j} |

1360:          i = 1, \ldots n,\:\: j =1, \ldots m\}$ is a common measurable

1361:          partition for both $\varphi$ and $\phi$.}

1362:          $\{ E_{n,1}, \ldots E_{n,m(n)}\}$ such

1363:          that $f_{n}$ and $g_{n}$ can be written as

1364:          \begin{equation}

1365:          \label{Equation:ME:InRenyisTheorem_1_a}

1366:            f_{n}(x) = \sum_{k=1}^{m(n)} (a_{n,k})

1367:            \chi_{E_{n,k}}(x) \enspace, \:\:\: a_{n,k} \in

1368:                {\mathbb{R}}^{+}, \, \forall k = 1, \ldots m(n) \enspace,

1369:          \end{equation}

1370:          \begin{equation}

1371:          \label{Equation:ME:InRenyisTheorem_1_b}

1372:            g_{n}(x) = \sum_{k=1}^{m(n)} (b_{n,k})

1373:            \chi_{E_{n,k}}(x) \enspace, \:\:\: b_{n,k} \in

1374:                {\mathbb{R}}^{+}, \, \forall k = 1, \ldots m(n) \enspace,

1375:          \end{equation}

1376:          where  $\chi_{E_{n,k}}$ is the characteristic function of

1377:          $E_{n,k}$, for $k=1,\ldots m(n)$. By

1378:          (\ref{Equation:ME:InRenyisTheorem_1_a}) and

1379:          (\ref{Equation:ME:InRenyisTheorem_1_b}) the approximating

1380:          sequences of pmfs $\{\tilde{p}_{n} = (\tilde{p}_{n,1},

1381:          \ldots, \tilde{p}_{n,m(n)})\}$

1382:           and $\{\tilde{r}_{n} = (\tilde{r}_{n,1}, \ldots,

1383:          \tilde{r}_{n,m(n)})\}$ can be written as

1384: %	corresponding

1385: %          to pdfs $p$ and $r$ respectively can be written as $\tilde{p}_{n,k}

1386: %          =  (a_{n,k}) \mu(E_{n,k}),\, k = 1, \ldots , m(n) $ and

1387: %          $ \tilde{r}_{n,k}= (b_{n,k}) \mu(E_{n,k}), \, k = 1, \ldots ,

1388: %          m(n)$

1389:           (see (\ref{Equation:ME:ActualDefinitionOfSeqenceOfPmfs}))

1390:          \begin{equation}

1391: 	 \label{Equation:ME:InRenyisTheorem_2_a}

1392:            \tilde{p}_{n,k} =  a_{n,k} \mu(E_{n,k})\:\:\: k = 1, \ldots , m(n) \enspace,

1393:          \end{equation}

1394:          \begin{equation}

1395: 	 \label{Equation:ME:InRenyisTheorem_2_b}

1396:            \tilde{r}_{n,k} =  b_{n,k} \mu(E_{n,k})\:\:\: k = 1, \ldots , m(n) \enspace.

1397:          \end{equation}

1398:       	 Now R\'{e}nyi

1399:          relative entropy for $\tilde{p}_{n}$ and

1400:          $\tilde{r}_{n}$ can be written as

1401:          \begin{equation}

1402:          \label{Equation:ME:InRenyisTheorem_2}

1403:            S_{\alpha}(\tilde{p}_{n} \| \tilde{r}_{n}) =

1404:          \frac{1}{\alpha-1} \ln \sum_{k=1}^{m(n)}

1405:            \frac{a_{n,k}^{\alpha}}{b_{n,k}^{\alpha -1}}

1406:            \mu(E_{n,k}) \enspace.

1407:          \end{equation}

1408:

1409:          To prove $\lim_{n \rightarrow \infty} S_{\alpha}(\tilde{p}_{n} \|

1410:          \tilde{r}_{n}) = S_{\alpha}(p \| r) $ it is enough to prove that

1411:          \begin{equation}

1412:          \label{Equation:ME:InRenyisTheorem_2}

1413:            \lim_{n \rightarrow \infty} \frac{1}{\alpha-1} \ln

1414:            \int_{X} \frac{ {f_{n}(x)}^{\alpha} }{

1415:              {g_{n}(x)}^{\alpha-1}} \, \ud \mu(x)

1416:             =   \frac{1}{\alpha-1} \ln

1417:            \int_{X} \frac{ {p(x)}^{\alpha} }{

1418:              {r(x)}^{\alpha-1}} \, \ud \mu(x) \enspace,

1419:           \end{equation}

1420:            since we have\footnote{ Since simple functions

1421:            ${\left(f_{n}\right)}^{\alpha}$ and ${\left(g_{n}\right)}^{\alpha-1}$ can be

1422:            written as

1423:            \begin{displaymath}

1424:              {\left(f_{n}\right)}^{\alpha}(x) = \sum_{k=1}^{m(n)}

1425:              \left( a_{n,k}^{\alpha} \right) \chi_{E_{n,k}}(x)

1426:             \enspace, \:\:\:\:\:\mbox{and}

1427:            \end{displaymath}

1428:            \begin{displaymath}

1429:              {\left(g_{n}\right)}^{\alpha-1}(x) = \sum_{k=1}^{m(n)}

1430:              \left( b_{n,k}^{\alpha-1} \right) \chi_{E_{n,k}}(x) \enspace.

1431:            \end{displaymath}

1432:            Further,

1433:            \begin{displaymath}

1434:            \frac{f_{n}^{\alpha}}{g_{n}^{\alpha-1}}(x)

1435:             =    \sum_{k=1}^{m(n)} \left( \frac{

1436:              a_{n,k}^{\alpha} }{b_{n,k}^{\alpha-1}} \right)   \chi_{E_{n,k}}(x) \enspace.

1437:            \end{displaymath}

1438:          }%Endfootnote

1439:          \begin{equation}

1440:          \label{Equation:ME:InRenyisTheorem_3}

1441:            \int_{X} \frac{{f_{n}(x)}^{\alpha}}{{g_{n}(x)}^{\alpha -1} } \,

1442:            \ud \mu(x) =

1443:            \sum_{k=1}^{m(n)}

1444:            \frac{a_{n,k}^{\alpha}}{b_{n,k}^{\alpha-1}} \mu(E_{n,k}) \enspace.

1445:          \end{equation}

1446:           Further it is enough  to prove that

1447:          \begin{equation}

1448:          \label{Equation:ME:InRenyisTheorem_3}

1449:            \lim_{n \rightarrow \infty}

1450:            \int_{X}  {h_{n}(x)}^{\alpha} g_{n}(x)   \, \ud \mu(x)

1451:             =

1452:            \int_{X} \frac{{p(x)}^{\alpha} }{

1453:              {r(x)}^{\alpha-1}} \, \ud \mu(x) \enspace,

1454:           \end{equation}

1455:          where $h_{n}$ is defined as $h_{n}(x) =

1456:          \frac{f_{n}(x)}{g_{n}(x)} $.\\

1457:         %Case 1---------

1458:         \noindent

1459:         {\em \underline {Case 1: $0 < \alpha < 1$}}

1460:

1461:         In this case

1462:         the {\em Lebesgue dominated convergence

1463:           theorem}~\cite[pp.26]{Rudin:1966:RealAndComplexAnalysis}

1464:         gives that,

1465:          \begin{equation}

1466:          \label{Equation:ME:InRenyisTheorem_4}

1467:            \lim_{n \to \infty} \int_{X}

1468:            \frac{f_{n}^{\alpha}}{g_{n}^{\alpha -1}} \, \ud \mu =

1469:            \int_{X} \frac{p^{\alpha}}{r^{\alpha -1}} \, \ud \mu \enspace.

1470:          \end{equation}

1471:          and hence (\ref{Equation:ME:InRenyisTheoremStatement_1})

1472:

1473:          %Case 2-----------

1474:          \noindent

1475:          {\em \underline {Case 2: $\alpha  > 1$}}

1476:

1477:          We have $h_{n}^{\alpha} f_{n}

1478:          \rightarrow \frac{f(x)^{\alpha}}{g(x)^{\alpha-1}}$ {\em

1479:            a.e}. By {\em Fatou's

1480:            Lemma}~\cite[pp.23]{Rudin:1966:RealAndComplexAnalysis} we

1481:          obtain that,

1482:          \begin{equation}

1483:          \label{Equation:ME:InRenyisTheorem_LimInfInequality}

1484:            \lim_{n \to \infty} \inf \int_{X}

1485:            h_{n}(x)^{\alpha} g_{n}(x) \, \ud \mu(x) \geq

1486:            \int_{X} \frac{{p(x)}^{\alpha} }{

1487:              {r(x)}^{\alpha-1}} \, \ud \mu(x) \enspace.

1488:          \end{equation}

1489:          From the construction of $f_{n}$ and $g_{n}$

1490:          (Lemma~\ref{Lemma:ME:ExistenceOfApproximatingSequenceOfSimpleFunctionsForPdf})

1491:          we have

1492:          \begin{equation}

1493:          \label{Equation:ME:InRenyisTheorem_5}

1494:          h_{n}(x) f_{n}(x) = \frac{1}{\mu(E_{n,i})} \int_{E_{n,i}}

1495:          \frac{p(x)}{r(x)} r(x) \, \ud \mu \enspace, \:\:\: \forall x

1496:          \in E_{n,i} \enspace.

1497:          \end{equation}

1498:          By Jensen's inequality we get

1499:          \begin{equation}

1500:          \label{Equation:ME:InRenyisTheorem_6}

1501:          h_{n}(x)^{\alpha} f_{n}(x) \leq \frac{1}{\mu(E_{n,i})}

1502:            \int_{E_{n,i}}  \frac{p(x)^{\alpha}}{r(x)^{\alpha-1}}   \,

1503:            \ud \mu \enspace, \:\:\: \forall x \in E_{n,i} \enspace.

1504:          \end{equation}

1505:          By (\ref{Equation:ME:InRenyisTheorem_1_a}) and

1506:          (\ref{Equation:ME:InRenyisTheorem_1_b}) we can write

1507:          (\ref{Equation:ME:InRenyisTheorem_6}) as

1508:          \begin{equation}

1509:          \label{Equation:ME:InRenyisTheorem_7}

1510:            \frac{a_{n,i}^{\alpha}}{b_{n,i}^{\alpha-1}}  \mu(E_{n,i})   \leq

1511:            \int_{E_{n,i}}   \frac{p(x)^{\alpha}}{r(x)^{\alpha-1}}

1512:            \, \ud \mu \enspace, \:\:\: \forall i = 1, \ldots m(n) \enspace.

1513:          \end{equation}

1514:          By taking summations both sides of

1515:          (\ref{Equation:ME:InRenyisTheorem_7}) we get

1516:          \begin{equation}

1517:          \label{Equation:ME:InRenyisTheorem_8}

1518:           \sum_{i=1}^{m(n)}  \frac{a_{n,i}^{\alpha}}{b_{n,i}^{\alpha-1}}  \mu(E_{n,i})   \leq

1519:           \sum_{i=1}^{m(n)} \int_{E_{n,i}}

1520:           \frac{p(x)^{\alpha}}{r(x)^{\alpha-1}}  \, \ud \mu \enspace,

1521:           \:\:\: \forall i = 1, \ldots m(n) \enspace.

1522:          \end{equation}

1523:          The above equation (\ref{Equation:ME:InRenyisTheorem_8}) nothing but

1524:          \begin{displaymath}

1525:           \int_{X} h_{n}^{\alpha}(x) f_{n}(x) \, \mu(x)   \leq

1526:           \int_{X}  \frac{p(x)^{\alpha}}{r(x)^{\alpha-1}}

1527:             \, \ud \mu \enspace, \:\:\: \forall n \enspace,

1528:          \end{displaymath}

1529:          and hence

1530:          \begin{displaymath}

1531:          \sup_{i > n } \int_{X} h_{i}^{\alpha}(x) f_{i}(x) \, \mu(x)

1532:          \leq \int_{X}  \frac{p(x)^{\alpha}}{r(x)^{\alpha-1}}

1533:            \, \ud \mu \enspace, \:\:\: \forall n \enspace.

1534:          \end{displaymath}

1535:          Finally we have

1536:          \begin{equation}

1537:          \label{Equation:ME:InRenyisTheorem_LimSupInequality}

1538:            \lim_{n \to \infty} \sup \int_{X}

1539:            h_{n}^{\alpha}(x) f_{n}(x) \, \mu(x)   \leq \int_{X}

1540:             \frac{p(x)^{\alpha}}{r(x)^{\alpha-1}} \, \ud \mu \enspace.

1541:          \end{equation}

1542:          From (\ref{Equation:ME:InRenyisTheorem_LimInfInequality}) and

1543:          (\ref{Equation:ME:InRenyisTheorem_LimSupInequality}) we have

1544:          \begin{equation}

1545:          \label{Equation:ME:InRenyisTheorem_LimEquality}

1546:            \lim_{n \to \infty} \int_{X}

1547:              \frac{f_{n}(x)^{\alpha}}{g_{n}(x)^{\alpha-1}}  \, \mu(x) = \int_{X}

1548:             \frac{p(x)^{\alpha}}{r(x)^{\alpha-1}} \, \ud \mu \enspace,

1549:          \end{equation}

1550:          and hence (\ref{Equation:ME:InRenyisTheoremStatement_1}).

1551:          \endproof

1552:           %EndProof--------------

1553:

1554:   %--------------------Sub Section-----------------------------

1555:   \subsection{On ME of Measure-Theoretic  definition of Tsallis entropy}

1556:          \noindent

1557:          With the shortcomings of Shannon entropy that it cannot be

1558:          naturally extended to the non-discrete case, we have observed

1559:          that Shannon entropy in its general case on measure space can

1560:          be used consistently for the ME-prescriptions. One can easily

1561:          see that generalized information measures of R\'{e}nyi and Tsallis

1562:          too cannot be extended naturally to measure-theoretic case,

1563:          i.e., measure-theoretic definitions are not equivalent to the

1564:          discrete case in the sense that they can not be defined as a

1565:          limit of sequence of finite discrete entropies corresponding to

1566:          pmfs defined on measurable partitions which approximates the

1567:          pdf. One can use the same counter example we discussed in

1568:          \S~\ref{SubSection:ME:DiscreteToContinuous}. We have already

1569:          given the ME-prescriptions of Tsallis entropy in the

1570:          measure-theoretic case. In this section, we show that the

1571:          ME-prescriptions in the measure-theoretic case are consistent

1572:          with the discrete case.

1573:

1574:  	 Proceeding as in the case of measure-theoretic entropy in

1575:          \S~\ref{SubSection:ME:MeasureTheoreticCasesinDiscrete},

1576:          measure-theoretic Tsallis

1577:          entropy $S_{q}(P)$~(\ref{Equation:ME:TsallisEntropyOf-PM}) in

1578:          the discrete case can be written as

1579:          \begin{equation}

1580: 	 \label{Equation:ME:MeasureTheoreticTsallisEntropyInDiscreteForm}

1581:          S_{q}(P) = \sum_{k=1}^{n} P_{k} \ln_{q} \frac{\mu_{k}}{P_{k}} \enspace.

1582:          \end{equation}

1583:          By (\ref{Equation:KN:PropertyOflnq(x/y)}) we get

1584:          \begin{equation}

1585: 	 \label{Equation:ME:MeasureTheoreticTsallisEntropyInDiscreteForm_1}

1586:          S_{q}(P) = \sum_{k=1}^{n} P_{k}^{q} \left[ \ln_{q} \mu_{k} -

1587:          \ln_{q} P_{k} \right] = S_{q}^{n}(P) + \sum_{k=1}^{n} P_{k}^{q}

1588:          \ln_{q} \mu_{k} \enspace,

1589:          \end{equation}

1590:          where $S_{q}^{n}(P)$ is the Tsallis entropy in discrete case.

1591:          When $\mu$ is a uniform distribution i.e., $\mu_{k} =

1592:          \frac{1}{n}\:\: \forall n = 1, \ldots n$ we get

1593:          \begin{equation}

1594: 	 \label{Equation:ME:MeasureTheoreticTsallisEntropyInDiscreteForm_1}

1595:          S_{q}(P) = S_{q}^{n}(P) - n^{q-1} \ln_{q} n \sum_{k=1}^{n}

1596:          P_{k}^{q} \enspace.

1597:          \end{equation}

1598:          Now we show that the quantity $\sum_{k=1}^{n} P_{k}^{q}$ is

1599:          constant in maximization of $S_{q}(P)$ with respect to the

1600:          set of constraints

1601:          (\ref{Equation:ME:Normalized-q-ExpectationConstraints}).

1602:

1603:          The claim is that

1604:          \begin{equation}

1605:          \label{Equation:ME:SumOfpPowerqs_ForNormalizedExpectation}

1606:          \int p(x)^{q}\, \ud \mu(x) = {(\overline{Z_{q}})}^{1-q} \enspace,

1607:          \end{equation}

1608: 	 which holds for Tsallis maximum entropy distribution

1609:          (\ref{Equation:ME:TsallisMaximumEntropyDistribution_wrt_q-Expt})

1610:          in general. This can be shown as follows. From

1611:          the maximum entropy

1612:          distribution~(\ref{Equation:ME:TsallisMaximumEntropyDistribution_wrt_q-Expt}),

1613:          we have

1614:          \begin{displaymath}

1615:          p(x)^{1-q} = \frac{\displaystyle 1 - (1-q)  {\left( \int_{X}

1616:             {p(x)}^{q}\, \ud \mu(x) \right)}^{-1}  \sum_{m=1}^{M}

1617:          \beta_{m} \left( u_{m}(x) -

1618:          {\langle\langle {u}_{m} \rangle\rangle}_{q} \right)}

1619:          {\displaystyle ({\overline{Z_{q}}})^{1-q}  } \enspace,

1620:          \end{displaymath}

1621:          which can be rearranged as

1622:          \begin{displaymath}

1623:          ({\overline{Z_{q}}})^{1-q} p(x) = \left[ 1 - (1-q)

1624:          \frac{\sum_{m=1}^{M} \beta_{m} \left( u_{m}(x) -

1625:          {\langle\langle {u}_{m} \rangle\rangle}_{q} \right)}{\int

1626:          {p(x)}^{q}} \, \ud \mu(x) \right] p(x)^{q} \enspace.

1627:          \end{displaymath}

1628:          By integrating both sides in the above equation, and by

1629:          using~(\ref{Equation:ME:Normalized-q-ExpectationConstraints})

1630:          we get (\ref{Equation:ME:SumOfpPowerqs_ForNormalizedExpectation}).

1631:

1632:          Now, (\ref{Equation:ME:SumOfpPowerqs_ForNormalizedExpectation}) can

1633:          be written in its discrete form as

1634:          \begin{equation}

1635:          \label{Equation:ME:SumOfpPowerqs_ForNormalizedExpectation_Discrete_1}

1636:           \sum_{k=1}^{n} \frac{P_{k}^{q}}{\mu_{k}^{q-1}} =

1637:          {(\overline{Z_{q}})}^{1-q} \enspace.

1638:          \end{equation}

1639:          When $\mu$ is uniform distribution we get

1640:          \begin{equation}

1641:          \label{Equation:ME:SumOfpPowerqs_ForNormalizedExpectation_Discrete_2}

1642:           \sum_{k=1}^{n} P_{k}^{q} = n^{1-q}  {(\overline{Z_{q}})}^{1-q}

1643:          \end{equation}

1644:          which is a constant.

1645:

1646:          Hence by

1647:          (\ref{Equation:ME:MeasureTheoreticTsallisEntropyInDiscreteForm_1})

1648:          and

1649:          (\ref{Equation:ME:SumOfpPowerqs_ForNormalizedExpectation_Discrete_2}),

1650:          on can conclude that with respect to a particular instance of

1651:          ME, measure-theoretic Tsallis entropy $S(P)$ defined for a

1652:          probability measure $P$ on

1653:         the measure space $(X,\mathfrak{M},\mu)$, is equal to

1654:         discrete Tsallis entropy up to an

1655:         additive constant, when the reference measure $\mu$ is chosen as a uniform

1656:         probability distribution. There by, one can further conclude

1657:          that with respect to a particular instance of ME of

1658:          measure-theoretic Tsallis entropy is consistent with its

1659:          discrete definition.

1660:

1661: %=======================Section: Conclusition===================

1662: \section{Conclusions}

1663: \label{Section:Conclusions}

1664: 	\noindent

1665: 	In this paper we presented measure-theoretic definitions of

1666: 	generalized information measures. We proved that the measure-theoretic

1667:         definitions of generalized relative-entropies, R\'{e}nyi and

1668:         Tsallis, are natural extensions of their respective discrete

1669:         cases. We also showed that, ME prescriptions of

1670:         measure-theoretic Tsallis entropy are consistent with the

1671:         discrete case.

1672:

1673:

1674: %========================Bibliography===================================

1675: \section*{References}

1676:

1677: \bibliographystyle{unsrt}

1678: \bibliography{papi}

1679:

1680:

1681: \end{document}

1682:

1683:

1684:

1685:

1686:

1687: