1:
2:
3: %----------------------------------------------------------------
4: %%%%%%%%%%%%%%%%%%%%5Check-
5:
6: % check whether to use pseudo-additivity or nonextensive additivity
7:
8:
9:
10: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
11: % INSTITUTE OF PHYSICS PUBLISHING %
12: % %
13: % `Preparing an article for publication in an Institute of Physics %
14: % Publishing journal using LaTeX' %
15: % %
16: % LaTeX source code `ioplau2e.tex' used to generate `author %
17: % guidelines', the documentation explaining and demonstrating use %
18: % of the Institute of Physics Publishing LaTeX preprint files %
19: % `iopart.cls, iopart12.clo and iopart10.clo'. %
20: % %
21: % `ioplau2e.tex' itself uses LaTeX with `iopart.cls' %
22: % %
23: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
24: %
25: %
26: % First we have a character check
27: %
28: % ! exclamation mark " double quote
29: % # hash ` opening quote (grave)
30: % & ampersand ' closing quote (acute)
31: % $ dollar % percent
32: % ( open parenthesis ) close paren.
33: % - hyphen = equals sign
34: % | vertical bar ~ tilde
35: % @ at sign _ underscore
36: % { open curly brace } close curly
37: % [ open square ] close square bracket
38: % + plus sign ; semi-colon
39: % * asterisk : colon
40: % < open angle bracket > close angle
41: % , comma . full stop
42: % ? question mark / forward slash
43: % \ backslash ^ circumflex
44: %
45: % ABCDEFGHIJKLMNOPQRSTUVWXYZ
46: % abcdefghijklmnopqrstuvwxyz
47: % 1234567890
48: %
49: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
50: %
51: \documentclass[12pt]{iopart}
52: \newcommand{\gguide}{{\it Preparing graphics for IOP journals}}
53:
54: %==============================
55: %Mine
56:
57: \usepackage{amssymb}
58: \usepackage{amsthm}
59:
60: %------------------theorm env---------------
61: \newtheorem{theorem}{Theorem}[section]
62: \newtheorem{lemma}[theorem]{Lemma}
63: \newtheorem{proposition}[theorem]{Proposition}
64: \newtheorem{corollary}[theorem]{Corollary}
65: \newtheorem{definition}[theorem]{Definition}
66: \newtheorem{remark}[theorem]{Remark}
67: % \def\QED{\mbox{\rule[0pt]{1.5ex}{1.5ex}}}
68: % \def\proof{\noindent\hspace{2em}{\it Proof: }}
69: % \def\endproof{\hspace*{\fill}~\QED\par\endtrivlist\unskip}
70: %--------------------------------------------
71: \newcommand{\ud}{\mathrm{d}}
72: %=====================================
73:
74: %Uncomment next line if AMS fonts required
75: %\usepackage{iopams}
76: \begin{document}
77:
78: \title[]{On Measure Theoretic definitions of Generalized Information
79: Measures and Maximum Entropy Prescriptions}
80:
81: \author{Ambedkar Dukkipati, M Narasimha Murty\footnote{Corresponding author} and
82: Shalabh Bhatnagar}
83:
84: \address{Department of Computer Science and Automation,
85: Indian Institute of Science, Bangalore-560012, India.}
86: \ead{\mailto{ambedkar@csa.iisc.ernet.in},
87: \mailto{mnm@csa.iisc.ernet.in}, \mailto{shalabh@csa.iisc.ernet.in}}
88:
89:
90: %----------------------------------------
91: \begin{abstract}
92: Though Shannon entropy of a probability measure $P$, defined
93: as $- \int_{X} \frac{\ud P}{\ud \mu} \ln \frac{\ud P}{\ud
94: \mu} \, \ud \mu$ on a measure space $(X, \mathfrak{M},\mu)$, does not
95: qualify itself as an information measure (it is not a natural
96: extension of the discrete case), maximum entropy (ME)
97: prescriptions in the measure-theoretic case are consistent with that of
98: discrete case.
99: In this paper, we study the
100: measure-theoretic definitions of generalized information
101: measures and discuss the ME prescriptions. We present two
102: results in this regard: (i) we prove that, as in
103: the case of classical relative-entropy, the measure-theoretic
104: definitions of generalized relative-entropies, R\'{e}nyi and
105: Tsallis, are natural extensions of their respective discrete
106: cases, (ii) we show that, ME prescriptions of
107: measure-theoretic Tsallis entropy are consistent with the
108: discrete case.
109: \end{abstract}
110:
111: %Uncomment for PACS numbers title message
112: \pacs{}
113: % Keywords required only for MST, PB, PMB, PM, JOA, JOB?
114: %\vspace{2pc}
115: %\noindent{\it Keywords}: Article preparation, IOP journals
116: % Uncomment for Submitted to journal title message
117: %\submitto{\JPA}
118: % Comment out if separate title page not required
119: \maketitle
120:
121: %=========================Introduction===========================
122: \section{Introduction}
123: \label{Section:Introduction}
124: Shannon measure of information was developed
125: essentially for the case when the random variable takes a
126: finite number of values. However, in the literature, one often
127: encounters an extension of Shannon entropy in the discrete
128: case to the case
129: of a one-dimensional random variable with density function $p$
130: in the form~(e.g \cite{ShannonWeawer:1949:TheMathematicalTheoryOfCommunication,Ash:1965:InformationTheory})
131: \begin{displaymath}
132: S(p) = - \int_{- \infty}^{+ \infty} p(x) \ln p(x)\, \ud x \enspace.
133: \end{displaymath}
134: This entropy in the continuous case
135: as a pure-mathematical formula (assuming convergence of
136: the integral and absolute continuity of the density $p$ with
137: respect to Lebesgue measure) resembles Shannon entropy in the
138: discrete case, but can not be used as a measure of
139: information. First, it is not a natural extension of Shannon
140: entropy in the discrete case, since it is not the limit of the sequence
141: finite discrete entropies corresponding to pmf which
142: approximate the pdf $p$. Second, it is not strictly positive.
143:
144: Inspite of these short comings, one can still use the
145: continuous entropy functional in conjunction with the principle of maximum
146: entropy where one wants to find a probability density function
147: that has greater uncertainty than any other distribution
148: satisfying a set of given constraints. Thus, in this use of
149: continuous measure one is interested in it as a measure of
150: relative uncertainty, and not of absolute uncertainty. This
151: is where one can relate maximization of Shannon entropy to the
152: minimization of Kullback-Leibler relative-entropy
153: (see~\cite[pp. 55]{KapurKesavan:1997:EntropyOptimizationPrinciples}).
154: % It
155: % is well known that the continuous version of
156: % KL-entropy defined for two probability density functions $p$
157: % and $r$ as,
158: % \begin{displaymath}
159: % I(p\|r) = \int_{- \infty}^{+ \infty} p(x) \ln
160: % \frac{p(x)}{r(x)} \, \ud x \enspace,
161: % \end{displaymath}
162: % is indeed a natural generalization of same in the discrete
163: % case.
164:
165: Indeed, during the early stages of development of
166: information theory, the important paper
167: by Gelfand, Kolmogorov and Yaglom~\cite{GelfandKolmogorovYaglom:1956:OnTheGeneralDefinitionOfTheAmountOfInformation}
168: called attention to the case of defining entropy functional on
169: an arbitrary measure space $(X, \mathfrak{M},\mu)$.
170: In this respect, Shannon entropy of a probability density function $p:X
171: \rightarrow {\mathbb{R}}^{+}$ can be written as,
172: \begin{displaymath}
173: S(p) = - \int_{X} p(x) \ln p(x) \, \ud \mu \enspace.
174: \end{displaymath}
175: One can see from the above definition that the concept of
176: ``the entropy of a pdf'' is a misnomer: there
177: is always another measure $\mu$ in the background. In the
178: discrete case considered by Shannon, $\mu$ is the cardinality
179: measure\footnote{Counting or cardinality measure $\mu$ on a
180: measurable space $(X,\mathfrak{M})$, when is $X$ is a
181: finite set and $\mathfrak{M} = 2^{X}$, is defined as $\mu(E)
182: = \# E$, $\forall E \in \mathfrak{M}$.}~\cite[pp. 19]{ShannonWeawer:1949:TheMathematicalTheoryOfCommunication};
183: in the continuous case considered by both Shannon and Wiener,
184: $\mu$ is the Lebesgue
185: measure cf.~\cite[pp. 54]{ShannonWeawer:1949:TheMathematicalTheoryOfCommunication}
186: and
187: \cite[pp. 61, 62]{Wiener:1948:Cybernetics}.
188: All entropies are
189: defined with respect to some measure
190: $\mu$,
191: as Shannon and Wiener both emphasized in~\cite[pp.57,
192: 58]{ShannonWeawer:1949:TheMathematicalTheoryOfCommunication}
193: and~\cite[pp.61, 62]{Wiener:1948:Cybernetics} respectively.
194:
195: This case was studied independently
196: by Kallianpur~\cite{Kallianpur:1960:OnTheAmountOfInformationContainedInASingmaField}
197: and Pinsker~\cite{Pinsker:1960:InformationAndInformationStability},
198: and perhaps others were guided by the earlier work
199: of Kullback~\cite{KullbackLeibler:1951:OnInformationAndSufficiency},
200: where one would define entropy in terms of Kullback-Leibler
201: relative entropy. Unlike Shannon entropy, measure-theoretic
202: definition of KL-entropy is a natural extension of definition
203: in the discrete case.
204:
205: %In this respect,
206: %the Gelfand-Yaglom-Perez theorem
207: %(GYP-theorem)~\cite{GelfandYaglom:1959:CalculationOfTheAmountOfInformation_Etc,Perez:1959:InformationTheoryWithAbstractAlphabets,Dobrushin:1959:GeneralFormulationsOfShannonsbasicTheorems}
208: %plays an important role, which equips measure-theoretic
209: %KL-entropy with a fundamental definition. The main
210: %contribution of this chapter is to prove GYP-theorem for
211: %R\'{e}nyi relative-entropy of order $\alpha >1$, which can be
212: %extended to Tsallis relative-entropy.
213:
214: %Before proving GYP-theorem for R\'{e}nyi relative-entropy,
215: In this paper we present the measure-theoretic definitions of
216: generalized information measures and show that as in
217: the case of KL-entropy, the measure-theoretic
218: definitions of generalized relative-entropies, R\'{e}nyi and
219: Tsallis, are natural extensions of their respective discrete
220: cases. We discuss the ME prescriptions for generalized
221: entropies and show that ME prescriptions of
222: measure-theoretic Tsallis entropy are consistent with the
223: discrete case, which is true for measure-theoretic
224: Shannon-entropy.
225:
226: Rigorous studies of the Shannon and KL entropy functionals in
227: measure spaces can be found in the papers by
228: Ochs~\cite{Ochs:1976:BasicPropertiesOfTheGeneralizedBoltzmann-Gibbs-ShannonEntropy}
229: and by
230: Masani~\cite{Masani:1992:TheMeasureTheoreticAspectsOfEntropy_Part_1,Masani:1992:TheMeasureTheoreticAspectsOfEntropy_Part_2}.
231: Basic measure-theoretic aspects of classical information measures can be
232: found
233: in~\cite{Pinsker:1960:InformationAndInformationStability,Guiasu:1977:InformationTheoryWithApplications,Gray:1990:EntropyAndInformationTheory}.
234: % in~\cite[Chapter~2]{Guiasu:1977:InformationTheoryWithApplications}
235: % and~\cite[Chapter~5]{Gray:1990:EntropyAndInformationTheory}.
236:
237: We review the measure-theoretic formalisms for classical
238: information measures in
239: \S~\ref{Section:ME:MeasureTheoreticDefinitionsOfInformationMeasures}
240: and extend these definitions to generalized
241: information measures in
242: \S~\ref{Section:ME:MeasureTheoreticDefinitionsOfGeneralizedInformationMeasures}. In
243: \S~\ref{Section:ME:MaximumEntropyAndCanonicalDistributions} we
244: present the ME prescription for Shannon entropy followed by
245: prescriptions for
246: Tsallis entropy in
247: \S~\ref{Section:ME:ME-prescriptionForTsallisEntropy}. We
248: revisit measure-theoretic definitions of generalized entropic
249: functionals in
250: \S~\ref{Section:ME:MeasureTheoreticDefinitions_Revisited} and
251: present some results.
252:
253: %================================Section:==========================================
254: \section{Measure-Theoretic definitions of Classical Information Measures}
255: \label{Section:ME:MeasureTheoreticDefinitionsOfInformationMeasures}
256: % Information measures like entropy, mutual information,
257: % conditional entropy, and conditional mutual information
258: % etc., can be expressed in terms of KL-entropy and hence
259: % the measure-theoretic analogs of these measures will follow
260: % from the measure-theoretic definition of KL-entropy.
261: % In this section, we study the measure-theoretic
262: % definitions of KL-entropy and its relation to entropy in this
263: % case.
264: %-----------------------------SubSection-------------------------------------
265: \subsection{Discrete to Continuous}
266: \label{SubSection:ME:DiscreteToContinuous}
267: \noindent
268: Let $p:[a,b] \rightarrow {\mathbb{R}}^{+}$ be a probability
269: density function, where $[a,b] \subset \mathbb{R}$. That is,
270: $p$ satisfies
271: \begin{displaymath}
272: p(x) \geq 0, \:\:\: \forall x \in [a,b] \:\:\: \mathrm{and}\:\:\:
273: \int_{a}^{b} p(x) \, \ud x =1 \enspace.
274: \end{displaymath}
275: In trying to define entropy in the continuous case, the
276: expression of Shannon entropy was automatically extended by
277: replacing the sum in the
278: Shannon entropy discrete case by the
279: corresponding integral. We obtain, in this way, Boltzmann's
280: H-function (also known
281: as differential entropy in information theory),
282: % ~\cite{Grad:1965:OnBoltzmannsH-Theorem}: reference for
283: % Boltzmann-H function
284: \begin{equation}
285: \label{Equation:ME:ContinuousEntropy}
286: S(p) = - \int_{a}^{b} p(x) \ln p(x) \, \ud x \enspace.
287: \end{equation}
288: But the ``continuous entropy'' given
289: by~(\ref{Equation:ME:ContinuousEntropy}) is not a natural
290: extension of definition in discrete case in the sense that, it
291: is not the limit of
292: the finite discrete entropies corresponding to a sequence of
293: finer partitions of the interval $[a,b]$ whose norms tend to
294: zero. We can show this by a counter example.
295: Consider a uniform probability distribution
296: on the interval $[a,b]$, having the probability density
297: function
298: \begin{displaymath}
299: p(x) = \frac{1}{b-a}\enspace, \:\:\:\:\: x \in [a,b] \enspace.
300: \end{displaymath}
301: The continuous
302: entropy~(\ref{Equation:ME:ContinuousEntropy}), in this case will be
303: \begin{displaymath}
304: S(p) = \ln (b - a) \enspace.
305: \end{displaymath}
306: On the other hand, let us consider a finite partition of the the interval
307: $[a,b]$ which is composed of $n$ equal subintervals, and let
308: us attach to this partition the finite discrete uniform
309: probability distribution whose corresponding entropy will be,
310: of course,
311: \begin{displaymath}
312: S_{n}(p) = \ln n \enspace.
313: \end{displaymath}
314: Obviously, if $n$ tends to infinity, the discrete entropy
315: $S_{n}(p)$ will tend to infinity too, and not to $\ln (b-a)$;
316: therefore $S(p)$ is not the limit of $S_{n}(p)$, when $n$ tends
317: to infinity. Further, one can observe that $\ln (b-a)$ is negative
318: when~$b-a <1$.
319:
320: Thus, strictly speaking
321: continuous entropy~(\ref{Equation:ME:ContinuousEntropy}) cannot
322: represent a measure of uncertainty since uncertainty should
323: in general be positive.
324: We are able to prove the ``nice'' properties only for the
325: discrete entropy, therefore, it
326: qualifies as a ``good'' measure of information (or
327: uncertainty) supplied by an random experiment. The ``continuous
328: entropy'' not being the limit of the discrete
329: entropies, we cannot extend the so called nice properties to
330: it.
331:
332: Also, in physical applications, the coordinate $x$ in
333: (\ref{Equation:ME:ContinuousEntropy}) represents an abscissa,
334: a distance from a fixed reference point. This distance $x$ has
335: the dimensions of length. Now, with the density function
336: $p(x)$, one can specify the probabilities of an event $[c,d)
337: \subset [a,b]$ as $\int_{c}^{d} p(x) \, \ud x$, one has to
338: assign the dimensions ${(\mbox{length})}^{-1}$, since
339: probabilities are dimensionless. Now for $0 \leq z < 1$, one
340: has the series expansion
341: \begin{equation}
342: - \ln (1-z) = z + \frac{1}{2}z^{2} + \frac{1}{3}z^{3}+
343: \ldots \enspace,
344: \end{equation}
345: it is necessary that the argument of the logarithm function
346: in~(\ref{Equation:ME:ContinuousEntropy}) be
347: dimensionless.
348: Hence the formula (\ref{Equation:ME:ContinuousEntropy}) is
349: then seen to be dimensionally incorrect, since the argument of
350: the logarithm on its right hand side has the dimensions of a
351: probability
352: density~\cite{Smith:2001:SomeObservationsOnTheConceptsOfInformationTheoreticEntropy}.
353: Although
354: Shannon~\cite{Shannon:1948:MathematicalTheoryOfCommunication_BellLabs}
355: used the formula (\ref{Equation:ME:ContinuousEntropy}), he
356: does note its lack of invariance with respect to changes in
357: the coordinate system.
358:
359: In the context of maximum entropy principle
360: Jaynes~\cite{Jaynes:1968:PriorProbabilities}
361: addressed this problem and suggested the formula,
362: \begin{equation}
363: \label{Equation:ME:JaynesSuggestion}
364: S'(p) = - \int_{a}^{b} p(x) \ln \frac{p(x)}{m(x)}\, \ud x \enspace,
365: \end{equation}
366: in the place of (\ref{Equation:ME:ContinuousEntropy}),
367: where $m(x)$ is a prior function. Note that when $m(x)$ is probability density
368: function, (\ref{Equation:ME:JaynesSuggestion}) is nothing but
369: the relative-entropy. However, if we choose $m(x) = c$, a constant
370: (e.g \cite{ZellnerHighfield:1988:CalculationOfMaximumEntropyDistributions}),
371: we get
372: \begin{displaymath}
373: S'(p) = S(p) - \ln c \enspace,
374: \end{displaymath}
375: where $S(p)$ refers to the continuous
376: entropy (\ref{Equation:ME:ContinuousEntropy}).
377: Thus, maximization of $S'(p)$ is equivalent to maximization of
378: $S(p)$.
379: Further discussion on estimation of probability
380: density functions by ME-principle in the continuous case can be found in
381: \cite{LazoRathie:1978:OnTheEntropyOfContinuousProbabilityDistributions,ZellnerHighfield:1988:CalculationOfMaximumEntropyDistributions,Ryu:1993:MaximumEntropyEstimationOfDensityAndRegressionFunction}.
382:
383: Prior to that, Kullback~\cite{KullbackLeibler:1951:OnInformationAndSufficiency} too
384: suggested that in the measure-theoretic definition of entropy,
385: instead of examining the entropy
386: corresponding to only on given measure, we have to compare the
387: entropy inside a whole class of measures.
388:
389: %-----------------------SubSection------------------------------------
390: \subsection{Classical information measures}
391: \label{SubSection:ME:ClassicalInformationMeasures}
392:
393: \noindent
394: Let $(X,\mathfrak{M},\mu)$ be a measure space. $\mu$
395: need not be a probability measure unless otherwise specified.
396: Symbols $P$, $R$ will denote probability measures on
397: measurable space $(X,\mathfrak{M})$ and $p$, $r$
398: denote $\mathfrak{M}$-measurable functions on $X$.
399: An $\mathfrak{M}$-measurable function $p:X \rightarrow
400: {\mathbb{R}}^{+}$ is said to be a probability
401: density function (pdf) if $\int_{X} p \, \ud \mu = 1$.
402:
403: In this general setting, Shannon entropy $S(p)$ of pdf $p$ is
404: defined as follows~\cite{Athreya:1994:EntropyMaximization}.
405: %DEFINITION: Shannon entropy for pdf
406: \begin{definition}
407: \label{Definition:ME:ShannonEntropy_Measuretheroetic_pdf}
408: Let $(X,\mathfrak{M},\mu)$ be a measure space and
409: $\mathfrak{M}$-measurable function $p:X \rightarrow
410: {\mathbb{R}}^{+}$ be pdf. Shannon entropy of $p$
411: is defined as
412: \begin{equation}
413: \label{Equation:ME:ShannonEntropyOf-pdf}
414: S(p) = - \int_{X} p \ln p \, \ud \mu \enspace,
415: \end{equation}
416: provided the integral on right exists.
417: \end{definition}%EndDefinition
418: Entropy functional $S(p)$ defined in (\ref{Equation:ME:ShannonEntropyOf-pdf}) can be
419: referred to as entropy of the probability measure
420: $P$, in the sense that the measure $P$ is induced by $p$,
421: i.e.,
422: \begin{equation}
423: \label{Equation:ME:ProbabilityMeasureInducedByaPdf}
424: P(E) = \int_{E} p(x) \, \ud \mu(x) \enspace, \:\:\:\:\:
425: \forall E \in \mathfrak{M} \enspace.
426: \end{equation}
427: This reference is consistent\footnote{Say
428: $p$ and
429: $r$ are two pdfs and $P$ and $R$ are corresponding
430: induced measures on measurable space $(X,\mathfrak{M})$ such
431: that $P$ and $R$ are identical, i.e., $\int_{E} p \,
432: \ud \mu = \int_{E} r \, \ud \mu$, $\forall E \in \mathfrak{M}$. Then
433: we have $p \stackrel{\mathrm{a.e}}{=} r$ and hence
434: $ -\int_{X} p \ln p \, \ud \mu = -\int_{X} r \ln r \, \ud
435: \mu$.} because the probability measure
436: $P$ can be identified {\it a.e} by the pdf $p$.
437:
438: Further, the definition of the probability measure $P$
439: in (\ref{Equation:ME:ProbabilityMeasureInducedByaPdf}), allows us
440: to write entropy functional
441: (\ref{Equation:ME:ShannonEntropyOf-pdf})
442: as,
443: \begin{equation}
444: \label{Equation:ME:ShannonEntropyOf-PM-inducedBy-pdf}
445: S(p) = - \int_{X} \frac{\ud P}{\ud \mu} \ln \frac{\ud P}{\ud
446: \mu} \, \ud \mu \enspace,
447: \end{equation}
448: since (\ref{Equation:ME:ProbabilityMeasureInducedByaPdf})
449: implies\footnote{If a
450: nonnegative measurable function $f$ induces a measure $\nu$ on
451: measurable space $(X,\mathfrak{M})$ with respect to a measure
452: $\mu$, defined as $\nu(E) = \int_{E} f \, \ud \mu, \:\:\: \forall E \in
453: \mathfrak{M}$ then $\nu \ll \mu$. Converse is given by
454: Radon-Nikodym theorem~\cite[pp.36, Theorem
455: 1.40(b)]{Kantorovitz:2003:IntroductionToModernAnalysis}.} $P
456: \ll \mu$, and pdf $p$ is the
457: Radon-Nikodym derivative of $P$ w.r.t $\mu$.
458:
459: Now we proceed to the definition of Kullback-Leibler
460: relative-entropy or KL-entropy for probability measures.
461: %Definition:Kullback-Leibler Relative-Entropy1
462: \begin{definition}
463: \label{Definition:ME:RelativeEntropy_1}
464: Let $(X,\mathfrak{M})$ be a measurable space. Let $P$ and $R$
465: be two probability measures on $(X,\mathfrak{M})$. Kullback-Leibler
466: relative-entropy KL-entropy of $P$ relative to $R$ is
467: defined as
468: \begin{equation}
469: \label{Equation:ME:RelativeEntropyOfProbabilityMeasures}
470: I(P\|R) = \left\{ \begin{array}{ll}
471: \displaystyle{\int_{X} \ln \frac{\ud P}{\ud R} \, \ud P } &
472: \:\:\:\:\:\textrm{if}\:\:\:\:\: P \ll R \enspace, \\ \\
473: +\infty & \:\:\:\:\:\textrm{otherwise.}
474: \end{array} \right.
475: \end{equation}
476: \end{definition}%EndDefinition:Kullback-Leiber Relative-Entropy1
477: The divergence inequality
478: $I(P\|R) \geq 0$ and $I(P\|R) =0$ if and only if $P=R$ can be
479: shown in this case too.
480: KL-entropy~(\ref{Equation:ME:RelativeEntropyOfProbabilityMeasures})
481: also can be written as
482: \begin{equation}
483: \label{Equation:ME:AnotherFormForRelativeEntropyOfProbabilityMeasures}
484: I(P\|R) = \int_{X} \frac{\ud P}{\ud R} \ln \frac{\ud P}{\ud R}
485: \, \ud R \enspace.
486: \end{equation}
487:
488: Let the $\sigma$-finite measure $\mu$ on $(X,\mathfrak{M})$
489: such that $P \ll R \ll \mu$. Since $\mu$ is $\sigma$-finite, from
490: Radon-Nikodym theorem, there exists a non-negative
491: $\mathfrak{M}$-measurable functions $p: X \rightarrow
492: \mathbb{R}^{+}$ and $r: X \rightarrow \mathbb{R}^{+}$ unique
493: $\mu$-{\em a.e}, such that
494: \begin{equation}
495: \label{Equation:ME:DefinitionOfPdf_p}
496: P(E) = \int_{E} p \, \ud \mu \enspace, \:\:\: \forall E \in \mathfrak{M} \enspace,
497: \end{equation}
498: and
499: \begin{equation}
500: \label{Equation:ME:DefinitionOfPdf_r}
501: R(E) = \int_{E} r \, \ud \mu \enspace, \:\:\: \forall E \in
502: \mathfrak{M} \enspace.
503: \end{equation}
504: The pdfs $p$ and $r$ in (\ref{Equation:ME:DefinitionOfPdf_p})
505: and (\ref{Equation:ME:DefinitionOfPdf_r}) (they are indeed
506: pdfs) are Radon-Nikodym
507: derivatives of probability measures $P$ and $R$ with respect
508: to $\mu$, respectively, i.e., $p =\frac{\ud P}{\ud \mu}$ and
509: $r=\frac{\ud R}{\ud \mu}$.
510: Now one can define relative-entropy of pdf $p$ w.r.t $r$ as
511: follows\footnote{This follows from the chain rule for
512: Radon-Nikodym derivative:
513: \begin{displaymath}
514: \frac{\ud P}{\ud R} \stackrel{\mathrm{a.e}}{=} \frac{\ud
515: P}{\ud \mu} {\left( \frac{\ud R}{\ud \mu} \right)}^{-1}\enspace.
516: \end{displaymath}
517: }.
518:
519: %Definition:KullbackLeibler Relative-Entropy2
520: \begin{definition}
521: \label{Definition:ME:RelativeEntropy_of_pdf}
522: Let $(X,\mathfrak{M},\mu)$ be a measure space. Let
523: $\mathfrak{M}$-measurable functions $p,r:X \rightarrow
524: {\mathbb{R}}^{+}$ be two pdfs. The KL-entropy of $p$
525: relative to $r$
526: is defined as
527: \begin{equation}
528: \label{Equation:ME:RelativeEntropy_of_pdf}
529: I(p\|r) = \int_{X} p(x) \ln \frac{p(x)}{r(x)} \, \ud \mu(x) \enspace,
530: \end{equation}
531: provided the integral on right exists.
532: \end{definition}%EndDefinition:KullbackLeibler Relative-Entropy2
533:
534: As we have mentioned earlier, KL-entropy
535: (\ref{Equation:ME:RelativeEntropy_of_pdf}) exist if the two
536: densities are absolutely continuous with respect to one
537: another. On the real line the same definition can be written
538: as
539: \begin{displaymath}
540: I(p\|r) = \int_{\mathbb{R}} p(x) \ln \frac{p(x)}{r(x)} \, \ud x \enspace,
541: \end{displaymath}
542: which exist if the densities $p(x)$ and $r(x)$ share the same support.
543: Here, in the sequel we use the convention
544: \begin{equation}
545: \ln 0 = - \infty, \:\:\:\:\:\:\:\:\:\:\: \ln \frac{a}{0} = + \infty\:\:
546: \mathrm{for any}\:\: a \in \mathbb{R}, \:\:\:\:\:\:\:\:\:\:\:
547: 0.(\pm \infty) = 0.
548: \end{equation}
549:
550: Now we turn to the definition of entropy functional on a
551: measure space.
552: Entropy functional in
553: ~(\ref{Equation:ME:ShannonEntropyOf-PM-inducedBy-pdf}) is defined
554: for a probability measure
555: that is induced by a pdf. By the Radon-Nikodym theorem, one can
556: define Shannon entropy for any arbitrary $\mu$-continuous probability measure as follows.
557: %Definition: Shannon entropy of Probability measure
558: \begin{definition}
559: \label{Definition:ME:ShannonEntropy_of_ProbabiliyMeasure}
560: Let $(X,\mathfrak{M},\mu)$ be a $\sigma$-finite measure
561: space. Entropy of any $\mu$-continuous probability measure $P$
562: ($P \ll \mu$) is defined as
563: \begin{equation}
564: \label{Equation:ME:ShannonEntropy_of_ProbabilityMeasure}
565: S(P) = - \int_{X} \ln \frac{\ud P}{\ud \mu} \, \ud P \enspace.
566: \end{equation}
567: \end{definition}
568: Properties of entropy of a probability measure in the
569: Definition~\ref{Definition:ME:ShannonEntropy_of_ProbabiliyMeasure} are
570: studied in detail by
571: Ochs~\cite{Ochs:1976:BasicPropertiesOfTheGeneralizedBoltzmann-Gibbs-ShannonEntropy}
572: under the name generalized Boltzmann-Gibbs-Shannon
573: Entropy. In the literature, one can find notation of the form
574: $S(P|\mu)$ to represent the entropy functional in
575: (\ref{Equation:ME:ShannonEntropy_of_ProbabilityMeasure}) viz., the
576: entropy of a
577: probability measure, to stress the role of the measure
578: $\mu$ (e.g~\cite{Ochs:1976:BasicPropertiesOfTheGeneralizedBoltzmann-Gibbs-ShannonEntropy,Athreya:1994:EntropyMaximization}). Since
579: all the information measures we define are with
580: respect to the measure $\mu$ on $(X, \mathfrak{M})$, we omit
581: $\mu$ in the entropy
582: functional notation.
583:
584: By assuming $\mu$ as a probability measure in the
585: Definition~\ref{Definition:ME:ShannonEntropy_of_ProbabiliyMeasure},
586: one can relate Shannon entropy with Kullback-Leibler entropy
587: as,
588: \begin{equation}
589: \label{Equation:ME:RelationBetweenMeasureTheoreticEntropyAndKullback}
590: S(P) = - I(P\|\mu) \enspace.
591: \end{equation}
592: Note that when $\mu$ is not a probability measure, the
593: divergence inequality $I(P\|\mu) \geq 0$ need not be
594: satisfied.
595:
596: A note on the
597: $\sigma$-finiteness of measure $\mu$. In the definition of
598: entropy functional we assumed that $\mu$ is a $\sigma$-finite
599: measure. This condition was used by
600: Ochs~\cite{Ochs:1976:BasicPropertiesOfTheGeneralizedBoltzmann-Gibbs-ShannonEntropy},
601: Csisz\'{a}r~\cite{Csiszar:1969:OnGeneralizedEntropy}
602: and
603: Rosenblatt-Roth~\cite{Rosenblatt-Roth:1964:TheConceptOfEntropyInProbabilityTheory}
604: to tailor the measure-theoretic definitions. For all practical
605: purposes and for most applications, this assumption is
606: satisfied. (See
607: \cite{Ochs:1976:BasicPropertiesOfTheGeneralizedBoltzmann-Gibbs-ShannonEntropy}
608: for a discussion on the physical interpretation of measurable space
609: $(X,\mathfrak{M})$ with $\sigma$-finite measure $\mu$ for
610: entropy measure of the
611: form~(\ref{Equation:ME:ShannonEntropy_of_ProbabilityMeasure}),
612: and of the relaxation $\sigma$-finiteness
613: condition.) By relaxing this condition, more universal
614: definitions of entropy functionals are studied
615: by Masani~\cite{Masani:1992:TheMeasureTheoreticAspectsOfEntropy_Part_1,Masani:1992:TheMeasureTheoreticAspectsOfEntropy_Part_2}.
616:
617: % In this thesis we will not go into those details.
618:
619: %---------------------------------------------------
620: \subsection{Interpretation of Discrete and Continuous Entropies in
621: terms of KL-entropy}
622: \label{SubSection:ME:MeasureTheoreticCasesinDiscrete}
623: \noindent
624: First, let us consider discrete case of $(X, \mathfrak{M},
625: \mu)$, where $X= \{x_{1}, \ldots, x_{n} \} $, $\mathfrak{M} =
626: 2^{X}$ and $\mu$ is a cardinality probability measure. Let $P$
627: be any probability measure on $(X, \mathfrak{M})$. Then $\mu$
628: and $P$ can be specified as follows.
629: \begin{displaymath}
630: \mu \mbox{:} \:\:\: {\mu}_{k} = \mu(\{x_{k}\}) \geq 0, \:\:k = 1,
631: \ldots, n, \:\:\:\sum_{k=1}^{n}
632: \mu_{k} =1 \enspace, \:\:\: %\mbox{and}
633: \end{displaymath}
634: and
635: \begin{displaymath}
636: P \mbox{:}\:\:\: P_{k} = P(\{x_{k}\}) \geq 0 , \:\:k =1,
637: \ldots, n, \:\:\: \sum_{k=1}^{n} P_{k} =1 \enspace.
638: \end{displaymath}
639: The probability measure $P$ is absolutely
640: continuous with respect to the probability measure $\mu$ if
641: $\mu_{k} =0$ implies $P_{k} =0$ for any $k=1,\ldots n$. The
642: corresponding Radon-Nikodym
643: derivative of $P$ with respect to $\mu$ is given by
644: \begin{displaymath}
645: \frac{\ud P}{\ud \mu}(x_{k}) = \frac{P_{k}}{\mu_{k}}, \,
646: k = 1, \ldots n \enspace.
647: \end{displaymath}
648: The measure-theoretic entropy $S(P)$
649: (\ref{Equation:ME:ShannonEntropy_of_ProbabilityMeasure}),
650: in this case, can be written as
651: \begin{displaymath}
652: S(P) = - \sum_{k=1}^{n} P_{k}\ln \frac{P_{k}}{\mu_{k}} =
653: \sum_{k=1}^{n} P_{k} \ln \mu_{k} - \sum_{k=1}^{n} P_{k} \ln
654: P_{k} \enspace.
655: \end{displaymath}
656: If we take referential
657: probability measure $\mu$ as a uniform probability
658: distribution on the set $X$, i.e. $\mu_{k} = \frac{1}{n}$, we obtain
659: \begin{equation}
660: \label{Equation:ME:RelativionBetweenMeasureTheoreticAndDiscreteEntropies}
661: S(P) = S_{n}(P) - \ln n \enspace,
662: \end{equation}
663: where $S_{n}(P)$ denotes the Shannon entropy of pmf $P =
664: (P_{1}, \ldots, P_{n})$ and $S(P)$ denotes
665: the
666: measure-theoretic entropy in the discrete case.
667:
668: Now, lets consider the continuous case of
669: $(X,\mathfrak{M},\mu)$, where $X = [a,b] \subset \mathbb{R}$,
670: $\mathfrak{M}$ is set of Lebesgue measurable sets of $[a,b]$,
671: and $\mu$ is the Lebesgue probability measure. In this case
672: $\mu$ and $P$ can be specified as follows.
673: \begin{displaymath}
674: \mu \mbox{:}\:\:\: \mu(x) \geq 0 , x \in
675: [a,b], \ni \mu(E) = \int_{E} \mu(x) \, \ud x, \forall E \in
676: \mathfrak{M}, \: \int_{a}^{b} \mu(x)\, \ud x =1 \enspace,
677: \end{displaymath}
678: and
679: \begin{displaymath}
680: P \mbox{:}\:\:\: P(x) \geq 0 , x \in
681: [a,b], \ni P(E) = \int_{E} P(x) \, \ud x, \forall E \in \mathfrak{M}, \:\int_{a}^{b} P(x)\, \ud x =1 \enspace.
682: \end{displaymath}
683: Note the abuse of notation in the above specification of
684: probability measures $\mu$ and $P$, where we have used the same
685: symbols for both measures and pdfs.
686:
687:
688: The probability measure $P$ is absolutely continuous with
689: respect to the probability measure $\mu$, if $\mu(x)=0$ on a
690: set of a positive Lebesgue measure implies
691: that $P(x)=0$ on the same
692: set. The Radon-Nikodym derivative of the probability measure
693: $P$ with respect to the probability measure $\mu$ will be
694: \begin{displaymath}
695: \frac{\ud P}{\ud \mu}(x) = \frac{P(x)}{\mu(x)} \enspace.
696: \end{displaymath}
697: Then the measure-theoretic entropy $S(P)$ in this case
698: can be written as
699: \begin{displaymath}
700: S(P) = - \int_{a}^{b} P(x) \ln \frac{P(x)}{\mu(x)} \, \ud x
701: \enspace.
702: \end{displaymath}
703: If we take referential probability measure $\mu$ as a uniform
704: distribution, i.e. $\mu(x) = \frac{1}{b-a}$, $x \in [a,b]$,
705: then we obtain
706: \begin{displaymath}
707: \label{Equation:ME:RelativionBetweenMeasureTheoreticAndContinuousEntropies}
708: S(P) = S_{[a,b]}(P) - \ln (b-a) \enspace,
709: \end{displaymath}
710: where $S_{[a,b]}(P)$ denotes the Shannon entropy of pdf
711: $P(x)$, $x \in [a,b]$ (\ref{Equation:ME:ContinuousEntropy})
712: and $S(P)$ denotes the measure-theoretic entropy in the
713: continuous case.
714:
715: Hence, one can conclude that
716: measure theoretic entropy $S(P)$ defined for a probability measure $P$ on
717: the measure space $(X,\mathcal{M},\mu)$, is equal to both Shannon
718: entropy in the discrete and continuous case case up to an
719: additive constant, when the reference measure $\mu$ is chosen as a uniform
720: probability distribution.
721: On the other hand, one can see that measure-theoretic KL-entropy,
722: in discrete and continuous cases are equal to its discrete and
723: continuous definitions.
724:
725: Further, from
726: (\ref{Equation:ME:RelationBetweenMeasureTheoreticEntropyAndKullback}) and
727: (\ref{Equation:ME:RelativionBetweenMeasureTheoreticAndDiscreteEntropies}),
728: we can write Shannon Entropy in terms Kullback-Leibler
729: relative entropy
730: \begin{equation}
731: S_{n}(P) = \ln n - I(P \| \mu) \enspace.
732: \end{equation}
733: Thus, Shannon entropy appearers as being (up to an additive
734: constant) the variation of information when we pass from the
735: initial uniform probability distribution to new probability
736: distribution given by $P_{k} \geq 0$, $\sum_{k=1}^{n} P_{k}
737: =1$, as any such probability distribution is obviously
738: absolutely continuous with respect to the uniform discrete
739: probability distribution.
740: Similarly, by
741: (\ref{Equation:ME:RelationBetweenMeasureTheoreticEntropyAndKullback})
742: and
743: (\ref{Equation:ME:RelativionBetweenMeasureTheoreticAndContinuousEntropies})
744: the relation between Shannon entropy and Relative entropy in
745: discrete case
746: we can write Boltzmann H-function in terms of Relative entropy
747: as
748: \begin{equation}
749: S_{[a,b]}(p) = \ln (b-a) - I(P \| \mu) \enspace.
750: \end{equation}
751: Therefore, the continuous entropy or Boltzmann H-function
752: $S(p)$ may be interpreted as being (up to an additive
753: constant) the variation of information when we pass from the
754: initial uniform probability distribution on the interval
755: $[a,b]$ to the new probability measure defined by the
756: probability distribution function $p(x)$ (any such
757: probability measure is absolutely continuous with respect to
758: the uniform probability distribution on the interval
759: $[a,b]$).
760:
761: Thus, KL-entropy equips one with unitary interpretation of both
762: discrete entropy and continuous entropy.
763: One can utilize Shannon entropy in the continuous case,
764: as well as Shannon entropy in the discrete
765: case, both being interpreted as the variation of information
766: when we pass from the initial uniform distribution to the
767: corresponding probability measure.
768:
769: Also,
770: since measure theoretic entropy is equal to the discrete and
771: continuous entropy upto an additive constant, ME prescriptions
772: of measure-theoretic Shannon entropy are consistent with
773: discrete case and the continuous case.
774:
775: %=======================Section:================================
776: \section{Measure-Theoretic Definitions of Generalized Information
777: Measures}
778: \label{Section:ME:MeasureTheoreticDefinitionsOfGeneralizedInformationMeasures}
779: \noindent
780: % In this section we extend the measure-theoretic definitions to
781: % generalized information measures discussed in
782: % Chapter~\ref{Chapter:KN}.
783: We begin with a brief note on the notation and assumptions
784: used.
785: We define all the information measures
786: on the measurable space $(X,\mathfrak{M})$, and default reference
787: measure is $\mu$ unless otherwise stated.
788: To avoid clumsy formulations, we will not
789: distinguish between functions differing on a $\mu$-null set
790: only; nevertheless, we can work with equations between
791: $\mathfrak{M}$-measurable functions on $X$ if they are
792: stated as valid as being only $\mu$-almost everywhere ($\mu$-a.e or
793: a.e).
794: Further we assume that all the quantities of interest
795: exist and assume, implicitly, the $\sigma$-finiteness of $\mu$ and
796: $\mu$-continuity of probability measures whenever
797: required. Since these assumptions repeatedly occur in various
798: definitions and formulations, these will not be mentioned in
799: the sequel.
800: With these assumptions we do not distinguish between
801: an information measure of pdf $p$ and of corresponding probability
802: measure $P$ -- hence we give definitions of
803: information measures for pdfs, we use corresponding
804: definitions of probability measures as well, when ever it is
805: convenient or required -- with the understanding that $P(E) = \int_{E} p\,
806: \ud \mu $, the converse being due to the Radon-Nikodym theorem, where $p =
807: \frac{\ud P}{\ud \mu}$. In both the cases we have $P \ll \mu$.
808:
809: First we consider the R\'{e}nyi generalizations.
810: Measure-theoretic definition of R\'{e}nyi entropy can be given
811: as follows.
812: %DEFINITION: Measure-theoretic definition of Renyi entropy
813: \begin{definition}
814: \label{Definition:ME:Measure-TheoreticRenyiEntropy}
815: R\'{e}nyi entropy
816: of a pdf $p:X \rightarrow {\mathbb{R}}^{+}$ on a measure space
817: $(X,\mathfrak{M},\mu)$ is defined as
818: \begin{equation}
819: \label{Equation:ME:RenyiEntropyOf-pdf}
820: S_{\alpha}(p) = \frac{1}{1-\alpha} \ln
821: \int_{X}p(x)^{\alpha}\, \ud \mu(x) \enspace,
822: \end{equation}
823: provided the integral on the right exists and $\alpha \in
824: \mathbb{R}$, $\alpha > 0$.
825: \end{definition}%EndDEFINITION: Measure-theoretic definition of RenyiEntropy
826: The same can be defined for any $\mu$-continuous probability
827: measure $P$ as
828: \begin{equation}
829: \label{Equation:ME:RenyiEntropyOf-PM}
830: S_{\alpha}(P) = \frac{1}{1-\alpha} \ln \int_{X}
831: {\left( \frac{\ud P}{\ud \mu} \right)}^{\alpha -1} \, \ud P \enspace.
832: \end{equation}
833: On the other hand, R\'{e}nyi relative-entropy can be defined as
834: follows.
835: %DEFINITION: Measure-theoretic definition of Tsallis relative entropy
836: \begin{definition}
837: Let $p,r:X \rightarrow
838: {\mathbb{R}}^{+}$ be two pdfs on measure space $(X,\mathfrak{M},\mu)$. The
839: R\'{e}nyi relative-entropy of $p$ relative to $r$
840: is defined as
841: \begin{equation}
842: \label{Equation:ME:RenyiRelativeEntropyOf-pdf}
843: I_{\alpha}(p\|r) = \frac{1}{\alpha -1} \ln \int_{X}
844: \frac{p(x)^{\alpha}}{r(x)^{\alpha -1}} \, \ud \mu(x) \enspace,
845: \end{equation}
846: provided the integral on the right exists and $\alpha \in
847: \mathbb{R}$, $\alpha > 0$.
848: \end{definition}%EndDEFINITION: Measure-theoretic definition of Tsallis
849: %relative entropy
850: The same can be written in terms of probability measures as,
851: \begin{eqnarray}
852: \label{Equation:ME:RenyiRelativeEntropyOf-PMs}
853: I_{\alpha}(P\|R) &=& \frac{1}{\alpha -1} \ln \int_{X}
854: {\left( \frac{\ud P}{\ud R} \right)}^{\alpha -1} \, \ud P
855: \nonumber \\
856: &=& \frac{1}{\alpha -1} \ln \int_{X}
857: {\left( \frac{\ud P}{\ud R} \right)}^{\alpha} \, \ud R
858: \enspace,
859: \end{eqnarray}
860: whenever $P \ll R$; $I_{\alpha}(P \|R) = + \infty$, otherwise.
861: Further if we assume $\mu$ in
862: (\ref{Equation:ME:RenyiEntropyOf-PM}) is a probability measure
863: then
864: \begin{equation}
865: \label{Equation:ME:Renyi_EntropyandRelativeEntropy}
866: S_{\alpha}(P) = I_{\alpha}(P\|\mu) \enspace.
867: \end{equation}
868:
869: Tsallis entropy in the measure theoretic setting can be defined as
870: follows.
871: %DEFINITION: Measure-theoretic definition of Tsallis entropy
872: \begin{definition}
873: \label{Definition:ME:Measure-TheoreticTsallisEntropy}
874: Tsallis entropy of a pdf $p$ on $(X,\mathfrak{M},\mu)$ is
875: defined as
876: \begin{equation}
877: \label{Equation:ME:TsallisEntropyOf-pdf}
878: S_{q}(p) = \int_{X} p(x) \ln_{q} \frac{1}{p(x)}\, \ud \mu(x) =
879: \frac{1 - \int_{X} p(x)^{q}\, \ud \mu(x) }{q-1}
880: \enspace,
881: \end{equation}
882: provided the integral on the right exists and $q \in
883: \mathbb{R}$ and $q > 0$.
884: \end{definition}%EndDEFINITION: Measure-theoretic definition
885: %of TsallisEntropy
886:
887: $\ln_{q}$ in
888: (\ref{Equation:ME:TsallisEntropyOf-pdf}) is referred to as
889: $q$-logarithm and is defined as $\ln_{q} x = \frac{\displaystyle
890: x^{1-q} -1}{\displaystyle 1-q}
891: \:\:\: (x >0, q \in {\mathbb{R}})$.
892: The same can be defined for $\mu$-continuous probability
893: measure $P$, and can be written as
894: \begin{equation}
895: \label{Equation:ME:TsallisEntropyOf-PM}
896: S_{q}(P) = \int_{X} \ln_{q} {\left(\frac{\ud P}{\ud \mu}\right)}^{-1}
897: \, \ud P \enspace.
898: \end{equation}
899:
900: The definition of Tsallis relative-entropy is given below.
901: %DEFINITION: Measure-theoretic definition of Tsallis relative entropy
902: \begin{definition}
903: Let $(X,\mathfrak{M},\mu)$ be a measure space. Let $p,r:X \rightarrow
904: {\mathbb{R}}^{+}$ be two probability density functions. The
905: Tsallis relative-entropy of $p$ relative to $r$
906: is defined as
907: \begin{equation}
908: \label{Equation:ME:TsallisRelativeEntropyOf-pdf}
909: I_{q}(p\|r) = - \int_{X} p(x) \ln_{q} \frac{r(x)}{p(x)}\, \ud
910: \mu(x) = \frac{\int_{X} \frac{p(x)^{q}}{r(x)^{q-1}}\,
911: \ud \mu(x) -1 }{q-1}
912: \end{equation}
913: provided the integral on right exists and $q \in
914: \mathbb{R}$ and $q > 0$.
915: \end{definition}%EndDEFINITION: Measure-theoretic definition of Tsallis
916: %relative entropy
917: The same can be written for two probability measures $P$ and
918: $R$, as
919: \begin{equation}
920: I_{q}(P\|R)= - \int_{X} \ln_{q} {\left(\frac{\ud P}{\ud R}\right)}^{-1}\,
921: \ud P \enspace,
922: \end{equation}
923: whenever $P \ll R$; $I_{q}(P \|R) = + \infty$, otherwise.
924: If $\mu$ in
925: (\ref{Equation:ME:TsallisEntropyOf-PM}) is a probability measure
926: then
927: \begin{equation}
928: \label{Equation:ME:Tsallis_EntropyandRelativeEntropy}
929: S_{q}(P) = I_{q}(P\|\mu) \enspace.
930: \end{equation}
931:
932: % We discuss the relations between generalized entropic
933: % functionals in measure-theoretic case to discrete or continuous
934: % case in
935: % \S~\ref{Section:ME:MeasureTheoreticDefinitions_Revisited}. The
936: % reason for this is the various relations discussed for
937: % classical information measures cannot be extended to the
938: % generalized case. As we are going to see contrary to the
939: % classical case, where consistency of ME-prescriptions of measure-theoretic
940: % definitions with discrete or continuous case can be argued
941: % without invoking ME-prescriptions, consistent arguments for measure-theoretic
942: % generalized entropy functionals involve explicitly
943: % ME-prescriptions. Hence it is important for us to discuss the
944: % ME-prescriptions in generalized case. First we briefly review
945: % the ME-prescriptions in the classical case.
946:
947: %=========================Section:=====================================
948: \section{Maximum Entropy and Canonical Distributions}
949: \label{Section:ME:MaximumEntropyAndCanonicalDistributions}
950: \noindent
951: For all the ME prescriptions of classical information measures
952: we consider set of constrains of the form
953: \begin{equation}
954: \label{Equation:ME:ExpectationConstraints}
955: \int_{X} u_{m} \, \ud P = \int_{X} u_{m}(x) p(x) \, \ud \mu(x) =
956: \langle u_{m} \rangle \enspace, \:\:\:m =
957: 1, \ldots , M \enspace,
958: \end{equation}
959: with respect to $\mathfrak{M}$-measurable functions $u_{m}: X
960: \rightarrow \mathbb{R}, \:\: m = 1, \ldots M$, whose expectation
961: values $\langle u_{m} \rangle, \, m=1,\ldots M$ are (assumed
962: to be) {\it a priori} known, along with the normalizing
963: constraint $\int_{X} \, \ud P =1$.
964: (From now on we assume that any set of constraints on
965: probability distributions implicitly includes this
966: constraint, which will not be mentioned in the sequel.)
967:
968: %-----Note on the notation for next chapter...
969: % A note on the notation: To avoid proliferation of symbols we
970: % use the same notation for the minimum or maximum entropy
971: % distributions and Lagrange multipliers in the various case;
972: % the correspondence should be clear from the context. In the
973: % maximum entropy case use $Z$ for the partition function and in
974: % minimum entropy case we $\widehat{Z}$.
975:
976: To maximize the
977: entropy~(\ref{Equation:ME:ShannonEntropyOf-pdf})
978: with respect
979: to the constraints~(\ref{Equation:ME:ExpectationConstraints}), the
980: solution is calculated via the Lagrangian:
981: {\setlength\arraycolsep{0pt}
982: \begin{eqnarray}
983: \label{Equation:ME:LagranginForMaximumEntropy}
984: \mathcal{L}(x, \lambda, \beta) = - \int_{X} \ln \frac{\ud
985: P}{\ud \mu}(x)&& \, \ud P(x) - \lambda \left(\int_{X}\, \ud P(x) - 1
986: \right) \nonumber \\
987: && - \sum_{m=1}^{M} \beta_{m} \left(\int_{X} u_{m}(x)\, \ud P(x) -
988: \langle u_{m} \rangle \right) \enspace,
989: \end{eqnarray}}
990: where $\lambda$ and $\beta_{m}\, m=1,\ldots,M$ are Lagrange
991: parameters (we use the notation $\beta = (\beta_{1}, \ldots, \beta_{M})$).
992: \noindent
993: The solution is given by
994: \begin{displaymath}
995: \ln \frac{\ud P}{\ud \mu}(x) + \lambda + \sum_{m=1}^{M}
996: \beta_{m} u_{m}(x) = 0 \enspace.
997: \end{displaymath}
998: The solution can be calculated as
999: \begin{equation}
1000: \ud P(x, \beta) = \exp \left( -\ln Z(\beta) - \sum_{m=1}^{M}
1001: \beta_{m} u_{m}(x)\right) \ud \mu(x)
1002: \end{equation}
1003: or
1004: \begin{equation}
1005: p(x) = \frac{\ud P}{\ud \mu} (x) = \frac{e^{ -
1006: \sum_{m=1}^{M} \beta_{m}
1007: u_{m}(x)}}{Z(\beta)} \enspace,
1008: \end{equation}
1009: where the partition function $Z(\beta)$ is written as
1010: \begin{equation}
1011: \label{Equation:PartitionFunctionForMaximumEntropy}
1012: Z(\beta) = \int_{X} \exp \left( - \sum_{m=1}^{M} \beta_{m}
1013: u_{m}(x)\right) \ud \mu(x) \enspace.
1014: \end{equation}
1015: The Lagrange parameters $\beta_{m},\: m = 1, \ldots M$ are
1016: specified by the set of
1017: constraints (\ref{Equation:ME:ExpectationConstraints}).
1018:
1019: The maximum entropy, denoted by $S$, can be calculated as
1020: \begin{equation}
1021: \label{Equation:ME:MaximumEntropy}
1022: S = \ln Z + \sum_{m=1}^{M} \beta_{m} \langle u_{m} \rangle \enspace.
1023: \end{equation}
1024:
1025: The Lagrange parameters $\beta_{m},\: m = 1, \ldots M$, are
1026: calculated by searching the unique solution (if it exists) of the
1027: following system of nonlinear equations:
1028: \begin{equation}
1029: \label{Equation:ME:MaximumEntropy_ThermodynamicEquation_1}
1030: \frac{\partial}{\partial \beta_{m}} \ln Z(\beta) = - \langle
1031: u_{m} \rangle \enspace, \:\:\:m = 1, \ldots M \enspace.
1032: \end{equation}
1033: We also have
1034: \begin{equation}
1035: \label{Equation:ME:MaximumEntropy_ThermodynamicEquation_2}
1036: \frac{\partial S}{\partial \langle u_{m} \rangle} = -
1037: \beta_{m} \enspace, \:\:\: m = 1, \ldots M \enspace.
1038: \end{equation}
1039: Equations
1040: (\ref{Equation:ME:MaximumEntropy_ThermodynamicEquation_1}) and
1041: (\ref{Equation:ME:MaximumEntropy_ThermodynamicEquation_1}) are
1042: referred to as the thermodynamic equations.
1043:
1044: %================================Section:===================================
1045: \section{ME prescription for Tsallis Entropy}
1046: \label{Section:ME:ME-prescriptionForTsallisEntropy}
1047: \noindent
1048: The great success of Tsallis entropy is
1049: attributed to the power-law distributions one can derive as
1050: maximum entropy distributions by maximizing Tsallis entropy
1051: with respect to the moment constraints. But there are
1052: subtilities involved in the choice of constraints one would
1053: choose for ME prescriptions of these
1054: entropy functionals. These subtilities are still part of the
1055: major discussion in the nonextensive formalism~\cite{FerriMartinezPlastino:2005:TheRoleOfConstraintsInTsallisNonextensiveTreatmentRevisited,AbeBagci:2005:NecessityOfqExpectation,WadaScarfone:2005:ConnectionsBetweenTsallisFormalismEtc}.
1056:
1057: In the nonextensive formalism maximum entropy distributions
1058: are derived with respect to the constraints which are
1059: different from (\ref{Equation:ME:ExpectationConstraints}),
1060: which are used for classical information measures. The
1061: constraints of the
1062: form~(\ref{Equation:ME:ExpectationConstraints}) are
1063: inadequate for handling the serious mathematical difficulties
1064: (see~\cite{TsallisMendesPlastino:1998:TheRoleOfConstraints}). To
1065: handle these difficulties constraints of the form
1066: \begin{equation}
1067: \label{Equation:ME:Normalized-q-ExpectationConstraints}
1068: \frac{\int_{X} u_{m}(x) p(x)^{q} \, \ud \mu(x)}{\int_{X}
1069: p(x)^{q}\, \ud \mu(x)} = {\langle\langle u_{m} \rangle\rangle}_{q} \enspace, m =
1070: 1, \ldots , M
1071: \end{equation}
1072: are proposed.
1073: (\ref{Equation:ME:Normalized-q-ExpectationConstraints}) can
1074: be considered as the expectation with respect to the
1075: modified probability measure $P_{(q)}$ (it is indeed a
1076: probability measure) defined as
1077: \begin{equation}
1078: P_{(q)}(E) = {\left( \int_{X} p(x)^{q} \, \ud \mu
1079: \right)}^{-1} \int_{E} p(x)^{q} \, \ud \mu \enspace.
1080: \end{equation}
1081: The measure $ P_{(q)}$ is known as escort probability
1082: measure.
1083:
1084: The variational principle for Tsallis entropy maximization
1085: with respect to
1086: constraints~(\ref{Equation:ME:Normalized-q-ExpectationConstraints})
1087: can be written as
1088: \begin{eqnarray}
1089: \label{Equation:ME:Lagrangin_TsallisMaximumEntropy_wrt_Norm-q-Expt}
1090: \mathcal{L}(x, \lambda, \beta) = &&\int_{X} \ln_{q}
1091: \frac{1}{p(x)} \, \ud P(x) - \lambda \left(\int_{X}\, \ud P(x) - 1
1092: \right) \nonumber \\
1093: && - \sum_{m=1}^{M} \beta^{(q)}_{m} \left(\int_{X} {p(x)}^{q-1}
1094: \left(u_{m}(x) - {\langle\langle u_{m} \rangle\rangle}_{q}
1095: \right) \, \ud P(x) \right) \enspace,
1096: \end{eqnarray}
1097: where the parameters $\beta_{m}^{(q)}$ can be defined in
1098: terms of true Lagrange parameters $\beta_{m}$ as
1099: \begin{equation}
1100: \beta_{m}^{(q)} = {\left(\int_{X} p(x)^{q}\, \ud \mu
1101: \right)}^{-1} \beta_{m}\enspace, \, m = 1, \ldots, M.
1102: \end{equation}
1103: The maximum entropy distribution in this case can be written
1104: as
1105: \begin{equation}
1106: \label{Equation:ME:TsallisMaximumEntropyDistribution_wrt_q-Expt}
1107: p(x) = \frac{\displaystyle {\left[ 1 - (1-q) {\left( \int
1108: dx\,{p(x)}^{q} \right)}^{-1} \sum_{m=1}^{M} \beta_{m} \left( u_{m}(x) -
1109: {\langle\langle {u}_{m} \rangle\rangle}_{q} \right) \right]}^{\frac{1}{1-q}}}
1110: {\displaystyle {\overline{Z_{q}}} }
1111: \end{equation}
1112:
1113:
1114: \begin{equation}
1115: \label{Equation:ME:TsallisMaximumEntropyDistribution_wrt_q-Expt_q-exponentialForm}
1116: p(x) = \frac{\displaystyle e_{q}^{- {\left(\int_{X} p(x)^{q}\, \ud \mu
1117: \right)}^{-1} \sum_{m=1}^{M} \beta_{m} (u_{m}(x) -
1118: {\langle\langle u_{m}\rangle\rangle}_{q} )
1119: }}{\displaystyle \overline{Z_{q}}} \enspace,
1120: \end{equation}
1121: where
1122: \begin{equation}
1123: \overline{Z_{q}} = \int_{X} {e_{q}^{- {\left(\int_{X} p(x)^{q}\, \ud \mu
1124: \right)}^{-1} \sum_{m=1}^{M} \beta_{m} (u_{m}(x) -
1125: {\langle\langle u_{m}\rangle\rangle}_{q} ) }} \, \ud \mu(x) \enspace.
1126: \end{equation}
1127:
1128: Maximum Tsallis entropy in this case satisfies
1129: \begin{equation}
1130: S_{q} = \ln_{q}\overline{{Z}_{q}} \enspace,
1131: \end{equation}
1132: while corresponding thermodynamic equations can be written
1133: as
1134: \begin{equation}
1135: \frac{\partial}{\partial \beta_{m}} \ln_{q} Z_{q} = -
1136: {\langle\langle{{u}_{m}}\rangle\rangle}_{q} \enspace, \:\:\: m = 1, \ldots M
1137: \enspace,
1138: \end{equation}
1139: \begin{equation}
1140: \frac{\partial S_{q}}{\partial
1141: {\langle\langle{{u}_{m}}\rangle\rangle}_{q} } = -
1142: \beta_{m} \enspace, \:\:\: m =1, \ldots M \enspace,
1143: \end{equation}
1144: where
1145: \begin{equation}
1146: \ln_{q} Z_{q} = \ln_{q} \overline{{Z}_{q}}
1147: - \sum_{m=1}^{M} \beta_{m}
1148: {\langle\langle{{u}_{m}}\rangle\rangle}_{q} \enspace.
1149: \end{equation}
1150:
1151: %=============================================================================
1152: \section{Measure-Theoretic Definitions: Revisited}
1153: \label{Section:ME:MeasureTheoreticDefinitions_Revisited}
1154: \noindent
1155: It is well known that unlike Shannon entropy, Kullback-Leibler
1156: relative-entropy in the discrete
1157: case can be extended naturally to the measure-theoretic
1158: case.
1159: In this section, we show
1160: that this fact is true for generalized relative-entropies
1161: too. R\'{e}nyi relative-entropy on continuous valued space
1162: $\mathbb{R}$ and its
1163: equivalence with the discrete case is studied
1164: by R\'{e}nyi~\cite{Renyi:1960:SomeFundamentalQuestionsOfInformationTheory}. Here,
1165: we present the result in the measure-theoretic case and
1166: conclude that both measure-theoretic definitions of Tsallis and
1167: R\'{e}nyi relative-entropies are equivalent to its discrete
1168: case.
1169:
1170: We also present a result pertaining to ME of
1171: measure-theoretic Tsallis entropy. We show that ME of Tsallis
1172: entropy in the measure-theoretic case is consistent with the
1173: discrete case.
1174:
1175: %-----------------------Sub Section------------------
1176: \subsection{On Measure-Theoretic Definitions of Generalized Relative-Entropies}
1177: \noindent
1178: Here we show that generalized relative-entropies in the
1179: discrete case can be naturally extended to measure-theoretic
1180: case, in the sense that measure-theoretic definitions can
1181: be defined as a limit of a sequence of finite discrete
1182: entropies of pmfs which approximate the pdfs involved.
1183: We call this
1184: sequence of pmfs as ``approximating sequence of pmfs of a
1185: pdf''. To formalize these aspects we need the following
1186: lemma.
1187: %--------------Lemmma-------------
1188: \begin{lemma}
1189: \label{Lemma:ME:ExistenceOfApproximatingSequenceOfSimpleFunctionsForPdf}
1190: Let $p$ be a pdf defined on measure space
1191: $(X,\mathfrak{M},\mu)$. Then there exists a sequence of simple
1192: functions $\{f_{n}\}$ (we refer to them as approximating sequence of
1193: simple functions of $p$) such that $\lim_{n \to \infty} f_{n} = p$
1194: and each $f_{n}$ can be written as
1195: \begin{equation}
1196: \label{Equation:ME:ActualDefinitionOfSeqenceOfSimpleFunctions}
1197: f_{n}(x) = \frac{1}{\mu(E_{n,k})} \int_{E_{n,k}} p \, \ud
1198: \mu \enspace, \:\:\:\:\:\:\: \forall x \in E_{n,k},
1199: \:\:\: k = 1, \ldots m(n) \enspace,
1200: \end{equation}
1201: where $(E_{n,1}, \ldots, E_{n,m(n)})$ is the measurable
1202: partition corresponding to $f_{n}$ (the notation $m(n)$
1203: indicates that $m$ varies with $n$). Further each $f_{n}$
1204: satisfies
1205: \begin{equation}
1206: \int_{X} f_{n} \, \ud \mu = 1 \enspace.
1207: \end{equation}
1208: \end{lemma}
1209: %Proof----
1210: \proof
1211: % \footnote{$ \cup_{k=1}^{m(n)} E_{n,k} = X$ and $E_{n,i}
1212: % \cap E_{n,j} = \emptyset$, $\forall i \neq j$}
1213: Define a sequence of simple functions $\{f_{n}\}$ as
1214: \begin{equation}
1215: f_{n}(x) = \left\{ \begin{array}{ll}
1216: \frac{1}{ \mu p^{-1} \left(
1217: \left[ \frac{k}{2^{n}}, \frac{k+1}{2^{n}} \right) \right)}
1218: \displaystyle \int_{ p^{-1} \left(
1219: \left[ \frac{k}{2^{n}}, \frac{k+1}{2^{n}} \right) \right)
1220: } p \, \ud \mu \enspace,& \: \:
1221: \:\:\textrm{if}\:\: \frac{k}{2^{n}} \leq p(x) <
1222: \frac{k+1}{2^{n}} , \\
1223: & \:\:\:k = 0, 1, \ldots n 2^{n}-1
1224: \\ \\
1225: \frac{1}{ \mu p^{-1} \left(
1226: \left[ n, \infty \right) \right)}
1227: \displaystyle \int_{ p^{-1} \left(
1228: \left[ n , \infty \right) \right)
1229: } p \, \ud \mu \enspace,& \: \:
1230: \:\:\textrm{if}\:\: n \leq p(x),
1231: \end{array} \right.
1232: \end{equation}
1233: Each $f_{n}$ is indeed a simple function and can be written as
1234: \begin{equation}
1235: f_{n} = \sum_{k=0}^{n2^{n}-1} \left( \frac{1}{\mu E_{n,k}}
1236: \int_{E_{n,k}} p\, \ud \mu \right) \chi_{E_{n,k}} + \left( \frac{1}{\mu
1237: F_{n}} \int_{F_{n}} p \, \ud \mu \right) \chi_{F_{n}} \enspace,
1238: \end{equation}
1239: where $E_{n,k} =
1240: p^{-1}\left(\left[\frac{k}{2^{n}},\frac{k+1}{2^{n}}\right)
1241: \right)$, $k= 0, \ldots, n2^{n}-1$ and $F_{n} = p^{-1} \left(
1242: \left[ n, \infty \right) \right)$.
1243: Since $\int_{E} p \, \ud \mu < \infty$ for any $E \in
1244: \mathfrak{M}$, we have $\int_{E_{n,k}} p\, \ud \mu = 0$
1245: whenever $\mu E_{n,k} =0$, for $k = 0, \ldots n2^{n} -1$. Similarly
1246: $\int_{F_{n}} p\, \ud \mu = 0$ whenever $\mu F_{n} =0$.
1247: Now we show that $\lim_{n \to \infty} f_{n} = p$, point-wise.
1248:
1249: First assume that $p(x) < \infty$. Then $\exists \: n \in
1250: {\mathbb{Z}}^{+} \ni p(x) \leq n$. Also $\exists \, k \in
1251: {\mathbb{Z}}^{+} $, $0 \leq k
1252: \leq n2^{n}-1
1253: \ni \frac{k}{2^{n}} \leq p(x) <
1254: \frac{k+1}{2^{n}}$ and $\frac{k}{2^{n}} \leq f_{n}(x) <
1255: \frac{k+1}{2^{n}}$. This implies $0 \leq |p - f_{n} | <
1256: \frac{1}{2^{n}}$ as required.
1257:
1258: If $p(x) = \infty$, for some $x \in X$, then $x \in F_{n}$ for
1259: all $n$, and therefore $f_{n}(x) \geq n$ for all $n$; hence
1260: $\lim_{n \to \infty} f_{n}(x) = \infty = p(x) $.
1261:
1262: Finally we have
1263: \begin{eqnarray}
1264: \int_{X} f_{n} \, \ud \mu &=& \sum_{k=1}^{n(m)} \left[
1265: \frac{1}{\mu(E_{n,k})} \int_{E_{n,k}} p \,\ud \mu \right]
1266: \mu(E_{n,k}) \nonumber \\
1267: &=& \sum_{k=1}^{n(m)} \int_{E_{n,k}} p \,\ud \mu \nonumber \\
1268: &=& \int_{X} p \, \ud \mu =1 \nonumber
1269: \end{eqnarray}
1270: \endproof
1271: %-------------End: lemmma-----------------
1272: The above construction of a sequence of simple functions which
1273: approximate a measurable function is similar to the
1274: approximation theorem~\cite[pp.6, Theorem
1275: 1.8(b)]{Kantorovitz:2003:IntroductionToModernAnalysis} in
1276: the theory of integration. But, approximation in
1277: Lemma~\ref{Lemma:ME:ExistenceOfApproximatingSequenceOfSimpleFunctionsForPdf}
1278: can be seen as a mean-value approximation where as in the later
1279: case it is the lower approximation. Further, unlike in the case
1280: of lower approximation, the sequence of simple functions
1281: which approximate $p$ in
1282: Lemma~\ref{Lemma:ME:ExistenceOfApproximatingSequenceOfSimpleFunctionsForPdf}
1283: are neither monotone nor satisfy $f_{n} \leq p$.
1284:
1285: Now one can define a sequence of pmfs $\{\tilde{p}_{n}\}$ corresponding
1286: to the sequence
1287: of simple functions constructed in
1288: Lemma~\ref{Lemma:ME:ExistenceOfApproximatingSequenceOfSimpleFunctionsForPdf},
1289: denoted by $\tilde{p}_{n} = (\tilde{p}_{n,1}, \ldots,\tilde{p}_{n,m(n)})$, as
1290: \begin{equation}
1291: \label{Equation:ME:ActualDefinitionOfSeqenceOfPmfs}
1292: \tilde{p}_{n,k} = \mu(E_{n,k})f_{n}\chi_{E_{n,k}} = \int_{E_{n,k}} p \, \ud
1293: \mu \enspace, k = 1, \ldots m(n),
1294: \end{equation}
1295: for any $n$.
1296: We have
1297: \begin{equation}
1298: \sum_{k=1}^{m(n)} \tilde{p}_{n,k} = \sum_{k=1}^{m(n)} \int_{E_{n,k}} p
1299: \, \ud \mu
1300: = \int_{X} p \, \ud \mu =1 \enspace,
1301: \end{equation}
1302: and hence $\tilde{p}_{n}$ is indeed a pmf.
1303: We call $\{\tilde{p}_{n}\}$ as the approximating sequence of pmfs of pdf
1304: $p$.
1305:
1306: % We say an measure-theoretic definition of an information
1307: % measure $\overline{S}$ is exact if
1308: % \begin{equation}
1309: % \lim_{n \to \infty} \overline{S}(P_{n}) = \overline{S}(p) \enspace.
1310: % \end{equation}
1311:
1312: Now we present our main theorem, where we assume that $p$ and
1313: $r$ are bounded. The
1314: assumption of boundedness of $p$ and $r$ simplifies the
1315: proof. However, the result can be
1316: extended to an unbounded
1317: case. See~\cite{Renyi:1959:OnTheDimensionAndEntropyOfProbabilityDistributions}
1318: analysis of Shannon entropy and relative entropy on $\mathbb{R}$.
1319: %THEOREM:Measure-theoretic definition of generalized relative entropies.
1320: \begin{theorem}
1321: \label{Theorem:ME:MeasureTheoreticDefinitionsOfGeneralizedRelative-Entropies}
1322: Let $p$ and $r$ be pdf, which are bounded, defined on a
1323: measure space $(X,\mathfrak{M}, \mu)$. Let $\tilde{p}_{n}$
1324: and $\tilde{r}_{n}$ be the approximating sequence of pmfs of $p$ and $r$
1325: respectively. Let $I_{\alpha}$ denotes the R\'{e}nyi relative-entropy as
1326: in~(\ref{Equation:ME:RenyiRelativeEntropyOf-pdf}) and
1327: $I_{q}$ denote the Tsallis
1328: relative-entropy as
1329: in~(\ref{Equation:ME:TsallisRelativeEntropyOf-pdf})
1330: then
1331: \begin{equation}
1332: \label{Equation:ME:InRenyisTheoremStatement_2}
1333: \lim_{n \to \infty} I_{\alpha}(\tilde{p}_{n} \| \tilde{r}_{n}) = I_{\alpha}(p\|r)
1334: \end{equation}
1335: and
1336: \begin{equation}
1337: \label{Equation:ME:InRenyisTheoremStatement_1}
1338: \lim_{n \to \infty} I_{q}(\tilde{p}_{n} \| \tilde{r}_{n}) = I_{q}(p\|r)
1339: \end{equation}
1340: \end{theorem}
1341: \proof
1342: It is enough to prove the result for either Tsallis or
1343: R\'{e}nyi since each are monotone and continuous functions of
1344: each other. Hence we write down the proof for the case of R\'{e}nyi
1345: and we use the entropic index $\alpha$ in the proof.
1346:
1347: Corresponding to pdf $p$, let $\{f_{n}\}$ be the approximating
1348: sequence of simple functions such that $\lim_{n \to \infty}
1349: f_{n} = p$ as in
1350: Lemma~\ref{Lemma:ME:ExistenceOfApproximatingSequenceOfSimpleFunctionsForPdf}.
1351: Let $\{g_{n}$ be the approximating sequence of simple
1352: functions for $r$ such that $\lim_{n \to \infty} g_{n} = r$.
1353: Corresponding
1354: to simple functions $f_{n}$ and $g_{n}$ there exists a common
1355: measurable partition\footnote{Let $\varphi$ and $\phi$ are two
1356: simple functions defined on $(X,\mathfrak{M})$. Let $\{E_{1},
1357: \ldots E_{n}\}$ and $\{F_{1},\ldots, F_{m}\}$ be the measurable
1358: partitions corresponding to $\varphi$ and $\phi$
1359: respectively. Then partition defined as $\{E_{i} \cap E_{j} |
1360: i = 1, \ldots n,\:\: j =1, \ldots m\}$ is a common measurable
1361: partition for both $\varphi$ and $\phi$.}
1362: $\{ E_{n,1}, \ldots E_{n,m(n)}\}$ such
1363: that $f_{n}$ and $g_{n}$ can be written as
1364: \begin{equation}
1365: \label{Equation:ME:InRenyisTheorem_1_a}
1366: f_{n}(x) = \sum_{k=1}^{m(n)} (a_{n,k})
1367: \chi_{E_{n,k}}(x) \enspace, \:\:\: a_{n,k} \in
1368: {\mathbb{R}}^{+}, \, \forall k = 1, \ldots m(n) \enspace,
1369: \end{equation}
1370: \begin{equation}
1371: \label{Equation:ME:InRenyisTheorem_1_b}
1372: g_{n}(x) = \sum_{k=1}^{m(n)} (b_{n,k})
1373: \chi_{E_{n,k}}(x) \enspace, \:\:\: b_{n,k} \in
1374: {\mathbb{R}}^{+}, \, \forall k = 1, \ldots m(n) \enspace,
1375: \end{equation}
1376: where $\chi_{E_{n,k}}$ is the characteristic function of
1377: $E_{n,k}$, for $k=1,\ldots m(n)$. By
1378: (\ref{Equation:ME:InRenyisTheorem_1_a}) and
1379: (\ref{Equation:ME:InRenyisTheorem_1_b}) the approximating
1380: sequences of pmfs $\{\tilde{p}_{n} = (\tilde{p}_{n,1},
1381: \ldots, \tilde{p}_{n,m(n)})\}$
1382: and $\{\tilde{r}_{n} = (\tilde{r}_{n,1}, \ldots,
1383: \tilde{r}_{n,m(n)})\}$ can be written as
1384: % corresponding
1385: % to pdfs $p$ and $r$ respectively can be written as $\tilde{p}_{n,k}
1386: % = (a_{n,k}) \mu(E_{n,k}),\, k = 1, \ldots , m(n) $ and
1387: % $ \tilde{r}_{n,k}= (b_{n,k}) \mu(E_{n,k}), \, k = 1, \ldots ,
1388: % m(n)$
1389: (see (\ref{Equation:ME:ActualDefinitionOfSeqenceOfPmfs}))
1390: \begin{equation}
1391: \label{Equation:ME:InRenyisTheorem_2_a}
1392: \tilde{p}_{n,k} = a_{n,k} \mu(E_{n,k})\:\:\: k = 1, \ldots , m(n) \enspace,
1393: \end{equation}
1394: \begin{equation}
1395: \label{Equation:ME:InRenyisTheorem_2_b}
1396: \tilde{r}_{n,k} = b_{n,k} \mu(E_{n,k})\:\:\: k = 1, \ldots , m(n) \enspace.
1397: \end{equation}
1398: Now R\'{e}nyi
1399: relative entropy for $\tilde{p}_{n}$ and
1400: $\tilde{r}_{n}$ can be written as
1401: \begin{equation}
1402: \label{Equation:ME:InRenyisTheorem_2}
1403: S_{\alpha}(\tilde{p}_{n} \| \tilde{r}_{n}) =
1404: \frac{1}{\alpha-1} \ln \sum_{k=1}^{m(n)}
1405: \frac{a_{n,k}^{\alpha}}{b_{n,k}^{\alpha -1}}
1406: \mu(E_{n,k}) \enspace.
1407: \end{equation}
1408:
1409: To prove $\lim_{n \rightarrow \infty} S_{\alpha}(\tilde{p}_{n} \|
1410: \tilde{r}_{n}) = S_{\alpha}(p \| r) $ it is enough to prove that
1411: \begin{equation}
1412: \label{Equation:ME:InRenyisTheorem_2}
1413: \lim_{n \rightarrow \infty} \frac{1}{\alpha-1} \ln
1414: \int_{X} \frac{ {f_{n}(x)}^{\alpha} }{
1415: {g_{n}(x)}^{\alpha-1}} \, \ud \mu(x)
1416: = \frac{1}{\alpha-1} \ln
1417: \int_{X} \frac{ {p(x)}^{\alpha} }{
1418: {r(x)}^{\alpha-1}} \, \ud \mu(x) \enspace,
1419: \end{equation}
1420: since we have\footnote{ Since simple functions
1421: ${\left(f_{n}\right)}^{\alpha}$ and ${\left(g_{n}\right)}^{\alpha-1}$ can be
1422: written as
1423: \begin{displaymath}
1424: {\left(f_{n}\right)}^{\alpha}(x) = \sum_{k=1}^{m(n)}
1425: \left( a_{n,k}^{\alpha} \right) \chi_{E_{n,k}}(x)
1426: \enspace, \:\:\:\:\:\mbox{and}
1427: \end{displaymath}
1428: \begin{displaymath}
1429: {\left(g_{n}\right)}^{\alpha-1}(x) = \sum_{k=1}^{m(n)}
1430: \left( b_{n,k}^{\alpha-1} \right) \chi_{E_{n,k}}(x) \enspace.
1431: \end{displaymath}
1432: Further,
1433: \begin{displaymath}
1434: \frac{f_{n}^{\alpha}}{g_{n}^{\alpha-1}}(x)
1435: = \sum_{k=1}^{m(n)} \left( \frac{
1436: a_{n,k}^{\alpha} }{b_{n,k}^{\alpha-1}} \right) \chi_{E_{n,k}}(x) \enspace.
1437: \end{displaymath}
1438: }%Endfootnote
1439: \begin{equation}
1440: \label{Equation:ME:InRenyisTheorem_3}
1441: \int_{X} \frac{{f_{n}(x)}^{\alpha}}{{g_{n}(x)}^{\alpha -1} } \,
1442: \ud \mu(x) =
1443: \sum_{k=1}^{m(n)}
1444: \frac{a_{n,k}^{\alpha}}{b_{n,k}^{\alpha-1}} \mu(E_{n,k}) \enspace.
1445: \end{equation}
1446: Further it is enough to prove that
1447: \begin{equation}
1448: \label{Equation:ME:InRenyisTheorem_3}
1449: \lim_{n \rightarrow \infty}
1450: \int_{X} {h_{n}(x)}^{\alpha} g_{n}(x) \, \ud \mu(x)
1451: =
1452: \int_{X} \frac{{p(x)}^{\alpha} }{
1453: {r(x)}^{\alpha-1}} \, \ud \mu(x) \enspace,
1454: \end{equation}
1455: where $h_{n}$ is defined as $h_{n}(x) =
1456: \frac{f_{n}(x)}{g_{n}(x)} $.\\
1457: %Case 1---------
1458: \noindent
1459: {\em \underline {Case 1: $0 < \alpha < 1$}}
1460:
1461: In this case
1462: the {\em Lebesgue dominated convergence
1463: theorem}~\cite[pp.26]{Rudin:1966:RealAndComplexAnalysis}
1464: gives that,
1465: \begin{equation}
1466: \label{Equation:ME:InRenyisTheorem_4}
1467: \lim_{n \to \infty} \int_{X}
1468: \frac{f_{n}^{\alpha}}{g_{n}^{\alpha -1}} \, \ud \mu =
1469: \int_{X} \frac{p^{\alpha}}{r^{\alpha -1}} \, \ud \mu \enspace.
1470: \end{equation}
1471: and hence (\ref{Equation:ME:InRenyisTheoremStatement_1})
1472:
1473: %Case 2-----------
1474: \noindent
1475: {\em \underline {Case 2: $\alpha > 1$}}
1476:
1477: We have $h_{n}^{\alpha} f_{n}
1478: \rightarrow \frac{f(x)^{\alpha}}{g(x)^{\alpha-1}}$ {\em
1479: a.e}. By {\em Fatou's
1480: Lemma}~\cite[pp.23]{Rudin:1966:RealAndComplexAnalysis} we
1481: obtain that,
1482: \begin{equation}
1483: \label{Equation:ME:InRenyisTheorem_LimInfInequality}
1484: \lim_{n \to \infty} \inf \int_{X}
1485: h_{n}(x)^{\alpha} g_{n}(x) \, \ud \mu(x) \geq
1486: \int_{X} \frac{{p(x)}^{\alpha} }{
1487: {r(x)}^{\alpha-1}} \, \ud \mu(x) \enspace.
1488: \end{equation}
1489: From the construction of $f_{n}$ and $g_{n}$
1490: (Lemma~\ref{Lemma:ME:ExistenceOfApproximatingSequenceOfSimpleFunctionsForPdf})
1491: we have
1492: \begin{equation}
1493: \label{Equation:ME:InRenyisTheorem_5}
1494: h_{n}(x) f_{n}(x) = \frac{1}{\mu(E_{n,i})} \int_{E_{n,i}}
1495: \frac{p(x)}{r(x)} r(x) \, \ud \mu \enspace, \:\:\: \forall x
1496: \in E_{n,i} \enspace.
1497: \end{equation}
1498: By Jensen's inequality we get
1499: \begin{equation}
1500: \label{Equation:ME:InRenyisTheorem_6}
1501: h_{n}(x)^{\alpha} f_{n}(x) \leq \frac{1}{\mu(E_{n,i})}
1502: \int_{E_{n,i}} \frac{p(x)^{\alpha}}{r(x)^{\alpha-1}} \,
1503: \ud \mu \enspace, \:\:\: \forall x \in E_{n,i} \enspace.
1504: \end{equation}
1505: By (\ref{Equation:ME:InRenyisTheorem_1_a}) and
1506: (\ref{Equation:ME:InRenyisTheorem_1_b}) we can write
1507: (\ref{Equation:ME:InRenyisTheorem_6}) as
1508: \begin{equation}
1509: \label{Equation:ME:InRenyisTheorem_7}
1510: \frac{a_{n,i}^{\alpha}}{b_{n,i}^{\alpha-1}} \mu(E_{n,i}) \leq
1511: \int_{E_{n,i}} \frac{p(x)^{\alpha}}{r(x)^{\alpha-1}}
1512: \, \ud \mu \enspace, \:\:\: \forall i = 1, \ldots m(n) \enspace.
1513: \end{equation}
1514: By taking summations both sides of
1515: (\ref{Equation:ME:InRenyisTheorem_7}) we get
1516: \begin{equation}
1517: \label{Equation:ME:InRenyisTheorem_8}
1518: \sum_{i=1}^{m(n)} \frac{a_{n,i}^{\alpha}}{b_{n,i}^{\alpha-1}} \mu(E_{n,i}) \leq
1519: \sum_{i=1}^{m(n)} \int_{E_{n,i}}
1520: \frac{p(x)^{\alpha}}{r(x)^{\alpha-1}} \, \ud \mu \enspace,
1521: \:\:\: \forall i = 1, \ldots m(n) \enspace.
1522: \end{equation}
1523: The above equation (\ref{Equation:ME:InRenyisTheorem_8}) nothing but
1524: \begin{displaymath}
1525: \int_{X} h_{n}^{\alpha}(x) f_{n}(x) \, \mu(x) \leq
1526: \int_{X} \frac{p(x)^{\alpha}}{r(x)^{\alpha-1}}
1527: \, \ud \mu \enspace, \:\:\: \forall n \enspace,
1528: \end{displaymath}
1529: and hence
1530: \begin{displaymath}
1531: \sup_{i > n } \int_{X} h_{i}^{\alpha}(x) f_{i}(x) \, \mu(x)
1532: \leq \int_{X} \frac{p(x)^{\alpha}}{r(x)^{\alpha-1}}
1533: \, \ud \mu \enspace, \:\:\: \forall n \enspace.
1534: \end{displaymath}
1535: Finally we have
1536: \begin{equation}
1537: \label{Equation:ME:InRenyisTheorem_LimSupInequality}
1538: \lim_{n \to \infty} \sup \int_{X}
1539: h_{n}^{\alpha}(x) f_{n}(x) \, \mu(x) \leq \int_{X}
1540: \frac{p(x)^{\alpha}}{r(x)^{\alpha-1}} \, \ud \mu \enspace.
1541: \end{equation}
1542: From (\ref{Equation:ME:InRenyisTheorem_LimInfInequality}) and
1543: (\ref{Equation:ME:InRenyisTheorem_LimSupInequality}) we have
1544: \begin{equation}
1545: \label{Equation:ME:InRenyisTheorem_LimEquality}
1546: \lim_{n \to \infty} \int_{X}
1547: \frac{f_{n}(x)^{\alpha}}{g_{n}(x)^{\alpha-1}} \, \mu(x) = \int_{X}
1548: \frac{p(x)^{\alpha}}{r(x)^{\alpha-1}} \, \ud \mu \enspace,
1549: \end{equation}
1550: and hence (\ref{Equation:ME:InRenyisTheoremStatement_1}).
1551: \endproof
1552: %EndProof--------------
1553:
1554: %--------------------Sub Section-----------------------------
1555: \subsection{On ME of Measure-Theoretic definition of Tsallis entropy}
1556: \noindent
1557: With the shortcomings of Shannon entropy that it cannot be
1558: naturally extended to the non-discrete case, we have observed
1559: that Shannon entropy in its general case on measure space can
1560: be used consistently for the ME-prescriptions. One can easily
1561: see that generalized information measures of R\'{e}nyi and Tsallis
1562: too cannot be extended naturally to measure-theoretic case,
1563: i.e., measure-theoretic definitions are not equivalent to the
1564: discrete case in the sense that they can not be defined as a
1565: limit of sequence of finite discrete entropies corresponding to
1566: pmfs defined on measurable partitions which approximates the
1567: pdf. One can use the same counter example we discussed in
1568: \S~\ref{SubSection:ME:DiscreteToContinuous}. We have already
1569: given the ME-prescriptions of Tsallis entropy in the
1570: measure-theoretic case. In this section, we show that the
1571: ME-prescriptions in the measure-theoretic case are consistent
1572: with the discrete case.
1573:
1574: Proceeding as in the case of measure-theoretic entropy in
1575: \S~\ref{SubSection:ME:MeasureTheoreticCasesinDiscrete},
1576: measure-theoretic Tsallis
1577: entropy $S_{q}(P)$~(\ref{Equation:ME:TsallisEntropyOf-PM}) in
1578: the discrete case can be written as
1579: \begin{equation}
1580: \label{Equation:ME:MeasureTheoreticTsallisEntropyInDiscreteForm}
1581: S_{q}(P) = \sum_{k=1}^{n} P_{k} \ln_{q} \frac{\mu_{k}}{P_{k}} \enspace.
1582: \end{equation}
1583: By (\ref{Equation:KN:PropertyOflnq(x/y)}) we get
1584: \begin{equation}
1585: \label{Equation:ME:MeasureTheoreticTsallisEntropyInDiscreteForm_1}
1586: S_{q}(P) = \sum_{k=1}^{n} P_{k}^{q} \left[ \ln_{q} \mu_{k} -
1587: \ln_{q} P_{k} \right] = S_{q}^{n}(P) + \sum_{k=1}^{n} P_{k}^{q}
1588: \ln_{q} \mu_{k} \enspace,
1589: \end{equation}
1590: where $S_{q}^{n}(P)$ is the Tsallis entropy in discrete case.
1591: When $\mu$ is a uniform distribution i.e., $\mu_{k} =
1592: \frac{1}{n}\:\: \forall n = 1, \ldots n$ we get
1593: \begin{equation}
1594: \label{Equation:ME:MeasureTheoreticTsallisEntropyInDiscreteForm_1}
1595: S_{q}(P) = S_{q}^{n}(P) - n^{q-1} \ln_{q} n \sum_{k=1}^{n}
1596: P_{k}^{q} \enspace.
1597: \end{equation}
1598: Now we show that the quantity $\sum_{k=1}^{n} P_{k}^{q}$ is
1599: constant in maximization of $S_{q}(P)$ with respect to the
1600: set of constraints
1601: (\ref{Equation:ME:Normalized-q-ExpectationConstraints}).
1602:
1603: The claim is that
1604: \begin{equation}
1605: \label{Equation:ME:SumOfpPowerqs_ForNormalizedExpectation}
1606: \int p(x)^{q}\, \ud \mu(x) = {(\overline{Z_{q}})}^{1-q} \enspace,
1607: \end{equation}
1608: which holds for Tsallis maximum entropy distribution
1609: (\ref{Equation:ME:TsallisMaximumEntropyDistribution_wrt_q-Expt})
1610: in general. This can be shown as follows. From
1611: the maximum entropy
1612: distribution~(\ref{Equation:ME:TsallisMaximumEntropyDistribution_wrt_q-Expt}),
1613: we have
1614: \begin{displaymath}
1615: p(x)^{1-q} = \frac{\displaystyle 1 - (1-q) {\left( \int_{X}
1616: {p(x)}^{q}\, \ud \mu(x) \right)}^{-1} \sum_{m=1}^{M}
1617: \beta_{m} \left( u_{m}(x) -
1618: {\langle\langle {u}_{m} \rangle\rangle}_{q} \right)}
1619: {\displaystyle ({\overline{Z_{q}}})^{1-q} } \enspace,
1620: \end{displaymath}
1621: which can be rearranged as
1622: \begin{displaymath}
1623: ({\overline{Z_{q}}})^{1-q} p(x) = \left[ 1 - (1-q)
1624: \frac{\sum_{m=1}^{M} \beta_{m} \left( u_{m}(x) -
1625: {\langle\langle {u}_{m} \rangle\rangle}_{q} \right)}{\int
1626: {p(x)}^{q}} \, \ud \mu(x) \right] p(x)^{q} \enspace.
1627: \end{displaymath}
1628: By integrating both sides in the above equation, and by
1629: using~(\ref{Equation:ME:Normalized-q-ExpectationConstraints})
1630: we get (\ref{Equation:ME:SumOfpPowerqs_ForNormalizedExpectation}).
1631:
1632: Now, (\ref{Equation:ME:SumOfpPowerqs_ForNormalizedExpectation}) can
1633: be written in its discrete form as
1634: \begin{equation}
1635: \label{Equation:ME:SumOfpPowerqs_ForNormalizedExpectation_Discrete_1}
1636: \sum_{k=1}^{n} \frac{P_{k}^{q}}{\mu_{k}^{q-1}} =
1637: {(\overline{Z_{q}})}^{1-q} \enspace.
1638: \end{equation}
1639: When $\mu$ is uniform distribution we get
1640: \begin{equation}
1641: \label{Equation:ME:SumOfpPowerqs_ForNormalizedExpectation_Discrete_2}
1642: \sum_{k=1}^{n} P_{k}^{q} = n^{1-q} {(\overline{Z_{q}})}^{1-q}
1643: \end{equation}
1644: which is a constant.
1645:
1646: Hence by
1647: (\ref{Equation:ME:MeasureTheoreticTsallisEntropyInDiscreteForm_1})
1648: and
1649: (\ref{Equation:ME:SumOfpPowerqs_ForNormalizedExpectation_Discrete_2}),
1650: on can conclude that with respect to a particular instance of
1651: ME, measure-theoretic Tsallis entropy $S(P)$ defined for a
1652: probability measure $P$ on
1653: the measure space $(X,\mathfrak{M},\mu)$, is equal to
1654: discrete Tsallis entropy up to an
1655: additive constant, when the reference measure $\mu$ is chosen as a uniform
1656: probability distribution. There by, one can further conclude
1657: that with respect to a particular instance of ME of
1658: measure-theoretic Tsallis entropy is consistent with its
1659: discrete definition.
1660:
1661: %=======================Section: Conclusition===================
1662: \section{Conclusions}
1663: \label{Section:Conclusions}
1664: \noindent
1665: In this paper we presented measure-theoretic definitions of
1666: generalized information measures. We proved that the measure-theoretic
1667: definitions of generalized relative-entropies, R\'{e}nyi and
1668: Tsallis, are natural extensions of their respective discrete
1669: cases. We also showed that, ME prescriptions of
1670: measure-theoretic Tsallis entropy are consistent with the
1671: discrete case.
1672:
1673:
1674: %========================Bibliography===================================
1675: \section*{References}
1676:
1677: \bibliographystyle{unsrt}
1678: \bibliography{papi}
1679:
1680:
1681: \end{document}
1682:
1683:
1684:
1685:
1686:
1687: