Published on

26-Jun-2016View

213Download

1

Transcript

Computational Statistics & Data Analysis 51 (2006) 86108www.elsevier.com/locate/csda

Practical representations of incomplete probabilistic knowledgeC. Baudrita,, D. Duboisb

aLaboratoire Mathmatiques et Applications, Physique Mathmatique dOrlans, Universit dOrlans, rue de Chartres BP 6759,Orlans 45067 cedex 2, France

bInstitut de Recherche en Informatique de Toulouse, Universit Paul Sabatier, 118 route de Narbonne 31062 Toulouse Cedex 4, FranceAvailable online 2 March 2006

Abstract

The compact representation of incomplete probabilistic knowledge which can be encountered in risk evaluation problems, forinstance in environmental studies is considered.Various kinds of knowledge are considered such as expert opinions about characteris-tics of distributions or poor statistical information. The approach is based on probability families encoded by possibility distributionsand belief functions. In each case, a technique for representing the available imprecise probabilistic information faithfully is pro-posed, using different uncertainty frameworks, such as possibility theory, probability theory, and belief functions, etc. Moreover theuse of probabilitypossibility transformations enables condence intervals to be encompassed by cuts of possibility distributions,thus making the representation stronger. The respective appropriateness of pairs of cumulative distributions, continuous possibilitydistributions or discrete random sets for representing information about the mean value, the mode, the median and other fractiles ofill-known probability distributions is discussed in detail. 2006 Elsevier B.V. All rights reserved.

Keywords: Imprecise probabilities; Possibility theory; Belief functions; Probability-boxes

1. Introduction

In risk analysis, uncertainties are often captured within a purely probabilistic framework. It suggests that all uncer-tainties whether of a random or an epistemic nature should be represented in the same way. Under this assumption,the uncertainty associated with each parameter of a mathematical model of some phenomenon can be described bya single probability distribution. According to the frequentist view, the occurrence of an event is a matter of chance.However, not all uncertainties are random nor can be objectively quantied, even if the choice of values for parametersis based as much as possible on on-site investigations. Due to time and nancial constraints, information regardingmodel parameters is often incomplete. For example, it is quite common for a hydrogeologist to estimate the numer-ical values of acquifer parameters in the form of condence intervals according to his/her experience and intuition(i.e. expert judgment). We are then faced with a problem of processing incomplete knowledge.

Overall, uncertainty regarding model parameters may have essentially two origins. It may arise from randomness dueto natural variability of observations resulting from heterogeneity (for instance, spatial heterogeneity) or the uctuationsof a quantity in time. Or it may be caused by imprecision due to a lack of information resulting, for example, fromsystematic measurement errors or expert opinions. As suggested by Ferson and Ginzburg (1996) and more recently

Corresponding author. Tel.: +33 6 80149785; fax: +33 2 38417205.E-mail addresses: baudrit@irit.fr (C. Baudrit), dubois@irit.fr (D. Dubois).

0167-9473/$ - see front matter 2006 Elsevier B.V. All rights reserved.doi:10.1016/j.csda.2006.02.009

C. Baudrit, D. Dubois / Computational Statistics & Data Analysis 51 (2006) 86108 87

developed by Helton et al. (2004), distinct representation methods are needed to adequately tell random variability(often referred to as aleatory uncertainty) from imprecision (often referred to as epistemic uncertainty).

A long philosophical tradition in probability theory dating back to Laplace demands that uniform distributions shouldbe used by default in the absence of specic information about frequencies of possible values. For instance, when anexpert gives his/her opinion on a parameter by claiming: I only know that the value of x lies in an interval A, theuniform probability with support A is used. This view is also justied on the basis of the maximum entropy approach(Gzyl, 1995). However, this point of view can be challenged. Adopting a uniform probability distribution to expressignorance is questionable. This choice introduces information that in fact is not available and may seriously bias theoutcome of risk analysis in a non-conservative manner (Ferson and Ginzburg, 1996). A more faithful representationof this knowledge on parameter x is to use the characteristic function of the set A, say such that (x) = 1,x Aand 0 otherwise. This is because is interpreted as a possibility distribution that encodes the family of all probabilitydistribution functions with support in A (Dubois and Prade, 1992). Indeed, there exists an innity of probabilitydistributions with support in A, and the uniform distribution is just one among them.

In the context of risk evaluation, the knowledge really available on parameters is often vague or incomplete. Thisknowledge is not enough to isolate a single probability distribution in the domain of each parameter. When faced withthis situation, representations of knowledge accepting such incompleteness look more in agreement with the availableinformation. Of course, the Bayesian subjectivist approach maintains that only a standard probabilistic representationof uncertainty is rational, but this claim relies on a betting interpretation that enforces the use of a single probabilitydistribution, in the scope of decision-making, not with a view to faithfully report the epistemic state of an agent (SeeDubois et al., 1996 for more discussions on this topic). In practice, while information regarding variability is bestconveyed using probability distributions, information regarding imprecision is more faithfully conveyed using familiesof probability distributions (Walley, 1991). At the practical, level such families are most easily encoded either byprobability boxes (upper and lower cumulative probability functions (Ferson et al., 2003; to appear) or by possibilitydistributions (also called fuzzy intervals) (Dubois and Prade, 1988; Dubois et al., 2000) or yet by belief functions ofShafer (1976).

This article proposes practical representation methods for incomplete probabilistic information, based on formallinks existing between possibility theory, imprecise probability and belief functions. These results can be applied formodelling inputs to uncertainty propagation algorithms. A preliminary draft of this paper is (Baudrit et al., 2004a).

In Section 2, we recall basics of probability-boxes (upper & lower cumulative distribution functions), possibilitydistributions and belief functions. We also recall the links between these representations. All of them can encodefamilies of probability functions. In Section 3, the expressive power of probability boxes and possibility distributionsis compared. In Section 4, some results on the relation between prediction intervals and possibility theory are recalled.It allows a stronger form of encoding of a probability family by a possibility distribution, whereby the cuts of the latterenclose the prediction intervals of the probability functions. In Sections 5 and 6, we consider a non exhaustive list ofknowledge types that one may meet after an information collection step in problems like environmental risk evaluation.We especially focus on incomplete non-parametric models, for which only some characteristic values are known, suchas the mode, the mean or the median and other fractiles of the distribution. For each case we propose an adaptedrepresentation in terms of p-boxes, belief functions (Section 5), and especially possibility distributions (Section 6).

2. Formal frameworks for representing imprecise probability

Consider a measurable space (,A) whereA is an algebra of measurable subsets of . LetP be a set of probabilitymeasures on the referential (,A). For all A measurable, we dene:

its upper probability

P(A) = supPP

P(A)

and its lower probability

P(A) = infPP

P(A).

88 C. Baudrit, D. Dubois / Computational Statistics & Data Analysis 51 (2006) 86108

Such a family may be natural to consider if a probabilistic parametric model is used but the parameters such as themean value or the variance are ill-known (for instance they lie in an interval). It can be also obtained if the probabilisticmodel relies on imprecise (e.g. set-valued) statistics (Jaffray, 1992), or yet incomplete statistical information (only a setof conditional probabilities is available). In a subjectivist tradition, the lower probability P(A) can be interpreted as themaximal price one would be willing to pay for the gamble A, which pays 1 unit if event A occurs (and nothing otherwise)(Walley, 1991). Thus, P(A) is the maximal betting rate at which one would be disposed to bet on A. That means P(A)is a measure of evidence in favour of event A. The upper probability P(A) can be interpreted as the minimal sellingprice for the gamble A, or as one minus the maximal rate at which an agent would bet against A (Walley, 1991). Thatmeans P(A) measures the lack of evidence against A since we have:

P(A) = 1 P (Ac) .It is clear that representing and reasoning with a family of probabilities may be very complex. In the following weconsider three frameworks for representing special sets of probability functions, which are more convenient for apractical handling. We review three modes of representation of uncertainty that can be cast in the imprecise probabilitymodel.

2.1. Probability boxes

Let X be a random variable on (,A). Recall that a cumulative distribution function is a non decreasing functionF : R [0, 1] assigning to each x R the value P(X (, x]). This function encodes all the informationpertaining to a probability measure, and is often very useful in practice.

A natural model of an ill-known probability measure is thus obtained by considering a pair(F,F

)of non-intersecting

cumulative distribution functions, generalising an interval. The interval[F,F

]is called a probability box (p-box)

(Ferson et al., 2003; to appear). A p-box encodes the class of probability measures whose cumulative distributionfunctions F are restricted by the bounding pair of cumulative distribution functions F and F such that

F(x)F(x)F(x) x R.A p-box can be induced from a probability family P by

x R F(x) = P((, x])and

x R F(x) = P((, x]).Let P

(P

C. Baudrit, D. Dubois / Computational Statistics & Data Analysis 51 (2006) 86108 89

within a certain interval: the possibility and the necessity N. The normalized measure of possibility (respectivelynecessity N) is dened from the possibility distribution : R [0, 1] such that supxR (x) = 1 as follows:

(A) = supxA

(x) (1)

and

N(A) = 1 (A)= infx /A(1 (x)). (2)

The possibility measure veries:A,B R (A B) = max((A),(B)). (3)

The necessity measure N veries:A,B R N(A B) = min(N(A),N(B)). (4)

A possibility distribution 1 is more specic than another one 2 in the wide sense as soon as 12, i.e. 1 is moreinformative than 2.

A unimodal numerical possibility distribution may also be viewed as a nested set of condence intervals, which arethe -cuts

[x, x

]= {x, (x)} of . The degree of certainty that [x, x] contains X is N ([x, x]) (=1 if is continuous). Conversely, a nested set of intervals Ai with degrees of certainty i that Ai contains X is equivalent tothe possibility distribution

(x) = mini=1...n {1 i , x Ai} ,

provided that i is interpreted as a lower bound on N (Ai), and is chosen as the least specic possibility distributionsatisfying these inequalities (Dubois and Prade, 1992).

We can interpret any pair of dual functions necessity/possibility [N,] as upper and lower probabilities inducedfrom specic probability families.

Let be a possibility distribution inducing a pair of functions [N,]. We dene the probability family P() ={P,A measurable, N(A)P(A)} = {P,A measurable, P(A)(A)}. In this case, supPP()P (A) = (A)and infPP()P (A) = N(A) (see De Cooman and Aeyels, 1999; Dubois and Prade, 1992) hold. In other words, thefamily P() is entirely determined by the probability intervals it generates.

Suppose pairs (interval Ai , necessity weight i) supplied by an expert are interpreted as stating that the probabilityP (Ai) is at least equal to i where Ai is a measurable set. We dene the probability family as follows: P() ={P,Ai, iP(Ai)}. We thus know that P = and P = N (see Dubois and Prade, 1992, and in the innite caseDe Cooman and Aeyels, 1999).

2.3. Imprecise probability induced by random intervals

The theory of imprecise probabilities introduced by Dempster (1967) (and elaborated further by Shafer, 1976 andSmets and Kennes, 1994 in a different context) allows imprecision and variability to be treated separately within asingle framework. Indeed, it provides mathematical tools to process information which is at the same time of randomand imprecise nature. Contrary to probability theory, which in the nite case assigns probability weights to atoms(elements of the referential), in this approach we may assign such weights to any subset, called focal set, with theunderstanding that portions of this weight may move freely from one element to another in a focal set. We typically ndthis kind of knowledge when some measurement device is tainted with limited perception capabilities and a randomerror (variability) due to the variability of a phenomenon. We may obtain a sample of random intervals of the form([mi ,mi +

])i=1...K supposedly containing the true value, where is a perception threshold, mi is a measured

value and K is the number of interval observations. Each interval is attached a probability i of observing the measured

90 C. Baudrit, D. Dubois / Computational Statistics & Data Analysis 51 (2006) 86108

value mi . That is, we obtain a mass distribution (i )i=1...K on intervals, thus dening a random interval. The probabilitymass i can be freely re-allocated to points within interval

[mi ,mi +

]. However, there is not enough information

to do it.Like possibility theory, this theory provides two indicators, called plausibility Pl and belief Bel by Shafer (1976).

They qualify the validity of a proposition stating that the value of variable X should lie within a set A (a certain intervalfor example). Plausibility Pl and belief Bel measures are dened from the mass distribution assigning positive weightsto a nite setF of measurable subsets of :

:F [0, 1] such thatEF

(E) = 1, (5)

as follows:

Bel(A) =

E,EA(E) (6)

and

Pl(A) =

E,EA=(E) = 1 Bel (A) , (7)

where E F is called a focal element. Bel(A) gathers the imprecise evidence that asserts A; Pl(A) gathers theimprecise evidence that does not contradict A.

A mass distribution may encode the probability family P() = {P,A measurable, Bel(A)P(A)} = {P,Ameasurable, P(A)Pl(A)} (Dempster, 1967). In this case we have: P = Pl and P = Bel, so that

P P(), BelP Pl. (8)This view of belief functions is at odds with the theory of evidence of Shafer and the transferable belief model of Smets,who never refer to an imprecisely located probability distribution. Originally, Dempster (1967) considered impreciseprobabilities induced from a probability space via a set-valued mapping. In this scope, Bel(A) is the minimal amountof probability that must be assigned to A by sharing the probability weights dened by the mass function among singlevalues in the focal sets. Pl(A) is the maximal amount of probability that can be likewise assigned to A. We may denean upper F and a lower F cumulative distribution function (a particular p-box) such that x RF(x)F(x)F(x)with

F(x) = Pl(X (, x]) (9)and

F(x) = Bel(X (, x]). (10)But this p-box contains many more probability functions than P().The setting of belief and plausibility functions encompasses possibility and probability theories, at least in the nite

case:

When focal elements are nested, a belief measure Bel is a necessity measure, that is Bel=N. A plausibility measures,PL is a possibility measure, that is Pl = .

When focal elements are some disjoint intervals, plausibility Pl and belief Bel measures are both probability measures,that is, we have Bel = P = Pl, for unions of such intervals.

Thus, all discrete probability distributions and possibility distributions may be interpreted by mass functions. However,continuous belief functions have not received much attention so far (except in the scope of random sets). On this topic,see the recent paper by Smets (2005).

The above notions offer a common framework to treat the information of imprecise and random nature. However anobvious question is how to compare the expressivity of p-boxes, possibility distributions and belief functions. As we

C. Baudrit, D. Dubois / Computational Statistics & Data Analysis 51 (2006) 86108 91

shall see, a p-box generally contains less information than a belief function and a possibility measure from which thisp-box is derived. Possibility measures also offer the capability of approximating condence intervals. A representationusing belief functions is potentially more complex than the two other representation modes because a mass functionmust be specied for all subsets. However, using only a few focal subsets may be enough in practice. In the next section,we focus on the respective expressive power of p-boxes and possibility measures.

3. Comparative expressivity of probability boxes and possibility distributions

Consider a unimodal continuous possibility distribution with core {a} (i.e. ({a}) = (a) = 1 and x = a,(x) = 1). We assume a unimodal for simplicity. Results in this Section readily adapt to the case when the core of is of the form [a, b]. The set of probability measures induced by , that is, P(), can be more conveniently describedby a condition on the cumulative distribution functions of these probabilities (as rst pointed out by Dubois and Prade,1987):

Theorem 1. Let be a unimodal continuous possibility distribution with core {a}. ThenP()= {P,x, y, xay,F (x) + 1 F(y) max((x), (y))}.

Proof. See Appendix A.

Note that we can choose x and y such that (x) = (y) in the expression of P(), i.e. suppose that [x, y] is a cutof . If I is the -cut of , it holds that: P() = {P,P (I) N (I) , (0, 1]}. Thus by putting xa, f (x) =sup{y, (x)(y)}, we can prove that (Dubois et al., 2004)

P() = {P,xa, F (x) + 1 F(f (x))(x)}.Dene a particular probability box

[F,F

]such that

F(x) = (X (, x]) (11)and

F(x) = N(X (, x]). (12)It is clear that F(x) = (x) x such that F(x)< 1 and F(x) = 1 (x) x such that F(x)> 0. Dene

+(x) ={

(x) for xa1 for xa and

(x) ={

(x) for xa,1 for xa

the functions +(x) and 1 (x) can be equated to the cumulative distribution functions F and F . The probabilitybox

(F,F

) = (+, 1 ) has an important specic feature: there exists a real value a such that F(a) = 1 andF(a) = 0. It means that the p-box contains the deterministic value a, so that the two cumulative distributions areacting in disjoint areas of the real line separated by this value. We can retrieve a possibility distribution from such twocumulative distribution functions as =min (F, 1 F ) and thus retrieve the possibility distribution that generated thep-box. However it is clear that this process applied to any p-box does not yield a normalized possibility distribution,when the cumulative distributions are too close. A probability box can be a precise tool for approximating a probabilitydistribution in the latter case, but it then forbids the case where the modelled unknown may be deterministic.

Moreover the two sets of probability functions P() and P(F

92 C. Baudrit, D. Dubois / Computational Statistics & Data Analysis 51 (2006) 86108

11

F

FF

F

0 0

a a

(a) (b)

F F

Fig. 1. Probabilities inP(F

C. Baudrit, D. Dubois / Computational Statistics & Data Analysis 51 (2006) 86108 93

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 180

0.10.20.30.40.50.60.70.80.9

1

F on [1,10]1(y)+F(g(y)) on [10,17]F

* on [1,17]

Fig. 2. Example of probability families included inP() induced by the triangular possibility distribution of core {10} and support [1, 17].

Conversely if F is known on [a,+) an upper bound F can be found on (, a] such that P P() if and only ifF(x)F (x) = min(F (a), (a) 1 + F(f (x))).

It is clear that sets{P |F(x)F(x) xa and F(x) = 1 xa} and {P |F(x)F(x) xa and F(x) = 0 xa}

are included in P(). They correspond to probability densities with support [min, a] or limited by [a, max] and theirmode is not the mode of .

Conversely, suppose F

94 C. Baudrit, D. Dubois / Computational Statistics & Data Analysis 51 (2006) 86108

Note that the cumulative distributions describing any p-box can be generated by a belief function, contrary to thecase of possibility distributions (see Ferson et al., 2003). Recent results (Kriegler and Held, 2005) suggest that the setof probabilities covered by a p-box can always be modelled by a belief function.

4. Approximating probability families by possibility distributions

Let p be a unimodal probability density function. Denote by M the mode of p. Let P be the probability measureassociated to p. So far we considered possibility distributions which verify the following condition (dubbed Dominancecondition):

P(A)(A) for all measurable events A.

We say that the possibility measure dominates the probability measure and it means P P().More generally, dominates a probability family P if and only if P P(). It holds that P (max (1, 2)) is the

convex closure of P (1) P (2), since the possibility distribution max (1, 2) generates the possibility measuremax (1,2). On the contrary,P (min (1, 2)) =P (1)P (2) in general, because min (1,2) is not a possibilitymeasure. So, if a probability family P is dominated by two possibility distributions 1, 2, one cannot deduce that Pis dominated by min (1, 2), even if min (1, 2) is normalized.

An approximate (covering) possibilistic representation of a given family P is any such that P P(). Clearlyit means that dominates all probability functions in P. Ideally should be such that P() is as small as possible.However such optimal covering approximations of probability families are not unique (see Dubois and Prade, 1990).Nevertheless, in the remainder of the paper we shall lay bare various informative approximate covering possibilisticrepresentations of probability families induced by incomplete probabilistic data.

However as previously seen, a possibility measure also encodes a set of nested condence intervals provided by anexpert. A possibility measure such that P P() can be constructed as follows (Dubois et al., 1993, 2004): LetJ =[x(), y()], for [0, 1] be a continuous nested interval family such that J J if , J0 ={x} supp(p)and J1 = supp(p) where supp(p) is the support of a unimodal probability density p. Then, the possibility distribution given by

(x()) = (y()) = 1 P(J) , (13)dominates p. That is p P() (or P ). If we choose J such that

J = {x, p(x)}, [0, sup(p)]. (14)J is of the form [x, f (x)] where f (x) = max{y, p(y)p(x)}. Then, J is the narrowest prediction interval ofprobability =P(J), x is the mode M of p and J is also the most probable interval of length |y() x()| (Duboiset al., 2000).

Hence, if we choose J as in (14) and as in (13), we obtain a possibility distribution p such that P (dominancecondition) and the -cut of p is indeed the narrowest prediction interval of p of condence level 1 (predictioninterval condition). Such p is called the optimal transform of p,

p(x) = 1 P({y, p(y)p(x)}) = F(x) + 1 F(f (x)),and p(M) = 1. This transformation is optimal in the sense that it provides the most specic possibility distributionamong those that dominate p, and preserve the order induced by p on the support interval.

It is clear that function p is a kind of cumulative distribution. More precisely, given any total ordering of values on the real line, and any value x, let A(x) = {y, xy}, and assume A(x) is measurable for all x. The functionF(x) = P (A(x)) is the cumulative function according to the order relation . If => the usual ordering on thereal line, then F =F. Now, choosing the ordering induced by the density p, that is, xpy if and only if p(x)>p(y),then Fp = p.

Computing p is not so obvious in general, but the case of symmetric densities has been considered in Dubois et al.(2004); it is shown that p is convex on each side of the mode of p. This result no longer holds in the general case,

C. Baudrit, D. Dubois / Computational Statistics & Data Analysis 51 (2006) 86108 95

but we can approximate any unimodal density p by means of a piecewise linear function. Then we can easily show thefollowing result:

Theorem 4. Let p be a unimodal continuous probability distribution function of mode M and of bounded supportsupp(p). If p is (piecewise) linear, then its optimal transform is piecewise convex.

Proof. See Appendix B.

Using the idea of narrowest prediction intervals described above, it is also interesting to characterize approximatecovering possibilistic representations of probability familiesP that account for such prediction intervals. A possibilitydistribution is said to strongly dominate a probability measure P with density p if {x;p(x)}= J {x, (x)}for = 1 P (J) (dubbed prediction interval condition). Given a probability family P one may try to nd the mostspecic possibility distribution P that strongly dominates all P P. Such a possibility distribution is P=suppP p.P has the peculiarity that any of its -cuts contain the (1)-prediction interval of any p P. Note that this approachis enabled by the property P (1) P (2) P (max (1, 2)).

5. Simple models of incomplete probabilistic knowledge using p-boxes and belief functions

The preceding results can be applied to the denition of faithful representations of poor probabilistic knowledge bymeans of probability boxes, belief functions and especially possibility distributions. The extreme case is when an expertprovides an interval containing the unknown value. Generally, there is a little more information than a simple interval:an expert may have an idea on typical values in the interval: the median, the mean, the mode. Additional informationon a distribution may be the knowledge of appropriate fractiles and condence intervals. These pieces of informationdene constraints restricting a probability family. The problem is whether such family can be simply described orapproximated by means of the simple tools described in the previous sections. These representation techniques suggestthat simple non-parametric representations of available uncertain knowledge, where incompleteness and variabilityreceive specic treatments, are feasible in the scope of further uncertainty propagation steps in risk analysis problems.This section recalls representation methods proposed by Ferson using p-boxes, when the mean-value of a density isknown, and for the modelling of a small set of precise observations by means of imprecise probabilities. Moreover,belief functions can be directly used for exploiting the knowledge of fractiles.

5.1. Representations by probability boxes

As discussed earlier, probability boxes generalise the idea of interval from a pair of points to a pair of cumulativedistribution functions. They are a very natural way of extending the notion of interval. They are especially informativewhen the two cumulative distributions are close to each other. They come up as a natural choice for imprecise parametricmodels with imprecise parameters. For instance, a Gaussian model where the mean-value and/or the variance is knownto lie in a prescribed interval may naturally yield a narrow p-box (even if the latter contains non-Gaussian distributions).However, we do not deal with parametric models here. The p-box model has been especially investigated by Fersonet al. (2003). We recall their proposals for representing distributions with xed mean value as well as for using theKolmogorovSmirnov condence limits in order to derive a p-box from small data samples.

5.1.1. Probability distributions with known mean and supportSuppose an expert supplies the mean and the support I = [b, c]. Let PmeanI denote the set of probabilities with

support I and prescribed mean, equal to . Ferson et al. (2003) proposes to represent this knowledge by a probabilitybox

[F,F

]. To obtain it, he separately solves two problems for each value x as follows: F(x)= supF :E(X)= F(x) and

F(x) = infF :E(X)= F(x) (the unknown is F). Using the characteristic property of the mean

inf(I )F (y) dy =

sup(I )

(1 F(y)) dy,

96 C. Baudrit, D. Dubois / Computational Statistics & Data Analysis 51 (2006) 86108

1 2 3 4 5 6 7 80

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

F

F

Fig. 3. Probability box built from x [2, 7] and E(X) = 4.

one obtains the following results:

F(x) ={ x

x b x [, c]0 x [b, ]

and F(x) ={

1 x [, c],c c x x [b, ].

The probability box[F,F

] (see Fig 3 for an example) denes a probability familyP (F,F ) which containsPmeanI . Itcould be tempting to use the probability family induced by possibility distribution such that (x) = (c )/(c x)for x [b, ] and (x)= 1 (x )/(x b) for x [, c]. But, as expected from the previous sections, the inclusionPmeanI P() does not hold. The probability P, dened by P(X=2)= 35 and P(X=7)= 25 , is enough to show we donot have the inclusion. Indeed, we do have E(X)= 4 but P(X = 2 or X = 7)= 1 and (X = 2 or X = 7)= 0.6, whichis contradictory with P . As pointed out earlier, the probability familyP() such that +(x)= min (1, 2F(x)) and(y)= min (1, 2 (1 F(y)))(see Fig. 4), containsPmeanI andP (F,F ). However, it is clear that this p-box is poorlyinformative, and that the covering possibility is even more so. In fact, the mean value does not seem to bring muchinformation on the distribution, and the problem of nding a better, tighter representation of this kind of informationremains open. Moreover, while the average value is very easy and often natural to compute from statistical data, it isnot clear that this value is cognitively plausible, that is, one may doubt that a single representative value of an ill-knownquantity provided by an expert refers to the mean value. For instance, while some quantities like average income canbe easily gured out, the average human size sounds like a very articial notion, and would not be directly perceivedby individuals.

5.1.2. Representing small data samples by a p-boxWhen the available knowledge is just a small data sample (x1, . . . , xn) coming from the unknown cumulative

distribution function F, Ferson et al. (2003) dene a probability box [F,F ] by using KolmogorovSmirnov condencelimits (noted K.S.) (Feller, 1948; Miller, 1956). These condence limits are distribution-free bounds about the sample

C. Baudrit, D. Dubois / Computational Statistics & Data Analysis 51 (2006) 86108 97

1 2 3 4 5 6 7 80

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Fig. 4. Possibility distribution containing the probability box[F,F

].

empirical cumulative distribution function Fn where n is the size of the sample. We can dene Fn as follows:

Fn(x) =

0 for x

98 C. Baudrit, D. Dubois / Computational Statistics & Data Analysis 51 (2006) 86108

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 160

0.10.20.30.40.50.60.70.80.9

1

F10

F

F

Fig. 5. KolmogorovSmirnov condence limits (gray) about an empirical cumulative distribution function (black) assuming a sample size of 10.

The obtained p-box cannot be derived from a possibility distribution, as it generally does not include the step-functioncorresponding to a deterministic value.

5.2. Discrete belief functions representations

Very naturally the representation of a family of probabilities by means of a belief function (a discrete random set)is appropriate if the probability of prescribed events is known. This is typically the case when only the median m ofa distribution is known. The meaning of the median is: P(Xm) = 0.5. Let PmedI be the set of probability functionswith support I =[b, c] and with median m. This knowledge can be exactly represented by a mass function m such thatm([b,m]) = m(]m, c]) = 0.5. The belief function Belm, deduced from m, encodes all probabilities with median m,i.e., PmedI = {P,C,Belm(C)P(C)}.

This representation naturally extends to the case when some fractiles and the support I of the unknown probabilitydistribution function are known. Suppose an expert supplies fractiles, say x1, x2 and x3 at 5%, 50% and 95%. DenotePx1,x2,x3I the set of probability distribution functions of support I=[b, c] and of fractiles x1, x2, x3. We can represent thisknowledge in an exact way using a belief function by the following obvious mass function fract: fract ([b, x1])= 0.05,fract (] x1, x2]) = 0.45, fract ( ] x2, x3]) = 0.45 and fract (] x3, c]) = 0.05. The belief function Belfrac, deduced fromfrac, is dominated by all probabilities with fractiles x1, x2 and x3, i.e., Px1,x2,x3I = {P,C,Belfrac(C)P(C)}.

Note that the mass function induced by fractiles bears on a partition of the support. On the contrary if an expert,provides a condence interval, x A R with a certainty degree , the most cautious interpretation correspondsto an inequality P(A). The corresponding mass function assigns to A and 1 to the real line itself. This iscalled a simple support function by Shafer. Note that the two focal elements are nested. The knowledge of a condenceinterval with condence is less precise than a fractile: if A = [x1, x3] with condence at least , we cannot deducethe probability degrees associated to intervals (, x1] and [x3,+), except if we assume the symmetry of theunderlying density.

A condence interval can be represented by the possibility distribution (Dubois and Prade, 1992)

x R (x) ={

1 if x A,1 if x /A,

where encodes the probability family P() = {P, P(A)}. When A is large enough, but, in practice, bounded,the level of condence is 1. This representation extends when several nested condence intervals {A1 A2 Ak} are obtained for several condence levels {1 < 2 < < k} as suggested previously. The corresponding massassignment is (Ai)=i+1 i , assuming 0 =0. It yields a discrete possibility distribution. The next section considersthe case of continuous possibility distributions.

C. Baudrit, D. Dubois / Computational Statistics & Data Analysis 51 (2006) 86108 99

-4 -2 0 2 4 6 80

0.10.20.30.40.50.60.70.80.9

1

p with B-Cp with C-M

Fig. 6. Optimal possibility distribution p knowing = 2 and = 1 and using BienaymChebychev and CampMeidel inequality.

6. Representations by continuous possibility distributions

The use of continuous possibility distributions for representing probability families heavily relies on probabilisticinequalities. Such inequalities provide probability bounds for intervals forming a continuous nested family around atypical value. This nestedness property leads to interpreting the corresponding family as being induced by a possibilitymeasure. While these bounds are often used for proving convergence properties, we propose here to use them forrepresenting knowledge. This is the case of the Chebyshev inequality for instance. The classical Chebyshev inequality(Kendall and Stuart, 1977) denes a bracketing approximation on the condence intervals around the known mean of a random variable X, knowing its standard deviation . The Chebyshev inequality can be written as follows:

P(|X |k)1 1k2

for k1.

By referring to the Section 2.2, Chebyshev inequality allows to dene a possibility distribution by considering intervals[ k, + k] as -cuts of and letting ( k)=(+ k)= 1/k2 (see Fig. 6). This possibility distribution (seeDubois et al., 2004) denes a probability familyP() such thatP, P() containing all distributions with knownmean and standard deviation, whether the unknown probability distribution function is symmetric or not, unimodal ornot. If it is moreover assumed that the unknown probability distribution is unimodal and symmetric, we can improvethe possibility distribution by means of CampMeidel inequality (Kendall and Stuart, 1977) (see Fig. 6).

P(|X |k)1 49k2

for k 23

.

Very often, and as seen above, the nested intervals share the same midpoints, thus yielding symmetric possibilitydistributions. In the following we do not make this restriction. On the contrary, we shall also rely on the most narrowintervals of xed condence levels as introduced earlier in this paper. It leads to exploit information on the mode ofdistributions rather than the mean. Moreover we make the additional assumption that the distributions have a boundedsupport. Some assumptions can be made on the shape of the density (without going to the point of choosing a particularmathematical model like a Gaussian): symmetry, convexity or concavity can be assumed, for instance.

6.1. Distributions with known mode and support: simple dominance

Suppose the mode M and the support I of the unknown probability distribution function p is supplied by an expert.In this section unimodality of distributions is assumed. One might argue that the mode best corresponds to the notionof usual value, as being the most frequently observed value. Even if the mode is known to be difcult to extract froma sample of statistical data, one may consider that the most frequent value (or a most frequent small range of values)is the natural feature extracted from repeated observations by humans. So the problem of representing this kind ofknowledge looks natural. We can take advantage of the fact that the cumulative distribution function F, associated to a

100 C. Baudrit, D. Dubois / Computational Statistics & Data Analysis 51 (2006) 86108

unimodal (asymmetric) probability distribution function p with mode M and bounded support I, satises the followingproperties:

F is convex on [inf(I ),M] since p increases on [inf(I ),M]. F is concave on [M, sup(I )] since p decreases on [M, sup(I )].

Thus, the concavity of F changes at M. Let PMI be the set of probabilities with support I = [b, c] and with mode M.Ferson (in Ferson et al., to appear) proposes to represent this knowledge by the probability box [FL, FL] such that FL(x) = (x b)/(M b) for x [b,M] and 1 otherwise. FL(x) = (x M)/(c M) for x [M, c] and 0 otherwise.

Indeed it is obvious that any probability distribution with mode M and support I is such that FL >F >FL.

Theorem 5. The triangular possibility distributionL=min(FL, 1 FL

), with support [b, c] and core {M} dominates

all probabilities lying in PMI .

Proof. Consider the nested family of intervals [x, y] such that

(x b)M b =

(c y)c M .

They are cuts of the triangular possibility distribution L. Dene the cumulative distribution FL as follows: FL(x) =(F (M)(x b))/(M b) for xM , and FL(x) = F(M) + ((x M)(1 F(M)))/(c M) for xM . Due to theconvexity of any F before the mode, and its concavity after the mode, it is clear that FL(x)F(x) for xM , andFL(x)F(x) for xM . Using (13) in Section 4, it is clear that

(P, x) PMI [b,M], P ([x, y]c) = F(x) + 1 F(y)FL(x) + 1 FL(y)= F(M)(x b)

M b + 1 (F(M) + (1 F(M))(y M)

c M)

= x bM b = L(x).

So it holds that L(A)P(A) A,P PMI .

Clearly this result corresponds to a Chebyshev-like probabilistic inequality built from the -cuts of L. The triangularpossibility distribution L of mode M is thus a more precise representation than the p-box

[FL, FL

]. Namely, the

probability family P (L) is a better approximation of PMI than the probability box[F,F

]proposed by Ferson.

Note that the assumption of bounded support is crucial in getting this piecewise linear representation. Moreover it isnoticeable that this distribution does not depend on the value F(M).

Suppose now this value is known. Let PM,F(M)I be the set of probabilities with support I = [b, c], with mode Mand value F(M) at M. The latter information can be modelled by a belief function (see Section 2.3) but we maywish to preserve a possibilistic representation and alter its shape so as to account for this fractile, still ensuring theDominance condition. Assume F(M)0.5. We choose nested intervals Jx =

[x, F1(1 F(x))] around the median,

and let M = F1(1 F(M)). We have seen F is concave on [M, c]. So, F >FL on [M, c]. Hence

MML = 11 F(M) {(c M)(1 F(M)) cF (M) + M}= F1L (1 F(M)).

C. Baudrit, D. Dubois / Computational Statistics & Data Analysis 51 (2006) 86108 101

0 1086420

0.2

0.4

0.60.8

1

ML M

covering

Fig. 7. M = 4, F(M) = 0.4, min =0 and max =10.

Now we can use p(x)=1P (Jx) as a possibility distribution dominating p. Let Jx =[x, y]. The following possibilitydistribution L,F(M) can then be used in place of L:

For x [b,M], p(x) = 2 F(x), and p is convex. So p(x)2 FL(x). So we let L,F(M)(x) = 2 FL(x). For y [M,ML], p(y) = F(x) + 1 F(y)F(M) + 1 F(y), and the latter is convex. We let L,F(M)(y) =

F(M) + 1 FL(y). For y [ML, c], p(y)=2 (1 F(y)) is convex. p(y)2 (1 FL(y)). So we let L,F(M)(y)=2 (1 FL(y)).

Thus, we have PM,F(M)I P(L,F(M)). The obtained shape is more realistic than the triangular fuzzy intervalespecially when M is the center of I, because the lack of balance of the probability mass is reected on the possibilitydistribution (see Fig. 7). In the case where F(M)= 0.5, we obtain M =M and we thus retrieve the triangular L witha support [b, c] and a core {M}. Note that it cannot be rened by exploiting the fact that both L and the above derivedL,F(M) dominate PM,F(M)I to rene the result, considering min(L,F(M), L) as a tighter approximant. Indeed, aspointed out earlier min

(L,F(M), L

)will not dominate PM,F(M)I generally, as P

(min

(L,F(M), L

))differs from

P(L,F(M)

) P (L).6.2. Accounting for fractiles in the continuous possibilistic representation

Suppose the expert provides the mode M and the median m of the probability distribution. Let PM,mI be the set ofsuch unimodal probability functions bounded by I = [b, c] and assume m

102 C. Baudrit, D. Dubois / Computational Statistics & Data Analysis 51 (2006) 86108

0 1 2 3 4 5 6 7 8 9 100

0.10.20.30.40.50.60.70.80.9

1

Fig. 8. Expert gives fractiles on 5%, 50% and 95% equal to 1, 5 and 9 on [0,10].

By denition x2 is the median, and suppose that it coincides with the mode. Let Px1,x2,x3I be the probability familyhaving these fractiles dened in the Section 5.2. With the same reasoning as above, we can represent this knowledgeby the following (symmetric) possibility distribution: (x1) = (x3) = F (x1) + 1 F (x3) = 0.1, (x2) = 1 andlinear interpolations on [b, x1], [x1, x2], [x2, x3] and [x3, c] for other values of (x) (see for instance Fig. 8). ClearlyPx1,x2,x3I PM,m P() (respecting the Dominance condition dened in Section 4).

6.3. Distributions with known mode and support: bracketing prediction intervals

Suppose that I =[b, c] contains the support of the unknown probability distribution function p and the symmetry of pis assumed. LetPSI be the set of such probabilities. Their mode is (b+c)/2 due to symmetry (but it includes the uniformprobability on I). If p is symmetric,the optimal transform p around the mode is convex on each side of the mode (Duboiset al., 2004). The symmetric triangular possibility distribution S with support I and core (b + c)/2 is thus such thatSp,p,and is really equal to suppPSI

p (Dubois et al., 2004). So not only doP (S) containPSI but also,the -cuts

of S bracket the narrowest prediction intervals of these probabilities. Nevertheless,P (S) also contains probabilitydensities that are not symmetric and whose mode differ from (b + c)/2 (but S does not bracket their predictionintervals,necessarily). One may argue that the p-box

[F, F

]

dened by F(x)= (x b)/(cb) if x(b+ c)/2,and1 otherwise, F(x) = (x b)/(c b) if x(b + c)/2, and 0 otherwise, is a more informative representation ofsymmetric densities with support in I. But of course, it cannot bracket their prediction intervals. The specic meritof S is precisely to bracked the prediction intervals in PSI . Interestingly, note that S = 2 min

(F

, 1 F

)for

x = (b + c)/2. If we know some fractiles, we can rene the representation as explained in the previous section. Suchrenements would respect the prediction interval condition (see Fig. 8) due to the symmetry assumption.

When p is asymmetric, the optimal transform p, associated to p may fail to be convex on each side of the mode M. Sothe -cuts of the triangular possibility distribution L with core {M} do not always contain the optimal (1)-predictionintervals of the probability measures of mode M, as clear from theorem 4 on the optimal transforms of piecewise lineardensities. For instance consider the example in Fig. 9 suggested in Dubois et al., 2004, where

p(x) = 0.6x + 1.2 on [2,1.5],p(x) = (0.2/3)x + 0.4 on [1.5, 0]

and

p(x) = 0.2x + 0.4 on [0, 2].The interval [1.4, 1.4] corresponding to the -cut equal to 0.3 of the triangular possibility distribution does not containthe optimal 0.7-prediction interval of probability measure of mode 0, which is [1.5, 1.5]: the optimal transform of p(in Section 4) is indeed not convex everywhere.

C. Baudrit, D. Dubois / Computational Statistics & Data Analysis 51 (2006) 86108 103

-2 -1.5 -1 -0.5 0 0.5 1 1.5 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1pLp optimal transformation of p

Fig. 9. Optimal transformation of p around the mode.

-2 -1.5 -1 -0.5 0 0.5 1 1.5 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

covering p*covering p* with pconvex on [-2,0] and concave on [0,2]

1-F(M)

F(M)

Fig. 10. The upper bound of p and its improvement when the concavity-convexity of p is know for M = 0 and F(M) = 0.4.

We can nevertheless nd an upper bound of p for a unimodal asymmetric continuous density p. Then, using theconcavity of F and considering nested intervals Jx = [x, max{y, p(y)p(x)} = f (x)] we have

For xM , p(x)F(x) + 1 F(f (x))FL(x) + 1 F(M) = F(M)(xb)Mb + 1 F(M). For xM , p(x)F(f1(x)) + 1 F(x)F(M) + 1 FL(x) = 1 1F(M)cM (x M).

Knowing the value F(M) is necessary to be able to dene this approximation (see Fig. 10 for instance). In general, itwill be difcult to come up with a more informative possibility distribution which accounts for the prediction intervalsof all probability measures on an interval I with xed mode, due to the wide range of such distributions. Some additionalassumptions must be made, for instance on the convexity-concavity of the unknown probability density function p.

Theorem 6. If the density function p is convex increasing on ]b,M[ and concave strictly decreasing on ]M, c[, thenp is also convex on ]b,M[.

Proof. See Appendix C.

104 C. Baudrit, D. Dubois / Computational Statistics & Data Analysis 51 (2006) 86108

The assumption F(M)< 0.5 is consistent with the convexity of p on ]b,M[ and its concavity on ]M, c[. In thiscase, a possibility distribution linearly increasing from 0 to 1 on [b,M] covers all optimal transforms of such densitieson this side. On the other side of the mode, using a linear shape is possible with (c) = 1 F(M) (see Fig. 10). Insummary, assuming F(M) is known and the assumption of Theorem 5 on convexity & monotony of p holds, then amore informative possibility distribution whose cuts contain the condence intervals of distributions of mode M havingsuch characteristics can be computed.

7. Conclusion

The notion of imprecise probability offers a natural formal framework for representing imprecise knowledge onnumerical quantities. Several types of information can be approximated by means of possibility distributions, othersare directly and exactly representable by belief functions, yet other more naturally t the probability box framework.In several cases, possibility distributions provide a concise approximate representation of a set of probability mea-sures, sometimes interpretable in terms of condence intervals of probabilities in such families. In fact each mode ofrepresentation seems to be adapted to the knowledge of specic characteristics of distributions. Only p-boxes seem tocapture information about mean values in a reasonable way. Belief functions directly model fractile information, whilepossibility measures are particularly well-suited for representing families of distributions whose mode is known, andcan integrate additional information on symmetry and concavity of densities, as well as known fractiles. The recentworks of Neumaier (2004) focus on probability familiesP of the formP=P() P(1 ) where is a possibilitydistribution and is a function I [0, 1] acting as a lower bound of , i.e. . The probability familyP()=P isrecovered when = 0. The probability family P is more precise than P() and assessing its potential demands morefuture investigations.

Our representation tools using possibility theory are currently applied to risk management problems (Baudrit et al.,2004b; Baudrit and Dubois, 2005). In such problems, straightforward Monte-Carlo methods involve too rich assump-tions of complete probabilistic knowledge and stochastic independence between parameters. Moreover, uncertaintydue to variability and uncertainty due to incomplete knowledge are mixed up in the resulting distribution. In contrast,Bardossy et al. (1995), Brdossy and Fodor (2004), Dou et al. (1995, 1997) present applications of possibility theoryto propagate imprecise information in environmental models. However a proper handling of real cases requires thepropagation of heterogeneous uncertain information where imprecision and variability of parameters are separatelyaccounted for and propagated through numerical models. Guyonnet et al. (2003) (see also Brdossy and Fodor, 2004)propose a method for the joint propagation of fuzzy intervals and probabilistic numbers. This method is further elab-orated in Baudrit et al. (2004b, 2006). Various joint possibilityprobability propagation techniques are compared inBaudrit and Dubois (2005), some involving independence assumptions, other ones, more conservative, avoiding suchassumptions. Comparison with p-box propagation is also made. For a stimulating discussion of various uncertaintypropagation techniques, using random intervals, imprecise probability and possibility theory, see Helton et al. (2004).

The unied representation framework proposed here makes it easy to represent poor data of various types in a faithfuland yet simple way. It facilitates the denition of a uniform mode of propagation in risk management, in spite of theheterogeneous character of the data collected, and it allows for the computation of conservative estimates, somethingthat is not allowed by traditional probabilistic methods.

Acknowledgements

This work is supported by the French Institutes B.R.G.M, I.R.S.N and I.N.E.R.I.S.

Appendix A

Proof of Theorem 1. : Let be P P() and an interval A such that A = [x, y] containing a.By denition, N(A)P(A) is equivalent to F(y)F(x)1supz/[x,y] (z), i.e. F(x)+1F(y) max((x),

(y)). We thus have P() {P,x, y, xay, F (x) + 1 F(y) max((x), (y))}.

C. Baudrit, D. Dubois / Computational Statistics & Data Analysis 51 (2006) 86108 105

: Let be P {P,xay, F (x) + 1 F(y) max((x), (y))}. Considering any measurable A.

(a) For A = (, x] with xa, F(x) + 1 F(+) max((x), (+)) F(x)+(x) P(A)(A).(b) For A = [y,+) with ya, F()+ 1 F(y) max((y), ()) 1 F(y)(y) P(A)(A).(c) For A = [x, y] with ya, knowing that F is increasing and according to case (a), we have F(y) F(x)F(y)

+(y). Hence P(A)(A).(d) For A = [x, y] with xa, knowing that F is limited by 1 and according to case (b), we have F(y) F(x)

1 F(x)(x). Hence P(A)(A).(e) For A, union of intervals such that (A)< 1. Suppose (A) is obtained for some y which lies on the right

side of a. We may consider a set A = (, x] [y,+) such that (x) = (y). Necessarily, A containsA, and we have(A) = (A) = (x) and P(A)P(A). We have xay, thus P(A)P(A) = F(x)+ 1 F(y) max((x), (y)) = (A) = (A). We then have P(A)(A).

(f) For A, union of intervals such that (A) = 1, choose y on the boundary of A such that (y) is maximal. Suppose thaty is on the right of a, we can consider a set as A = [x, y] A such that (x) = (y). We have (A) = (A) = 1and N(A) = N(A), moreover xay thus F(x)+ 1 F(y) max((x), (y)) F(y)F(x)1 (y).we then have, N(A) = N(A)P(A)P(A), thus P(A)N(A).

Appendix B

Proof of Theorem 4. First, we show that the optimal transform of a triangular density function p, is convex on eachside of the mode M (see Fig. 11). Let [b, c] be the support of p.We have:

p(x) = p(M)M b (x b) and p

1 () =M bp(M)

+ b

and

p+(x) = p(M)c M (c x) and p

1+ () = c c Mp(M)

.

For [0, p(M)], we obtain p(p1 ()

)= p

(p1+ ()

)= 22p(M) (c b). Then

For xM , by putting = p(x), we have

p(x) =p(M)(c b)2(M b)2 (x b)

2,

whose second derivative is positive, hence p is convex on [b,M].

S1 S2S1+S2

1

p

()1 p+ ()1 cb M p ()1 p+ ()1 cb M

p

p+

p(M) p*

Fig. 11. Triangular probability density p on the left and the shape of its p optimal transformation on the right.

106 C. Baudrit, D. Dubois / Computational Statistics & Data Analysis 51 (2006) 86108

b a2 M a4 c

p1

p2 p3

p4p(a2)p(a4)p(M)

Fig. 12. Linear unimodal continuous probability density.

b

T1 T2

S1 S2

R1 R2

p2 ()1 p4 ()11p2 (p(a4)) 1p4 (p(a2))a2 a4M c

p1

p2 p3

p4

p(a2)

p(a4)

p(M)

Fig. 13. Step of the optimal transformation of linear unimodal continuous probability density when [p (a2) , p (a3)].

p *

b a2 a4p2 (p(a4))1 p4 (p(a2))1M c

p(M)=1*

p(a4)*p(a2)*

Fig. 14. Shape of p optimal transformation of linear unimodal continuous probability density.

Similarly, for xM , by putting = p+(x), we have

p(x) =p(M)(c b)2(c M)2 (x c)

2.

Hence p is convex on [M, c].Now let p be piecewise linear and 12 n be the ordinates of the points where the slope changes. In

particular p(M) = n and p(b) = p(c) = 1 = 2 = 0.For illustration, we picture the case where the density p is linear on 4 intervals

[b = min(supp(p)), a2

], [a2,M],

[M,a4],[a4, c = max(supp(p))

] (see Fig. 12).Consider index i such that i < i+1,

[i , i+1

]. Denote

[bi, bi

]and

[bi+1, bi+1

]the intervals whose end-points

have ordinates i and i+1, and [x, y] such that p(x) = p(y) = (see Fig. 13 where i = p (a2) ,

C. Baudrit, D. Dubois / Computational Statistics & Data Analysis 51 (2006) 86108 107

i+1 = p (a4) ,[bi, bi

] = [a2, p14 (p (a2))] , [bi+1, bi+1] = [p12 (p (a4)) , a4]) and [x, y] = [p12 (), p14 ()].The integral computing p(x) = p(y) contains a constant part corresponding to the areas under p outside the in-terval

[bi, bi

] (T1 and T2 in Fig. 13), plus a part linear in corresponding to rectangles (R1 and R2 in Fig. 13)of areas i

(x bi

)and i

(bi y

)inside the intervals

[bi, x

]and

[y, bi

], plus a quadratic part in corre-

sponding to the area of the remaining triangles (S1 and S2 in Fig. 13) located inside the intervals[bi, x

]and[

y, bi]

and bounded by p and the horizontal line of ordinate i and the vertical lines of abcissae x and y, respec-tively. The second derivative of p is this equal to zero except for the quadratic part of p for which it is con-stant. Hence, we nd the same expression as in the optimal transformation triangular density p (see Fig. 11). Then,p is piecewise convex and Fig. 14 shows the shape of p in the case where the density p is linear between 5points.

Appendix C

Proof of Theorem 6. We must show that the second derivative of p is positive on ]b,M[. Consider p1 (the left partof p) and p2 (the right part of p) dened as follows:

x [b,M], p1(x) = p(x) and 0 otherwise. x [M, c], p2(x) = p(x) and 0 otherwise.

For x [b,M], p(x) = F(x) + 1 F(f (x)) where f (x) = max{y, p(y)p(x)}.If we differentiate p on ]b,M[, we obtain

p (x) = F (x) f (x)F (f (x)) = p1(x) f (x)p2(f (x)).However p1(x) = p2(f (x)), thus

p (x) = p1(x)(1 f (x)) .

Hence differentiating again

p (x) = p1(x)(1 f (x)) p1(x)f (x).

We know that p1(x) = p2(f (x)); if we differentiate this equality, we obtain

f (x) = p1(x)

p2(f (x)).

The function p1 increases on ]b,M[, then p10. The function p2 strictly decreases on ]M, c[, then p2 < 0. We thusdeduce that f 0. We conclude that

p1(x)(1 f (x))0, x ]b,M[.By differentiating again f , we obtain

f (x) = p1(x) (f (x))2p2(f (x))

p2(x).

We know that p is convex on ]b,M[ (resp. concave on ]M, c[), we have p1(x)0 for all x ]b,M[ (resp. p2(x)0for all x ]M, c[). Hence, p1(x) (f (x))2p2(f (x))0 for all x ]b,M[ and thus f (x)0 for all x ]b,M[. Wethus conclude that p1(x)f (x)0,x ]b,M[.

To summarize, we have p1(x)(1 f (x))0 and p1(x)f (x)0,x ]b,M[ We thus have proved that p ispositive on ]b,M[, and hence the convexity of p on ]b,M[.

108 C. Baudrit, D. Dubois / Computational Statistics & Data Analysis 51 (2006) 86108

References

Bardossy, A., Bronstert, A., Merz, B., 1995. 1-, 2- and 3-dimensional modeling of groundwater movement in the unsaturated soil matrix using afuzzy approach. Adv. Water Resources 18 (4), 237251.

Brdossy, G., Fodor, J., 2004. Evaluation of Uncertainties and Risks in Geology: New Mathematical Approaches for their Handling. Springer, Berlin.ISBN 3-540-20622-1.

Baudrit, C., Dubois, D., 2005. Comparing methods for joint objective and subjective uncertainty propagation with an example in a risk assessment.Fourth International Symposium on Imprecise Probabilities and Their Application (ISIPTA05), Pittsburg, PA, USA, 2005, pp. 3140.

Baudrit, C., D. Dubois, H. Fargier, H., 2004a. Practical representation of incomplete probabilistic information. Advances in Soft Computing: SoftMethods of Probability and Statistics Conference, Oviedo, 2004a, pp. 149156.

Baudrit, C., Dubois, D., Guyonnet, D., Fargier, H., 2004b. Joint treatment of imprecision and randomness in uncertainty propagation. Proceedingsof the Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, Perugia, 2004b, pp. 873880.

Baudrit, C., Dubois, D., Guyonnet, G., 2006. Joint propagation and exploitation of probabilistic and possibilistic information in risk assessmentmodels. IEEE Transaction on Fuzzy Systems, to appear.

De Cooman, G., Aeyels, D., 1999. Supremum-preserving upper probabilities. Inform. Sci. 118, 173212.Dempster, A.P., 1967. Upper and lower probabilities induced by a multivalued mapping. Ann. Math. Statist. 38, 325339.Dou, C., Woldt, W., Bogardi, I., Dahab, M., 1995. Steady-state groundwater ow simulation with imprecise parameters. Water Resource Res. 31

(11), 27092719.Dou, C., Woldt, W., Bogardi, I., Dahab, M., 1997. Numerical solute transport simulation using fuzzy sets approach. J. Contaminant Hydrol. 27,

107126.Dubois, D., Prade, H., 1987. The mean value of a fuzzy number. Fuzzy Sets and Systems 24, 279300.Dubois, D., Prade, H., 1988. Possibility Theory: An Approach to Computerized Processing of Uncertainty. Plenum Press, New York.Dubois, D., Prade, H., 1990. Consonant approximations of belief functions. Int. J. Approx. Reason. 4 (5/6), 419449 (Special Issue: Belief Functions

and Belief Maintenance in Articial Intelligence).Dubois, D., Prade, H., 1992. When upper probabilities are possibility measures. Fuzzy Sets and Systems 49, 6574.Dubois, D., Prade, H., Sandri, S.A., 1993. On possibility/probability transformations. In: Lowen, R., Roubens, M. (Eds.), Fuzzy Logic: State of the

Art. Kluwer, Dordrecht, pp. 103112.Dubois, D., Prade, H., Smets, P., 1996. Representing partial ignorance. IEEE Trans. Systems, Man Cybern. 26 (3), 361377.Dubois, D., Nguyen, H.T., Prade, H., 2000. Possibility theory, probability and fuzzy sets: misunderstandings, bridges and gaps. In: Dubois, D., Prade,

H. (Eds.), Fundamentals of Fuzzy Sets. Kluwer, Boston, MA, pp. 343438.Dubois, D., Mauris, G., Foulloy, L.H. Prade, 2004. Probabilitypossibility transformations, triangular fuzzy sets and probabilistic inequalities. Reliab.

Comput. 10, 273297.Feller, W., 1948. On the KolmogorovSmirnov limit theorems for empirical distributions. Ann. Math. Statist. 19, 177189.Ferson, S., Ginzburg, L.R., 1996. Different methods are needed to propagate ignorance and variability. Reliab. Eng. Systems Safety 54, 133144.Ferson, S., Ginzburg, L., Kreinovich, V., Myers, D.M., Sentz, K., 2003. Construction of probability boxes and DempsterShafer structures. Sandia

National Laboratories, Technical report SANDD2002-4015. Available at http://www.sandia.gov/epistemic/Reports/SAND2002-4015.pdf.Ferson, S., Ginzburg, L., Akcakaya, R., to appear. Whereof one cannot speak: when input distributions are unknown. Risk Analysis.Guyonnet, D., Bourgine, B., Dubois, D., Fargier, H., Cme, B., Chils, J.-P., 2003. Hybrid Approach for Addressing Uncertainty in Risk Assessments.

J. Environ. Eng. 129 (1), 6878.Gzyl, H., 1995. The method of maximum entropy. Series on Advances in Mathematics for Applied Sciences, vol. 29.Helton, J.C., Johnson, J.D., Oberkampf, W.L., 2004. An Exploration of Alternative Approaches to the Representation of Uncertainty in Model

Predictions. Reliab. Eng. System Safety 85 (13), 3971.Jaffray, J.Y., 1992. Bayesian updating and belief functions. IEEE Trans. Systems, Man Cybern. 22, 11441152.Kendall, M., Stuart, A., 1977. The Advanced Theory of Statistics. Grifn and Co.,Kriegler, E., Held, H., 2005. Utilizing belief functions for the estimation of future climate change. Int. J. Approx. Reason. 39 (23), 185209.Miller, L.H., 1956. Table of percentage points of Kolmogorov statistics. J. Amer. Statist. Assoc. 51, 111121.Neumaier, A., 2004. Clouds, Fuzzy Sets and Probability Intervals. Reliab. Comput. 10, 249272.Shafer, G., 1976. A Mathematical Theory of Evidence. Princeton University Press, Princeton, NJ.Smets, P., 2005. Belief functions on real numbers. Int. J. Approx. Reason. 40 (3), 181223.Smets, P., Kennes, R., 1994. The transferable belief model. Artif. Intell. 66, 191234.Walley, P., 1991. Statistical Reasoning with Imprecise Probabilities. Chapman and Hall, London.

Practical representations of incomplete probabilistic knowledgeIntroductionFormal frameworks for representing imprecise probabilityProbability boxesBasics of numerical possibility theoryImprecise probability induced by random intervals

Comparative expressivity of probability boxes and possibility distributionsApproximating probability families by possibility distributionsSimple models of incomplete probabilistic knowledge using p-boxes and belief functionsRepresentations by probability boxesProbability distributions with known mean and supportRepresenting small data samples by a p-box

Discrete belief functions representations

Representations by continuous possibility distributionsDistributions with known mode and support: simple dominanceAccounting for fractiles in the continuous possibilistic representationDistributions with known mode and support: bracketing prediction intervals

ConclusionAcknowledgementsAppendixAppendixAppendixReferences