# Summaries of Talks

**Alesandro Antonucci (IDSIA, Lugano, Switzerland)****Bayesian networks with imprecise probabilities: theory and applications to knowledge-based systems and classification**

Bayesian networks are important tools for uncertain reasoning in AI; their quantification requires a precise assessment of the conditional probabilities. Credal networks generalize Bayesian networks, so that probabilities can vary in a set (e.g., interval). This provides a more realistic model of expert knowledge and returns more robust inferences. The first part of the talk describes the specification procedure for credal network, the existing inference algorithms and approaches to decision making; two prototypical examples of knowledge-based expert systems related to military decision making and environmental risk analysis based on credalnetworks are indeed presented. In the second part, we describe the major examples of credal classifiers, i.e., classification algorithms based on credal networks, developed so far. Credal classifiers generalize the traditional Bayesian classifiers, which are based on a single prior density and on a single likelihood. Credal classifiers are instead based on (i) a set of priors, thus removing the need for subjectively choosing a prior and (ii) possibly also on a set of likelihoods, to allow robust classification even with missing data. Credal classifiers can return more classes if the assignment to a single class is too uncertain; in this way, they preserve reliability. Algorithms for credal classification and comparison with traditional classifiers on a large number of data sets. Also the problem of evaluating performance of a classifier possibly returning multiple output and alternative quantification techniques are discussed.

[REFERENCES]

Corani, G., Antonucci, A., Zaffalon, M. (2012). Bayesian networks with imprecise probabilities: theory and application to classification.

In Holmes, D.E., Jain, L.C. (Eds),Data Mining: Foundations and Intelligent Paradigms, Intelligent Systems Reference Library 23, Springer, Berlin / Heidelberg, pp. 49–93.

http://ipg.idsia.ch/preprints/corani2012c.pdf

Piatti, A., Antonucci, A., Zaffalon, M. (2010). Building knowledge-based expert systems by credal networks: a tutorial.

In Baswell, A.R. (Ed), Advances in Mathematics Research11, Nova Science Publishers, New York.

http://ipg.idsia.ch/preprints/antonucci2010d.pdf

**Thomas Augustin (University of Munich, Germany)Imprecise Probability in Statistical Modelling: A Critical Review**

The talk discusses the high potential of imprecise probabilities in statistical modeling. By its substantially broader understanding of uncertainty, imprecise probability offers new avenues for powerful statistical modeling. We exemplify this by considering three prototypic areas: the use of neighborhood-models as a superstructure upon robust statistics, the expressive modeling of prior-data conflict in generalized Bayesian inference and the proper handling of data imprecision. We then elaborate some desiderata for the further development of imprecise probability methodology, strengthening further its impact on statistical modeling.

References

Augustin, T., Walter, G., and Coolen, F.P.A. (2014): Statistical Inference.

In: T. Augustin, F.P.A. Coolen, G. de Cooman, M.C. Troffaes (eds.): Introduction to Imprecise Probabilities, Wiley, Chichester, pp. 135--189.

Schollmeyer, G., Augustin, T. (2015): Statistical modeling under partial identification: Distinguishing three types of identification regions in regression analysis with interval data.

International Journal of Approximate Reasoning 56: 224-248.

**E. Chojnacki (IRSN, Cadarache)Why bother with non-probabilistic models in risk analysis ?**

Probabilistic models are well acknowledged in risk analysis, however in industrial applications the uncertainty modelling by joint probability laws can be a tough task. Indeed uncertainty is a polysemous word leading to some equivocal transcriptions in probability distributions. The formalism associated to some subsets of imprecise probabilities appears to be an efficient way to overcome some difficulties (even the impossibility) for an analyst to represent his or her knowledge in an appropriate probabilistic model. Our talk will be illustrated by examples encountered in nuclear safety studies.

Reference

Baccou J, Chojnacki E (2014) A practical methodology for information fusion in presence of uncertainty: application to the analysis of a nuclear benchmark. Environ Syst Decis 34(2)

**Gert De Cooman (Ghent University, Belgium)A martingale-theoretic approach to discrete-time stochastic processes with imprecise probabilities **

We show how the definition of a submartingale can be suitably extended to an imprecise probability context, and how these notions can be used to derive a generalisation of Ville’s Theorem, leading to expressions for lower and upper previsions associated with a convex cone of submartingales. This result can then be used to lay the foundations for a theory of stochastic processes in discrete time using imprecise probably models. We discuss imprecise Markov chains and their ergodicity properties as a special case.

**Thierry Denœux, (Heudiasyc, UTC, Compiègne)Title: Statistical estimation and prediction using belief functions**

Abstract: Classically, the uncertainty of statistical forecasts is described either by frequentist prediction intervals, or by posterior predictive Bayesian distributions. In this talk, we advocate an alternative approach, which consists in modeling estimation uncertainty using a consonant belief function constructed from the likelihood, and combining it with random uncertainty arising from the data-generating process. The resulting predictive belief function is argued to be better founded than frequentist prediction intervals. It is also more widely applicable than Bayesian posterior distributions, which always require prior knowledge of parameters. However, the proposed approach boils down to Bayesian prediction when probabilistic prior information is available. The predictive belief function can be approximated to any desired accuracy using Monte Carlo simulation. We illustrate the method using simple examples, including linear regression with and without serial correlation.

**Sébastien Destercke (Heudiasyc, UTC Compiègne)Cost-sensitive classification: recent advances**

Many modern learning problems are cost-sensitive, in the sense that different prediction mistakes may induce different costs. This is particularly the case when considering problems where the output space is structured, i.e., binary vectors, rankings, ordered labels. How to make efficient inferences with imprecise probabilities when considering such problems is not an easy problem. In this talk, we will review some recent results related to this issue, for the specific problems of ordinal regression and multilabel classification"

**D. Dubois,H. Fargier R. Guillaume (IRIT, Toulouse)Deciding under ignorance: for or against resolute choice ? **

The major paradigm for sequential decision under uncertainty is expected utility. This approach has many good features that qualify it for posing and solving decision problems, especially dynamic consistency and computational efficiency via dynamic programming. However, when uncertainty is due to sheer lack of information, and expected utility is no longer a realistic criterion, the approach collapses because dynamic consistency becomes counterintuitive and the global non-expected utility criteria are no longer amenable to dynamic programming. In this paper we argue against Resolute Choice strategies, following the path opened by Jaffray, and suggest that the dynamic programming methodology may lead to more intuitive solutions respecting the Consequentialism axiom, while a global evaluation of strategies relying on lottery reduction is questionable.

**H. Fargier (IRIT, Toulouse) O Spanjaard (LIP6, UMPC, Paris)Resolute Choice in Sequential Decision Problems with Multiple Priors**

This talk is devoted to sequential decision making under uncertainty, in the multi-prior framework of Gilboa and Schmeidler [1989]. In this setting, a set of probability measures (priors) is defined instead of a single one, and the decision maker selects a strategy that maximizes the minimum possible value of expected utility over this set of priors. We are interested here in the resolute choice approach, where one initially commits to a complete strategy and never deviates from it later. Given a decision tree representation with multiple priors, we study the problem of determining an optimal strategy from the root according to min expected utility. We prove the intractability of evaluating a strategy in the general case. We then identify different properties of a decision tree that enable to design dedicated resolution procedures. Finally, experimental results are presented that evaluate these procedures.

**Aurélien Garivier (IMT- Université Paul Sabatier)Optimism in Reinforcement Learning and Kullback-Leibler Divergence**

We consider model-based reinforcement learning in finite Markov Decision Processes (MDPs), focussing on so-called optimistic strategies. In MDPs, optimism can be implemented by carrying out extended value iterations under a constraint of consistency with the estimated model transition probabilities. The UCRL2 algorithm by Auer, Jaksch and Ortner (2009), which follows this strategy, has been shown to guarantee near-optimal regret bounds. In this talk, we strongly argue in favor of using the Kullback-Leibler (KL) divergence for this purpose. By studying the linear maximization problem under KL constraints, we provide an efficient algorithm, termed KL-UCRL, for solving KL-optimistic extended value iteration. Using recent deviation bounds on the KL divergence, we prove that KL-UCRL provides the same guarantees as UCRL2 in terms of regret. However, numerical experiments on classical benchmarks show a significantly improved behavior, particularly when the MDP has reduced connectivity. To support this observation, we provide elements of comparison between the two algorithms based on geometric considerations.

**R. Guillaume, D. Dubois (IRIT Toulouse)Robust parameter estimation of density functions under interval observations**

This study deals with the derivation of a probabilistic parametric model from interval data using the maximum likelihood principle. In contrast with classical techniques such as the EM algorithm, that define a precise likelihood function by computing the probability of observations viewed as a collection of non-elementary events, our approach presupposes that each imprecise observation underlies a precise one, and that the uncertainty that pervades its observation is epistemic, rather than representing noise. We define an interval-valued likelihood function and apply robust optimisation methods to find a safe plausible estimate of the statistical parameters. The approach is extended to fuzzy data by optimizing the average of lower likelikoods over a collection of data sets obtained from cuts of the fuzzy intervals, as a trade off between optimistic and pessimistic interpretations of fuzzy data.

**Jean-Michel Loubes (IMT- Université Paul Sabatier)Comparing probabilities with Wasserstein distance : deformations and Barycenters.**

"Comparing probabilities and giving a sense to the mean of several distributions is a difficult task. Here we consider the Wasserstein's distance and study the properties of the Barycenter (or Fréchet mean) estimator of several empirical sample. Moreover we provide a new statistical analysis to compare the deformations of these distributions with regards to this 'mean'"

**Gilles Mauris (Université de Savoie)Revisiting some conventional statistical notions in the framework of possibility theory **

The talk reviews the deep connections between probability and possibility measurement uncertainty representation:

the definition of a possibility distribution equivalent to a probability of one from its whole set of dispersion intervals about one point, the bridges with the conventional dispersion parameters, the representation of a partial probability knowledge owing to a maximum specificity principle better than the maximum entropy principle, and also relationships with probability inequalities, a possibility theory formulation of confidence interval parameter estimation.

References

Mauris G., A review of relationships between possibility and probability representations of uncertainty in measurement, IEEE Trans. on Instrumentation and Measurement, Vol. 62, No 3, 2013, pp. 622-632.

Mauris G., Possibility distributions: A unified representation of usual direct probability-based parameter estimation methods, International Journal of Approximate Reasoning, Vol. 52, No. 9, 2011, pp. 1232-1242.

**Serafin Moral (University of Granada, Spain)Likelihood Based Methods and the Learning of Credal Networks.**

Credal networks [4] are a generalization of Bayesian networks in which probabilities are imprecise. They have been used for supervised classification [3] and as general knowledge based systems [1]. The result of these models will not be a single optimal decision but a set of admissible decisions. One of the reasons of the success of probabilistic graphical models is the possibility of learning from observational data. Learning implies inducing a directed acyclic graph and to estimate the parameters of the conditional probability distributions associated to it. To learn it is necessary to make hypotheses and when these hypotheses are not satisfied, then we can have problems. We will show some practical example in which learning with the BDEu score has some undesirable behavior. Though, it is not possible to learn without assumptions, it is true that when assumptions are weaker, then the results will be more robust. That is what can be achieved with the use of imprecise probability models. Usually, the problem of learning with credal networks has been addressed considering a fixed graphical structure in which imprecise parameters are estimated. We recently proposed a procedure to learn generalized credal networks in which the graphical structure is also imprecise [5]. This talk will review the existing methods for learning imprecise probabilistic models. In particular, we will show the methods for learning alternative structures based on the imprecise sample size Dirichlet model [5]. But, we will also insist in enlarging the initialset of alternatives including other models for the relation of a variable with its set of parents as the the noisy-or gate model. In order to make the procedures effective we will demonstrate the necessity of using likelihood information as in [6, 8]. A simplification useful for practical purposes is to consider models based on profile-likelihood [7]. Likelihood based methods for supervised classification introduced in [2] will be extended to general credal networks.

References

[1] A. Antonucci, B. Brühlmann, A. Piatti, and M. Zaffalon. Credal net-works for military identification problems. International Journal of Approximate Reasoning, 50:666–679, 2014.

[2] A. Antonucci, M. Cattaneo, and G. Corani. Likelihood-based robust classification with bayesian networks. In Advances in Computational Intelligence, pages 491–500. Springer, 2012.

[3] G. Corani, J. Abell ́an, A. Masegosa, S. Moral, and M. Zaffalon. Classification. In Th. Augustin, F. Coolen, G. de Cooman, and M. Troffaes, editors, Introduction to Imprecise Probabilities, pages 230–257. Wiley, Chichester, U.K., 2014.

[4] F.G. Cozman. Credal networks. Artificial Intelligence, 120:199–233, 2000.

[5] A. Masegosa and S. Moral. Imprecise probability models for learning multinomial distributions from data. applications to learning credal networks. International Journal of Approximate Reasoning, 55:1548–1569, 2014.

[6] S. Moral. Calculating uncertainty intervals from conditional convex sets of probabilities. In D. Dubois et al., editors, Proceedings of the 8th Conference on Uncertainty in Artificial Intelligence, pages 199–206, San Mateo, 1992. Morgan & Kaufmann.

[7] Y. Pawitan. In All Likelihood: Statistical Modelling and Inference Using Likelihood. Oxford University Press, Oxford, 2001.

[8] P. Walley and S. Moral. Upper probabilities based only in the likelihood function. Journal of the Royal Statistical Society, B, 61:831–847, 1996.

**Mathieu Serrurier (IRIT, Université Paul Sabatier).**

**Imprecise probabilities and machine learning : a tradeoff between accuracy and epistemic uncertainty**

The goal of machine learning approaches is to find the best trade-off between the complexity of the model (generally related to its accuracy) and the amount of data available. In this talk, we show that, in some cases, this compromise is a matter of epistemic uncertainty that can be handled by imprecise probabilities. Thus, we present some loss functions and their associated entropy measures that reflect both the accuracy of a probability distribution and the uncertainty around the evaluation of its parameters. Then, we explain how these measures can be used to learn classification or regression model that satisfy the trade-off between accuracy and epistemic uncertainty and we propose some online algorithm based on this compromise.

**Olivier Strauss (LiRMM,Montpellier)Possibilistic signal processing: how to handle scant sensor knowledge **

In digital image processing, kernel functions play a central role.

A kernel function can model the point spread function of an imager, a linear filtering process, a linear aggregation process, a continuous-to-discrete interplay, etc.

Most digital image processing try to mimic an analog image processing by mean of an algorithm.

For example, rotating a digital image consists of estimating the digital image that would have been obtained by rotating the camera before the image acquisition.

Generally, the choice of the kernel has no real impact - e.g. when you are photoshopping pictures of your children. However, in specific contexts like medical imaging, the kernel choice can have a drastic influence on the resulting image - e.g. artifacts in computerized tomography - or lead to misinterpretation of the resulting image - e.g. in multimodal image registration.

Choosing a particular kernel function for processing digital images can be a difficult task leading to hazardous results. Identifying the point spread function of an imager is know as being an ill-conditionned problem.

No interpolation function can ensure reversibility of an image transformation: rotating an image forward then backward does barely lead to the original image.

Finding the appropriate smoothing filter to improve the signal-to-noise ratio of an image can be intricate. etc.

Saying that choosing a particular kernel is hazardous does not mean that the kernel is completely unknown but rather imprecisely known.

For example, point spread function of imagers are usually appropriately modeled by centered, symmetric, positive functions with a support whose spread is lower than twice the sampling step. But no particular shape of this function can be proposed. The same acts for interpolation kernels.

Modeling scant knowledge on kernel functions is not possible in the traditional signal processing framework. Deviation on the processed image due to the inappropriate kernel choice is usually considered as a random variation when it is not.

In this talk, we will see how Choquet concave capacities can be used for modeling imprecise knowledge on a kernel function.

The first model has been proposed by Loquin et al in [1] under the name of maxitive kernels.

The maxitive kernel framework is based on the possibility measure framework and has a straightforward interpretation.

It has been used to define imprecise filtering [2], guaranteed image rigid transformations, etc. and it has been proved to be a bridge between linear filtering and mathematical morphology.

More sophisticated models have been proposed that are based on clouds [4], p-boxes or on extending the Perfilieva's fuzzy transform [5].

It has been used to achieve deconvolution in a semi-blind context (i.e. when the point spread function of the imager is imprecisely known) [6] and error quantified tomographic reconstruction.

One of the particularity of this approach is that the output is interval-valued. This interval value is simply the convex set of all the values that would have been obtained by considering a kernel belonging to the considered set of kernels. Among different advantages, this modeling allows robustness w.r.t. the modeling without a drastic additional computational cost. Moreover, in case of random noise, the spread of the interval-valued output is a good marker of the estimation error [6]

References

[1] K. Loquin, O. Strauss, On the granularity of summative kernels, Fuzzy Sets and Systems, Volume 159, Issue 15, 1 August 2008, 1952-1972.

[2] A. Rico, O. Strauss, Imprecise expectations for imprecise linear filtering, International Journal of Approximate Reasoning, Volume 51, Issue 8, October 2010, 933-947.

[3] O. Strauss, K. Loquin, Linear filtering and mathematical morphology on an image: a bridge, IEEE International Conference on Image Processing, ICIP 2009, Le Caire, Égypte. pp. 3965–3968.

[4] S. Destercke, O. Strauss, Using Cloudy Kernels for Imprecise Linear Filtering, Computational Intelligence for Knowledge-Based Systems Design, IPMU 2010, Dortumund, Germany, June 20-July 02 2010, pp. 198-207.

[5] O. Strauss, Non-additive interval-valued F-transform, Fuzzy Sets and Systems, Available on line.

[6] K. Loquin, O. Strauss, J-F. Crouzet, Possibilistic signal processing: how to handle noise?, International Journal of Approximate Reasoning, Volume 51, Issue 9, November 2010, 1129-1144.

**MatthiasTroffaes (University of Durham, UK)**

**Solving practical decision problems under severe uncertainty:**

**some applications of imprecise probability in the environmental and engineering sciences.**

Since Abraham Wald's seminal 1939 paper, decisions have been at the center of classical statistical inference. In this talk, I will argue that they also play a central role in the interpretation of imprecise probability, and in the practice of inference under uncertainty in cases where it is difficult to specify a full probability distribution, due to lack of information.

We will then go on and briefly discuss their application in two real-life problems. The first application stems from the environmental sciences, and concerns crop rotation modelling. We investigate how imprecise probability can help identifying robust policies for decreasing manure intensive crops, thereby promoting ecological diversity and sustainability. The second application concerns an engineering problem. We investigate how we can make sensible decisions on energy storage, given uncertainty about the future climate and increasing renewables such as wind - which are very sensitive to climate change. These applications will demonstrate how imprecise probability can help us to handle assumptions that are hard to validate from data or expert opinion, through careful sensitivity analysis.

The work presented is supported by the Food and Environment Research Agency (York, UK), National Grid (UK), EPSRC (grant no EP/K002252/1), and BP.

References:

Thomas Augustin, Frank P. A. Coolen, Gert De Cooman, and Matthias C.

M. Troffaes, editors. Introduction to Imprecise Probabilities. Wiley

Series in Probability and Statistics. Wiley, 2014.

Lewis Paton, Matthias C. M. Troffaes, Nigel Boatman, Mohamud Hussein, and Andy Hart. Multinomial logistic regression on Markov chains for crop rotation modelling. In Anne Laurent, Oliver Strauss, Bernadette Bouchon-Meunier, and Ronald R. Yager, editors,Proceedings of the 15th International Conference IPMU 2014 (Information Processing and Management of Uncertainty in Knowledge-Based Systems, 15-19 July 2014, Montpellier, France), volume 444 of Communications in Computer and Information Science, pages 476-485. Springer, 2014.

Matthias C. M. Troffaes and Gert de Cooman. Lower Previsions. Wiley

Series in Probability and Statistics. Wiley, 2014.

Matthias C. M. Troffaes, Edward Williams, and Chris J. Dent. Data analysis and robust modelling of the impact of renewable generation on long term security of supply and demand. Accepted for the IEEE PES General Meeting 2015.

Abraham Wald. Contributions to the theory of statistical estimation and testing hypotheses. The Annals of Mathematical Statistics, 10(4):299-326, December 1939.