How to select a model if we know probabilities with interval uncertainty?

Vladik Kreinovich (Department of Computer Science, University of Texas at El Paso, El Paso, Texas, USA)

Asian Journal of Economics and Banking

ISSN: 2615-9821

Article publication date: 8 November 2023

162

Abstract

Purpose

When the probability of each model is known, a natural idea is to select the most probable model. However, in many practical situations, the exact values of these probabilities are not known; only the intervals that contain these values are known. In such situations, a natural idea is to select some probabilities from these intervals and to select a model with the largest selected probabilities. The purpose of this study is to decide how to most adequately select these probabilities.

Design/methodology/approach

It is desirable to have a probability-selection method that preserves independence. If, according to the probability intervals, the two events were independent, then the selection of probabilities within the intervals should preserve this independence.

Findings

The paper describes all techniques for decision making under interval uncertainty about probabilities that are consistent with independence. It is proved that these techniques form a 1-parametric family, a family that has already been successfully used in such decision problems.

Originality/value

This study provides a theoretical explanation of an empirically successful technique for decision-making under interval uncertainty about probabilities. This explanation is based on the natural idea that the method for selecting probabilities from the corresponding intervals should preserve independence.

Keywords

Citation

Kreinovich, V. (2023), "How to select a model if we know probabilities with interval uncertainty?", Asian Journal of Economics and Banking, Vol. ahead-of-print No. ahead-of-print. https://doi.org/10.1108/AJEB-08-2023-0078

Publisher

:

Emerald Publishing Limited

Copyright © 2023, Vladik Kreinovich

License

Published in Asian Journal of Economics and Banking. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at http://creativecommons.org/licences/by/4.0/legalcode


1. Formulation of the problem

1.1 Need for indirect measurements and data processing

In many practical situations, we are interested in the quantity y that is difficult – or even impossible to measure directly. For example, we may be interested in tomorrow's temperature or in next year's gross domestic product (GDP). Since we cannot measure the quantity y directly, we need to measure it indirectly, i.e.:

  1. find easier-to-measure quantities x1, …, xn which are related to y by a known dependence y=fx1,,xn,

  2. measure the values of these quantities, resulting in measurement results x̃1,,x̃n, and

  3. compute the desired estimate ỹ=fx̃1,,x̃n by applying the algorithm f to the results x̃i of measuring xi.

Computing ỹ is an important particular case of data processing.

1.2 Need to find a model

In many practical situations, we know the function fx1,,xn. For example, in celestial mechanics, we know how the future location y of a celestial body depends on the current location and velocity of this and other bodies. However, in many other practical situations, we do not know this dependence. In such cases, we need to determine this dependence from the experiments and/or observations. Specifically.

  1. In several (K) cases, we know both the values xik of the inputs xi and the value yk of the desired quantity y, and

  2. We need to find the dependence fx1,,xn that is consistent with all these observations, i.e. for which, for all k from 1 to K, we have

(1)yk=fx1k,,xnk.

1.2.1 Terminological comment

  1. The resulting function y=fx1,,xn serves as a model of the corresponding situation.

  2. In statistics, the problem of finding a model is known as regression.

  3. In computer science, the same problem – when solved by an algorithm – is known was machine learning.

  4. In this paper, we use the word “model” in the general scientific sense – as a description of a real-life process, i.e. in this case, as a function f(x1, …, xn) that estimates the desired quantity y. To avoid possible confusion, it should be mentioned that in statistics, sometimes, a “model” means a family of such functions – e.g. all linear functions or all linear functions that depend only on the first k variables x1, …, xk.

1.3 Need to select a model

  1. To describe a general function fx1,,xn, we need to describe infinitely many parameters – e.g. the values of the function at all the tuples x1,,xn for which all the values xi are rational.

  2. Our only requirement on possible functions is to satisfy K equations (1).

Here, the number of parameters much larger than the number of equations. Thus, there are, in general, many different functions that fit all the observations.

We therefore need to select one of these functions, i.e. we need to select a model.

1.4 How a model is selected now: case when we know probabilities

In some cases, we know the probabilities pi of different models. In this case, a reasonable idea is to select the most probable model, i.e. the model whose probability is the largest: pi=maxjpj.

Such a selection is, e.g. one of the main ideas behind the maximum likelihood approach to model selection; see, e.g (Sheskin, 2011). In this method, usually, we maximize the probability p by solving the equivalent problem of minimizing the quantity

L=deflnp.

1.4.1 Comment

It should be mentioned that, strictly speaking, likelihood is not the probability of a model; it is the probability of the data according to this model. To come up with the probability of the model, we need to use Bayesian approach. In this approach, if we assume that a priori all models are equally probable – i.e. that prior distribution is uniform – then likelihood becomes proportional to the probability of the model, so that maximizing likelihood is equivalent to maximizing the model's probability.

1.5 What if we only have partial information about probabilities: description of the situation

Often, we only have partial information about the probabilities. For example, instead of the exact values pi of each probability, we only know the lower bound p¯i and the upper bound p¯i: p¯ipip¯i.

In this case, the only information that we have about the probability pi is that this probability is contained on the interval [p¯i,p¯i]. Thus, this situation is known as interval uncertainty.

1.6 How to make a decision under such interval uncertainty: a natural idea

In situations with interval uncertainty, it is desirable to apply well-traditional probability-based decision making technique. To do this, we need to select, within each of the intervals p¯i,p¯i, one of the values pi, and then select the model with the largest value of this selected probability pi.

1.7 Resulting challenge

How do we select a value pi in each interval? There are many different ways to select, which one should we choose?

1.8 What we do in this paper

In this paper, we show that a natural condition on the selection of the probability values from the corresponding intervals uniquely determine a 1-parametric family of such selections – the only selections that satisfy this natural condition.

2. Main result

2.1 Natural condition: informal description

We want to find a mapping that assigns, to each interval of probability values, a number from this interval. It is desirable to select this mapping so that it preserves important properties of the situation.

In probabilistic techniques, one of the most important notions is the notion of independence. It is therefore reasonable to require that the desired intervals-to-numbers mapping satisfy the following condition: If the two events were independent, then this mapping should preserve this independence.

2.2 Let us formalize this natural condition

If two events with probabilities p1 and p2 are independent, then the probability of them occurring at same time is equal to the product p1p2 of the corresponding probabilities. If for each of these events, we only know the interval [p¯i,p¯i] of possible values of its probability, then possible values of the probability that both events occur is equal to the set of possible values

{p1p2:p1[p¯1,p¯1] and p2[p¯2,p¯2]}.
one can easily check that this set is equal to the interval [p¯1p¯2,p¯1p¯2]; see, e.g. (Jaulin et al., 2012), (Kubica, 2019; Mayer, 2017; Moore et al., 2009). Indeed, for non-negative values pi, the product function p1, p2p1p2 is (nonstrictly) increasing with respect to each of its variables. Thus:
  1. The smallest possible value of this function when pi[p¯i,p¯i] is attained when both inputs are the smallest possible, i.e. when pi=p¯i for both i, and

  2. The largest possible value of this function when pi[p¯i,p¯i] is attained when both inputs are the largest possible, i.e. when pi=p¯i for both i.

Thus, we arrive at the following definition.

Definition.

We say that a mapping f that maps each subinterval p¯,p¯ of the interval 0,1 into a number fp¯,p¯ from this interval is natural if it satisfies the following condition: for all values p¯1p¯1 and p¯2p¯2, we have

f(p¯1p¯2,p¯1p¯2)=f(p¯1,p¯1)f(p¯2,p¯2).

Proposition.

A mapping is natural if and only if, for some α0,1, it has the form

f(p¯,p¯)=p¯αp¯1α.

Discussion.

The function L=lnp is decreasing with respect to p. Thus, when pp¯,p¯, then:

  1. The smallest value L¯ of L=lnp is attained when p is the largest, i.e. when p=p¯:

L¯=lnp¯;
  1. The largest value L¯ of L is attained when p is the smallest, i.e. when p=p¯:

L¯=lnp¯.

For the values L=lnp, L¯=lnp¯, and L¯=lnp¯, the above formula takes the form L=αL¯+1αL¯. Interestingly, this is exactly Hurwicz optimism-pessimism criterion that is used for decision making under interval uncertainty; see, e.g. (Hurwicz, 1951; Kreinovich, 2014; Luce and Raiffa, 1989).

This model selection has been successfully used; see, e.g. (Denœux, 2023).

Proof of the Proposition.

  • 1. It is easy to prove that the above formula leads to a natural mapping. So, to complete the proof, it is sufficient to prove that every natural mapping has this form.

Let fp¯,p¯ be a natural mapping. Let is prove that it has the desired form.

  • 2. For each p, by definition of a natural mapping, the value fp,p belongs to the interval p,p and is, thus, equal to p. In particular, for p = 0, we get

    f0,0=0.

  • 3. Let us first take p¯1=p¯2=0 and p¯1=p¯2=1. In this case, the naturalness condition implies that f0,1f0,1=f0,1. Thus, either f0,1=1 or f0,1=0. Let us consider these two possible cases one by one.

  • 4. Let us first consider the case when f0,1=1.

  • 4.1. In this case, for every a0,1, for p¯1=0, p¯1=1, p¯2=1, and p¯2=1, we get f0,1fa,1=f0,1. Since f0,1=1, this means that fa,1=1 for all a.

  • 4.2. Now, for all possible p¯p¯ for which p¯>0, naturalness leads to

    f(p¯,p¯)=fp¯,p¯f(p¯/p¯,1).

As we have proven in Section 4.1 of this proof, the second factor fp¯/p¯,1 is equal to 1. The first factor fp¯,p¯ is, by Part 2 of this proof, equal to p¯.

So, for all cases when p¯>0, we have fp¯,p¯=p¯.

  • 4.3. For p¯=0, the formula fp¯,p¯=p¯ is also true – by Part 2 of this proof. Thus, this formula holds for all p¯p¯. This corresponds to α = 0.

  • 5. Let us now consider the case when f0,1=0.

In this case, naturalness implies that, for all p¯, we have

f0,p¯=f0,1f0,p¯
and hence f0,p¯=0. Let us now consider intervals for which p¯>0, and thus, for FA=deflnfexpA,1, we always have FA0. In particular, F10.
  • 5.1. Let us first consider the values fa,1 corresponding to a > 0. When a < b, then we have fa,1=fa/b,1fb,1. Since fa/b,1 is a probability, it is smaller than or equal to 1, thus, fa,1fb,1, i.e. fa,1 is a nonstrictly increasing function of a.

  • 5.2. Each value a > 0 can be represented as expA for A=lna. By definition of the natural mapping, each such value fa,1 for a > 0 is greater than or equal to a > 0 and thus, fa,1>0. So, we can take logarithm of these values as well. Let us denote FA=deflnfexpA,1. Probabilities fexpA,1 are smaller than or equal to 1, so

    lnfexpA,11=0

and thus, for FA=deflnfexpA,1, we always have FA0. In particular, F10.
  • 5.3. Let us prove that FA is a (nonstrictly) increasing function.

Indeed, if A < B, then −A > − B. Since expx is an increasing function, we get expA>expB. Since fa,1 is a nonstrictly increasing function of a, we conclude that fexpA,1fexpB,1. Since lnx is an increasing function, we conclude that

lnfexpA,1lnfexpB,1.
Multiplying both sides by −1, we get
lnfexpA,1lnfexpB,1,
i.e. FAFB. The statement is proven.
  • 5.4. For values fa,1, naturalness implies that fab,1=fa,1fb,1. For a=expA and b=expB, we have ab=expA+B, thus,

fexpA+B,1=fexpA,1fexpB,1.
By taking negative logarithms of both sides, we get
(2)FA+B=FA+FB.
FmA=FA++Am times=
(3)FA++FAm times=mFA.
in particular, for m = n and A = 1/n, we get F1=nF1/n, hence
(4)F1/n=1/nF1.
for a general m and A = 1/n, we get Fm/n=mF1/n. Due to (4), we get Fm/n=m/nF1, i.e. Fr=rF1 for all rational number r.
  • 5.6. For every real number x and for every positive integer n, we can take, as mn, an integer part of nx, so that mn ≤ nx < mn + 1. By dividing all parts of this inequality by n, we get mn/nx<mn+1/n. In the limit n, we get mn/nx and mn+1/nx.

By Part 5.3 of this proof, the function FA is nonstrictly increasing, thus Fmn/nFxFmn+1/n. Due to Part 5.5, this means that

mn/nF1Fxmn+1/nF1.

In the limit n, we have

mn/nF1xF1 and mn+1/nF1xF1.

Thus, in the limit, we get xF(1) ≤ F(x) ≤ xF(1), i.e. Fx=xF1.

fexpA,1=expFA=expAF1.
substituting A=lna into this expression, we get
fa,1=elnaF1=elnaF1=aF1.

The condition that fa,1a implies that F11, thus F10,1.

  • 5.8. For every pair 0<p¯p¯, naturalness implies that

f(p¯,p¯)=f(p¯,p¯)f(p¯/p¯).
by Part 2 of this proof, the first factor in this product is equal to p¯. Due to Part 5.7, we get the expression for the second factor, thus we get
f(p¯,p¯)=p¯(p¯/p¯)F1=p¯F1p¯1F1.

This is exactly the desired formula, for α=F1 – limited to the case when p¯>0. Then:

  1. If α = 0, we get the case considered in Part 4 of this proof.

  2. For α > 0 and p¯=0, we have 0αp¯1α=0, and f0,p¯=0 by Part 5, so the desired equality holds for all p¯p¯.

The proposition is proven.

References

Denœux, T. (2023), “Quantifying prediction uncertainty in regression using random fuzzy sets: the ENNreg model”, IEEE Transactions on Fuzzy Systems, Vol. 31 No. 10, pp. 3690-3699, to appear, doi: 10.1109/tfuzz.2023.3268200.

Hurwicz, L. (1951), “Optimality criteria for decision making under ignorance”, Cowles Commission Discussion Paper, Statistics No. 370.

Jaulin, L., Kiefer, M., Didrit, O. and Walter, E. (2012), Applied Interval Analysis, with Examples in Parameter and State Estimation, Robust Control, and Robotics, Springer, London.

Kreinovich, V. (2014), “Decision making under interval uncertainty (and beyond)”, in Guo, P. and Pedrycz, W. (Eds), Human-Centric Decision-Making Models for Social Sciences, Springer Verlag, pp. 163-193.

Kubica, B.J. (2019), Interval Methods for Solving Nonlinear Constraint Satisfaction, Optimization, and Similar Problems: From Inequalities Systems to Game Solutions, Springer, Cham.

Luce, R.D. and Raiffa, R. (1989), Games and Decisions: Introduction and Critical Survey, Dover, New York.

Mayer, G. (2017), Interval Analysis and Automatic Result Verification, de Gruyter, Berlin.

Moore, R.E., Kearfott, R.B. and Cloud, M.J. (2009), Introduction to Interval Analysis, SIAM, Philadelphia.

Sheskin, D.J. (2011), Handbook of Parametric and Nonparametric Statistical Procedures, Chapman and Hall/CRC, Boca Raton, FL.

Acknowledgements

The author is thankful to Thierry Denœux, Hung T. Nguyen, and Nguyen Ngoc Thach, for their encouragement and advice.

This work was funded by the National Science Foundation (No. 1623190) (A Model of Change for Preparing a New Generation for Professional Practice in Computer Science), HRD-1834620 and HRD-2034030 (CAHSI Includes), EAR-2225395 (Center for Collective Impact in Earthquake Science C-CIES) and the AT&T Fellowship in Information Technology. It was also supported by the program for the development of Scientific-Educational Mathematical Center of Volga Federal District (No. 075-02-2020-1478) and the Hungarian National Research, Development and Innovation Office (NRDI).

Corresponding author

Vladik Kreinovich can be contacted at: vladik@utep.edu

Related articles