Does additional data always reduce posterior variance?

A discussion over lunch today brought up the fact that additional data does not always decrease the size of a confidence interval. This post will look at this from a Bayesian perspective.

In general, new information reduces your uncertainty regarding whatever you’re estimating. The posterior distribution becomes more concentrated as more data are collected.

That’s what happens “in general” but does it necessarily happen every time you get new data? Conceivably if you get surprising data, data that is very unlikely given your current prior, posterior uncertainty might increase.

Beta-binomial model

To show that this is the case, suppose the probability of success in some binary trial has parameter θ and that θ has a beta prior. You could imagine this prior to be the posterior after having made some number of previous observations. Can a new observation increase the posterior variance in θ? If so, under what conditions?

The variance of a beta(a, b) random variable is

ab / (a + b)²(a + b + 1).

After observing a successful trial, the posterior distribution on θ is beta(a + 1, b). We can calculate the ratio of the posterior variance to the prior variance and ask under what circumstances, if any, the ratio is greater than 1.

If 2a ≥ b the posterior variance will be strictly less than the prior variance. This says if the prior mean odds against a success are no more than 2 : 1, observing a success will reduce the variance. (So will observing a failure.) But for any value of b, you can find a small enough value of a that observing a success will increase the variance.

Normal-normal model

Whether an observation can increase the posterior variance depends on the data model. If your data have a normal likelihood function with known variance and a normal prior on the mean θ, the posterior variance is always less than the prior observation, and it reduces by the same amount, independent of the observation x. If x is very unlikely a priori then it will pull the posterior mean toward itself more than an observation that is more concordant with the prior would have, but the change in the posterior variance is the same.

Proof of beta theorem

Here is a proof in Lean 4 of the statement above that if 2a ≥ b the posterior variance will be strictly less than the prior variance.

import Mathlib

set_option linter.style.header false

noncomputable def f (a b : ℝ) : ℝ := a * b / ((a + b) ^ 2 * (a + b + 1))

theorem f_ratio_lt_one' (a b : ℝ) (ha : 0 < a) (hb : 0 < b) (hab : b ≤ 2 * a) :
    f (a + 1) b / f a b < 1 := by
  have hs : 0 < a + b := by linarith
  have h2ab : 0 ≤ 2 * a - b := by linarith
  have hprod : 0 ≤ (a + b) * (2 * a - b) := mul_nonneg hs.le h2ab
  -- key polynomial inequality (∗)
  have key : (a + 1) * (a + b) ^ 2 < a * ((a + b + 1) * (a + b + 2)) := by
    nlinarith [hprod, ha]
  -- nonzero facts needed to clear denominators
  have ha' : a ≠ 0 := ne_of_gt ha
  have hb' : b ≠ 0 := ne_of_gt hb
  have hs' : a + b ≠ 0 := ne_of_gt hs
  have hs1' : a + b + 1 ≠ 0 := by positivity
  have hs2' : a + b + 2 ≠ 0 := by positivity
  have ha1' : a + 1 ≠ 0 := by positivity
  -- express the ratio as a single closed-form fraction
  have hratio : f (a + 1) b / f a b
      = ((a + 1) * (a + b) ^ 2) / (a * ((a + b + 1) * (a + b + 2))) := by
    unfold f
    have e : a + 1 + b = a + b + 1 := by ring
    rw [e]
    field_simp
    ring
  rw [hratio, div_lt_one (by positivity)]
  exact key

2 thoughts on “Does additional data always reduce posterior variance?”

Ian D-B

4 July 2026 at 05:31

For small steps — i.e. continuous learning — uncertainty rising with a signal doesn’t require anything extreme — it will occur any time the signal takes you in the same direction as the third moment. With a positive third moment, a positive surprise in the signal raises uncertainty and a negative surprise reduces it, and with a negative third moment those signs reverse.

The intuition is kind of obvious (positive third moment means longer right tail than left, so a positive surprise makes the longer tail more likely). The fact that the third moment is actually the right number is somewhat surprising.

Two references:
Continuous time: https://dew-becker.org/documents/beliefs.pdf (proposition 1 and theorem 1)
Single update: https://ieeexplore.ieee.org/document/9925240/ (proposition 7).

The proof discrete version of the result is just a few lines of algebra, and the gain in the continuous case is also easy to derive (the drifts are more tedious), but somehow these facts aren’t so well known.
Jonathan Rougier

5 July 2026 at 02:17

Hi John, I find this easiest to explain with a mixture model. Suppose you are almost certain (eg p=0.99) that X comes from a distribution with a large expectation and a small variance, but you do not rule out (p=0.01) that it comes from a distribution with a small expectation and a large variance. So the prior variance is small. But if the observed value of X is small then the posterior probability concentrates on the second component, and the posterior variance is larger than the prior variance.

As you note, this outcome involves a surprise.

Comments are closed.