Comments on: p-values are inconsistent

By: Nick Adams

Nick Adams — Wed, 15 Jul 2020 19:33:20 +0000

The conclusion of this paper is invalid.
First, he argues that a point null hypothesis (say mu=0) and a dividing hypothesis (say mu <0) are simply instances of interval hypotheses where the interval width tends to zero and infinity respectively. This seems reasonable. Then he clearly demonstrates that 2 interval hypotheses are incoherent when one is entirely nested inside the other. Finally, he concludes that this incoherence must then apply to point and dividing hypotheses as well. However, it is not possible to entirely nest one dividing hypothesis within another nor to nest point hypotheses at all. The incoherence only applies to interval hypotheses.
Incidentally, if two dividing hypotheses (two 1-sided tests) are used instead of an interval hypothesis the incoherence disappears.

By: Fran

Fran — Thu, 21 Mar 2013 21:30:56 +0000

So the author make use of p-values in a complete bogus way, gets a complete bogus result and, somehow, it is the p-values fault… nice.

By: wei

wei — Thu, 22 Apr 2010 13:18:01 +0000

this is weird.

I have been taught that p-value is a measure of the incapability between data and hypothesis. It is only used to disprove a (null) hypothesis. We do not accept all the things that Neyman and Pearson had proposed.

I also learned that p-value is a relative measure, with vague quantitative interpretation. Its numerical value is relative to the experimental conduct and to the hypothesis. Numerical comparisons of two p-values are meaningless if the sample sizes of the 2 experiments are different, or if the width of the 2 interval null hypotheses are different.

By: prairiedock

prairiedock — Fri, 05 Mar 2010 15:22:38 +0000

@wjc: Just google for it, using Google Scholar and including the search term ‘pdf’.

By: John

John — Fri, 05 Mar 2010 03:55:50 +0000

In reply to WJC.

WJC: Thanks.

The American Statistician does not make their articles publicly available, so I can’t provide a link. You can access the journal’s archives if you are an ASA member. Also, the article is available via JSTOR; Your library may have access to JSTOR.

By: WJC

WJC — Fri, 05 Mar 2010 03:38:55 +0000

Hi John,

1st off, let me say i’ve enjoy your blogs.
can you provide a link to the article?

By: Will

Will — Thu, 04 Mar 2010 21:20:42 +0000

I think you need to change ‘inconsistent’ to incoherent. One has a clear well defined term and one is entirely subjective and loaded. Can you guess which one you used and which one the author himself used?

By: John

John — Thu, 04 Mar 2010 19:30:18 +0000

In reply to Nqkoi. Nqkio: Yes. Please see the formula for the p-values given here. I've verified the values in the example using this formula.

By: Nqkoi

Nqkoi — Thu, 04 Mar 2010 18:54:14 +0000

Can you elaborate more on the p-values? I get 0.047 for the bigger interval and 0.043 for the smaller one.

By: kav

kav — Thu, 04 Mar 2010 17:41:01 +0000

sorry, should read “in about 20% of the cases, the p-values associated with the tighter range are *larger*”

By: kav

kav — Thu, 04 Mar 2010 17:40:04 +0000

David Stivers:> Sorry, my bad.

“So, first, the known measurement variation (SD=1) is at least 3 times that of the observed population variation…”

I too thought that Schervish’s values may not be representative, so i ran his experiment, this time using many randomly generated ranges and mean points (i.e. the 2.18). I find that in about 20% of the cases, the p-values associated with the tighter range are smaller than those associated with the larger range. So, Schervish’s point holds even for milder values of the mean point.

By: David Stivers

David Stivers — Thu, 04 Mar 2010 16:06:55 +0000

@kav: I”m sorry, I don’t follow.

By: John

John — Thu, 04 Mar 2010 16:03:09 +0000

I've updated the post to link to the expression Schervish uses for his p-value calculation.

By: kav

kav — Thu, 04 Mar 2010 14:43:36 +0000

David Stivers: This is not true!

if you repeat the experiment with random values of
mu_1,mu_2,mu_1′,mu_2′ you will see that
the p-values of the first test are larger than those of the second test about
1 out of 5 times.

Best,

By: EnlightenedDuck

EnlightenedDuck — Thu, 04 Mar 2010 14:38:28 +0000

OK – I’m missing something here….probably because I haven’t had my morning coffee yet and I’m a frequentist. What I want to do is take the difference between the observation (2.18), and the edge of the interval (.5 or .52), and normalize it (divide by 1, in this case). This gives us 1.68 or 1.66. I’m inclined towards 2-tailed tests (since it could be lower, too), giving p-values around 0.1. And yielding more evidence for not being in the tighter interval (not-a-bear), rather than not being in the wider interval (not-a-mammal). So I’m not seeing an inconsistency.

Of course, this completely ignores the lengths of the intervals, so I’m guessing that if I were to treat these as characterizing a (uniform?) prior, I’d get results closer to those of the post….

By: efrique

efrique — Thu, 04 Mar 2010 02:29:41 +0000

Nice counterexample.

By: David Stivers

David Stivers — Wed, 03 Mar 2010 22:16:33 +0000

While I agree with the basic premise that p-values can be misleading or inconsistent, because such a stretch is required to set this up, I don't think that it is a great example of why I should be worried about the issue in practice. Where did these two (hypothetical) intervals come from? Presumably, they represent 95% CI for a population sample of some quantity in the subtype (bears), which was found to have mean 0 and SD 0.255; and in the type (mammals), having mean -0.15 and SD 0.342. So, first, the known measurement variation (SD=1) is at least 3 times that of the observed population variation for either the type or the subtype; not an unimaginable situation if the population estimates were derived from repeated measures, but in that case, we haven't been given the relevant intervals. Second, given either a N(0, 0.255) or N(-0.15, 0.341), the probability of observing 2.18 or greater is close to 0; i.e., you're extremely unlikely to observe 2.18 if measuring an actual mammal.

By: Joseph Delaney

Joseph Delaney — Wed, 03 Mar 2010 22:13:48 +0000

Or, of course, I could have misread the two intervals and look like a fool. My apologies.

By: Joseph Delaney

Joseph Delaney — Wed, 03 Mar 2010 22:12:01 +0000

I am confused by the confidence intervals. It looks like 0.52 > 0.50 (the top of the 95% confidence interval). Is it not the case that all Bears are Mammals? If so, should the smaller confidence interval not be nested inside the larger one?

By: John Myles White

John Myles White — Wed, 03 Mar 2010 20:05:58 +0000

I’m confused: does this anomaly come up because the larger hypothesis interval is skewed further away from the observation than the smaller interval?