Subject: Biost 517 Q&A: HW #6

QUESTION:

Using question one as an example of a scientific question, I might report

 	-the mean spd12 for dose 0 with associated CI
 	-the mean spd12 for dose 0.4 with associated CI
 	-the p-value

Based on a p-value of <0.05 and non-overlapping CI, I could say the difference 
is significant, for example.

Is it also important to report the difference in means and the associated CI 
for the difference in means and then remark because the CI of the difference of 
the means does not cross zero then the difference in means is significant if 
that is the case?

ANSWER:

It is very important to report the difference in means and the associated CI 
for the difference, as that is the measure of treatment effect. As we saw on 
the midterm, it is not unusual for there to be a trend in the placebo group 
perhaps due to
 	1) Aging
 	2) Seasonal trends (but less important with a 1 year timeframe)
 	3) Secular trends in diet or other behavior
 	4) Trends in laboratory measurement error
 	5) "Hawthorne effect" in which subjects being studied modify their
 	   habits
 	6) Any of a gazillion other possible reasons

Furthermore, as I have said (and will continue to say) repeatedly, 
non-overlapping CIs are a very imprecise way to judge statistical significance 
of a comparison. While that criterion does serve as "elevator statistics" 
sometimes, there is no excuse for not doing the proper analysis to obtain 
precise inference on the relevant measure. I have shown you where we had 
substantially overlapping CI on problem #4 of the midterm, but a HIGHLY 
significant P value for the difference. In fact, the CI for one group actually 
included the point estimate for the other group-- something that could happen 
with significant differences because the estimates for each group were 
correlated with each other.

The point is that we find a single number that measures the scientific 
quantity. When looking for the effect of a risk factor, it will usually 
represent the difference or ratio of some summary measure of the distribution. 
We then make inference on that single number. (See below for comments about 
reasons for reporting results for each group as well.) Full inference will 
include a point estimate of the difference or ratio, a CI for the difference or 
ratio, and a P value that the true difference might be zero or that the true 
ratio might be 1 (unless you had reason to test some other null hypothesis).

For emphasis: I STRONGLY urge each and every one of you to use the criterion of 
non-overlapping CIs as a criterion of last resort. First and foremost, you 
should ask for and obtain inference on the true measure of treatment effect. 
This includes all tasks you perform in this class, and ought to include all 
presentations you see in other classes, seminars, research papers, newspaper 
articles, ...

Elevator statistics are worthwhile only when you are standing in an elevator or 
confronted with a situation where your requests for proper inference will 
necessarily go unanswered (the scientific literature if you aren't the 
referee). Then you might want to be able to interpret CI from separate groups 
in order to make a comparison. But if you want to use this criterion (around 
me, at any rate), you better be able to remember the ENTIRE rules (I bet you 
will find it hard to remember all the disclaimers, but on future exams you will 
be required to establish all the conditions before I will accept this criterion 
in response to a question):

1) If you have CI computed from INDEPENDENT samples and they do not overlap, 
the difference between the respective summary measures will be statistically 
significant at the corresponding level of confidence ON THE SCALE ON WHICH THEY 
WERE INITIALLY ANALYZED.
-- So if you computed CI for means directly, then the difference of means is 
OK. If you computed CI for log geometric means, then the difference of log 
geometric means is OK. If you computed CI for the log means (and we do this in 
Poisson regression models), then the difference of log means is OK.
-- This approach does not make any particular statement about what the CI for 
the difference will have ruled out, nor what strength of evidence there would 
be in the P value for the difference

2) If you have CI computed from INDEPENDENT samples and the CI for one group 
includes the point estimate from the other group, the difference (see all the 
disclaimers above) will not be statistically different from zero at the 
corresonding level of confidence.
-- Again, this approach does not make any particular statement about what the 
CI for the difference will have ruled out, nor what strength of evidence there 
would be in the P value for the difference. You would be unable to interpret a 
"negative" study (see my lecture slides on the importance of CI relative to P 
values: using this approach we can say, for instance, P > 0.05, but nothing 
else).

3) In any other setting, you can say "I haven't a clue".
-- These other settings include
 	. some overlapping CI from independent samples
 	. all CI from correlated samples

There is a HUGE difference between "I don't know whether or not the difference 
is statistically significant" (which is what we have to say under the third 
rule) and "We know it is not statistically significant".
I think that anyone who does a study and merely gives the "I don't know 
whether..." response needs to be fired, and, if they are using NIH funds, 
refund me (at least) my tax money.

I do note that if you can figure out how the CI were computed, and if that 
method involved a normally distributed statistic, then we can often figure out 
what the SE was and be able to compute the SE for the difference, providing the 
CI were from independent samples. This is what I usually do when confronted 
with problems in the scientific literature, but since I no longer carry a 
calculator (and contrary to the stereotype, even in my days in physics or in an 
engineering school, I never carried a slide rule except to exams), I cannot 
usually do this on an elevator.

All of that having been said, as a general rule, I do think it relevant to 
present for each group:
-- mean, SD (not SE), min, max, percent with response above some scientifically 
meaningful threshold (all of these allow us to assess possible individual 
toxicity)
-- CI for the summary measure within each group
-- Perhaps: A P value for the within group response IF the summary of response 
within each risk group is some sort of change over time or place. I will not 
make too much of this due to all the possible reasons for trends in the placebo 
group. I note that a nonsignificant change in the placebo group and a 
significant change in the treatment group can still lead to a nonsignificant 
difference between the two groups. I have seen way too many erroneous (and 
potentially harmful) reports in the scientific literature that merely look for 
nonsignificance in the Placebo group and then overinterpret a significant 
result in the new treatment group, when the proper analysis says that there was 
no significant difference between the groups. To avoid anyone making this 
mistake, I sometimes suppress the inference that is irrelevant to the primary 
question.

I note that editors will often not let you present all that many aspects of the 
inference, in which case the descriptive statistics for each group (mean, SD, 
range, etc.) should be presented in a table somewhere, but the CI and P values 
for the groups can safely be omitted.

And as I have stressed in class, presenting the point estimate, the CI, and the 
full P value is important. Merely stating "the difference is statistically 
significant" does not allow a reader with more stringent or less stringent 
standards of evidence to evaluate your results. I do note that it is rarely 
useful to present a P value to more than 4 decimal places, and often two 
suffice.

Scott