Subject: Biost 517 Q&A: HW #6 QUESTION: Using question one as an example of a scientific question, I might report -the mean spd12 for dose 0 with associated CI -the mean spd12 for dose 0.4 with associated CI -the p-value Based on a p-value of <0.05 and non-overlapping CI, I could say the difference is significant, for example. Is it also important to report the difference in means and the associated CI for the difference in means and then remark because the CI of the difference of the means does not cross zero then the difference in means is significant if that is the case? ANSWER: It is very important to report the difference in means and the associated CI for the difference, as that is the measure of treatment effect. As we saw on the midterm, it is not unusual for there to be a trend in the placebo group perhaps due to 1) Aging 2) Seasonal trends (but less important with a 1 year timeframe) 3) Secular trends in diet or other behavior 4) Trends in laboratory measurement error 5) "Hawthorne effect" in which subjects being studied modify their habits 6) Any of a gazillion other possible reasons Furthermore, as I have said (and will continue to say) repeatedly, non-overlapping CIs are a very imprecise way to judge statistical significance of a comparison. While that criterion does serve as "elevator statistics" sometimes, there is no excuse for not doing the proper analysis to obtain precise inference on the relevant measure. I have shown you where we had substantially overlapping CI on problem #4 of the midterm, but a HIGHLY significant P value for the difference. In fact, the CI for one group actually included the point estimate for the other group-- something that could happen with significant differences because the estimates for each group were correlated with each other. The point is that we find a single number that measures the scientific quantity. When looking for the effect of a risk factor, it will usually represent the difference or ratio of some summary measure of the distribution. We then make inference on that single number. (See below for comments about reasons for reporting results for each group as well.) Full inference will include a point estimate of the difference or ratio, a CI for the difference or ratio, and a P value that the true difference might be zero or that the true ratio might be 1 (unless you had reason to test some other null hypothesis). For emphasis: I STRONGLY urge each and every one of you to use the criterion of non-overlapping CIs as a criterion of last resort. First and foremost, you should ask for and obtain inference on the true measure of treatment effect. This includes all tasks you perform in this class, and ought to include all presentations you see in other classes, seminars, research papers, newspaper articles, ... Elevator statistics are worthwhile only when you are standing in an elevator or confronted with a situation where your requests for proper inference will necessarily go unanswered (the scientific literature if you aren't the referee). Then you might want to be able to interpret CI from separate groups in order to make a comparison. But if you want to use this criterion (around me, at any rate), you better be able to remember the ENTIRE rules (I bet you will find it hard to remember all the disclaimers, but on future exams you will be required to establish all the conditions before I will accept this criterion in response to a question): 1) If you have CI computed from INDEPENDENT samples and they do not overlap, the difference between the respective summary measures will be statistically significant at the corresponding level of confidence ON THE SCALE ON WHICH THEY WERE INITIALLY ANALYZED. -- So if you computed CI for means directly, then the difference of means is OK. If you computed CI for log geometric means, then the difference of log geometric means is OK. If you computed CI for the log means (and we do this in Poisson regression models), then the difference of log means is OK. -- This approach does not make any particular statement about what the CI for the difference will have ruled out, nor what strength of evidence there would be in the P value for the difference 2) If you have CI computed from INDEPENDENT samples and the CI for one group includes the point estimate from the other group, the difference (see all the disclaimers above) will not be statistically different from zero at the corresonding level of confidence. -- Again, this approach does not make any particular statement about what the CI for the difference will have ruled out, nor what strength of evidence there would be in the P value for the difference. You would be unable to interpret a "negative" study (see my lecture slides on the importance of CI relative to P values: using this approach we can say, for instance, P > 0.05, but nothing else). 3) In any other setting, you can say "I haven't a clue". -- These other settings include . some overlapping CI from independent samples . all CI from correlated samples There is a HUGE difference between "I don't know whether or not the difference is statistically significant" (which is what we have to say under the third rule) and "We know it is not statistically significant". I think that anyone who does a study and merely gives the "I don't know whether..." response needs to be fired, and, if they are using NIH funds, refund me (at least) my tax money. I do note that if you can figure out how the CI were computed, and if that method involved a normally distributed statistic, then we can often figure out what the SE was and be able to compute the SE for the difference, providing the CI were from independent samples. This is what I usually do when confronted with problems in the scientific literature, but since I no longer carry a calculator (and contrary to the stereotype, even in my days in physics or in an engineering school, I never carried a slide rule except to exams), I cannot usually do this on an elevator. All of that having been said, as a general rule, I do think it relevant to present for each group: -- mean, SD (not SE), min, max, percent with response above some scientifically meaningful threshold (all of these allow us to assess possible individual toxicity) -- CI for the summary measure within each group -- Perhaps: A P value for the within group response IF the summary of response within each risk group is some sort of change over time or place. I will not make too much of this due to all the possible reasons for trends in the placebo group. I note that a nonsignificant change in the placebo group and a significant change in the treatment group can still lead to a nonsignificant difference between the two groups. I have seen way too many erroneous (and potentially harmful) reports in the scientific literature that merely look for nonsignificance in the Placebo group and then overinterpret a significant result in the new treatment group, when the proper analysis says that there was no significant difference between the groups. To avoid anyone making this mistake, I sometimes suppress the inference that is irrelevant to the primary question. I note that editors will often not let you present all that many aspects of the inference, in which case the descriptive statistics for each group (mean, SD, range, etc.) should be presented in a table somewhere, but the CI and P values for the groups can safely be omitted. And as I have stressed in class, presenting the point estimate, the CI, and the full P value is important. Merely stating "the difference is statistically significant" does not allow a reader with more stringent or less stringent standards of evidence to evaluate your results. I do note that it is rarely useful to present a P value to more than 4 decimal places, and often two suffice. Scott