homepage link

WHI FDA Testimony about Estrogen-Progestin

    Summary and Excerpts
    FDA Advisory Committee Meeting October 7, 2003
    Women's Health Initiative

    (WHI 2002 and Subsequent Papers)

    Timothy Bilash MD MS OBGYN
    May 2005
    (WHI VII)


    The following is what WHI and Industry researchers said about the WHI Prempro Study (July 2002) in testimony to the Food & Drug Administration (FDA).

    It is my opinion that the difficulties inherent in interpreting the WHI data have not been adequately communicated to the public. Please examine the minutes of the meeting that these excerpts are taken from, the slides, and also the research papers themselves. Text styling has been added.

    TDB May 2005

    <Jump to Excerpts from the Testimony>
    <Jump to Breast Cancer Hypothesis>


    Endocrinologic and Metabolic Drugs Advisory Committee Documents for October 7, 2003
    Transcript of the Meeting
    Slides from the Meeting
    WHI References

  3. Excerpts from testimony to the committee

    1. "We all are concerned with the fact that media publicity has resulted in symptomatic women who could benefit from hormone replacement therapy [not getting it]." (Archer)

      1. "We know that hormone therapy improves symptoms and the quality of life for these women. I think we're all concerned that the media has characterized hormone therapy as harmful to women, particularly in cardiovascular disease and breast cancer. I believe the scientific community and physicians realize that the relative risk numbers are often high, but the attributable risks in the community is a different issue." [ie, is a low risk]

      2. That relative risk especially when presented as percentages can easily be misunderstood by people who do not work with them on a regular basis. A change from 2:10 to 1:10, and a change from 2:1000 to 1:1000, have the same relative change but a vastly different meaning. That's a simple statement but it's one that I recurrently see a problem with in looking at editorials and the popular and the lay press information on this topic. I wanted to take this opportunity to stress it. (Stadel)

      3. [ANOTE] The comparisons of rates such as Relative Risks is extremely tenuous. Rates cannot be added or arithmetically compared, only numbers. Rates (Relative and Absolute Risks) can be compared only when the demominators are comparable ("Common Denominators").

    2. WHI was designed to study "whether hormone therapy is a suitable treatment in older women to prevent chronic diseases." (Rossouw)

      1. [ANOTE]:

        WHI was designed to study two diseases, Coronary Heart Disease and the expected possible adverse Outcome of Breast Cancer. It in effect studied whether hormone therapy is suitable in older women (65) who would already have a higher rate of chronic disease compared to younger perimenopausal women (55), but in fact showed a lower rate than would be expected for their ages at study entry, with essentially a null result in those patients. The baseline rate of disease for patients in the study was much lower than the general population, consistent with undetected disease at entry.

        Using the term "treatment" confuses the point. Primary versus secondary and prevention versus treatment is ambiguously defined. A primary prevention study should determine effects on patients who do not already have disease. Secondary prevention versus treatment in those who already have a disease is an unclear distinction. It has a different meaning when applied to Coronary Heart Disease and Breast Cancer, in both clinical and statistical aspects.

    3. "Any number of complex clinical and scientific issues remain unanswered by the WHI study or indeed are raised in its aftermath." (Orloff)

    4. "The medical team at Wyeth reviewed and discussed the WHI data at great length, internally with the investigators, with the NIH team, with the FDA. We acted on the data last year by amending the label and supporting dissemination of the WHI data. You'll see how we did this too. But the medical team doesn't agree fully with some of the broad interpretations of the data, particularly some of the statements about the application of the data to all clinical practice especially some of the subgroup analyses." (Camardo)

      1. (3992S1_08_Wyeth-Camardo.ppt)

      2. (3992S1_08_Wyeth-Camardo.ppt)

    5. "Risks of using hormones seem bigger than we originally thought" (Foegh)

      1. WHI results are not consistent with observational data available.
      2. The WHI measures only a sample of possible clinical parameters and outcomes. Fracture improvement is essentially ignored. There is clear evidence that incredibly low doses of estrogen are effective on bone, but those doses are lower than needed for other effects. It is difficult to provide an overall summary index to decide up or down risk-benefit, especially because the Outcomes are so highly age-dependent.

    6. "We think it's only fair to bring that note of caution into the interpretation of these data." (Stefanick)

      1. "We did have relatively high discontinuation rate for pill taking." (Stefanick)

      2. "You see that over the course of time an increasing number of women were coming off the pills in both the placebo group [and the active group]... But also there were an increasing percent of women going on estrogen and progestin [in the placebo group over time]. So that what you see below is the women who are coming off the pills here as a substantial portion of them were going on exactly the same medication but open label with their own physician and twice as many women in the placebo group were falling into that category. (Stefanick)

      3. When we actually look at the highly compliant women, the risk attributed to these hormones is even greater. (Stefanick) Breast Cancer showed an earlier departure deviation of the two curves in the compliance data. (Cheblowski)

      Results stating that compliant women (censored greater than 6 months after non-compliance) had higher CHD rates ignores a possibility that stopping hormones increases immediate CHD risk. There is evidence from other CHD studies that stopping NSAIDS increases CHD risk, for example. Study findings (except for Stroke increase (a rare event), and Bone Fracture benefit (a common event) were essentially consistent with chance for CHD and the subgroups for WHI.

    7. Intention to Treat

      1. "All the analyses we present are based on intention to treatment. That means that every women randomized is analyzed and included the analysis in the arm in which she was randomized regardless of whether she stayed with that arm. Even our sensitivity analysis looking at adherents do not cross the women over. .They just censor her data at the time she becomes non-adherent. This is the best in terms of preserving the ideal quality of a randomized trial." (Anderson)


        This becomes a complication if the reason for censoring is correlated somehow to risk or outcomes over time, and not a random event, especially if the percent censored is high. The switch from Estrogen to Estrogen plus Progestin in early years, and the large jump in censored patients in Outcome Year4 are non-random events that appear to be important. (For example, if a woman in the Hormone group with no apparent breast disease and no cardiac disease stops taking hormones when she hears the announcement of increased cardiac risks, or a woman in the Placebo group starts taking hormones because she feels she doesnt know she has heart disease and feels she isnt at risk, or a woman unknowingly on Placebo sees no improvement on Placebo and restarts hormones).

        The study claims to be blinded, but patients taking HRT are generally aware of its effects. Essentially, there are problems with compliance that compromise the exposure/ non-exposure to hormone that the randomization label implies.

    8. Global Index

      1. "We cannot agree with this global index approach because we believe it [to be biased]. In offering the Global Index hazard ratio, the WHI investigators attempted to estimate overall benefit versus risk. Although this concept is potentially useful from a public policy perspective, it falls short as guidance for care of individual patients." (American College of Obstetricians and Gynecologists Testimony Statement)


          The WHI study was designed to evaluate the use of a Global Index for harm or benefit based on correlation with the individual outcomes (a weighted average). The Global Index was not intended to overide the outcome results when they differed (see Subgroup analysis above).

    9. Data Adjustment and Corrections

      1. Stopping Criteria

        1. Monitoring Benefit: (Anderson)
          1. (3992S1_05_Anderson.ppt)

          2. CHD benefit used a standard procedure that looks like the upper tail of .05 level test with .025-level one-sided test.
            1. a 0.05 level two-sided test
            2. adjustment for multiple looks over time , the traditional O'Brien-Fleming procedure boundary
            3. not corrected for multiple endpoints, the Bonferroni correction
            4. This is exactly the same in many trials used for a single endpoint.
          3. Global Index benefit looked at the .05-level one-sided test
            1. a 0.10 level two-sided test
            2. no adjustment for multiple looks over time
            3. not corrected for multiple endpoints
            4. Global index "was to be clearly weighing on the side of overall benefit to stop this trial."

        2. Monitoring Harm: (Anderson)
          1. (3992S1_05_Anderson.ppt)

          2. Breast Cancer harm used p=.050 level one-sided test
            1. Breast Cancer was our primary safety endpoint. There were prior data suggesting that this might be a problem so we defined a monitoring boundary for it alone not adjusted [corrected] for multiple endpoints.
              1. "Because we were interested in proving harm to the same degree of precision as you might want for benefit, the stopping level was a .05-level (one-sided test) equivalent to the 0.10 percent type one error [two-sided] again adjusted with O'Brien-Fleming procedure for multiple looks over time."
              2. not corrected for multiple endpoints, the Bonferroni correction (other harms were corrected)
          3. Other harms
            1. "We also defined similar stopping boundaries for all the other designated monitored endpoints of CHD, Stroke, PE, Hip Fracture, Colorectal Cancer, Endometrial Cancer and Death from other causes... These use the same .05-level tests [one-sided] but it was corrected with a conservative Bonferroni correction because we were looking at all those multiple endpoints and didn't want to inflate our type one error by looking at too many endpoints at once." (Stefanick)
          4. Global Index harm used p=.050 level one-sided test (no correction or adjustment), was required for stopping for harm with one of the others."

        3. Those are our monitoring boundaries. If that boundary were crossed and also the global index which was supportive of harm was a Z-statistic less than minus one. So one standard deviation below the null-hypothesis, would stop for harm. It was the breast cancer boundary and the global index boundary for harm that were crossed last spring [2002]. (Anderson)

      2. The data were adjusted to take into account the fact that we did look at the data every six months for monitoring purposes [O'Brien-Fleming procedure] and we did look at multiple endpoints [Bonferroni correction]. (Stefanick)

      3. [ANOTE]

        This stopping criteria for harms was half as stringent as for benefits (0.10vs 0.05 two-sided).

        Benefits required a .025 significance, while harms required only a .050 significance (both one-sided uncorrected). This means that the statistics of the study were biased in favor of stopping the study more easily for harms than benefits. The stopping boundaries for harm used a less stringent (Z = -1) standard deviation level. The nominal Z-value, or the p= 0.5 level test, is Z = 1.96 or two standard deviations from the mean.

        The exception was that both Global benefit and harm used p=.050, Global benefit was uncorrected and Global harm corrected, which still biases the Global Index stopping point towards benefit instead.

        The E+P study was stopped for an indicated monitoring Breast Cancer harm which was not supported by Outcome analysis of the data (HR confidence interval included 1.0 even when uncorrected, not significant).

        Subsequent papers marginally indicate significance when Confidence Intervals were revised: Outcome results were re-evaluated and changed, with questionably supported decisions about weighted vs unweighted results (monitoring vs subgroups, Breast Cancer).

  4. Timeline of Intervening Events

    1. December 1995 (3rd Recruiting Year):
      1. Initially there were a small group of women with a uterus (331/8435 or 4% of EP Group) who were initially assigned to Estrogen in a three-way randomization of Estrogen(E), Estrogen-Progestin(EP), or Placebo(PL). When the PEPI results came out indicating the increase in endometrial pathology with unopposed estrogen, the estrogen only (E-only) arm was discontinued and these women were converted to combination therapy (E -> E+P), with Outcomes included in the E+P Outcome Group. (Dr. Stefanick)

    2. April 2000 (average of 3 Outcome Years, 2 years after Recruiting stopped):
      1. The Data Safety Monitoring Board (DSMB) "requested that the investigators inform the women after most of them had completed two years of the trial that... there was actually an increase in the number of heart attacks, strokes and blood clots in the lungs and the legs in the women receiving active hormones compared to women taking placebo. So all the participants in the hormone trial were alerted. (Dr. Stefanick)

    3. June 2001 (average of 4 Outcome Years, 3 years after recruiting stopped):
      1. "The DSMB required that we inform the women that... excess cardiovascular events persisted in the active hormone group compared to the placebo." (Dr. Stefanick)
      2. (3992s1_03_Stefanick.ppt)

    4. May 2002 (average of 5 Outcome Years):
      1. "The NHLBI accepted the DSMB recommendation to [prematurely] stop the E+P trial after an average of 5.2 years because the risks exceeded the benefits based on the monitoring rules." (Dr. Stefanick)
      2. (3992s1_03_Stefanick.ppt)

      3. [ISLT] Study closed 3 years early, so Outcome Years 5 and above are missing the later recruited patients which effectively censored them.

  5. Outcome Adjustments

    1. [ANOTE] Suggestions for Interpreting the Statistical Outcome Results in the WHI

      To help interpret the clinical significance of the WHI findings, consider the statistical characteristics of each each statistical test and the justification for using them in that situation (terms here created for clarity):

      1. significance-level used (p-level .050 vs .025)
      2. two- vs one-sided statistical test performed. Note the problem created by using a one-sided test for significance. One-sided testing assumes that the effect is only in one direction (greater rather than less, or less rather than greater) and makes the test for significance less stringent. When a harm is seen instead of an expected benefit, the one-sided test is more likely to see an effect that isnt there (type1, false-positive error) and less likely to see an effect that is there (type2, false-negative error). The WHI study used one-sided tests at p=0.050/0.025, equivalent to two-sided tests at p=0.100/0.050.
      3. unweighted vs weighted (with type of weighting scheme and how it affects the outcome and is used to determine summary statistics like hazard ratios and interpret confidence intervals)
      4. covariant-nominal (uncorrected for testing of multiple outcomes/ endpoints) vs covariant-corrected (corrected for testing of multiple outcomes/ endpoints/ Bonferroni correction)
      5. monitoring-nominal (unadjusted for multiple looks over time/ monitoring) vs monitoring-adjusted (adjusted for multiple looks over time/ monitoring/ O'Brien-Fleming procedure)
      6. In general, would expect that the weighted/unweighted, corrected/uncorrected, adjusted/unadjusted Confidence Intervals give similar significance. This is not the case for the WHI results.

    2. In the E+P trial results (July 2002) we published both the nominal (unadjusted) confidence intervals for the hazard ratios for each of the primary events and 1) very conservative adjustments based on the sequential monitoring (O'Brien-Fleming procedure) and 2) the corrections for multiple outcomes (Bonferroni correction]. (Stefanick)
      1. (3992S1_05_Anderson.ppt)

    3. Weighting:

      1. "Primary analyses of all of our clinical endpoint data [monitoring of stopping points] is based on a weighted log rank statistic ... That plays into both the analysis and it also play[s] into the monitoring plan. Dr. Rossouw gave us a nice summary trying to understand where we were when we designed this trial. It was a prevention trial. In developing our monitoring plan which the development has been published back in 1996, we were thinking of the issue of benefits and risks with CHD being a benefit that was at that time considered so obvious that the question was 'Could we really ethically continue this trial when the benefits might accrue by year three in the study when we knew the breast cancer results might take a fair amount of time to see'." (Stefanick)


        Weighted Outcomes were used for all monitoring results. For Subgroups, CHD and Breast Cancer were considered Primary Outcomes by the researchers, with weighted results used in some places (Breast Cancer monitoring and Hazard Ratios), but unweighted in others (Breast Cancer P-values).

      2. (3992S1_05_Anderson.ppt)

      3. For Breast Cancer , "any differences you see in the early period are more likely to be due to a random occurrence than to be a true treatment effect. We actually had very good observational data to say that the effect of hormones on breast cancer may take a considerable amount of time to be fully manifested... The differences observed in the first year or so would have very little weight but increasing [10 years linearly] over time [ purple line in weighting graph]. Differences at year 10 and beyond would have full weight.That was the weighting scheme for cancer and also for mortality or global index calculations." (Anderson)

      4. For CVD and Fractures "the data were not so clear. In fact, the observational data tended to suggest that it was current use of hormones that was protective for CHD. Nevertheless a lot of the hypothesis came through the intermediate effects of lipids which though that might be rather immediate but its translation into a clinical impact could take some time. After quite a bit discussion, we used a three-year weighting period [yellow line in weighting graph]. By the time we got to the three years any events occurring after that would receive full weight." (Anderson)

      5. For subsequent subgroup analysis [not the monitoring] we used "unweighted hazard ratios which is a bit of awkwardness given that the trials were based on the weighted design, the weights over time. I would say that this was a compromise that we made based on the fact that we were completely wrong about our CHD findings. The assumptions underlying that design were wrong. We didn't reach the full preventive effect by year 3... Then what do you do with the weights? Mostly when you don't have an idea of a time to effect you would do an unweighted type of statistic. We do provide unweighted hazard ratios and then associated with those, nominal and adjusted confidence intervals." (Anderson)

      6. "Particularly for Breast Cancer, we also in some places showed the weighted P-values, P-values from the weighted analyses, because there is a discrepancy in the interpretation at points when you take the weights into account and when you don't. To be fair, the design and the analysis for these endpoints did always indicate that we would use weighted analyses. (Anderson)

      7. [ANOTE]

        Unweighted results were used to determine Hazard Ratios, but Weighted results were used to determine Statistical Significance. I am unaware of the validity in Survival Analysis of quoting Significance (P-values) based on weighted data, for Ratios based on unweighted data.

        Concern that breast cancer results in early periods are probably a random occurence is perplexing. WHI insisted that they stopped the study because the data for later years was less accurate, yet here state that first years should ideally be down-weighted linearly from 10% in the first year, only counting the later years at full weight.

        The weighted hazard ratios de-emphasize early years, so using weighted results results for CHD stopping would ignore the early years (where the assumptions underlying the weighting were completely wrong). Using unweighted results includes the early years equally for Breast Cancer stopping (where the early finding is more likely to be due to random chance).

        Dr. Anderson states that when you don't know the time-to-effect you would use the unweighted, but Breast Cancer results used weighted significance for the Hazard Ratios.

        In the case of unknown time to effect where the effect is getting bigger over time and the weights increase over time, would expect the weighted results to show a better significance, not worse. Weighted Breast Cancer shows a worse significance over time, probably due to the large number of dropouts.

        CHD Subgroup Outcomes used unweighted results entirely.

  6. Subgroup Analysis

    1. July 2002 Prelliminary Results and Reference (3992s1_03_Stefanick.ppt)

    2. Subsequent Paper References (3992s1_03_Stefanick.ppt)

    3. Subgroup analyses "are much more difficult to interpret statistically... We've developed our own WHI sort of policy for how we'll interpret them. It is that our inference will be based primarily on the test of interaction. The trial was not designed to test this specific hypothesis within each subgroup so we acknowledge that those specific subgroup tests within themselves are low powered. That means we have a high type two error. We also have a high type one error. We've looked at many subgroup analyses [number = 36]. It's possible to find some that are significant by chance alone." (Anderson)

      1. (3992S1_05_Anderson.ppt)

      2. To minimize this as best we can, our inference is primarily based on those tests of interaction. Then we report unadjusted P-values and we say that these should be considered as hypothesis generating, not testing. Then we have asked each author of each paper to report the number of interactions they tested and to report the number that would be expected to be significant by chance alone. We feel that it is a reasonable approach to this area which is really very exploratory. (Anderson)

      3. "For the interaction test, I mostly showed you both weighted and unweighted, but I have to say that in developing the protocol and all that, we never talked about how we would do interaction tests. It's not clear to me whether the weighting that we defined for the primary endpoint comparisons is the right weight to use for interactions." (Anderson)

      4. No adjustment for multiple testing is used for any subgroup results [O'Brien-Fleming procedure]. The Subgroup results are to be considered "hypothesis generating". (Anderson)

      5. "I want to sound a real note of caution for those year-by-year analyses. The first year comparison is a randomized comparison because everyone who is randomized goes through that first year and has an event and is counted. The second year becomes a woman who didn't have an event in the first year. That becomes the denominator. So there are survivor issues. The farther out you go on that timeline the worse it is." (Anderson)

      6. As soon as we start to make subgroup inference [about prior hormone use for example], "we've left the framework of a randomized trial. We're now starting to talk about an observational study." (Anderson)

      7. "In addition, we have lack of adherence that starts to feed into that in a big way and later on. So looking at 'randomized comparisons' in those later years in a year-by-year fashion is dangerous territory and I wouldn't want to make much inference about that year six data." (Anderson)

      8. Discussion about statistical power of the interaction analysis:

        1. DR. BONE: One of the points that was made is that rather than look at the individual groups in some cases, there was a test for whether there was an interaction. We saw P-values of about 0.1 in many cases that were displayed. When we talk about a hazard ratio of 1.2 versus a hazard ratio of 1.0, was there actually testing of the power of this test of the interaction term to detect a true difference?

          DR. ANDERSON: No.

          DR. WOOLF: Can I follow up on Dr. Bone's question?. Does a failure to do a power analysis say anything about the validity of the interaction's statistics?

          DR. ANDERSON: A power analysis asks "What's the probability of finding an effect if there is a true one of a certain size?". So in an interaction test, it's rather challenging to ask what the power is for something like that.. We have to acknowledge that there are few women when you cut up the data so finely.. To address that, we tended to do those interactions with a continuous variable instead of dicing it up into little cells.. We just did it continuously and still didn't find anything.. Yes, we don't have great power in some of these.. I would not want to hazard a guess of what the power would be, but this is the best data that we're going to have on that.. These data pretty much stand for themselves.

          DR. FOLLMAN: Just a comment on the power analysis issue, Dr. Anderson's exactly right. We don't have good power for these tests of interaction. That's just the way clinical trials are designed in a way. You design it to ask the main question and by definition, you essentially don't have good power for the interaction. So they give you some comfort if there is not interaction, but it's understood that there's not a lot of power for it.

      9. [ANOTE]

        In summary, the tests of interaction for subgroups have low power and high error rates for detecting interaction, which clouds any results coming out of the WHI for subgroups.

        One important aspect of the data which is not so clear is which years are the most statistically accurate. The statistical significance increases with outcome numbers over time, but also decreases from dropouts over time. The medical effect itself may also increase over time.

        Although more CHD events were found in year one, fewer breast cancer events were found in year one. This is significant, as the ratio of Breast Cancer to CHD is age dependent (See discussion below).

    4. Age and Health Status

      1. "Young women between the ages of 45 to 55 who are peri or post menopausal and are symptomatic are a different class of women than those reported in the WHI... there is very low risk for these women who are younger and in good health of developing significant adverse events particularly those related to the cardiovascular system. We feel that this underscores the fact that consumers really apply the results of what's published in the media to themselves inappropriately." (Archer)


        1. Studies about these Outcomes must control for 1) age (50-55 vs 65 or older), 2) prior menopausal hormone use, and 3) previous oral contraceptive use in these studies, especially for breast cancer findings.

        2. CHD and Breast Cancer Incidence (and reasons for dropping out) changes with age. Women are more likely to be older in the WHI population. There is a lower percentage of CHD and Total Incidence in a younger group, with a higer percentage of CHD and Total Incidence in a older group, compared to Cancer and Other events. The Hazard Ratios would highly age dependent, because they are an average of these age-specific rates.

          If hormones prevented Breast Cancer in younger women more than it prevented CHD in older ones, it would be difficult to detect this from an average of the age-specific event rates.

      2. 70% of WHI study patients were obese, with a lower previous CVD incidence at baseline than would be expected in the general population. (Rossouw)

        1. "The mean BMI was 28.5... just over 30 percent were not overweight or obese. 6.2% had prior CVD... these numbers are all quite a bit lower than what you'll find in NHANES surveys. This population on average was indeed healthier than the average post menopausal population. That's borne out by the fact that our CHD rates were about half of what we had predicted when we started the study. (Rossouw)

          1. (3992S1_02_Rossouw.ppt)

        2. [ANOTE]

          Having a lower than expected intitial overall CVD diagnosis at entry is consistent with an alternative possibility that some patients have undiagnosed disease. The position taken that study patients were healthier than the expected population does not consider this
          possibility that study patients had undiagnosed cardiovascular disease at study entry and only appeared healthier, and that hormone therapy (particularly progestin effect on coronary vasospasm without fatality) intitially allowed those patients to be identified. Those in the Placebo group would be indentified, but later in the study.

          Since obese individuals have different risks, it would be important to characterize the censored patients as to whether their individual characteristics biased any results.

          BMI had a confounding effect on the Breast Cancer Hazard Ratios seen with prior hormone use. No prior hormone use had (HR's 1.18/0.96 for BMI <30/>30), and prior hormone use had (HR's 2.30/1.62 for BMI <30/>30).

      3. "There was enough noise about breast cancer risks that those women who were higher [risk] or had more family histories just didn't enter the study." (Chlebowski 2003) "Many women refused to be enrolled in the WHI Study because they wanted to keep taking their hormones." (WebMD 3/17/03)

      4. "There are many papers showing that the major source of estrogen after menopause androstenedione mostly secreted from the adrenals and aromatized to estrone which then equilibrates with.estradiol. It's in adipose tissue. I think most people believe it's in the stromal cells where aromatized enzymes are located.

        It is widely believed in many papers that this accounts for the positive association between post menopausal breast cancer and obesity. Why is this important?. It's because the amount of estrogen that women make after menopause depends on their amount of adipose tissue and the functionality of their aromatizing enzymes. So if you give a specific dose of estrogen to someone who has estrogen, you could expect clinically that you might get a different response than if you give that same dose of an estrogen to someone who doesn't [endogenous] have estrogen." (Stadel)

    5. Coronary Heart Disease (CHD) Subgroup

      1. Metanalysis for Coronary Heart Disease E+P (3992S1_02_Rossouw.ppt)

      2. The current final report from the WHI in July of this year [2003] really did not find overall an increase in coronary heart disease in women receiving E+P hormone therapy. (Dr. Archer)

        1. The hazard ratio from the updated centrally adjudicated data is 1.24 [1.00-1.54] so [overall] 24 percent increase in CHD [non-significant] . But the most important point that I'd like make is this was particularly elevated in the first year. The hazard ratio of 1.81 appeared in the first year." (Dr. Stefanick)

        2. "Higher base-line levels of low-density lipoprotein cholesterol were associated with an excess risk of CHD among women who received hormone therapy." (Manson 2003)

        3. [ANOTE]

          The Hazard Ratio of 1.81 in the first year for CHD looks high, but it has an associated Confidence Interval of [1.09-3.51], which is just barely excludes 1.0 for significance. The Cox Proportional Hazard Analysis with Time-dependent treatment effects actually shows a decrease in CHD Hazard Ratio over exposure time from the initial year, with Z=-2.36 (p=.02), a significant difference.

          Those with lower baseline LDL levels who received hormones therapy actually showed a decreased CHD (note this is a subgroup observational analyses from the Observation Arm of the WHI Study, not the randomized part of the WHI study).

          The first year increased CHD Hazard Ratio includes those patients who were initially treated with estrogen only and remained in the estrogen-progestin Outcome Group.

          The Kaplan-Meier curves for CHD cross in the final re-adjudicated data, which indicates that Kaplan-Meier analysis is invalid (also seen for Invasive/Total Breast Cancer, Global Index, and Total Mortality). These contrast with the K-M curves for reduced Fractures, which have the expected shape for a Cox Proportional Hazard Model giving reliable Hazard Ratios from Kaplan-Meier.

          1. Coronary heart Disease Hazard Rates (Manson et al, NEJM 2003;349:528)

          2. Invasive Breast Cance Hazard Rates (3992S1_04_Chlebowski.ppt)

          3. Global Index (3992s1_03_Stefanick*.ppt)

          4. Fracture Incidence (3992S1_07_Anderson.ppt)

      3. We did see a decrease in total cholesterol and LDL cholesterol of 12.7 percent, very similar to data published previously. There was also an increase in HDL cholesterol of 7.0 percent which was actually a little bit better than the PEPI study. We also saw decreases in glucose, not significant, but also insulin. So the lipid benefits... were also seen in WHI, but I think we all recognize that this is a risk factor for a disease. The disease was not benefitted. So in this case, we have to recognize that looking at lipid changes is not the appropriate approach with respect to CVD and hormones. (Dr. Stefanick)

        1. [ANOTE] This interpretation is in contrast to many studies correlating improved HDL, LDL and Cholesterol values with Cardiac disease benefit. Although WHI does not see CVD benefit as reported, that does not negate the other data supporting Cholesterol and LDL as risk factors for CHD.

    6. Breast Cancer Subgroup

      1. "E+P showed a total of 245 breast cancers versus 185 for Placebo. Invasive breast cancer was 199 versus 150 with a hazard ratio of 1.24 and just a trend of insitu cancers... during a course of follow-up that ended after 5.6 years. (Archer) You can see actually that the curves cross at about four years." (Chlebowski)

      2. Breast Cancer co-factors

        1. Age itself is the most important risk factor for breast cancer from numerous publications. (Dr. Archer)
          1. No interaction of Breast Cancer HR with Age was seen, with HR=1.2 in the 50 to 59 year olds, HR=1.22 in 60 and above. Hormone and Placebo groups did not differ in their rates as for older patients compared to younger. (Stefanick)
        2. BMI (Body Mass Index) may have had some effect on younger patient breast cancers. (Stefanick)
        3. Histology- There was a "suggestion from especially more recent observational studies involving E+P that lobular cancers would be largely responsible for most of the increase. Actually we saw nothing like that. We saw really that all types of cancers were the same in both groups. Again the suggestion on the predominance of the observational studies that E+P would be associated with well differentiated cancers wasn't seen. We saw the same distribution, similar histology and grade on E+P compared to that on placebo." (Dr. Stefanick)
        4. Receptor status- "What we see here is that both receptor positive and negative breast cancers were greater on E+P... You can see more receptor positive cancers, more receptor negative cancers, more progestin receptor positive cancers, more progestin receptor negative cancers. The P-value suggests that there was a significant imbalance with respect to the number of individuals having receptor status determined [for Breast Cancer]. This wasn't based on size difference. We don't have an explanation for that imbalance." (Dr. Stefanick)
        5. Stage- Tumors on E+P compared to placebo were larger [1.7 vs 1.5 cm].

      3. This finding of similar grade, histology, and receptor status but more advanced stage... and the suggestion that there were apparently fewer [breast] cancers seen in the first couple of years on hormone prompted us to look at the mammograms. (Cheblowski)

        1. "Suspicious [mammographic] abnormalities usually leading to biopsy were [higher in the E+P group].. . E+P may simulate breast cancer growth and hinder breast cancer diagnosis." (Dr. Stefanick)
          1. (3992S1_05_Anderson.ppt)

        2. Abnormal mammograms (not biopsies) were 9.4 percent versus 5.4 percent on placebo... Most of those abnormals were in the short interval follow-up category, Category 3 [unscheduled]. (Cheblowski)
          1. (3992S1_05_Anderson.ppt)

        3. [ANOTE] E+P patients had more breast biopsies because of more abnormal mammograms.

        4. The cumulative abnormal mammograms after six plus years of follow-up were 30% versus 21%, E+P versus Placebo... The people that would drop off that wouldn't be required to have mammograms before dispensation... abnormal mammograms were associated with even one year of E+P use, a four percent absolute increase in abnormal mammograms after one year on E+P, a ten percent absolute increase in abnormal mammograms after about five years of E+P. (Cheblowski)

        5. Mammograms were 74 percent more likely to be abnormal after one year, but in that first year there was about 30 to 40 percent less cancer seen . So [as a percentage of all the cancers in the first year] we ended up having almost twice as many abnormal mammograms, significantly fewer cancers seen, and more advanced cancers subsequently being delivered . Those things taken together just looking at those numbers suggest that cancers are growing during those initial years but we're not able to see them with mammograms which are much less effective in finding the cancers. If we're looking at those first two or three years, we really don't know what we're seeing because it appears that the E+P is making the mammographic diagnosis of those cancers much more difficult. That's why they're being seen later. So it's the same kind of question of how can we look at fairly those first two year events when we know that there's two other things that are occurring in the background. (Cheblowski)


          1. Fewer Placebo patients with prior hormone exposure had abnormal mammograms in year one, consistent with previous detection from prior mammograms. More Placebo patients (without prior hormone) had abnormal mammograms in year 6+, consistent with increasing detection of baseline tumors over time.
          2. More Placebo patients with prior hormone exposure dropped out in first year, which means that there would be fewer mammograms done in the placebo group in year one, as is seen. Patients that dropped out would not get their mammograms for comparison according to the protocol.
          3. An comparison of time-to-mammogram with time-to-abnormal mammogram (from randomization) over time might indicate if shifts in one of these biased the Breast Cancer findings. (Earlier abnormal mammograms for the same number would increase the Hazard Rate).

      4. Breast cancer and prior hormone use

        1. The period in which the E+P group has a lower Breast Cancer incidence rate [total] is at least for four and a half years, but the curves do cross. The E+P group has a slightly higher rate in the later years. Therefore the pattern is overall the same but you see a longer duration of lower rates. Whereas in the prior exposed, the separation of the curves does begin much earlier by about Year 2." (Anderson)

          1. About 26 percent of our population had used hormones previously. A little bit more had been combined use.
          2. Women who had used prior E alone were more likely to have a shorter term exposure to estrogen than women who had used combined hormones.
          3. Prior E alone users (58 percent) had their exposure to E more than 10 years ago.
          4. Prior E+P combined hormone users were more likely to be E+P current users.
          5. [ANOTE]

            E-only prior exposure group would be less likely to have had recent mammograms (got hormone more than 10 years ago) and less recent exposure than E+P patients.

        2. Prior Exposure "Here I've categorized slightly differently than it was in the JAMA paper. Here prior E only exposure is only exposed to Estrogen alone... These women never took progestin before. Any prior E+P, some of these women did have some episodes of E alone exposure. I wanted to keep the E alone group pure." (Anderson)

          1. (Prior E alone) or (Prior E+P/some E) (3992S1_05_Anderson.ppt) (3992S1_04_Chlebowski.ppt)

          2. Prior E+P (some E)

            1. Women who had prior E+P exposure (some with prior E) have a breast cancer Hazard Ratio of 2.19. Unweighted P-value for the interaction is 0.08. The weighted is 0.17. So again there's some kind of suggestive trends but not very strong. The suggested prior exposure particularly prior E+P seems to be associated with higher risk. (Anderson)
              1. [ANOTE] P >.05 is not a statistically significant level.
            2. "Women with prior exposure to E+P who were randomized to placebo have a quite low rate, 0.19 percent per year [total breast cancer rate] versus the other two groups [prior E & no prior exposure] with about 0.36 percent per year. So women with prior exposure to E+P are clearly different." (Anderson)

          3. No Prior E+P or E

            1. "Women who had never used hormone therapy... were not found to have a significant increase in the occurrence of breast cancer during the five years of the clinical trial [HR=1.09]... the woman 50 to 55 who is symptomatic and requests treatment is really not at a particularly increased incidence of breast cancer from the use of hormone therapy using the relative hazard published in the WHI. (Archer)
            2. The Ratio of Hazard Rates was less for E+P women prior menopausal hormone therapy in the first years at 0.48, a 50 percent apparent reduction in the first two years for E+P compared to Placebo (0.65). "You don't see that in the women with prior menopausal hormone therapy." (Chlebowski)

          4. Women with prior Estrogen exposure (only) have a breast cancer Hazard Ratio of 1.47
          5. Women with prior Progestin exposure (only) have a breast cancer Hazard Ratio of 1.0

      5. Invasive breast cancer and prior hormone use (Anderson)

        1. (3992S1_05_Anderson.ppt)

        2. Looking by prior hormone use, for invasive breast cancer, the hazard ratio is 1.09... In all invasive [breast] cancer, the hazard ratio is 1.86. The unweighted P-value is .04. The weighted P-value is .10 suggesting some modest evidence of an interaction with prior hormone use where women who have been exposed in the past if you looked at that by itself these Z-values of -2.7 or -3.0 are clearly statistically significant.

        3. "What is rather curious about this finding and I can't explain it exactly is that the rate of invasive breast cancer in women who have been exposed previously [to E or E+P ] but who then were randomized to Placebo is quite low. It's 0.25 here. That's the annualized incidence rate. Placebo who are not previously exposed is higher. It's 0.36. That's a little bit curious and suggests to me some sort of selection bias probably in the sense that these women are different, the prior hormone users versus the no-prior exposed group."

      6. Nodal status and prior hormone use (Anderson)

        1. "Percent positive nodes in advanced stage show the same pattern in both groups but again it's this weird thing where the placebo group in the women who had been exposed previously have a lower percent of positive nodes and lower percent of advanced stage than the placebo group with no prior exposure. So this is another very curious finding."

      7. Recency of prior hormone use (Anderson)

        1. Women who were using hormones at the time we first encountered them actually had to go through a three-month washout period before they could be randomized. These are women who were using hormones before the washout period and then within the last five years but not at the baseline visit five to ten years ago or ten plus years ago. You can see all of these are generally in the same region. The P-values suggest that there's no interaction [for recency of prior hormone use].

          There are the curves. Hormone used at enrollment within the last five years, five to ten years ago, and more than ten years ago. Maybe the separation is coming a little bit later for older use.

      8. Revisions: Breast Cancer Reports from Study, Post Study Update, Both Combined

        1. Breast Cancer Findings Intervention Period (Preliminary 2002)

          1. Total Breast Cancer (not reported)
          2. Invasive Breast Cancer
            1. 166 Invasive Breast Cancers in E+P
            2. 124 Invasive Breast Cancers in Placebo
            3. HR = 1.26 for Preliminary Intervention Period
          3. In-Situ Breast Cancer (not reported)

        2. Breast Cancer Findings Intervention Period (Final Revised 2003)
          There was a total of 245 versusĀ 185 cases ( follow-up 5.6 years).

          1. (3992S1_04_Chlebowski.ppt)

        3. Breast Cancer Findings Post Intervention Period (Post Period 2003)
          "Women received letters from us on July 8 of last year [2002] asking them to stop taking their pills but we've continued to follow them up. This is the increment of data since that time. They have not been taking our pills. Some of them have probably been taking their own pills. But you can see that we've had 21 new [invasive] breast cancers in the E+P trial and 18 new ones in placebo for a hazard ratio of 1.13." (Anderson)

          1. (3992S1_05_Anderson.ppt)

        4. Breast Cancer Findings (Intervention + Post Periods Combined)
          "Our cumulative, combining the intervention period with the post intervention period, is 227 invasive cancers versus 170. The hazard ratio is 1.26, again especially by our weighted statistic, very highly statistically significant." (Anderson)

          1. (3992S1_05_Anderson.ppt)

      9. Summary of Preliminary/Revised Invasive Breast Cancer Rates/Ratios


        1. WHI reported the Combined [Intervention + Post-Intervention] HR = 1.26, which is slightly higher than the Revised [Intervention Only] HR = 1.24. It is unclear how adding data with a lower HR (1.13) gives a higher HR (1.26).
        2. Calculating the Invasive Breast Cancers for the Final Revised from [Intervention Revised+ Post Intervention Combined] - [Post Intervention] for E+P= 227-21= 206 is 7 greater than the reported [Intervention Revised] = 199; and Placebo= 170-18= 162 is 12 greater than the reported [Intervention Revised] = 150. No discussion of this discrepancy is provided, an apparent elevation of the HR in the [Intervention Revised + Post Intervention Combined]. See note 1.
        3. Adding in post-intervention years dramatically lowered the ratio of hazard rates for Year 5, with smaller offsetting increases in Year 3 & 6+. This gives a hint that the HRs for patients recruited later in the study were lower, so that the HR for Outcome Year 5 had been artifically elevated by truncation when the study was terminated prematurely. (Outcome Year 5 for the 97-98 Recruit Cohort is missing initially but included in the revised. All other cohorts before 97-98 include Year 5 Outcomes initially and for the revised).
        4. The Hazard Ratio for Invasive Breast Cancer in the period after the study closed shows a continued decline (1.13) from the maximum Rate of 1.99 in Outcome Year 5, not an increase as implied by the decision to stop the study for Breast Cancer Harm. The Hazard Ratio for Breast Cancer appears not to have a linear trend as considered in the reports, but a non-linear trend with a maximum at 4-5 Years.
        5. Prempro group had lower Breast Cancer Rates for 4.5 of the average 5.6 outcome years of the study, with no statistical increase in Breast Cancer rates over the entire study (Stefanick). [Only Year 5 excluded HR = 1.0, all others did not contradict a HR of 1.0.]
        6. Unweighted sequential monitoring gives [0.97 - 1.59] for Invasive Breast Cancer, which is not significant (from Chlebowski 2003).
        7. Survival curves that cross for breast cancer indicate that the Kaplan-Meier analysis by Cox Proportional Model is invalid for significance. [see McDonough, P, Fertil Steril 78(5):Nov2002(951-956)]
        8. Non-agreement between the weighted and unweighted results is concerning.
        9. The Trial [Intervention] Period shows an increased Invasive Breast Cancer Hazard Ratio with no increased In-Situ Breast Cancers, but post-Trial data shows an increased In-Situ Breast Cancer Hazard Ratio with no increased Invasive, the reverse (with both not statistically significant).

      10. [ISLT]

        Hypothesis Explaining Prempro WHI Breast Cancer Findings (Dr.Tim)

        Breast cancer findings from the WHI are consistent with the hypothesis that:

        1) Study patients with previous estrogen exposure have a number of larger undetected tumors, which are detected sooner after given Estrogen+Progestin compared to Placebo (Detectable tumors would have been already detected). This would especially be true for prior E+P exposure.

        2) Study patients with no previous estrogen exposure would have a (possibly same or greater) number of smaller previously undetected tumors at baseline, which would take longer to detect after given Estrogen+Progestin
        compared to previous estrogen exposure (more undetected tumors).

        3) Study patients given placebo would have minimal change in their tumor size during the study, with a lower detection rate during the study (the average age of patients was 63 and thus undetected breast cancers would be slower growing ones).

        4) Study Patients given estrogen only also would have more but still minimal change in the size of their tumors and also with a lower detection rate, since estrogen stimulates tumor growth less than estrogen-progestin.

        5) Giving estrogen-progestin (or estrogen) decreases new tumor formation and suppresses grade and nodal spread, but stimulates

        growth in size for existing tumors, increasing initial detection (would initially see fewer in-situ tumors compared to invasive).

        6) A large number of Placebo patients who become non-compliant by starting E+P off-study, would increase the censored-after-6-months hazard ratio for breast cancer when they are excluded.
        This would lower the adherent Placebo rate and increase the adherent hazard ratio under intention-to-treat analysis, as reported in the study (if starting E+P increases initial detection, excluding those patients would fail to count cancers that would eventually be found anyway in the placebo group).

        7) Earlier breast cancers detected would be more invasive (bigger tumors), later breast cancers more in-situ (finally big enough to detect but more benign).

        8) Randomized studies of breast cancer survivors given hormones show lower mortality rates, consistent with this hypothesis. 05.14.2005

    7. Bone Subgroup (WHI and Lower Estrogen Dose Studies On Bone Density)

      1. Effects from loss of estrogen on bone (3992OPH1_01_Foegh.ppt)

      2. "Current data from the WHI and other publications indicate that standard and lower doses of estrogen with progestin or estrogen alone prevent bone loss in post menopausal women... estrogen plus progestin reduces the incidence of fracture of the hip, spine or vertebral body and wrist in all the subgroups of post menopausal women. (Dr. Archer) "There was net benefit of hormone therapy even in women considered to be at high risk of fracture." (Dr Silva)
        1. (3992S1_08_Wyeth-Camardo.ppt)

        2. (3992S1_08_Wyeth-Camardo.ppt)

        3. (3992S1_08_Wyeth-Camardo.ppt)

        4. Prempro Reduction of Fractures in WHI (E+P (3992S1_06_Cauley.ppt)

        5. Prempro Reduction of Fractures by Age in WHI. There was no evidence that the effect of E + P on fracture differed across age groups. (Cauley) (E+P 3992S1_06_Cauley.ppt)

        6. We found consistently higher BMD measurements in women randomized to the E+P so that by the end after three years of treatment, the lumbar spine increased over 6.5 percent in the E+P group compared to about 1.2 percent in the placebo group which is overall a 4.5 percent difference in BMD at year three at the lumbar spine with somewhat smaller differences at the total hip which is consistent with other osteoporosis therapy showing larger effects on lumbar spine than on the total hip.
          (Cauley) (E+P 3992S1_06_Cauley.ppt)

        7. Risodronate, Aldendronate and Estrogen/Progestin are the only treatments shown to be effective for osteoporosis.
          (E+P 3992S1_06_Cauley.ppt)

        8. "Berlex has sponsored a [lower-dose] study on osteoporosis in women between the age of 60 to 80. UCSF was the coordinating center and you'll see some names that are familiar to the osteoporosis field and estrogen like Dr. Grady, Dr. Cummings and also on the investigator list, there are names familiar in this field. (Orloff)
          1. This was a double-blind, randomized trial with 417 women that were as I said between the ages of 60 to 80 years and they all had an intact uterus. They were more than five years post menopausal and the entrance criteria was a z-score of more equal to 2.0. The estrogen dose was a weekly transdermal patch which delivers 0.014 mg of estradiol. That was tested against a placebo patch.
          2. The goal was to increase estradiol just to 10-15 picogram per mL. This is a low level of estradiol because you may all know that women post menopausal have levels below 20 picogram per mL and nearly all men have actually levels about 20 picogram per mL which may come to a surprise to many that men have higher estradiol levels than post menopausal women.
          3. All the women took calcium and vitamin D of reasonable doses and the study lasted for two years with follow-ups every four months. The primary endpoint was bone marrow density ("BMD") at lumbar spine.
          4. As you can see, there's a 2.5 percent difference between placebo and the active arm at 24 months, a highly significant result of a P-value less than 0.001. This is very comparable to other estrogen and other compounds that SERMs use for treatment of prevention of osteoporosis.
          5. We also had a secondary endpoint of fractures. Of course we were aware that the study wasn't big enough to show any difference in fractures, but as you can see numerically at least there is a difference. There is four in the active arm and 10 fractures in the placebo arm."
          6. [ANOTE] Peri- and early- menopausal women have higher serum levels of estradiol than 20pg. This is a major difference between newly and later post menopausal women. Assays also overestimate serum estradiol values in patients taking oral preparations and do not have adequate resolution in the low ranges [see McDonough, P, Fertil Steril 78(5):Nov2002(951-956)]

        9. "In August, most of you are aware that a study was published on what I would also call an ultra-low dose of estradiol. That was Dr. Prestwood and her collaborators." (Orloff)

          1. In a study of a cohort of women 65 years of age or older, we compared the serum hormone concentrations at base line in 133 women who subsequently had hip fractures and 138 women who subsequently had vertebral fractures with those in randomly selected control women from the same cohort. Women who were taking estrogen were excluded. The results were adjusted for age and weight.

            women with undetectable serum estradiol concentrations (<5 pg per milliliter [18 pmol per liter]) had a relative risk of 2.5 for subsequent hip fracture (95 percent confidence interval, 1.4 to 4.6) and subsequent vertebral fracture (95 percent confidence interval, 1.4 to 4.2), as compared with the women with detectable serum estradiol concentrations. Serum concentrations of sex hormone–binding globulin that were 1.0 µg per deciliter (34.7 nmol per liter) or higher were associated with a relative risk of 2.0 for hip fracture (95 percent confidence interval, 1.1 to 3.9) and 2.3 for vertebral fracture (95 percent confidence interval, 1.2 to 4.4). Women with both undetectable serum estradiol concentrations and serum sex hormone–binding globulin concentrations of 1 µg per deciliter or more had a relative risk of 6.9 for hip fracture (95 percent confidence interval, 1.5 to 32.0) and 7.9 for vertebral fracture (95 percent confidence interval, 2.2 to 28.0). For those with low serum 1,25-dihydroxyvitamin D concentrations (23 pg per milliliter [55 pmol per liter]), the risk of hip fracture increased by a factor of 2.1 (95 percent confidence interval, 1.2 to 3.5).

            Postmenopausal women with undetectable serum estradiol concentrations and high serum concentrations of sex hormone?binding globulin have an increased risk of hip and vertebral fracture. [NEJM Volume 339:733-738 September 10, 1998 Number 11]

          2. Dr. Cummings, one of the investigators, pooled the data of these two ultra-low studies and the combined factors were that there were six fractures for the ultra-low and sixteen for placebo. This is statistically significantly, difference of a p-value of 0.4. This is really exciting because these are mainly osteopenic women and these are fractures that we are talking about. So it's very encouraging.

          3. What were the adverse events?. Here's adverse events we worry about namely, breast cancer, cardiovascular events. These are what they look like in this study which lasted for two years. We looked at all but what I've summarized here for you are the breast cancer and the cardiovascular. It was interesting when you glance over it. There is really no difference between the placebo and the active arm.

          4. One interesting point is actually that we did not have any venous thromboembolic events. If you go down to the bottomline, I thought it might be interesting also to see there were no deaths in this age group and the hospitalization was not statistically significantly different. It was 22 in one group and ten in the other.

          5. This is the conclusion... What we found is the prevention of bone loss in all the post menopausal women with this dose that is 75 percent lower than the normally used dose. It is safe for the endometrium. The study lasted for two years so for two years you do not need to use progestin. There was decrease in the bone markers and there was no difference in some of the normal estrogen related side effects like breast tenderness, headache. If you look at the bottom, there was also no difference in lipids, sex hormone binding globin ("SHBG") or C-reactive protein ("CR-P") between the two groups.

        10. "So we really think that this effect of this ultra-low dose is kind of a paradigm shift in the risk-benefit of the hormone use. We showed that it seems that you would be able to get a fracture reduction in osteopenic patients. You can give [unopposed] estrogen at this dose for up to two years. We do not know what happens after two years. The adverse event profile is similar to placebo. We have no increase in the vasomotor symptoms. We don't share of course bisphosphonates effects because we are transdermal products. (Orloff)

      3. [ANOTE]

        These lower-dose estrogen studies are interesting, but really are of value for Bone Mineral Density and prevention of Bone Fractures and not other clinical outcomes.

        1. These two studies were of short duration (2years)
        2. Study women were well past menopause (60-80 years, >65 years)
        3. Patient numbers were small, with low power to detect subgroup adverse outcome differences.
        4. Randomized study women were already severly osteopenic (z-score<=2.0) and presumably had low serum estrogen values, given the estrogen patch (not controlled for estrogen exposure), which were pooled with a cohort study of women who did or didnt have hip fractures with high and low estrogen levels. Both studies do indicate better BMD when have higher serum estrogen or take estrogen.
        5. Study women all had a uterus with unstated ovary status and were given estrogen only, confirming the WHI Estrogen-only arm study results. It has little impact on the E+P studies discussed by the committee here. The randomized study would be comparable to the women with a uterus who were given estrogen at the beginning of the WHI E+P Study 2002 (first two years) and later switched to E only because of endometrial concerns.
        6. It has been shown that BMD is sensitive to exquisitely low doses of estrogen, so these findings are what might be expected for bone. It is telling that the WHI comittees and reviewers highlight these lower-dose studies to support the effect of estrogen in preventing bone fracture, yet when they issued their negative recommendations about Prempro HRT essentially ignored all the telling WHI data which indicates the same prevention of bone fractures from estrogen-progestin use . Bone fractures are one of the most under-appreciated events of post menopause.




    OCTOBER 7, 2003

    + + + + +

    The Advisory Committee met at 8:00 a.m. in the Versailles Ballroom of the Holiday Inn, 8120 Wisconsin Avenue, Bethesda, Maryland, Dr. Michael McClung, Acting Chairman, presiding.


    .................. Acting Chairman
    ............................. Consultant (Voting)
    THOMAS O. CARPENTER, M.D..........Member
    DEAN FOLLMAN, Ph.D.........................Member
    BARBARA LUKERT, M.D..................... Consultant (Voting)
    CLIFFORD ROSEN, M.D..................... Consultant (Voting)
    DAVID SCHADE, M.D.......................... Member
    MORRIS SCHAMBELAN, M.D.............Member
    MARTHA N. SOLONCHE.................... Consumer Representative (Voting)
    PAUL WOOLF, M.D.............................. Member
    ROBERT ZERBE, M.D......................... Acting Industry Representative

    DORNETTE SPELL-LeSANE, M.H.A..NP-C Executive Secretary








    MARIE FOEGH, M.D., D.Sc.

    1. Committee Members

      DR. ZERBE:.....................................QUATRx Pharmaceuticals, Industry representative.
      DR. SCHADE:................................. Endocrinology University of New Mexico, School of Medicine.
      DR. SCHAMBELAN:...................... Endocrinology, University of California in San Francisco ("UCSF").
      DR. FOLLMAN:............................... Statistician at the National Institutes of Allergy and
      .......................................................... Infectious Diseases.
      DR. BONE:...................................... Endocrinologist and the Director of the Michigan Bone and
      .......................................................... Mineral Clinic.
      DR. LUKERT:.................................. Endocrinology, University of Kansas.
      DR. CARPENTER:......................... Pediatric Endocrinology, Yale University in New Haven.
      DR. WOOLF:................................. Adult Endocrinologist, Crozer Chester Medical Center.
      MS. SOLONCHE:.......................... New York City, Patient Representative.
      DR. STADEL:.................................. Medical Officer, Metabolic and Endocrine Division (FDA).
      DR. COLMAN:................................ Medical Officer from Metabolic and Endocrine (FDA).
      DR. ORLOFF:................................ David Orloff, Director, Division of Metabolic and Endocrine
      ......................................................... Drug Products (FDA).
      CHAIRMAN McCLUNG:............... Endocrinologist at the University of Oregon Health
      ..........................................................Sciences Center in the Oregon Osteoporosis Center..
      SECRETARY SPELL-LeSANE... Executive Secretary for the Committee

    2. Participants

      Dr. Marie Foegh............................ Berlex Laboratories
      David Archer.................................. American Society for Reproductive Medicine ("ASRM")
      Dr. Omega Silva............................ Endocrinologist, Past President of the American Medical
      ..........................................................Women's Association ("AMWA")
      Dr. Jim Simon................................. President of the North American Menopause Society
      Amy Allina....................................... Program Director for the National Women's Health Network
      Dr. Jacques Rossouw.................. Women's Health Initiative Investigators Group
      Dr Rowan Chlebowski.................. Women's Health Initiative Investigators Group



    1) Probably the clearest and best overview of the WHI to date is by Kathryn Morris and Sue Ungar "A Different Perspective on the WHI" 6/09/2005: p1-3, 8-9. They address the importance of administration route. However, their comparison of Estradiol with Premarin and Progesterone with Provera-MedroxyprogesteroneAcetate-MPA is a bit simplistic. Transdermal Estradiol is highly converted to Estrone in the liver (first pass effect), and Oral Premarin has less of this because it is mostly already Estrone. Transdermal Progesterone is quite sedating, and Oral Progesterone stresses the liver more than Provera because it has to be given in such high doses. It is not known what dose of Progesterone is needed to protect the endometrium. Progestins given in combination with Estrogen for HRT exhibit completely different characteristics than when given alone. The effects of particular doses, ratios, routes and types of estrogen and progestin are not entirely known or tested, and often depend on the individual. As well, newer transdermal HRT patches are available that are not described.
    The important issue to stress is that the WHI data actually supports the safety of HRT, especially relative to Cardiac Events and Breast Cancer (which was the big public scare generated by the study).

    2) The confusion generated in the wake of the WHI publications highlights the danger in making interval statistical judgements based on the incomplete data prior to study completion, especially if power is low and error rates are high. The decisions made and conclusions reached in the WHI published studies do not appear to be rationally defensible.

    In this reviewer's opinion it is more important to continue a study whenever a monitoring boundary is crossed which is also however statistically non-significant, than to stop a study because of the fear of possible harm based on a non-significant boundary crossing. In this case the indicated harm can just as well be due to chance fluctuations. Stopping a study prematurely when you get an answer you expect or answers you don't like alters the reliability of the findings (as clearly evidenced by the WHI results for Coronary Heart Disease). Studies are designed to obtain definitive results. Predicting how the future data will affect that result is circular: it presumes you already know the behavior of future data. Additionally, it introduces time correlations that wreck havoc with statistical tests for significance of time dependence.

    The premature termination of studies based on the statistician's prediction of furture outcomes is becoming a frequent occurrence. Trials must be allowed to complete when monitoring statistics do not have the power to accurately detect harm or help until the trial is done.

    DrTim 05.14.2005

This review is dedicated with admiration to Donald Thursh, MD and
Mel Schoenberg, MD, who had it right from the very beginning.

to Top



page views since May2005