WHI Women's Health Initiative HRT Gives Better Survival, Lower Mortality ? WHI Mortality New Analysis by Timothy D. Bilash MD MS OBGYN WHI Continuous Estrogen-Progestin Menopause Study Progesterone U.S. Government National Institutes of Health Lower Mortality Prempro Estrogen Plus Progestin Group 2002 WHI Mortality Least Squares Analysis of Cox Proportional Hazard Ratios 2002 Women's Health Initiative WHI Mortality WHI Mortality WHI Mortality WHI Mortality WHI Mortality WHI Mortality WHI Mortality WHI Mortality WHI Mortality WHI Mortality WHI Mortality WHI Mortality WHI Mortality WHI Mortality WHI Mortality WHI Mortality WHI Mortality Death WHI Mortality WHI Mortality WHI Mortality WHI Mortality WHI Mortality WHI Mortality WHI Mortality WHI Mortality WHI Mortality Death

This analysis uses the Least Squares method, which finds the linefit that minimizes the sum of the squared vertical distances between the actual data(y) and predicted fit(Y) values over time. This is done for each study group. (See the discussion of statistics and D&T [13 p195]).

The Annualized Cox Mortality Least Squares Rate Fits obtained by Year for the
Prempro Estrogen-Progestin (EP) and Placebo (PL) Groups are:

If two groups have the same mortality rates, then the OMRD Difference plot would be expected on average to have a zero Slope and a zero Intercept ( ltblue shading above). A decreasing slope is found, however. Again note the large Residual for Outcome Year4.

This analysis utilizes the DIFFERENCE, while the WHI reported the RATIO of the Hormone and Placebo Annualized Hazard Rates. Because event rates are low, ratio and difference plots versus time have the same shape. The Ratios of the WHI Cox Annualized Rates by Year (EP/PL in red) are plotted below with the published overall constant Hazard Ratio of HR=.98 (grey line).[1]

To compare our Least Squares Difference Fit to the published WHI Hazard Ratio (presuming a constant Hazard Ratio or zero difference slope), Least Squares Linefits to the Annual Cox Rates (LS) are calculated assuming equal rate-slopes. The Cox Mortality Rates Data Difference (orange line) and the Least Square Constant Difference Linefit obtained (brown line = constant = -27, zero slope obtained because of assumption of equal slopes) is shown at left. Note that the Residuals for the restricted Constant Difference (purple text, shaded in tan) are larger compared to the unrestricted LS Difference Linefit shown again at right below. They Residuals also do not appear randomly distributed about zero when comparing the Plots of the Residuals (shaded in tan).

To further explore the time dependence of the difference data, later years are excluded (truncated) to obtain fits from the first- 3, 4, 5, and all data years shown below. There is a more negative slope difference and more positive intercept difference as more years are included, both differences increasing in magnitude. [Note that a Trimmed Mean is used for the first- 4 year graph only (2nd from the left below) - see 4 Year Effect of Trim below].

[click to enlarge]

The Slope of the Simple LS Linefit truncating the last 2 years (using the first four years left below) has a different sign (positive, not negative) than truncating the last 0, 1 or 3 years (using first 6, 5 or 3 years). Using the Trimmed Mean [17 p141], the Difference Slope and Intercept Residuals are improved (SE for the EP slope is dramatically decreased). This Trimmed fit is what is used for the last-2-year truncation in the multi-plot comparison above for better illustration, since this is highly dependent on the data point for Year 4 (right graph below and 2nd left above).

A table summarizing these modified fits is shown below. Although the Standard Error (SE) estimate for the Least Trim Square is not rigorously calculated, the Mean estimates are more reliable, and combining it with the Truncated Analysis provides a suggestion that Outcome Year4 (and perhaps Outcome Year3) differentially contributes errors to the SE beyond the expected random behavior, affecting the Mean Slope and Intercept values themselves less.

DISCUSSION OF LEAST SQUARE RESULTS FOR WHI

This leads to a search for possible explanations for the Outcome Year4 residual (affecting the EP and thus Difference and Ratio results).

Non-uniform Jump in Censored Patients at Outcome Year4

As noted in the 2002 paper, the number of censored patients exceeded the study design expectations. No discussion of how this might affect the results has been offered to date. In fact, adjusting just for a large number of censored patients is a problem. If in addition the differences between the groups or the number of censored patients is not uniform over time, it is not entirely clear how the analysis would be affected. [13 p240]

Below is an illustration of the Censored patients in the Kaplan-Meier Mortality Analysis from the WHI E+P Study. Note that there is a discrete jump in the censored patients for both groups at Outcome Year4, exactly the Year that displays a large Residual for EP. Initial Outcome Years 1,2,3 have a change in censor rate of about 0% per year [(+/-)1% for the EP group and essentially 0% for placebo group]. Later Outcome Years 4 and beyond have a change in censor rate of about (+)15% per year.^a Because Outcome Year4 is a transition between the two censor rates, a larger Variance(Residual) for the Censored Outcome might be obtained relative to the other years, with less effect on the value itself. (The %changes data has been added for this Web version. Further results pertaining to the Censored patients will be added as soon as possible.)

Recruiting/Exposure not Coincident with Outcomes for Survival Analysis

Analysis of Survival is different from a true experiment in many ways. One is that Exposure is started at different times. Also Exposures and Outcomes are separated in time, with unequal time intervals.

Below is an illustration of the Recruiting and Outcome Groups that make up the Cox Analysis in the WHI. There are in effect 5 cohorts of patients, one for each year of recruitment, whose Outcome Years are shifted in chronological time. That is, Outcome Year4 is 96-97 for the patients recruited in 93-94, while Outcome Year4 is 99-00 for the patients recruited in 96-97.*^b

WHI COX Outcome Years for each Recruiting Year Cohort

The Outcomes Groups in the Cox Analysis are made by combining the different chronological years that correspond to the same Outcome Year (Kaplan-Meier is different in this regard). Below is an illustration of this using the color scheme from the last graph.

Calendar Years comprising each WHI Cox Outcome Year

Events that have a non-uniform effect on recruiting, exposure, outcome diagnosis, or censoring over the time course of the study that can have anomalous effects on the outcome rates and comparisons of rates, if they differentially affect one group only.

Two identified events in the WHI are 1) the switching of patients from the Estrogen Only Study into the Combined Estrogen-Progestin Study in 1996 (patient hormone treatments were changed from E to E+P when this was done and patients included in the E+P Group for Outcome results), and 2) the announcement of increased risks in the Prempro Group in 2000. These moments are indicated on the recruiting graph above and in the general Timeline of the Study below. Further discussion of this will be added as soon as possible.

DISCUSSION OF THE WHI PUBLISHED MORTALITY RESULTS

Mortality is an extremely important clinical Outcome with a very clear clinical endpoint compared to other Outcomes. This paper examines the Annualized Cox Mortality Outcome data from the WHI (JAMA, July 17, 2002 [1]) over time. The individual patient data has not been released by the study yet (10/2004).

Cancer events are the major contributor to Mortality in the WHI study (47% =195/416). Coronary events are the other major contributor (29% =120/416). Note that the WHI reported no statistically significant increase in Coronary Heart Disease Mortality or Total Mortality with the use of continuous Prempro.[2,3,4,5,6,7,8]

The WHI study was publicized as a "carefully designed hormone study" concluding that continuous oral Estrogen/Progestin Hormone Treatment (Prempro CEE.625/MPA2.5) "should not be initiated or continued for primary prevention" of Coronary Heart Disease.

However, numerous problems in the study, such as high treatment crossover and dropout rates, have clouded the Outcome results, which conflict with other studies.[7,9] It can be argued that although the selection of exposure group was randomized, the actual exposure to hormone itself was not well controlled.

Many publications, societies and government agencies have echoed the conclusions of the study without examining the study shortcomings.[10,11,12] This paper is an attempt to look at the conclusions about Mortality.

STATISTICAL COMMENTS

Analysis of Survival Data requires a compound outcome to deal with the problems of analyzing the data before all events have actually happened. Ideally, one would wait until all patients die to do the analysis, which is not practical. So time and events get lumped together as one Outcome Statistic (events per time to event for the Hazard Function, or some variable like person-years). An early or late timing of diagnosis for the same number of events can have the same effect as a change in the number events at the same timing.

Considerable bias can occur with Survival Analysis [13 p217]
if the time intervals are large (not a problem with Kaplan-Meier)
if many withdrawals occur
if withdrawals do not happen on average midway in the interval (Cox Hazard Model or Kaplan-Meier)

Unfortunately, "little information is available to guide investigators in deciding which statistical analysis is appropriate for any given application of Survival Analysis, and research on biostatistical methods for analyzing survival data is still underway." [13 p224]

STUDY FACTORS THAT AFFECT THE ACCURACY OF THE WHI STUDY

Cross-overs
Dropouts
Contamination of Exposure
Censoring
Unblinding
Early termination, decreasing the statistical accuracy of any time-dependent analysis done on survival data.[21]

USE OF THE STUDENT T-TEST FOR RATE DATA

A proportion is a special case of a mean of 1's and 0's: the mean is simply the proportion itself. Student's T-Test on proportions is thus equivalent to an analysis on the means of binary rate data, and can be used as a test of significance for binary rates. [13 p108; 15] The Student T-Test is used here as a measure of significance for the Least Squares Fits.

A Two-Sample Independent Groups T-Test can be used to ask whether the means of two groups are equal when observations are numerical (continuous or means, ratios or proportions). [13 p133] Provided that the exposure risk is randomized, the Two-Sample T-Test is a valid approximation to the exact randomization experiment, free of the random sampling assumption or the assumption of exact normality. [14 p95]

Limitations of the T-test for significance of means relates primarily to missing an actual difference (beta error, low power =1-beta) rather than finding a difference when there is none (alpha error), particularly if the distributions have the same shape. [16;17 p107,120;18]

STATISTICAL FACTORS THAT AFFECT THE ACCURACY OF THE STUDENT T-TEST

Non-identical Population Distributions (shape)
Non-constant Population Variances
Non-random samples
Correlation of Outcome with other factors (time of event)
Unequal Sample Sizes
Non-normal population distributions (see below)

STATISTICAL CALCULATIONS (see Dawson and Trapp [13])

Definitions and Least Square Equations: (p195)
•x,y are the Sample Means
x is the Predictor (independent) Sample Mean value
y is the Outcome (dependent) Sample Mean value
•YE=Y,XE=X are the Expected Sample Means
XE is the expected x Sample Mean for the Sample (x,y)
YE = Y= a+bx is the expected y Sample Mean for Sample (x,y)
•Y= a+bx = YE is the regression line fit
y-YE = e = error term = residual
y=[a+bx]+e=Y+e
•n is the number of samples
•xbar,ybar are the Grand Sample Means
xbar = SUM(x)/n
ybar = SUM(y)/n
b=SUM[(x-xbar)(y-ybar)]/ SUM[(x-xbar)²] = Slope of the Linefit
a=ybar-b*(xbar) = Intercept of the Linefit

Assumptions for the least squares fit (p197, 201)
•y is normally distributed about its expected value Y
•the expected values Y form a straight line
•the expected values Y are independent
•Variance of y is constant for every x
•ybar (the mean of sample y's) = µ (mean of the population distribution)
•regression is a robust procedure and may be used in many situations in which the assumptions are not met, as long as the measurements are fairly reliable and the correct regression model is used
•if regression equations are calculated blindly without examining plots of the data, investigators can miss very strong but nonlinear relationships

One-Sample T-Test for Slope and Intercept (p196-201, 238)
want to perform a separate statistical test on slope and intercept obtained from Least Square Linefit
The T-Test can be used to determine whether Least Square regression coefficients are non-zero and to form confidence intervals, using the Standard Estimate of the Error (SE)
SE = Sqrt(SUM[(y-ybar)²/(n-2)]) = (Standard Deviation=SD)/Sqrt(n)
b₀ = expected Intercept (= 0 for zero Intercept)
•T(Intercept) = {a-b₀} / {SE*Sqrt[(1/n)+[xbar²/SUM(x-xbar)²]}
denominator of this = SE(Int) = Standard Error of the Intercept
•T(Slope) = SE²/[SUM(x-xbar)²]

Two-Sample T-Test for Differences (p135,139)
•significance for difference of means between two sample groups
assume random sampling (Student/Gossett)
assume equal Population Variances
•T = (y₁bar-y₂bar) / Sqrt[SD_p²(1/n₁ + 1/n₂)]
y₁bar,y₂bar is the sample means of group 1&2
n₁,n₂ is the sample sizes
SD₁,SD₂ is the sample variances
the assumed estimate of the common population variance is
SD_p² = [(n₁-1)SD₁² +(n₂-1)SD₂²] / [n₁+n₂-2]
SE₁², SE₂² = SD₁²/n₁, SD₂²/n₂
SD_p² = [(n₁-1)SE₁²/n₁ +(n₂-1)SE₂²/n₂] / [n₁+n₂-2]
SE²(Difference)=SE²=SD_p²(1/n₁+1/n₂)
SE²=SE₁²+SE₂²
^{Note: here SE is the pooled SE for the difference (SEE) , not to be confused with the SE for one sample. In this poster for convenience, SE is used for both the one sample and difference calculations, and should be clear from context.}
Assumptions for 2-Sample T-Test (p137-138)
each group follows a normal distribution
each group is independent
each group has the same population variance
however, with equal sample sizes, this requirement can be ignored (the 2-sample T-Test is robust with equal sample sizes)

REFERENCES

[1] "Risks and Benefits of Estrogen Plus Progestin in Healthy Postmenopausal Women", Writing Group for the Women's Health Initiative Investigators, JAMA, July 17, 2002, 288(3):p321-337
[2] "Estrogen and Progestin and the Risk of Coronary Heart Disease", Manson JE et al, NEJM, 349:p523-534
[3] "Influence of Estrogen Plus Progestin on Breast Cancer and Mammography in Healthy Postmenopausal Women", Chlebowski RC et al, JAMA, 2003;289:3243-3253
[4] "WHI: Now that the dust has settled, Creasman WT, Hoel D and DiSaia PJ, Am J Obstet Gynecol, Sept 2003, p621-626
[5] "Cardiovascular Disease and Postmenopausal Hormone Therapy", Speroff L, Current Controversies in Obstetrics and Gynecology, Nov22-24 (2002) Newport Beach, CA
[6] "Postmenopausal Hormone Therapy and Breast Cancer", Speroff L, Current Controversies in Obstetrics and Gynecology, Nov22-24 (2002) Newport Beach, CA
[7] "The randomized world is not without its imperfections: reflections on the Women's Health Initiative Study, McDonough PG, Fertility and Sterility, 78(5), November 2002, p951-956
[8] "Results from the Women's Health Initiative", Bilash T, August/October, 2002 [www.DrTimDelivers.com]
[9] "Hormonal Therapy Following Breast Cancer", Wren BG, in Proceedings of the Second International Symposium of the Portugese Menopausal Society 1999, p55-56
[10] "Preliminary Statement to ACOG Membership on the Women's Health Initiative Study", July 10, 2002
[11] "FDA Approves New Labels for Estrogen and Estrogen with Progestin Therapies for Postmenopausal Women Following Review of Women's Health Initiative Data", January 8, 2003
[12] "New Federal Report on Carcinogens Lists Estrogen Therapy, Ultraviolet, Wood Dust", NIEHS PR#02-11, NIH, December 11, 2002
[13] Dawson B, and Trapp R, Basic Clinical Biostatistics (2001)
[14] Box , Hunter and Hunter, Statistics for Experimenters (1978), p507
[15] Colton T, Statistics in Medicine (1974), p35
[16] Bowers D, Medical Statistics from Scratch (2002), p42,131
[17] Wilcox R, Fundamentals of Modern Statistical Methods (2001)
[18] Colton T, Statistics in Medicine (1974), p82,108
[19] Koosis D, Statistics (1985), p139
[20] Bilash, T [unpublished]
[21] van Belle, G, Statistical Rules of Thumb (2002), p72

Special Thanks
to Rand R. Wilcox for his generous expert opinions about the Least Trimmed Squares and Bootstrap statistical techniques.

Timothy D. Bilash MD OBGYN 10.2004

^(a) censored patients paragraph and graph revised 02.16.05
^(b) outcome groups paragraph refined and graphs revised 03.01.05
^(c) original abstract used the word difference in mortality, decrease is more accurate statement 01.01.08

__________________________________________________________

This poster is Dedicated in Memory of
Peg Harris, Ruth Howard and Helen Kolesnik Bilash

__________________________________________________________

goto

homepage

page views since Oct2004

WHI index page (10.13.2004/01.01.2008)
	homepage
			WHI MORTALITY RATES New Analysis by Timothy D. Bilash MD MS OBGYN www.DrTimDelivers.com October 27, 2004

		*Lower Mortality? with Continuous Estrogen-Progestin* in the Women's Health Initiative Menopause Study (2002) sponsored by the U.S. Government National Institutes of Health


	rate at year 0 INTERCEPT		rate increase each year SLOPE
Estrogen-Progestin Group		136 26		120 159
					Deaths per 100,000 women at risk per year
Placebo Group


		An alternative statistical approach applied to the Women's Health Initiative Estrogen+Progestin Study shows a decreasing Mortality Rate with Continuous Prempro (EP) over time compared to Placebo (PL), which is contrary to the published report.

			The simple linear model, although limited, provides insight into the possibility that data anomaly (in Outcome Year4 specifically) decreases the power of the published study findings to find an effect of Estrogen+Progestin on mortality.
			This poster was presented at the North American Medical Society Annual Meeting (October 6-9, 2004) [Menopause 2004;11(6):664(P-43)]


	The Straight Line Fits (EP pink line & gold text , PL pink line & pink text ) to the Cox Rates (dkblue line and text) are shown here, with the Residuals [Data-Expected] (brown text , shaded in cream).
	Because the event rates are low, the data are well approximated by a straight line. Note, however, a large deviation from a straight line fit for Year4 in the EP group (pink diamond).


	OMRD = EP-PL = 110 - 39 * [Outcome Year]
			Deaths per 100,000 women at risk per year

Note: The Absolute Value of the Residuals are listed top right of the 2-Sample Difference graph. EP Abs(Residuals) are shaded in pink, PL Abs(Residuals) are shaded in pale blue, and the Absolute Value of the Difference of the Residuals \|EP-PL\| are shaded in purple.
[click images to enlarge]


	WHI MORTALITY FITS
	WHI MORTALITY FITS		WHI MORTALITY DIFFERENCE FIT

*Difference EP-PL / Constant* Slope Linefit**		*Difference EP-PL / Unrestricted* Slope Linefit**