go back (07.06.05/02.21.09)
DrTim homepage link
Intro to Clinical Statistics
by Timothy Bilash MD
July 2005
based on:
Review of Basic & Clinical Biostatistics
Beth Dawson, Robert Trapp (2001)
From CH6 and Ch10
I. TWO SEPARATE OR INDEPENDENT GROUPS (from Chapter 6)
- Levine Test
(p141)
-
Levine Test is an alternative test for the equality of means
- Utilizes absolute deviations
from the mean
- Tests the hypothesis that the average of the absolute value of the observation deviations
from the mean
is the same in each group.
- Two-Sample T-Test is done, using the absolute value of the deviations
(absolute distance each observation is from the mean in that group) rather than the deviations
- If the T Test on these deviations is significant (>.05), the hypothesis of equal variance is rejected
and the T-Test on the difference of means is not appropriate.
-
if the Levine Test is significant, the deviations from the mean in one group on average exceeds the other
- valid for
normal and nonnormal
population distributions
- a modified Levine Test replaces the mean with median
-
[ANOTE] Statistical Significance [[ ]] ?expected value meaning
- statistical significance is determined by the deviations
of the observed values from some expected value, not the observed values themselves. whether a test is done for equality of the expected value, or equality for some function of the expected value, want the deviations
from the expected to be minimized.
-
deviation = y-f(y)
-
mean: f(y)=ybar
-
least squares: f(y)=a+bx
- identical population distributions may be more important than equal population variances for validity of the T-Test.
- Welch Test
(p141)
- another test for comparing means in two independent groups
II. STATISTICAL METHODS FOR MULTIPLE VARIABLES (from Chapter 10) [[ ]]
- Predicting Group Outcomes
(Nominal
<->
Categorical
<->
Grouped)
- Three methods are used for Outcomes measured on nominal scale (bi-valued: present or not-present, yes or no, true or false, + or -
) [[ ]]
-
Logistic Regression (Curve Fitting)
- can be transformed into Odds Ratio
- Controls (adjusts) for confounding variables using analysis of co-variance
-
Discriminant Analysis
- used less now
-
Log-linear Analysis
- rarely used
-
Logistic Regression (Curve Fitting) [[ ]]
- Fits an exponential (log-linearized) to the data (log-linearized), obtaining a regression coefficient (bi) for each factor
- Probability of outcome is divided up among the (N in number) X factors
= 1 / [1 + exp(b0
+ b1
X1
+ b2X
2
+ ... + bNXN
)] = CONSTANT
- Independent Variables are then selected or derived independently from the fit [[ ]]???
- Multiple outcomes
vs multiple risks
- Logistic
Regression
- find best fit for multiple Risk Factors
- Multiple Risk Factors are Independent variables that are Numerical or Nominal(Categories)
, ie, X's
- Single Outcome
Variable
is
bi-valued
(Logical), ie, Y, or can also be used for Multiple Outcomes
if Categories
- Sometimes also called multiple regression
- Multiple Regression
- find best fit for multiple Outcome Data, holding the values of all other variables constant
- Sometimes used to mean combination of both Multiple Risk Factors
with
Multiple Outcome Data
(note confusion between these different uses)
- If the independent variables (predictors or risk factors) are bi-valued, then can interpret regression coefficients as Odds Ratio
- odds ratio is a summary statistic
- odds ratio is the outcome probability ratio for a bi-valued risk
- contrast a 95% chance in one patient
with 100% chance in 95% of patients
- equivalent if odds ratio is constant
- an underlying problem for survival curves to distinguish between these (especially if time affects outcomes or risks)
- Chi-square
Test
is used to determine significance for each variable's regression coefficient when the Outcome is Multiple Categories/Nominal (cant use a T or F test)
- T or F test can be used if single input and single output variables
to determine whether each regression coefficient is different from zero
(if one binary risk and one binary outcome - see elsewhere)
- T Distribution can be used to form confidence intervals
for each regression coefficient
- If 95% confidence interval for the odds ratio does not include value of one, then 95% confident that the factor associated with the odds ratio is a significant factor within the confidence interval
- Regression tends to underpredict
the probability that a risk factor is present for a given outcome
- some advocate the Kappa Statistic for more correct percentage (but see R Wilcox for disagreement with this)
Back to Top
Back
page views since Sept2007