go back (09.08.03/03.22.06)
DrTim homepage link

CH3 Summarizing Data
(Nominal, Ordinal, Numerical)

Intro to Statistics by Timothy Bilash MD
September 2003

based on:
Review of Basic & Clinical Biostatistics by
Beth Dawson, Robert Trapp (2001) CH3


    1. Nominal scale (placed in (named) categories, grouped, qualitiative)
      1. organized by an attribute
      2. is a member of a group or not (discrete membership)
      3. count the number of observations with or without an attribute that fit the category
      4. percentages or proportions used
      5. often represented by frequencies by category rather than the raw numbers by category
      6. often displayed in contingency tables, dividing up into numbers which meet each selected criteria

    2. Ordinal scale (nominal scale plus groups are also put some order, semi-quantitative)
      1. organized by an attribute that is more or greater than
      2. diference between categories is not the same thruout the scale
      3. percentages or proportions used
      4. sometimes summerized by median values
      5. example is rank-order from highest to lowest

    3. Numerical scale (characterized by a number, quantitative)
      1. interval or continuous
        1. characterize by a number that varies in some continuous way
      2. discrete
        1. characterize by integer number values
        2. may be used to describe a continuous scale


    1. Proportion
      1. part divided by the whole (ie number divided by the total number)
      2. Percentage is a proportion multiplied by 100%
      3. useful for ordinal, numerical and nominal data
      4. dimensionless

    2. Ratio
      1. part divided by another part
      2. (number / with___) a given characteristic is an example
        (number / without)
      3. dimensionless, no units

    3. Rate
      1. a ratio times a multiplier (read as rate per units of the multiplier factor)
      2. for a given unit time interval (ratio per unit time)

      3. Crude rate
        1. rate for the whole population (everyone)
        2. affected by the distribution of characteristics in the population that affect the rate
      4. Sex-specific rate
        1. restricted to a given sex
      5. Cause-specific rate
        1. restricted to a given cause
      6. Age adjusted rate
        1. adjusting the rate weighted by the age of the people at risk
        2. allows comparing changes in a given population over time (since ages and numbers in each age group changes over time)

      7. Mortality Rate
        1. (number who died____________) in the given time interval
          (number who are at risk of dying)
        2. number at risk halfway thru the period often used for the denominator as an estimate
        3. Infant Mortality Rate
          1. number of infants who die before age1 per 1000 live births
      8. Morbidity Rate
        1. (number who develop a disease) in the given time interval
          (number at risk of a disease.........)
      9. Incidence Rate (per time interval)
        1. (number of new cases___________________) in the given time interval
          (number at the beginning of the time interval)

      10. Standardized Rates
        1. rates have to be adjusted for the distribution in order to be compared with other populations
          1. assumes weighted averages maintain the validity of the comparison
        2. often adjusted by each category to one of the populations as the reference population (direct method of rate standardization)
        3. standardized mortality ratio (indirect method of rate standardization)
          1. (number of deaths_________)
            (number of expected deaths)
          2. expected deaths calculated in each population using the specific rates from a standard population as a reference
          3. no dimensions

    4. Prevalence (is a proportion and not a rate since no time interval, a snapshot)
      1. (number with a disease at a moment in time)
        (number at risk at that same moment in time)


    1. Distributions (the spread of Numerical Data)
      1. Scale (how closely spaced)
      2. Shape (how the frequency changes along the scale)
        1. symmetric (evenly distributed about some middle)
        2. skewed (not evenly distributed about some middle)

    2. Measures of the Middle (Statistical Tests on Numerical Data)

      1. Arithmetic Mean or Mean (average of the data values)
        1. weighted average (used to calculate the mean if frequencies of a measure are reported and not the raw data)
        2. used for numerical, symmetric data
      2. Median (middle or midpoint observation, which half are smaller and half are larger)
        1. less sensitive to extreme values or shape of distribution of data than the mean
        2. used for ordinal, or numerical data if the distribution is skewed
      3. Mode (most frequent value)
        1. often a range if data grouped in intervals
        2. used primarily when distribution is bimodal
      4. Geometric Mean (average of the squares of the data values)
        1. used when data is measured on a logarthmic scale
        2. logarithm of the geometric mean is equal to the mean of the logarithms of the data values

    3. Measures of Spread or Dispersion (Statistical Tests on Numerical Data)

      1. Range (spread of lowest to highest data value)
        1. emphasizes the extreme values
      2. Percentiles
        1. divides up the distribution into percentile intervals
        2. allows comparison of data to a norm
      3. Normal or Standard Distribution
        1. if the distribution is a Bell Shape
      4. Standard Deviation (SD, how data clusters about the mean)
        1. average square of deviations from the means
        2. (similar average absolute values of deviations from the mean not used)
        3. used with a mean
        4. Variance is the SD squared
      5. Coefficient of Variation
        1. Standard Deviation adjusted or normalized by dividing by the mean value
        2. makes it possible to compare distributions of different ranges and means
      6. Degrees of Freedom
        1. number of observations minus one
        2. number of independent variables


    1. Nominal Data (grouped items)

      1. Basic Inquiry Statistics (refer to Fig Definitions of Symbols for Nominal Relationships)

        1. Experimental Event Rate in Exposed (ERR)
          1. ERR = A
          2. number with risk factor who have or develop the outcome
        2. Control Event Rate in Unexposed (CRR)
          1. CRR = C
          2. number without risk factor who have or develop the outcome
        3. Absolute Risk Reduction/Increase (ARR/ARI)
          1. ARR = ERR - CRR
          2. way to appraise the reduction in risk compared to the baseline risk
          3. events avoided per 10,000 people
        4. Number Needed to Treat/Harm (NNT/NNH)
          1. NNT = 1/ARR = 1/(ERR - CRR)
          2. number needed to treat to prevent one event
        5. Relative Risk Reduction (RRR)
          1. RRR = ARR/CR
          2. can obtain ARR if multiple RRR by the Control Event Rate (RRR * CRR)

      2. Measures of Significance for Nominal Data (more Inquiry Statistics)

        1. Relative Risk ratio (RR) **
          1. ratio of the outcome with the risk factor/exposed and without the risk factor/not exposed
          2. RR =ERR/CRR
          3. A
          4. investigator decides the number of subjects with and without risk factors
            1. from risk factor to outcome (inquiry statistics are forward in time)
            2. contrast to Odds Ratio
          5. so can be calculated only from a cohort or clinical trial
            1. cohort and clinical trial is also forward in time (from causes to outcomes) so appropriate measure
            2. persons with and without risk factor followed over time to determine which persons develop the outcome of interest

        2. Odds Ratio (OR) **
          1. (odds that a person with an adverse outcome was exposed or at risk prior to the outcome) divided by the (odds that a person without an adverse outcome was not exposed or at risk)
          2. OR = (A/C)/(B/D) = AD/BC (see also fig Definitions of Symbols for Nominal Relationships)
          3. also called cross-product ratio
          4. investigator decides the number of subjects with and without disease outcome
            1. from outcome to risk factor (inquiry statistics are backward in time)
            2. contrast to Relative Risk
          5. Odds ratio usually used for case-control studies
            1. case-control study is also backward in time (from outcomes to causes) so appropriate measure
            2. logistic regression can also be interpreted in terms of odds ratio in addition to relative risk (see chapter 8)
          6. OR is non-linear, so exaggerates extremely high and low odds

        3. matching measure statistic (RR, OR) to appropriate study type (Cohort, Trials, Case-control)

          1. each observational study type differs in both its Tense and Direction of Inquiry Statistics
            1. Tense:
              1. in the future=prospective
              2. in the past=retrospective
            2. Inquiry Statistics Direction:
              1. forward (risk to outcome)
              2. backward (outcome to risk)

          2. each measure statistic also differs in its Direction of Inquiry Statistics
            1. RR Relative Risk (forward in time)
            2. OR OddsRatio (backward in time)

          3. matching appropriate statistic (see fig observational studies)
            1. case-control is ..retrospective-backward (use OR)
            2. cohort is ....................prospective-forward (use RR)
            3. historical cohort is retrospective-forward (use RR)

    2. Ordinal Data (ranked items)

      1. Spearman rank correlation (rs = -1 to +1)
        1. for 2 ordinal, one ordinal and one numerical, or numerical variables if the data is skewed
        2. rank order the data (a derived statistic) from lowest to highest by some characteristic
        3. ranks then used to calculate the statistic instead of the data
        4. +1 or -1 indicates perfect agreement between the ranks (order) of the values, but not the values themselves
        5. tedious

    3. Numerical Data (raw numbers)

      1. Correlation Coeficient (r = -1 to +1)
        1. between two numbers
        2. independent of units
        3. greatly influenced by outlying (extreme) data values, so not good for skewed data
          1. use a transformation that changes of scale before the correlation is computed
          2. these transformations provide a weighted correlation
          3. rank or logarithmic transformations
        4. crude rule of thumb for interpreting correlation r (same for negative r's)
          0.00 to 0.25
          ...(little or no relationship)
          0.25 to 0.50
          ....(fair degree of relationship)
          0.50 to 0.75
          ...(moderate to good relationship)
          0.75 or >
          ........ (very good to excellent relationship)
        5. correlations of r=0.95 to 1.00 are suspect and may indicate artifact or error in the biological sciences
        6. only measures a straight line correlation, not if a curvilinear relationship where one changes more as the other changes
        7. "correlation does not imply causation"
          1. must justify by experimental observations or logical argument
        8. coefficient of determination (r2)
          1. indicates percent of shared causation between characteristics (accountability or predictability)


      1. how the data is presented and the scaling affects the interpretation
      2. when two or more measures are of interest, the purpose of the study generally determines which measure is viewed within the context of the other (which is dependent and independent).
      3. can imply a causality that is misleading if normalize data to the outcomes rather than the risks.
        1. alters the percentage statistic relative to a different denominator, as part of a different whole
        2. difficult to keep clear, each a percent of which part, and in which causal direction?
        3. this example (Table 3-23 p 57) shows that data, collected on compliance and insurance status from a survey at one point in time, indicates that 35% of patients with a low level of compliance have no insurance (A. table), and 55% of patients with no insurance have low compliance (B. Table).
        4. shows if the data is consistent with the interpretation, but doesnt prove causation (here results are calculated on a survey at one point in time, so causality is a confusing question. especially when groups are subdivided, the ability to distinguish causality as opposed to coincidence is diminished. The table for a coincidence look exactly the same as the table for a causality.
      4. can also imply causality that is false, if invert the table to display something in the reverse direction that was causal only in the forward direction (C. Table).

goto www.DrTimDelivers.com

page views since Sept2007