Chapter 7: Logistic Regression Models


Summary

This chapter introduces the modeling framework for categorical data in the simple situation where we have a categorical response variable, often binary, and one or more explanatory variables. A fitted model provides both statistical inference and prediction, accompanied by measures of uncertainty. Data visualization methods for discrete response data must often rely on smoothing techniques, including both direct, non-parametric smoothing and the implicit smoothing that results from a fitted parametric model. Diagnostic plots help us to detect influential observations that may distort our results.

Contents

7.1. Introduction
7.2. The logistic regression model
7.3. Multiple logistic regression models
7.4. Case studies
7.5. Influence and diagnostic plots
7.6. Chapter summary
7.7. Lab exercises

Selected figures

view R code
  • Figure 7.1

    Overview of fitting and graphing for model-based methods in R.
  • Figure 7.3

    Plot of the Arthritis treatment data, showing the conditional distributions of the 0/1 observations of the Better response by histograms and boxplots.
  • Figure 7.7

    Conditional plot of Arthritis data, stratified by Treatment and Sex. The unusual patterns in the panel for Males signals a problem with this data.
  • Figure 7.8

    Full-model plot of Arthritis data, showing fitted logits by Treatment and Sex.
  • Figure 7.10

    Plot of all effects in the main effects model for the Arthritis data. Partial residuals and their loess smooth are also shown for the continuous predictor, Age.
  • Figure 7.12

    Full-model plot of the effects of all predictors in the main effects model for the Arthritis data, plotted on the probability scale.
  • Figure 7.16

    Conditional plots of the Donner data, showing the relationship of survival to age and sex. Left: The smoothed curves and confidence bands show the result of fitting separate quadratic logistic regressions on age for males and females. Right: Separate loess smooths are fit to the data for males and females.
  • Figure 7.19

    Effect plots for the interactions of color with age (left) and year (right) in the Arrests data.
  • Figure 7.21

    Odds ratios for the terms in the model for the ICU data. Each line shows the odds ratio for a term, together with lines for 90, 95, and 99% confidence intervals in progressively darker shades.
  • Figure 7.23

    Fitted log odds of death in the ICU data for the model icu.glm2. Each line shows the relationship with age, for patients having various combinations of risk factors and 1 standard error confidence bands.
  • Figure 7.25

    Index plots of influence measures for the Donner data model. The four most extreme observations on each measure are labeled.
  • Figure 7.30

    Component-plus-residual plot for the simple additive linear model, donner.mod1. The dashed red line shows the slope of age in the full model; the smoothed green curve shows a loess fit with span = 0.5.
  • Figure 7.32

    Jointly influential points in regression models. In each panel, the thick black line shows the regression of y on x using all the data points. The solid purple line shows the regression deleting both the red and blue points and the broken and dotted lines show the regression retaining only the point in its color in addition to the constant gray points. (a) Two points whose joint influence enhance each other; (b) two points where the influence of one is masked by that of the other; (c) two points whose combined influence greatly exceeds the effect of either one individually.
  • Figure 7.33

    Added-variable plots for age (left) and sex (right) in the Donner Party main effects model. Those who survived are shown in blue; those who died in red. Men are plotted with filled circles; women with filled triangles.
S