Online Learning with Survival Data

Decision-makers frequently use adaptive experiments to optimize time-to-event outcomes, such as accelerating healthcare screenings among patients who are not up to date or delaying customer churn. A common choice to run these adaptive experiments is a multi-armed bandit with a dichotomized outcome -- an experimenter sets some threshold (e.g. 1 month) and then uses the algorithm to identify the intervention with better performance on the dichotomized outcome (e.g. which algorithm maximizes the proportion of participants who get up to date on screening within a month of outreach). We introduce "survival bandits," a principled class of algorithms that integrate the Cox proportional hazards model to better learn from time-to-event outcomes. Both theoretically and numerically (with a case study on cervical cancer screening), we show that these new algorithms have the potential to greatly improve adaptive experimentation for decision makers across industries who seek to speed or slow an event of interest.

Arielle Anderer (Assistant Prof, Cornell), Hamsa Bastani (Associate Prof, UPenn Wharton), John Silberholz (Assistant Prof, UMD Smith)

  • John Silberholz
  • Decision, Operations and Information Technologies
  • Information Technology
    Back to Top