Wednesday, March 29, 2006

Biosurveillance meetings

Because of the very diverse nature of the research, implementation, and usage of syndromic surveillance systems, the platforms for discussing advances are disperesed. I've attended and given talks at a range of conferences, workshops, and workgroups ranging from purely academic to non-academic. Even within the academic conferences there are different types of meetings, from multiple disciplines. Let me describe a few past and upcoming events, and hopefully others will add to the list.

The largest venue is of course the annual Syndromic Surveillance Conference, to be hosted this year in Baltimore, MD. This is attended by many of the researchers developing and designing the systems, organizations that delpoy them, as well as users.

The DIMACS group on Biosurveillance, Data Monitoring and Information Exchange, spearheaded by Henry Rolka and Colleen Martin from CDC and David Madigan from Rutgers, organized a few workgroup meetings in the last few years. The latest brought together many health monitors who use syndromic surveillance systems.

The upcoming 2006 INFORMS conference will feature an invited session on "Quality and Statistical Decision-Making in Healthcare Applications", which might include a biosurveillance aspect.

Statistical flavor:
SAMSI Anomaly Detection Workgroup- A group of statisticians that have been meeting weekly since Sept-05 (some of them remotely) to discuss statistical methods for biosurveillance. There was a kickoff meeting and a mid-year workshop.

The 2005 Joint Statistical Meeting (JSM) in Minneapolis featured several session on biosurveillance, including an invited panel on "national systems for biosurveillance" and a session on "Innovations in Prospective Anomaly Detection for Biosurveillance".

The upcoming 2006 ENAR spring meeting will have a few biosurveillance-related talks.

The upcoming 2006 Intl Workshop on Applied Probability will have several sessions on biosurveillance (update from Daniel Neill)

More data-mining/machine-learning venues are the KDD 2005 workshop "Data Mining Methods for Anomaly Detection" and the next one, called "ML Algorithms for Surveillance and Event Detection" in conjuction with ICML-2006 in Pittsburgh.

And a venue more medical-informatics oriented is the annual American Medical Informatics Association symposium. In 2006 it will be in Washington DC (update from Daniel Neill)

Wednesday, March 01, 2006

Syndromic surveillance systems in practice

Last week's DIMACS working group on biosurveillance brought together users of BioSense, ESSENCE, and RODS surveillance systems alongside with a few researchers. This was an eye-opener for me, a researcher. I learned a very important point: most health monitors, who are the users of such systems, learned to ingore alarms triggered by their system. This is due to the excessive false alarm rate that is typical of most systems - there is nearly an alarm every day!

The increased alarm rate can be a result of several factors. First, both BioSense and ESSENCE use classic control charts such as CuSum and EWMA (although with some adaptations). One clear pattern that is very popular in daily syndromic data is a day-of-week effect. This violates one of the main assumptions underlying the above control charts, which is that the "target value" is constant. In contrast, we do not expect the same number of ER visits on weekends and on weekdays. In the presence of such an effect, a CuSum or EWMA are likely to lead to false alarms on "high" days (e.g., weekday visits to ED), and to miss real outbreaks on "low" days (e.g., weekend visits to ED).

The day-of-week effect is only one pattern that can lead, if ignored, to excessive false alarms. Another factor is the autocorrelation present in daily syndromic data, even after removing the day-of-week effect. Howard Burkom gives the example of the effect of a common cold on daily counts of respiratory syndrome data, if the cold is not what you want to detect. This, of course, can lead to excessive false alarms.

The bottom line is that the excessive false alarms are not surprising when these control charts are applied directly to raw daily syndromic data. This is closely related to my posting on monitoring complex data. The good news is that this can actually be improved: there are ways to precondition the data that directly address the structure of these new daily syndromic streams. Some examples are regression models (e.g., Brillman et al. 2005), ARIMA models (e.g., Reis and Mandl, 2003), exponential smoothing (e.g., Burkom, Murphy, and Shmueli, work in progress), wavelet methods (e.g., Goldenberg et al. 2002, Shmueli, 2005), etc. The difficulty is to have one automated method (or suite of methods) that can be used with diverse syndromic data streams. The diversity need therefore be characterized.

Another reason for the high rate of alarms has to do with multiple testing: A health monitor will be examining data from multiple locations, and can also slice the data by categories (e.g., by age group). This relates to a previous posting that I had on multiple testing. How should this multiplicity be addressed? Should there be a ranking system that displays the "most abnormal" series?

And finally, there is a host of data quality issues that are related to the alarming. These are typically harder to tackle through models.

As a statistician, I feel that we must regain the trust of health monitors in the alerting abilities of syndromic surveillance systems. Designing automated algorithms for data preconditioning should be a priority. We should also be able to show the benefit of employing such preconditioning before using popular control charts like the CuSum, and its effectiveness in reducing false alarms and capturing real abnormalities.


Judith C Brillman , Tom Burr , David Forslund , Edward Joyce , Rick Picard and Edith Umland, Modeling emergency department visit patterns for infectious disease complaints: results and application to disease surveillance, BMC Medical Informatics and Decision Making 2005, 5:4, pp 1-14

Goldenberg A, Smueli G, Caruana RA and Fienberg SE (2002). Early Statistical Detection of Anthrax Outbreaks by Tracking Over-the-Counter Medication Sales. PNAS, 99 (8), 5237-5240.
Reis BY, Mandl KD. Time series modeling for syndromic surveillance. BMC Med Inform Decis Mak. 2003;3(1).

Shmueli, G., (2005) Wavelet-Based Monitoring in Modern Biosurveillance, Working Paper, Smith School of Business, Universith of Maryland