Researchers use an algorithm to diagnose infectious disease a continent away.
Scenario: A man is brought to a clinic in a remote village in Nepal. The patient is confused, has a severe headache, and is breathing laboriously. He’s had seizures, weakness, and vomiting. These symptoms suggest encephalitis, a type of brain inflammation that could be caused by a number of infectious diseases.
The warning signs of a major outbreak are clear. The urgent question becomes: What infection is causing the symptoms? It could be the relatively easy to cure typhoid fever or the impossible to cure Nipah (NiV) virus. But the clinic has no equipment to analyze blood samples or find the cause of the inflammation. A clinic worker takes down the patient’s information, when the symptoms began, where he’s from, and whether he owns cattle, birds, or pigs. She uploads her findings to the ProMED-mail Web site.
Across the globe, an algorithm goes to work, weighing the information in the report against details contained in similar reports from other ProMED-mail users in the area and around the world. The algorithm rules that the encephalitis is probably the result of an emerging NiV cluster (since several similar reports have emerged from nearby clinics). Local health officials are now able to put in the right measures to, they hope, prevent an epidemic of a deadly disease.
Timely diagnosis of infectious diseases is crucial for thwarting massive outbreaks. It’s also difficult and costly in some of the places where hotspots are most likely to flare up, such as in South Asia.
A team of researchers has developed a diagnostic shortcut for resource-strapped communities. In a paper published in the Journal of the Royal Society Interface,they describe an algorithm that can identify the specific pathogens causing certain illnesses by using information loaded into databases, as opposed to expensive lab diagnosis of blood samples.
First, they established a dataset for different illnesses or symptoms, like encephalitis, and then cross-referenced that against 10 different known pathogens, like NiV, typhoid fever, etc. They also weighed environmental features such as the season and the fatality rate for people who suffered from that illness. Next, they pored through the 97 different reports of encephalitis in the ProMED-mail database to identify instances where brain inflammation had been observed in conjunction with a contagious bug such as dengue fever, meningitis, or NiV.
To test the model, they removed the official diagnosis from the reports. The algorithm was left with just a few symptom key words like fever, neurological, headache, and other clues. The model was able to retroactively identify, with 80% accuracy, NiV, which occurs only in the spring and kills two-thirds of its victims. Other pathogens, like chikungunya fever, could be predicted with 75% accuracy.
Clustering and Networking Disease
Databases like ProMED-mail and GIDEON have made some disease clusters much easier to remotely detect, in part because they enable the spread of information not just between local health-care workers and global health organizations, but also between clinic workers in the same area who may be on the front lines of a potential outbreak and not know it.
“In a way, the method is not all too different from syndromic [symptom-based] methods of disease identification practiced by doctors around the world, but it formalizes this process and yields the potential of linking outbreaks of uncommon or new diseases that are not on the radar of local clinicians,” says Princeton University zoologist Tiffany L. Bogich, the study’s corresponding author. “It also provides an objective output with probabilities attached to each potential disease that could be causing an outbreak, so it takes much of the potential subjectivity out of the diagnostic process.”
Have we reached an age where data and statistics outperform doctors and formal lab tests? Not yet. The same phenomenon that makes certain pathogens more conspicuous can make other illnesses more difficult to diagnose: Diseases, illnesses, and symptoms all tend to show up at the same time. Out of the original set of 97 cases of encephalitis, 54 had multiple diagnoses, such as dengue fever and meningitis, occurring simultaneously.
Also, when one of the health-care workers jotted down information incorrectly, it threw the model off.
“Real world information is often vague, minimal, and at times contradictory, so the challenge is to find ways to make good inferences (disease identifications) from such limited data,” says epidemiologist Stephen Morse of Columbia University, one of the paper’s co-authors and creator of the ProMED-mail site.
But the potential to detect outbreaks much faster through the use of statistical models applied to field reports is clear. These tools will find their greatest value in places where deadly pathogens are numerous, diagnostic equipment is hard to find, and time is short.
“What one could do with our method in real time is to give a quick and indicative evaluation,” says Bogich. “When lab diagnostics are not possible—either because it’s early on in an outbreak or capacity in country does not exist—our method offers a ‘quick and dirty’ alternative.”—Patrick Tucker
About the Author
Patrick Tucker is the deputy editor of THE FUTURIST magazine, director of communications for the World Future Society, and author of the forthcoming book The Naked Future: What Happens in a World That Anticipates Your Every Move (Current, Penguin 2014). He will be speaking on this and other topics related to big data and the future at WorldFuture 2013.
Sources: “Using network theory to identify the causes of disease outbreaks of unknown origin” by Tiffany L. Bogich, Sebastian Funk, et al., Journal of the Royal Society Interface (2013 10, 20120904, published online 6 February 2013).
Columbia University, Mailman School of Public Health, www.mailman.columbia.edu.
Originally published in THE FUTURIST, may-June 2013