Frequently asked questions (FAQ)

What exactly is a re-emerging disease?

The term "re-emerging infectious disease" is not defined consistently in the literature. However, various verbal definitions of the term are related in that they explain a disease with any known historical occurrence, a decrease in occurrence, followed by a more recent increase. "Re-emergence" instances are only defined as such when subjective and non-reproducible assessments are made, usually by public health experts who report/publish details of infectious disease events in peer-reviewed literature. Here, we seek to provide insight into potential re-emerging disease scenarios by providing a set of informative visuals and analytics that enable user assessments to be based on a more standardized set of criteria.

What is the scope of RED Alert?

RED Alert is designed with in the intention to serve as a decision-support tool to assist analysts and public health professionals by providing situational awareness of a local/regional disease event and, thereby, potentially early warning indications of a global event. This tool is not intended to be used for investigation of unfolding outbreaks. Rather, this tool's intended use is for informing user's of historical disease trends in a way that not only allows them to better understand the geospatial and temporal context of a situation of interest, but enables hypothesis generation of the disease impact on a global scale.

A data source link is broken. How can I access the original data source?

Unfortunately, links may die, making it difficult to access the original data source using our link. Perhaps the website is down for maintenance, or perhaps the link has changed or was deleted for some reason. Internet Archive's Wayback Machine attempts to archive snapshots of every website on the internet and can be useful for accessing content from broken links. To use it, simply visit the Wayback Machine and copy/paste the URL of the broken link. Additionally, we invite you to use our feedback form to tell us about encountered broken links so that we can fix them in our database.

Can I access the data used here?

We have developed an extensive API containing the majority of the data used by RED Alert. The API can be accessed here.

Graphics are displaying no data, what does this mean?

When a message appears, indicating that there are no data to display, this is a result of missing data in the data source you selected. For example, if you have selected “Yellow Fever,” “Russia,” and “2016” as the disease, location, and year of interest, respectively, there are no data for these queries as reported by the data source (World Health Organization, for example), so no information will populate in the map or boxplot in the “Historic/Global Analysis” tab. This is not a bug with the tool, but rather a reflection of a lack of data reported from the data source selected.

Why are there multiple population sources with different values?

Population counts for the selected location of interest differ between the two data sources you are able to select from. Users can use their discretion to select between the two population sources from which disease incidence will be calculated.

World Bank population data are provided at the country level from 1960-2016. These data are available for free at https://data.worldbank.org/indicator/SP.POP.TOTL.

LandScan utilizes an algorithm to combine spatial data and imagery analysis technologies as well as a multivariate dasymetic modeling methods to disaggregate census data counts in a given administrative boundary across the world. LandScan data are provided free to government organizations by Oakridge National Laboratory at http://web.ornl.gov/sci/landscan/. We have all historical LandScan data from 1998-2016, excluding 1999 (a year for which a dataset was not generated). Leveraging the country and state boundaries already ingested in our database courtesy of Natural Earth, LandScan provides a 1km x 1km grid of the entire world, where the value of each 1km2 area is the population in that grid. We have overlaid each country/state boundary on this grid, extracted the grid elements within boundaries, and summed their values to create total population counts for a given location.

In order to do this, each LandScan dataset was converted to the standard Esri ASCII grid format. Data processing was done using Python. We use rasterio to read the grid files and their metadata into memory. We use rasterstats to compute zonal statistics for each country/state boundary in our database. Specifically, we ask it to sum the values, which gives us our total population.

LandScan's high-resolution gridded population counts allow us to provide users with both country- and state-level population data, whereas World Bank data are available only at the country-level.

Why are there multiple case count sources?

In an effort to provide users with all potential comprehensive case count data sources available, we have incorporated several data source options for case counts users to use in the incidence calculations we provide. Pan-American Health Organization (PAHO) data, for example, provides slightly discrepant case counts for Dengue than the World Health Organization (WHO) data source. Furthermore, PAHO is a regional entity and only has data available for the western hemisphere. So while PAHO may be an ideal data source when investigating Dengue in Brazil, for example, this data source would not be preferable for investigating Dengue in Southeast Asia. As such, the WHO can be considered a more appropriate default data source, as it has the most complete, global case counts.

How is the spline calculated?

The spline is calculated using SciPy's univariate spline method. We use the default parameters except for s, the smoothing factor. We use a smoothing factor

Smoothing factor,

where y is the vector of values. s is therefore a fraction of the length of the data times the variance of the data.

How is re-emergence calculated?

One of the goals of RED Alert is to detect potential re-emergence and this is done using a machine learning classifier. Classifiers are algorithms that learn a decision function that maps a new observation to a class (from a set of classes, e.g., spam vs. non-spam email) based on the given labeled data (known observation-class pairs, e.g., examples of spam and non-spam emails).

For creating the labeled dataset for each disease, the subject matter experts (SMEs) in our team were given data for 100 countries selected at random and they labeled each location-year pair as a re-emergence or not. For measles and cholera, the disease trend before and after year 2000 seem quite different and hence, the labeling and classification is performed only on the data starting from 2000. On the other hand, there is a lot of missing data for dengue after 2000, so this is done for a longer time frame (starting 1980). For each disease, SMEs developed a schema that takes into account factors (e.g., raw incidence, case counts, change in incidence, etc.) that help detect potential re-emergence and help guide the labeling process.

These labeled datasets were used to train classifiers for various diseases. We tried two classifiers: decision tree and random forest. For all diseases, random forest performed better than decision tree and hence RED Alert uses random forest to detect if there is a re-emergence for the given disease in a given location and year.

What is the preformance of re-emergence classifiers?

We performed nested cross-validation (where inner cross-validation is used to select optimal parameters and outer cross-validation is used to test for overfitting) 10 times. The results (i.e., mean and standard deviation across 10 nested cross-validations) are as follows:

re-emegence classifier results

Information about various performance measures can be found here.

What information is in the re-emergence factors?

Multiple component causes are necessary to produce a disease outbreak or a re-emergence event. These events occur from a variety of different pathways. This visualization supplies a list of components that are meant to develop hypothesis for sufficient causes in a re-emergence scenario. Not all factors are causal to the user's situation, however each of the nodes have been identified through a literature review as contributing to a disease-specific historical disease outbreak or re-emergence event. The broad categories of host, pathogen, and environment in the center of the wheel fit the epidemiological triad. With increasing distance from the center, the contributing causes become more specific. The outermost layers, or primary indicators, are designed to provide the user with the most actionable factors that can potentially prevent re-emergence on a specific pathway.

What information does the “Indicator Analysis” graphic convey?

This chart shows the association between the variable selected in the dropdown with respect to time and incidence. Here, the location of the point on the y axis shows the variable value, the location of the point on the x axis shows the year, and the size of the bubble corresponds to the incidence (per 100,000 persons). Please refer hover text for point-by-point information.

What information does the “Global Comparison of Indicators” graphic convey?

This chart shows the association between the variable selected in the dropdown with respect to time and incidence. Here, the location of the point on the y axis shows the variable value, the location of the point on the x axis shows the year, and the size of the bubble corresponds to the incidence (per 100,000). There is a series for every country with similar incidence (from the 'Historic/Global Analysis' tab). This means that the countries here had a disease incidence between 50% and 150% of the user's incidence in the year of interest.

Does the exact location of the re-emergence dot on the map indicate a specific area within a country where the event occurred?

No, these points merely indicate the centroid of the country where the re-emergence event occurred, as re-emergence events are determined at a national level.