Flutrack is a system that detects influenza symptoms by processing and displaying influenza related Twitter messages. Flutrack platform gathers and visualizes tweets every 20 minutes in real time. This open platform and its API allow users and developers to extend this project and take influenza detection to higher levels. The platform could work not only with Twitter but with any data provider.
For monitoring purposes, Flutrack gathers flu related tweets for the entire world using the Twitter API. Since the above procedure is complete, the system visualizes and updates tweets every 20 minutes. The Flutrack open-platform differs from similar tracking services in that it extracts and processes flu related data in real time, without interruption.
Feasibility of monitoring influenza
Influenza, or flu, is a viral infection that affects mainly the throat, nose, bronchi and occasionally lungs. It is considered one of the most common human infectious diseases. Seasonal influenza epidemics are a major public health concern, causing tens of millions of respiratory illnesses and 250,000 to 500,000 deaths worldwide each year. Early detection of disease activity, when followed by a rapid response, can reduce the impact of both seasonal and pandemic influenza.
Influenza differs from the common cold as it is caused by a different group of viruses, and its symptoms tend to be more severe and to last longer. Infection usually lasts for about a week, and is characterized by sudden onset of high fever, aching muscles, headache and severe malaise, non-productive cough, sore throat and rhinitis. Symptoms usually peak after two or three days.
Self-diagnosis of influenza is common among the general public. The best predictions for influenza are cough and fever, since this combination of symptoms has been shown to have a positive predictive value of around 80% in differentiating influenza within a population suffering from flu-like symptoms.
Twitter and influenza relation
Twitter, a micro-blogging service, has an estimated community of 500 million active users, generating 340 million messages daily. Twitter users are enabled to send and read one another’s 140-character messages, called "tweets". Despite the high level of noise, the Twitter stream contains information that can be useful for tracking or even forecasting trends, moods or behaviour if it can be extracted in an efficient manner. Furthermore, Twitter has been used as a real-time source for various public health applications.
The words used as tags are influenza synonyms and common flu symptoms.
List of the monitoring tags:
- sore throat
- runny nose
- cough, dry cough
Before saving flu-related tweets to Flutrack's open database, the system filters them automatically by removing tags and hashtags (including @, # symbols). Moreover, only geolocated tweets and tweets whose geolocations are extracted from user profile location (self-declared home location) are saved to the database. If enough location information is not available, tweets are automatically avoided. Tweets having fewer than 5 characters and those containing non-ASCII characters are excluded. Profile location is also automatically filtered from "suspicious" words (home, heaven etc.), to avoid false or nonexistent location coordinates.
Results and statistics
To evaluate the accuracy of extracted data and results, a correlation between Flutrack’s and Google Flu Trends’ datasets hat to be examined. All tweets, from both data providers, were geolocated in the U.S. and extracted from December 2, 2012 to April 7, 2013. The location was chosen based on the fact that the query counts of Google Flu Trends are compared with reliable sources, such as Centers for Disease Control and Prevention and other traditional flu surveillance systems for the U.S. The scatter plot of the normalized population of tweets both from Google Flu Trends and the Flutrack platform was plotted. To quantify the degree of the observed linear relation between the normalized uploads of the two social media, a linear regression analysis was performed. The corresponding coefficient of determination was estimated to be 0.79 thus showing a high degree of correlation between Google Flu Trends and the Flutrack platform.
- Beveridge, W. (1991). "The Chronicle of influenza epidemics," History and Philosophy of the Life Sciences, pp. 13(2) : 223-234.
- Ghendon, Y. (1994). "Introduction to pandemic influenza through history," European Journal of Epidemiology, pp. 451–453.
- Thursky, Karin (2003). "Working towards a simple case definition for influenza surveillance," Jurnal of Clinical Virology, Vol. 27, Issue 2, pp. 170–179, 2003.
- Eccles, Ronn (2005). "Understanding the symptoms of the common cold and influenza," The Lancet Infectious Diseases, Vol 5, Issue 11, pp. 718–725, 2005.
- Taubenburger, JK., Morens, DM. (2010). "Influenza: the once and future pandemic," Public Health Reports (Washington, D.C., pp. 3:16-26.
- Ginsberg, J., Mohebbi, M.H., Patel, R.S., Brammer, L., Smolinski, M.S., Brilliant, L. (2009). "Detecting influenza epidemics using search engine query data," Nature 457, pp. 1012–1014.
- Pak, A., Paroubek, P. (2010). "Twitter as a corpus for Sentiment Analysis and Opinion Mining," Proceedings of LREC).
- Aramaki, E., Maskawa, S., Morita, M. (2011). "Twitter Catches The Flu, Detecting Influenza Epidemics using Twitter," Conference on Empirical Methods in Natural Language Processing, EMNLP.
- Twitter Developers Documentation (2013), https://dev.twitter.com .
- Sadilek, A., Henry, A., Silenzio, V. (2012). "Predicting Disease Transmission from Geo-Tagged Micro-Blog Data," AAAI 2012).
- Sadilek, A., Kautz, H., (2013). "Modelling the impact of lifestyle on health at scale," Proceedings of the sixth ACM international conference on Web search and data mining.