Media Cloud

From Wikipedia, the free encyclopedia
Jump to: navigation, search
Media Cloud analysis of top 25 U.S. news sources' coverage of Occupy Wall Street for the week of September 26, 2011, compared with week of October 3, 2011

Media Cloud is an open-source content analysis tool that aims to map news media coverage of current events. It "performs five basic functions -- media definition, crawling, text extraction, word vectoring, and analysis."[1] Media cloud "tracks hundreds of newspapers and thousands of Web sites and blogs, and archives the information in a searchable form. The database ... enable[s] researchers to search for key people, places and events — from Michael Jackson to the Iranian elections — and find out precisely when, where and how frequently they are covered."[2] Media Cloud was developed by the Berkman Center for Internet & Society at Harvard University and launched in March 2009.[3][4] It's distributed under the GNU GPL 3+.[5]

As of October 2011, Media Cloud tracks news from mostly U.S. sources. It "collects news stories" in sets from:[6]

What Media Cloud does[edit]

On May 6, 2011 the Berkman Center relaunched Media Cloud, “a platform designed to let scholars, journalists and anyone interested in the world of media ask and answer quantitative questions about media attention. For more than a year, we’ve been collecting roughly 50,000 English-language stories a day from 17,000 media sources, including major mainstream media outlets, left and right-leaning American political blogs, as well as from 1000 popular general interest blogs.”[7] The data was used to “analyze the differences in coverage of international crises in professional and citizen media and to study the rapid shifts in media attention that have accompanied the flood of breaking news that’s characterized early 2011.”[7] International research has lead way to publishing of “new research that uses Media Cloud to help us understand the structure of professional and citizen media in Russia and in Egypt.”[7] The relaunch of Media Cloud allows users who are interested in using its tools to analyze “what bloggers and journalists are paying attention to, ignoring, celebrating or condemning."[7]

How it works[edit]

First, Media Cloud chooses a set of media sources and uncovers the feeds for each.[1] Each feed is then crawled in order to determine if any stories have been added to any feed.[1] All content is then extracted of each relevant story. Any advertisements or other navigation pages are left behind.[1] The text of each story is broken down into word counts, which shows the different word choices that each media source uses in discussing any relevant topic.[1]The word counts are then analyzed and published to show data trends.[1]

Uses and application[edit]

Media Cloud was used from September 2010 through January 2012 to obtain data for a study at the Berkman Center for Internet & Society that analyzed a set of 9,757 online stories related to the COICA-SOPA-PIPA debate. The open source application was utilized for the text and link analysis portion of the research.[8] Findings from this research were published in July 2013[2].

The Berkman Center for Internet & Society website offers an interactive visualization map[3] from this study, which was created to “depict media sources (“nodes”, which appear as circles on the map with different colors denoting different media types)… [and] track media sources and their linkages within discrete time slices and allows users to zoom into the controversy to see which entities are present in the debate during a given period…”[8] This map allows for the visualization of how the COICA-SOPA-PIPA controversy evolved over time by using link analysis.

Many companies are taking advantage of the ability to analyze and organize this new data that media cloud can create. Companies such as RAMP offer a "cloud-based" way to analyze and create every type of metadata.[9]


Media cloud's key functionality comes from using web crawling to periodically fetch articles from various sources and then break them down into words that are counted. These word counts are then analyzed to determine what sources are saying about certain news.[1] This process is not unique to Media Cloud and in fact is an application of the recently popular stream algorithms. These are algorithms characterized by operating on a continuous and unending stream of data, rather than waiting for a complete batch of information to be assembled. These algorithms are very useful because they allow monitoring of trends without having to know which topics are going to be the most popular. This type of functionality first noticeably emerged with network managers trying to dynamically see which sites have the highest traffic volumes. From there, stream algorithms have been used to have programs dynamically act on financial information, and by researchers whose experiments generate more data than can be analyzed, so stream algorithms are used to dynamically filter the initial data.[10] Media cloud has similarly taken advantage of the functionality of stream algorithms to dynamically associate words to news as it crawls through various sources, and then provide its signature service of generating sentences based on words that the users are interested in and related media reports.

Future use[edit]

The day that Media Cloud relaunched, Ethan Zuckerman said, "We hope the tools we're providing are a complement to amazing efforts like Project for Excellence in Journalism's News Coverage and New Media indices--we consider their tools the gold standard for understanding what topics are discussed in American media. PEJ works their magic using talented teams of coders, who sample different corners of the media ecosystem to find out what's being discussed. We use huge data sets, algorithms, and automation to give a different picture, one focused on language instead of topic."[7]

Future uses for Media Cloud can involve smart phone or tablet applications to introduce the platform to users away from a computer. A Media Cloud app could serve as a news source while on the go for users. If Media Cloud were to expand into different information sites, it could target social media sites and incorporate news into them. Twitter and Facebook have incorporated features for trending news and topics similar to what Media Cloud aims to do.


Berkman Center, Cambridge, Massachusetts, USA
  1. ^ a b c d e f g Media Cloud. About. Retrieved 2011-10-12
  2. ^ Patricia Cohen. "Hot Story to Has-Been: Tracking News via Cyberspace." New York Times, August 5, 2009
  3. ^ Berkman Center. Media Cloud. Retrieved 2011-10-12
  4. ^ Alisa Miller. Media Makeover: Improving the News One Click At a Time. TED Books, 2011
  5. ^
  6. ^ Media Cloud. Media sets. Retrieved 2011-10-12
  7. ^ a b c d e Zuckerman, Ethan. "Media Cloud, Relaunched". Retrieved 20 February 2014. 
  8. ^ a b Beckman Center for Internet and Society. "New Publication: "Social Mobilization and the Networked Public Sphere : Mapping the SOPA-PIPA Debate"". Harvard University. Retrieved 19 March 2014. 
  9. ^
  10. ^

External links[edit]