Social data revolution

From Wikipedia, the free encyclopedia

The social data revolution is the shift in human communication patterns towards increased personal information sharing and its related implications, made possible by the rise of social networks in the early 2000s. This phenomenon has resulted in the accumulation of unprecedented amounts of public data.[1]

This large and frequently updated data source has been described as a new type of scientific instrument for the social sciences.[2] Several independent researchers have used social data to "nowcast" and forecast trends such as unemployment, flu outbreaks,[3] mood of whole populations,[4] travel spending and political opinions in a way that is faster, more accurate and cheaper than standard government reports or Gallup polls.[2]

Social data refers to data individuals create that is knowingly and voluntarily shared by them. Cost and overhead previously rendered this semi-public form of communication unfeasible, but advances in social networking technology from 2004–2010 has made broader concepts of sharing possible.[5] The types of data users are sharing include geolocation, medical data,[6] dating preferences, open thoughts, interesting news articles, etc.

The social data revolution enables not only new business models like the ones on but also provides large opportunities to improve decision-making for public policy and international development.[7]

The analysis of large amounts of social data leads to the field of computational social science. Classic examples include the study of media content[8] or social media content.[3][4][9]

Evolution of social data[edit]

Every internet activity leaves behind traces of data (a digital footprint) which can be used to learn more about the user.[10] As use of the internet is becoming more widespread, the datafication of the world is progressing rapidly: Currently, around 16 zettabytes of data are produced per year and for the year 2025 163 zettabytes of data are expected.[11] This has led to data becoming a critical commodity.[10] This ties together all societal actors: Public institutions, private firms, as well as individuals, each relying on data in a unique way.

Governments have been collecting data for centuries to ensure the continuance of institutional systems, through limiting the risk of defaulting credits, collecting tax based on income and providing the necessary infrastructure under consideration of their citizens' demographic distribution.[12] In its beginnings, this data entailed written information for record keeping and control, including a census system.[12]

This analogue process was very time- and cost-intensive, leaving little room for interpreting larger data sets.[12] Meanwhile, corporate technological developments have moved this offline data into the digital age, allowing visualization and data analytics.[12][10] In the public sphere, connecting the survey and poll methodologies with database computing, resulted in the ability to gather and store large data sets on individuals.[10]

Web 2.0 and social network sites[edit]

Over the last few decades, the internet has shifted from being used mostly as a source of information about the world to being primarily used for communication, user-generated content, data sharing, and community building.[13] This is what many consider to be the development of "Web 2.0" social network sites such as Facebook and YouTube are the foundation of the development of Web 2.0 and the shift to social data sharing.[13]

Early examples of social data websites are Craigslist and the wishlists of Both enable users to communicate information to anybody who is looking for it. They differ in their approach to identity. Craigslist leverages the power of anonymity, while leverages the power of persistent identity, based on the history of the customer with the firm. The job market is even being shaped by the information people share about themselves on sites like LinkedIn and Facebook.[14]

Examples of more sophisticated social data sites are Twitter and Facebook. On Twitter, sending a message or tweet is as simple as sending an SMS text message. Twitter made this C2W, customer to the world: Any tweet a user sends can potentially be read by the entire world. Facebook focuses on interactions between friends, C2C in traditional language. It provides many ways for collecting data from its users: "tag" a friend in a photo, "comment" on what they posted, or just "like" it. These data are the basis for sophisticated models of the relationships between users. They can be used to significantly increase the relevance of what is shown to the user, and for advertising purposes.[15]

By 2009, the popularity of social networking sites had increased to four times of what it had been in 2005.[16] As of 2013, Twitter has over 250 million users sharing almost 500 million tweets per day, and Facebook has well over one billion users around the world.[17]

Business sector and social data[edit]

Companies often use the data that is shared via social networking sites and other forms of data sharing avenues, advertisers, etc.[18] Social networking sites, for example, can sell user data to advertisers and other entities which they can then influence consumer decisions.[13] Data mining is also used to gather this information.[18]

While websites and other applications were the origins of this data collection, with improvements in technology, many devices that are used in daily life have the ability to collect data on individuals and therefore are increasing the amount of personal data that is available (ex. smartphones, tech watches, music devices, etc.).[19][20]

This growth of people's digital identity – the information available via these electronic sources- is being used by companies and organizations to improve products and services and to reduce costs by targeting what consumers want/expect.[20] The data that can be gathered can include shopping experiences, social media preferences, demographic information and more.[18]

Using this data can allow for better personalization of products and has become an expected and vital aspect of product use and production.[19] The data that is accessible about consumers can be used to infer behavioral patterns of consumers.[21] For example, location information is used to assess when and where consumers are going to target ads and promotions based on what stores consumers are going to.[21] Online retailers also have gained insight as to how better personalize the online shopping experience through data gathered during the online transaction.[22]

Businesses can even use consumer data to determine whether different shelf spacing of products has an effect on consumer purchasing decisions as well as assess potential cross-item marketing potentials based on items often purchased together.[23]

Social commerce[edit]

While businesses and advertisers often take advantage of the consumer data available, consumers also use other users' information for their purchase decisions. Social commerce sites are where consumers share product/service experiences and opinions and other information.[24] A famous example of such a site is Pinterest which has over 100 million users.[24] These sites and other online sources of product/brand information are influential on consumer's purchasing decisions.[25] It is estimated that about 67% of online customers use this information in making their purchase decisions.[24] These sites create an environment that is considered trusted by consumers since the information is coming from other consumers.[24]

Other uses of social data[edit]

With the vast amount of data available about individuals that are accessible, the potential uses of this information are growing.

The healthcare sector has many potential uses for this data. Information gathered from social media, and other social data sharing sources can be used to predict the flu, disease outbreaks, how emergency responses are handled, and more.[26] With the use of Twitter and geotags, medical researchers can evaluate the health of a particular neighborhood and use that information to provide better outreach and services.[26] Medtronic has developed a digital blood glucose meter that allows health care providers and patients know about low levels.[19]

Social data can also be used to assess reactions to crises.[27] After Hurricane Sandy, researchers used Twitter to evaluate the emotions and issues that those affected were facing.[27] This information can potentially be used to help better prepare and respond to future crises.

This data can be used to assist with urban planning. The city of Boston has used rider information from Uber to improve transportation planning and road maintenance.[19]

Computational social science[edit]

Using social data for research purposes has led to the development of computational social science. Computational social science combines social science, computer science, and network science.[28] This field emerged in 2009.[29] Before the rise of social data and the technological advances that supported it, researchers were limited to a narrow view of information based on individuals since their primary form of research relied on interviews.[29] With the vast amount of social data available today, researchers can now analyze a wider group and can obtain a broader view of information. They can use social networks, cell phone data, and perform online experiments that allow them to gather more information than before.[29]

Privacy concerns[edit]

With the amount of data available about individuals accessible by many sources, privacy has become a major concern. Security breaches of customer and other social information such as the compromise of more than 56 million Home Depot customers' credit card information[19] have impacted the concern of privacy with social data. How companies are using, and the potential misuse of the personal information gathered is a concern for the majority of consumers.[19][20] Despite this, many people do not know how social networking sites and other sources are using and selling their data.[30] In 2014 study, only 25% of online users knew that their location could be accessed and only 14% knew that their web-surfing history could be accessed and shared.[19]

Even though privacy concern is a critical factor in people's sharing of personal information on the internet and overall internet involvement,[22] most people are willing to share this information if the benefits of doing so outweigh the potential privacy and security costs.[18][20] Consumers enjoy the personalization of products and services that are possible because of this information gathering and despite the concerns, continue to use them.[19]

International development[edit]

"From a macro-perspective, it is expected that Big Data-informed decision-making will have a similar positive effect on efficiency and productivity as ICT have had during the recent decade."

— Hilbert 2013

In his study of the data revolution in international development, Social Sciences Professor at UC Davis, Martin Hilbert, argued that the natural next step from information societies, fueled by ICT, since the late 1990s are knowledge societies informed by Big Data analysis. Decision-making informed by big data analysis has improved both efficiency and productivity in the developed world. Hilbert examines the challenges and potential of the data revolution on "the unruly world of international development."[7]

Types of data[edit]

Hilbert identified four types of data available in large quantities by 2013: words, locations, nature, and behavior.[7]


Individual interactions with the internet, such as words in comments, social media postings, and Google search term volumes, offer an increasingly large source of big data. Typically statistics are generated through a census or a probability survey, for example, the Annual Social and Economic Supplement (ASEC), Current Population Survey (CPS), American Community Survey (ACS), National Health Interview Survey (NHIS) in the United States or administrative records, such as payroll, unemployment, Social Security income taxes, scanner data and credit card data and other commercial transaction records.[31]

"Google has analyzed clusters of search terms by region in the United States to predict flu outbreaks faster than was possible using hospital admission records."

— Shaw 2014 "Why "Big Data" Is a Big Deal"

Weatherhead University Professor Gary King described how the revolution is not just regarding the quantity of data available but in the ability to do something with the data to benefit society.[32]


Global Positioning System (GPS)-enabled mobile tablets, phones, Radio-frequency identification (RFID) chips (part of Automatic identification and data capture (AIDC) technologies), telematics, Location-based games, etc. provide data on absolute location and relative movement.


Hilbert categorizes data on natural processes under 'Nature' which includes sensors that provide data on moisture in the air and temperature.[7]


Data can be generated from user-behavior in multiplayer online games,[7] such as League of Legends, World of Warcraft, Minecraft, Call of Duty, and Dota 2. Nathan Eagle's, a computer scientist at the Santa Fe Institute in New Mexico, began using cellphones in the early 2000s to collect accurate, large-scale data about real social interactions.[33][34][35] The project was named one of the "10 Technologies Most Likely To Change The Way We Live" by the MIT Technology Review.[36]

See also[edit]


  1. ^ Weigend, Andreas. "The Social Data Revolution". Harvard Business Review. Retrieved July 15, 2009.
  2. ^ a b Hubbard, Douglas (2011). Pulse: The New Science of Harnessing Internet Buzz to Track Threats and Opportunities. John Wiley & Sons.
  3. ^ a b Vasileios Lampos; Nello Cristianini (2012). "Nowcasting Events from the Social Web with Statistical Learning". ACM Transactions on Intelligent Systems and Technology. 3 (4): 1–22. doi:10.1145/2337542.2337557. S2CID 8297993. 72.
  4. ^ a b Thomas Lansdall-Welfare; Vasileios Lampos; Nello Cristianini (August 2012). "Nowcasting the mood of the nation". Significance Magazine. Vol. 9, no. 4. pp. 26–28. doi:10.1111/j.1740-9713.2012.00588.x.
  5. ^ Swathi Dharshana Naidu (December 2009). "Social Data Revolution". Posterous. Retrieved July 8, 2010.
  6. ^ Dyson, Esther (March 23, 2010). "Health, not Health Care!". Huffington Post. Retrieved June 8, 2010.
  7. ^ a b c d e Hilbert, Martin (2013). "Big Data for Development: From Information- to Knowledge Societies". SSRN Scholarly Paper (2205145). Rochester, NY: Social Science Research Network. SSRN 2205145. {{cite journal}}: Cite journal requires |journal= (help)
  8. ^ Detecting macropatterns in global media content
  9. ^ Twitter Mood: The Effects of the Recession on Public Mood in the UK
  10. ^ a b c d West, Sarah Myers (2017). "Data Capitalism: Redefining the Logics of Surveillance and Privacy". Business & Society: 1–22.
  11. ^ Cave, Andrew (April 13, 2017). "What Will We Do When the World's Data Hits 163 Zettabytes In 2025?". Forbes. Retrieved May 30, 2018.
  12. ^ a b c d Mayer-Schönberger, Viktor; Cukier, Kenneth (2013). Big Data: A Revolution That Will Transform How We Live, Work and Think. London, UK: John Murray (Publishers).
  13. ^ a b c Fuchs, Christian. 2011. "Web 2.0, Prosumption, and Surveillance." Surveillance & Society 8(3): 288-309.
  14. ^ Reid Hoffman (June 26, 2009). "Future of Jobs & Social Data Revolution". Retrieved July 2, 2010.
  15. ^ Dyson, Esther (February 11, 2008). "The Coming Ad Revolution". The Wall Street Journal. Retrieved April 10, 2010.
  16. ^ Donde, Deepa S., Chopade, Neha, and Ranjith, P.V. 2012. "Social networking sites: a new era of 21st century." SIES Journal of Management 8(1): 66-73.
  17. ^ Osatuyi, Babajide. 2013. "Information sharing on social media sites." Computers in Human Behavior 29(6): 2622-2631.
  18. ^ a b c d Jai, Tun-Min, and King, Nancy J. 2016. "Privacy versus reward: Do loyalty programs increase consumers' willingness to share personal information with third-party advertisers and data brokers?" Journal of Retailing and Consumer Services 28: 296-303.
  19. ^ a b c d e f g h Morey, Timothy, Forbath, Theodore, and Schoop, Allison. 2015. "Customer data: designing for transparency and trust." Harvard Business Review 93(5): 96-105
  20. ^ a b c d Roeber, Bjoern; Rehse, Olaf; Knorrek, Robert; Thomsen, Benjamin (2015). "Personal data: How context shapes consumers' data sharing with organizations from various sectors". Electronic Markets. 25 (2): 95. doi:10.1007/s12525-015-0183-0. S2CID 28025341.
  21. ^ a b Smith, Natasha. 2015. "The datafication of marketing." DM News: 16+. Retrieved from
  22. ^ a b Lee, Seungsin; Lee, Younghee; Lee, Joing-In; Park, Jungkun (2015). "Personalized E-Services: Consumer Privacy Concern and Information Sharing". Social Behavior and Personality. 43 (5): 729. doi:10.2224/sbp.2015.43.5.729.
  23. ^ Tsai, Chieh-Yuan; Huang, Sheng-Hsiang (2014). "A data mining approach to optimise shelf space allocation in consideration of customer purchase and moving behaviours". International Journal of Production Research. 53 (3): 850. doi:10.1080/00207543.2014.937011. S2CID 110688389.
  24. ^ a b c d Liu, Libo, Cheung, Christy M.K., and Lee, Matthew K.O. 2016. "An empirical investigation of information sharing behavior on social commerce sites." International Journal of Information Management 36(5): 686-699.
  25. ^ Chen, Jie, Teng, Lefa, Yu, Ying, and Yu, Xeer. 2016. "The effect of online information sources on purchase intentions between consumers with high and low susceptibility to informational influence." Journal of Business Research 69(2): 467-475.
  26. ^ a b Nguyen, Duc T., and Jung, Jai E. 2016. "Real-time event detection for online behavioral analysis of big social data." Future Generation Computer Systems 66: 137-145.
  27. ^ a b Spence, Patric R., Lachlan, Kenneth A., and Rainear, Adam M. 2016. "Social media and crisis research: Data collection and directions." Computers in Human Behavior 54: 667-672.
  28. ^ Chang, R. M., Kauffman, R.J., and Kwon, Y. 2014. Understanding the paradigm shift to computational social science in the presence of big data. Decision, 63, 67-80.
  29. ^ a b c Mann, A. 2016. Core concept: computational social science. PNAS, 113(3). 468-470. doi: 10.1073/pnas.1524881113
  30. ^ Lilley, Stephen, Frances S. Grodzinsky and Andra Gumbus. 2012. "Revealing the Commercialized and Compliant Facebook User." Journal of Information, Communication & Ethics in Society 10(2):82-92
  31. ^ "Survey Methodology" (PDF), StatsCan, December 19, 2014, retrieved December 19, 2013
  32. ^ Shaw, Jonathan (March 2014), "Why "Big Data" Is a Big Deal: Information science promises to change the world", Harvard Magazine, retrieved December 23, 2016
  33. ^ Nature News, April 2009
  34. ^ Reality Mining downloads
  35. ^ Reality mining whitepaper
  36. ^ Eagle's Harvard Biography