Data aggregation

From Wikipedia, the free encyclopedia

Data aggregation is the compiling of information from databases with intent to prepare combined datasets for data processing.[1]


The United States Geological Survey explains that, “when data are well documented, you know how and where to look for information and the results you return will be what you expect.”[2] The source information for data aggregation may originate from public records and criminal databases. The information is packaged into aggregate reports and then sold to businesses, as well as to local, state, and government agencies. This information can also be useful for marketing purposes. In the United States, many data brokers' activities fall under the Fair Credit Reporting Act (FCRA) which regulates consumer reporting agencies. The agencies then gather and package personal information into consumer reports that are sold to creditors, employers, insurers, and other businesses.

Various reports of information are provided by database aggregators. Individuals may request their own consumer reports which contain basic biographical information such as name, date of birth, current address, and phone number. Employee background check reports, which contain highly detailed information such as past addresses and length of residence, professional licenses, and criminal history, may be requested by eligible and qualified third parties. Not only can this data be used in employee background checks, but it may also be used to make decisions about insurance coverage, pricing, and law enforcement. Privacy activists argue that database aggregators can provide erroneous information.[3]

Role of the Internet[edit]

The potential of the Internet to consolidate and manipulate information has a new application in data aggregation, also known as screen scraping.[4] The Internet gives users the opportunity to consolidate their usernames and passwords, or PINs. Such consolidation enables consumers to access a wide variety of PIN-protected websites containing personal information by using one master PIN on a single website. Online account providers include financial institutions, stockbrokers, airline and frequent flyer and other reward programs, and e-mail accounts. Data aggregators can gather account or other information from designated websites by using account holders' PINs, and then making the users' account information available to them at a single website operated by the aggregator at an account holder's request. Aggregation services may be offered on a standalone basis or in conjunction with other financial services, such as portfolio tracking and bill payment provided by a specialized website, or as an additional service to augment the online presence of an enterprise established beyond the virtual world. Many established companies with an Internet presence appear to recognize the value of offering an aggregation service to enhance other web-based services and attract visitors. Offering a data aggregation service to a website may be attractive because of the potential that it will frequently draw users of the service from the hosting website.

Local business data aggregation[edit]

When it comes to compiling location information on local businesses, there are several major data aggregators that collect information such as the business name, address, phone number, website, description and hours of operation. They then validate this information using various validation methods. Once the business information has been verified to be accurate, the data aggregators make it available to publishers like Google and Yelp.

When Yelp, for example, goes to update their Yelp listings, they will pull data from these local data aggregators. Publishers take local business data from different sources and compare it to what they currently have in their database. They then update their database it with what information they deem accurate.

The four major data aggregators for local business search are Acxiom, Infogroup, Localeze and Factual.[5] As of January 2020, Acxiom will no longer be acting as a data aggregator. Foursquare takes the place of Acxiom in four primary data aggregators.[6]

Legal implications[edit]

Financial institutions are concerned about the possibility of liability arising from data aggregation activities, potential security problems, infringement on intellectual property rights and the possibility of diminishing traffic to the institution's website. The aggregator and financial institution may agree on a data feed arrangement activated on the customer's request, using an Open Financial Exchange (OFX) standard to request and deliver information to the site selected by the customer as the place from which they will view their account data. Agreements provide an opportunity for institutions to negotiate to protect their customers' interests and offer aggregators the opportunity to provide a robust service. Aggregators who agree with information providers to extract data without using an OFX standard may reach a lower level of consensual relationship; therefore, "screen scraping" may be used to obtain account data, but for business or other reasons, the aggregator may decide to obtain prior consent and negotiate the terms on which customer data is made available. "Screen scraping" without consent by the content provider has the advantage of allowing subscribers to view almost any and all accounts they happen to have opened anywhere on the Internet through one website.


Over time, the transfer of large amounts of account data from the account provider to the aggregator's server could develop into a comprehensive profile of a user, detailing their banking and credit card transactions, balances, securities transactions and portfolios, and travel history and preferences. As the sensitivity to data protection considerations grows, it is likely there will be a considerable focus on the extent to which data aggregators may seek to use this data either for their own use or to share it with third parties and operator(s) of the website on which the service is offered.[7]


  1. ^ Stanley, Jay; Steinhardt, Barry (January 2003). "Bigger Monster, Weaker Chains: The Growth of an American Surveillance Society". American Civil Liberties Union. {{cite journal}}: Cite journal requires |journal= (help)
  2. ^ "Why Does Data Need to be Managed?". USGS. 2022-06-11. Retrieved 2022-06-11.
  3. ^ Pierce, Deborah; Ackerman, Linda (2005-05-19). "Data Aggregators: A Study of Data Quality and Responsiveness". Archived from the original on 2007-03-19. Retrieved 2007-04-02.
  4. ^ van Oostenrijk, Alex (2004). "Screen scraping web services". The Netherlands: Radboud University of Nijmegen, Department of Computer Science. Nijmegen.
  5. ^ Yuzdepski, Zachary (16 June 2016). "Improve Your Local Search Ranking With Data Aggregators". Vendasta. Archived from the original on 2017-11-25.
  6. ^ Chessall, Erica (22 January 2020). "Listing Distribution: Foursquare as a New Data Aggregator". Archived from the original on 2020-04-25.
  7. ^ Ledig, Robert H.; Vartanian, Thomas P. (2002-09-11). "Scrape It, Scrub It and Show It: The Battle Over Data Aggregation". Fried Frank. Retrieved 2007-04-02.