Machine-generated data is information which was automatically created from a computer process, application, or other machine without the intervention of a human. However, there is some indecision as to the breadth of the term. Monash Research's Curt Monash, who is generally credited with the introduction of the term, defines it as "data that was produced entirely by machines OR data that is more about observing humans than recording their choices." Meanwhile, Daniel Abadi, CS Professor at Yale, proposes a narrower definition of "Machine-generated data is data that is generated as a result of a decision of an independent computational agent or a measurement of an event that is not caused by a human action." Regardless of the conflict in definition, both exclude data manually entered by an end user. Machine-generated data crosses all industry sectors, and humans increasingly generate the data unknowingly.
Machine-generated data tends to be amorphous; typically, users never modify this data. Machines often generate this data as a consistent response to an event which occurred. Since the event is historical, the data is less prone to updates and modifications. Partly because of this quality, the U.S. court systems consider machine-generated data as highly reliable.
In 2009, Gartner published that data will grow by 650% over the following five years. Most of the growth in data is the byproduct of machine-generated data. IDC estimated that in 2020, there will be 26 times more connected things than people. Wikibon issued a forecast of $514 billion to be spent on the Industrial Internet in 2020.
Given the fairly static yet voluminous nature of machine-generated data, data owners rely on highly scalable tools to process and analyze the resulting dataset. Almost all machine-generated data is unstructured but then derived into a common structure. Typically, these derived structures contain many data points/columns. With these data points, the challenge lies mostly with analyzing the data. Given high performance requirements along with large data sizes, traditional database indexing and partitioning limits the size and history of the dataset for processing. Alternative approaches exist with columnar databases as only particular "columns" of the dataset would be accessed during particular analysis.
- Web server logs
- Call detail records
- Financial instrument trades
- Network event logs
- SEIM logs
- Telemetry collected by the government
- Monash, 12/30/2010
- Monash, Three Broad Categories of Data
- Deloach, Machine Generated Data
- Federal Evidence Review, Machine Generated Data was Not Statement and Raised no Hearsay
- , Chuck's Blog
- , Wikibon
- Wikipedia, Column Oriented DBMS
- Monash, Examples of Machine Generated Data
- Abadi, Daniel. "Machine vs. Human generated data". BlogSpot.
- Deloach, Don. "Machine Generated Data". Infobright, Inc.
- Federal Evidence Review. "Machine Generated Data Was Not Statement and Raised no Hearsay or Confrontation".
- Monash, Curt. "Three Broad Categories of Data". Monash Research.
- Monash, Curt. "Examples of Machine Generated Data". Monash Research.
- Monash, Curt. "Examples and definition of machine-generated data". Monash Research.
- Science Logic. "Gartner Ten Technologies to Watch".
- "Column Oriented DBMS". Wikipedia.