Data janitor

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

A data janitor is a person who works to take big data and condense it into useful amounts of information. Also known as a "data wrangler," a data janitor sifts through data for companies in the information technology industry. A multitude of start-ups rely on large amounts of data, so a data janitor works to help these businesses with this basic, but difficult process of interpreting data.

While it is a commonly held belief that data janitor work is fully automated, many data scientists are employed primarily as data janitors. The Information technology industry has been increasingly turning towards new sources of data gathered on consumers, so data janitors have become more commonplace in recent years.[1]

Data janitors work in a process that largely consists of four steps: selection and defining relationships, extraction and organization, loading, and interpretation.[2] Data janitors identify sources of data before selecting which data is relevant and find the relationships between the data that will be useful to the company's projects. Next, they structure the data in an effort to extract the information and put it into a format that can be stored in a secure place for the business. Last, the data janitors work with other employees to create visual aids to present to managers and executives who will eventually benefit from the conclusions that can be made from them. In this way, the work of data janitors is integral to the functioning of businesses that rely on large amounts of data to function.


  1. ^ Lohr, Steve. "For Big-Data Scientists, 'Janitor Work' Is Key Hurdle to Insights". The New York Times. The New York Times Company. Retrieved 26 July 2015.
  2. ^ "In Big Data, Preparing Data is Most of the Work". Data Science Central. Sullexis LLC. Retrieved 26 July 2015.