DataOps
This article contains promotional content. (June 2017) |
[1]DataOps for Data Analytics
DataOps is a set of practices and tools used by Big Data teams to increase velocity, reliability, and quality of data analytics. Done right, DataOps fosters a tight collaboration between data engineers/data scientists and IT operations, which in turn leads to faster time to market with Big Data apps or products that are high performing and reliable.[2][3][4][5][6]
It emphasizes communication, collaboration, integration, automation, measurement and cooperation between data scientists, analysts, data/ETL (extract, transform, load) engineers, information technology (IT), and quality assurance/governance. The method acknowledges the interdependence of the entire end-to-end analytic process. It aims to help organizations rapidly produce insight, turn that insight into operational tools, and continuously improve analytic operations and performance. It enables the whole analytic team involved in the analytic process to follow the values laid out in the Agile Manifesto.[7]
Inflexibility, poor quality, and other obstacles hinder the successful production of analytics for data-driven organizations.[8] Other types of organizations have faced similar challenges and the lessons learned in these other domains can be applied in data analytics. In software development, both Agile Development and DevOps have led to a major transformation in the speed and quality of code creation. In manufacturing, statistical process controls (SPC) assure quality and provide early feedback on non-conformances. Applying these methods to data analytics is called DataOps. DataOps is a combination of tools and process improvements that enable rapid-response data analytics at a high level of quality. DataOps adapts more easily to user requirements, even as they evolve, and ultimately supports improved data-driven decision-making.
The speed and flexibility achieved by Agile and DevOps, and the quality control attained by SPC, can be applied to data analytics. Leading edge proponents of this approach are calling it DataOps. DataOps, simply stated, is DevOps with statistical process control, for data analytics. DataOps applies DevOps, and manufacturing quality principles, methodologies, and tools, to the data-analytics pipeline. The result is a rapid-response, flexible and robust data-analytics capability, which is able to keep up with the creativity of internal stakeholders and users.
== DataOps Principles ==[9]
There are twenty principles behind the way DataOps teams work. When working in a DataOps team members:
- Our highest priority is to satisfy the customer through the early and continuous delivery of valuable analytic insight.
- We believe that analytics team will always have a variety of skills, tools, and techniques at their disposal. And the work they create using tools that range from non-technical self-service graphical tools to most technical text editing environment is just code.
- Reproducible results are required, therefore we version everything: data, hardware/software configuration, and every tool’s code & configuration.
- We welcome changing requirements, even late in development. DataOps harness those changes for the customer's competitive advantage
- We work to deliver working analytic insight frequently, from a couple of minutes to weeks, with a preference to the shorter timescale.
- Business people, operations, and analytic developers must work together daily throughout the project.
- Build projects around motivated individuals. Give them the environment and support they need, and trust them to get the job done.
- Minimize the cost for analytic team members to experiment by giving them easy to create, run, and dispose of technical environments that reflect their production environments
- DataOps teams strive to minimize the time and effort turn an analytic idea into a prototype, get customer feedback, then turn that into a production process, and finally refactor and reuse that process.
- The most efficient and effective method of conveying information to customers and within an analytic team is face-to-face conversation.
- Insightful analytics incorporating accurate data delivered by sound systems is the primary measure of progress.
- A fundamental part of DataOps is a focus on process thinking. Like a manufacturing line, a DataOps process orchestrates that process is a directed acyclic graph (DAG) of steps that take data inputs and transforms them into valuable analytic insight delivered to customers.
- DataOps strives to have quality built in their processes as much as possible. By building quality into your process, you prevent unnecessary rework, errors, and loss of confidence in your results from customers. This means that your analytic process are capable of detecting abnormalities (jidoka), and your fixtures have mistake proofing to avoid mis-assembly (poka yoke)[10]
- Performance and quality measures are continuously monitored in order to detect unexpected variation and generate operational statistic to support statistical process control.
- The analytic team should strive to reduce heroism and be able to maintain a constant pace indefinitely.
- Continuous attention to technical excellence and good design enhances agility.
- Simplicity--the art of maximizing the amount of work not done--is essential.
- The best analytic insight, algorithms, architectures, requirements and designs emerge from self-organizing teams.
- Automated testing of the code produced analytic team and data and artifacts ingested, created and produced with a DataOps orchestration graph DAG is essential to moving fast with high quality.
- At regular intervals, the team reflects on how to become more effective, then tunes and adjusts its behavior accordingly.
References
- ^ "What is DataOps (data operations)? - Definition from WhatIs.com". SearchDataManagement. Retrieved 2017-04-05.
- ^ "From DevOps to DataOps, By Andy Palmer - Tamr Inc". Tamr Inc. 2015-05-07. Retrieved 2017-03-21.
- ^ Smith, Alivia. "Why you Need DataOps to Organize Your Data Science Projects". Retrieved 2017-03-21.
- ^ "DataOps: A Modest Proposal for Rethinking Enterprise Data Management | Blue Hill Research". bluehillresearch.com. Retrieved 2017-03-21.
- ^ "Emerging: DataOps and three tips for getting there". SearchCIO. Retrieved 2017-03-21.
- ^ "DataOps – It's a Secret". www.datasciencecentral.com. Retrieved 2017-04-05.
- ^ "Manifesto for Agile Software Development". agilemanifesto.org. Retrieved 2017-03-21.
- ^ "DataOps: The Collaborative Framework for Enterprise Data-Flow Orchestration | Blue Hill Research". bluehillresearch.com. Retrieved 2017-03-21.
- ^ http://dataopsmanifesto.org/
- ^ "Lean Manufacturing Principles". Retrieved 2017-03-21.