DevOps or DataOps – What is the distinction between data and software development
DataOps is a development methodology that combines processes and tools in order to continuously provide quality data products and analytics in time and with predictable costs.
In this blog post, we will explore the similarities and differences between DataOps and DevOps, and the specific challenges and considerations of data development. We’ll discuss the dependency on underlying business applications, the constantly changing nature of data, and the different skill sets required for data engineering and software development.
About this DataOps blog post series
To get an overview of the DataOps methodology and its key principles, tools, and best practices, we set out to create a series of blog posts focusing on different aspects of DataOps.
We'll publish the blog series over the coming weeks. If you don't want to wait, you can also download the whole story as a whitepaper.
DevOps: Combining development and operations to improve quality, efficiency, and speed of software delivery
DevOps is a software development methodology that combines processes and practices to improve quality, efficiency, and speed in development, deployment, and delivery while making the software more reliable and the maintenance and operations of the software as smooth as possible.
In other words, DevOps combines or unifies the development and operations. It shifts the thinking from project to product and takes the whole life cycle into account. In practice, this means introducing agile development methods, like Scrum, and automating as much as possible, i.e., testing, version control, CI/CD, monitoring, etc.
A crucial part of DevOps is measuring the development process and making it leaner. This is done by removing obstacles and waste from the processes and by continuously trying to improve the performance of the development team. Unifying operations in the development team improves the quality of the software as it is in the developers' interest to make the software work well and require as little maintenance and operation work as possible.
One of the core things we should not forget in DevOps comes from agile methods: collaboration and communication. This means there should be clear communication channels between all relevant stakeholders and a continuous feedback loop from end users to developers.
Understanding DataOps: data development and operations
DataOps is not about data and operations, but DevOps adopted to data development. DataOps is a development methodology and a set of processes and tools that aim to continuously provide quality data pipelines, products, and analytics in time and with predictable costs. In practice, this means improved delivery speed and more accurate and reliable data products.
To understand how DataOps differs from DevOps, we need to understand how data development differs from software development. In a way, you could oversimplify it like this: Business applications are about automating and helping with business processes and functions. Data development is about deriving insight from the data created by the said business processes and applications.
Navigating the challenges of upstream data sets
One of the major differences is that data and analytics development is always downstream development and dependent on the underlying business applications, while software development is most commonly focused on the functionalities of a stand-alone business application. Getting the data out and combining it with data extracted from other business applications creates an extra layer of complexity to data development.
You also have to keep in mind that data is constantly changing. In most cases, the upstream business applications do not have integrated development or test data sets, even if the production usually does, to some extent. This creates different challenges for developing the data products as they typically focus on the data content rather than the functionalities.
Understanding the required skill set
Another major difference is that working with data emphasizes a slightly different skill set than software development. As a modern data engineer, it helps to have experience and a decent understanding of programming. However, you should have a firm grasp of SQL and relational databases and some mathematical domains such as set logic and statistics. On top of that, having the required skills and an understanding of data modeling is a huge benefit because data modeling is all about communication and semantics, which is necessary when integrating data from several sources.
That said, the required skill set of data engineers and how you put together your DataOps team heavily depends on the specific needs of your organization.
First of all, you should put some up-front effort into the data platform architecture and consider and analyze at least the following questions:
- How your data platform architecture fits your enterprise architecture?
- What kind of DevOps/DataOps tools your organization possibly already has in use?
- What kind of expertise you already have inhouse and is available in the market?
In summary, DataOps serves as a bridge between development and operations within the realm of data. By recognizing and understanding the fundamental distinctions between software and data development, we gain deeper insights into what sets the concept of DataOps apart. These differences, such as the dependency on underlying business applications and the ever-changing nature of data, underscore the unique challenges and considerations involved in data development. Embracing DataOps means embracing a methodology that takes these nuances into account.