Keeping Cloud Data Warehouse Costs in Check

Apr 26, 2024 9:31:13 AM

Have you ever wondered who's accountable if your data warehouse starts draining your budget? Can you monitor usage and identify the sources of costs?

Knowing the cost structure and having clear visibility of your expenditure is vital to control operating costs in cloud data warehouses. However, remember that these operating costs are just a portion of the total cost of ownership (TCO) of your cloud data warehouse. Other factors such as the use of different tools, the build-vs-buy decision, the availability of skilled personnel, and data architecture can also influence the costs. In this post, we'll focus on understanding the running costs of your cloud data warehouse.

horizontal-ruler-down

The Cloud's Variable Cost Model

If you're reading this, you're likely familiar with the cloud's variable cost model - the more you use, the more you pay. This applies to cloud data warehouses as well, though pricing models may vary among vendors. Be it Snowflake's time-based billing, BigQuery's on-demand or capacity pricing, or Azure Synapse's pay-as-you-go or reserved pricing, you pay more for increased compute and storage use. Naturally, you want to eliminate wastage and avoid paying for unused capacity.

Building a Cost-Conscious Culture

Using a cloud data warehouse means everyone on the team should understand its costs. This applies to every stakeholder, from developers to business users. Each cloud data warehouse comes with its own pricing scheme, so it's important to know what makes the costs go up. To help everyone understand this, it's a good idea to teach your team about the ins and outs of your chosen data warehouse.

One way to understand this better is to look at something called Cloud FinOps practices:

FinOps is an evolving cloud financial management discipline and cultural practice that enables organizations to get maximum business value by helping engineering, finance, technology, and business teams to collaborate on data-driven spending decisions.

 

FinOps Foundation,  [1]

 

To build a cost-conscious culture around your data warehouse, it’s essential to keep the practices of FinOps in mind.

What's great about FinOps is that it's not just about cutting costs. It's about making sure the money you spend on the cloud is helping your business. This should be your goal when you're thinking about costs in your cloud data warehouse. You should use the warehouse where it helps your business, but always look for ways to save money with optimization. That's what DataOps and tools like the Agile Data Engine (ADE) are all about. They help you deliver value quickly to keep your business happy. If your costs are getting too high, you can keep improving and adding features to use your data warehouse more efficiently.

Another good thing about FinOps is that it encourages everyone to take responsibility for their own cloud usage. This means that the people who build things should also understand the costs. I've seen teams that take ownership of their data product and its costs, and some that don't. I think the idea of cost ownership in FinOps is very important, and it applies to cloud data warehouses as well.

horizontal-ruler-up

The Importance of Architecture and Planning

How you use cloud data warehouses matters. You need a robust architecture and a clear plan for implementing your data product. For instance, the cost implications of streaming all your data versus running nightly batch jobs in your cloud data warehouse are vastly different. You must strike a balance between business requirements, solution architecture, and cost management.

Several architectural decisions influence the cost of your data warehouse:

  • What's your data volume and velocity, both currently and projected?
  • Are you using delta loads or full loads from the source?
  • Will you be historizing all data or only a part of it?
  • How rapidly is your environment growing in terms of new data products expected in the next 1 to 6 months?
  • How often are workflows executed?
  • Is near-real-time data loading required?

Further, features of our ADE can also impact the cost-effectiveness of your data product. For instance, using delta loading features across your ADE-built data warehouse can enhance cost efficiency. We'll explore this in more detail in future posts.

In the end, architectural decisions are crucial to balance performance and costs, ensuring a good return on investment. The business side doesn't concern itself with the details of data loading from source systems into the cloud data warehouse. That responsibility falls on the data team, which must provide high-performing, cost-effective solutions. For instance, it wouldn't make sense to run data pipelines every hour if the business needs data updates only once or twice a day.

horizontal-ruler-down

Shining a Light on Costs

So, does your data team have a clear picture of all cost-related details? They should. When it comes to using cloud data warehouses, having a clear understanding of costs is very important. Your team needs this information when they're developing new things or maintaining existing ones. Visibility to costs is a key part of making sure there is a continuous feedback loop from development to operations. This idea of always being aware of how much things cost in real time is one of the key principles in FinOps [2].

There are some teams that don't have a clear view of how much their cloud data warehouse is costing them, or they only see costs lumped together. That's not a good way to do things, especially when costs can change depending on how much you use your warehouse. It's not possible to have a proper feedback loop about how well something works and how much it costs without concise cost information.

There needs to be a detailed breakdown of costs, not just how much you're paying each day for compute and storage. The level of detail and the amount of information you have will depend on the cloud data warehouse product you're using and how they charge you.

This could include things like:

  • Which workloads use a lot of resources?
  • Which queries take a long time?
  • How much do BI tools use the data warehouse?
  • What are the costs grouped by different features or products, or by different users?

The point of knowing all this isn't to find someone to blame for the costs. Instead, it's about understanding what's happening. What's the situation with your cloud data warehouse in terms of costs and how well it's working?

If your business needs are growing and that's why costs are going up, that's great! It means the increase in cost is worth it. But if costs are going up and there's no good reason for it, maybe it's time to look at how much you're using in terms of computing power and storage to see if there is a place for optimization. Sometimes this is a good idea, sometimes it's not. But the important thing is to have up-to-date information about how much of your data warehouse resources you're using.

horizontal-ruler-up

Wrapping Up

The majority of cloud data warehouse products are easy and enjoyable to use. You can achieve results quickly, and performance is rarely an issue. In most cases, customers are satisfied with their cloud data warehouses and the agility they offer.

However, keeping an eye on costs and maintaining visibility is essential. Even if your data warehouse costs are currently low, the rule of thumb in cloud services is—the more you have, the more you want. So, it's better to manage and monitor these costs now rather than later!

Cloud FinOps practices provide an interesting perspective on managing cloud costs. They could be an important consideration for your organization if you wish to keep costs under control. To build a cost-conscious culture for your cloud data warehouse, FinOps practices are worth looking into.

Sources:

[1] FinOps Foundation, https://www.finops.org/introduction/what-is-finops/

[2] Cloud FinOps: Collaborative, Real-Time Cloud Financial Management. J.R. Storment, Mike Fuller