Data Lakes Didn't Fail. We Just Keep Blaming the Wrong Thing.

Written by Matti Karell

30 Years of Data Platforms, 30 Years of Blaming the Wrong Thing - and Why the AI Era Finally Changes the Equation

Every few years, the data industry finds a new scapegoat. The warehouse was too rigid. ETL tools were too slow. The data lake was too messy. Code-first was too fragile. Now AI hallucinates. But the technology was never the real problem - we just keep skipping the fundamentals and blaming the tools when the outcomes disappoint.

I've spent three decades working at the intersection of data, analytics, and business. From roles at Cognos/IBM and SAP through Salesforce's cloud revolution, to now leading Agile Data Engine's journey across the Nordics and EMEA - I've had a front-row seat to every major data platform shift.

The terminology changes. Data mesh, data fabric, lakehouse, AI-first - every cycle brings a new concept that promises to solve what the last one couldn't. But when I sit across the table from a data leader, the underlying challenges are remarkably familiar. The technology isn't the problem. Skipping the fundamentals is.

ADE pic 1

The 1990s–2000s: When Discipline Was Built In

The modern data platform story begins with Bill Inmon and Ralph Kimball, who laid the foundations that still shape data architecture today. Inmon championed a top-down, normalized enterprise data warehouse. Kimball took the opposite path: start with star-schema data marts, integrate through conformed dimensions. I lived through this debate in real-time - larger Nordic enterprises gravitated toward Inmon's rigor; mid-market organizations preferred Kimball's speed.

But beneath the architecture debate, the technology forced discipline. Databases ran on expensive on-premise servers. Storage was scarce, so you modeled carefully. Compute was limited, so performance was a constant battle. The technology punished laziness - and honestly, that discipline produced some of the most reliable data foundations I've ever seen.

As complexity grew, ETL tools - Informatica, DataStage, SSIS - brought governance into the process. Knowledge lived in the system, not just in people's heads. These tools were expensive and rigid, but they solved problems the industry would later have to re-solve.

Then came Data Vault. Dan Linstedt's methodology was designed for change - a modular structure where new sources could be added without redesigning what existed, full history was preserved by default, and patterns were highly repeatable and automatable. For regulated industries across the Nordics and Europe, the auditability wasn't a nice-to-have - it was a compliance requirement. And that repeatability would become central to the metadata-driven approach we'll discuss later.

The business impact across this era: Organizations that invested in proper modeling got reliable reporting, consistent KPIs, and resilience when source systems changed. Those that skipped it spent years reconciling numbers between departments - with nobody knowing which figures were correct.

The Cloud Revolution: When Constraints Disappeared

The 2010s brought Snowflake, BigQuery, Redshift, Databricks, and MS Fabric. Separate storage from compute, scale on demand, pay for what you use. The physical constraints that had enforced discipline for two decades evaporated overnight. Then came the data lake. Store everything in raw form, figure out the structure later. The promise was compelling. For many organizations, the reality wasn't.

Without enforced schemas, data lakes became swamps. Without governance, nobody knew what was in them. Without modeling, the same "customer" concept existed in ten different forms. The flexibility that was supposed to liberate data teams actually buried them.

The fundamental mistake was conflating storage scalability with architectural freedom. Just because you could store everything didn't mean you should store it without structure. Many organizations across the Nordics and Europe are still paying that price - sitting on massive data lakes with very little business value to show for it. The lakehouse concept (Delta Lake, Apache Iceberg) emerged as a correction - the industry admitting the pendulum had swung too far from the modeling discipline that Inmon, Kimball, and Linstedt had championed.

The business impact: Cloud unlocked genuine scale. But organizations that moved without a data strategy simply replicated their on-premise mess at cloud scale, with a larger monthly bill.

ADE pic 2

The Recurring Pattern - and Why Shortcuts Don't Work

Alongside the cloud shift came the code-first revolution - dbt, Airflow, custom SQL pipelines. Fast, flexible, modern. But it reintroduced problems the ETL era had already solved: documentation skipped, governance bolted on as an afterthought, knowledge locked in individuals' heads.

I see this pattern in almost every conversation across Europe. A company invests heavily in code-first engineering. Two years later, the original team has moved on. Maintenance consumes 60-70% of capacity. The CFO asks why costs keep rising while business impact stays flat.

The pattern across three decades is clear. Every few years, a new concept is positioned as the shortcut to data-driven success. Data marts. Data lakes. Code-first. Data mesh. AI. Each contains genuine insight. But when treated as a shortcut - as a way to skip the fundamentals - the outcome is always the same: technical debt, governance gaps, and diminishing returns.

I see it repeatedly with data mesh. Organizations adopt it because it promises to solve scaling problems. But unless the underlying data is modeled, governed, and well-documented, mesh just distributes the mess across more teams. The concept isn't wrong - the mistake is treating it as a shortcut around the fundamentals.

The lesson is always the same: you need to know how you want data to support the business, and then build your data foundations and culture accordingly. No architectural pattern and no buzzword substitutes for that.

ADE pic 3

Metadata-Driven Development: Discipline Without the Drag

What if the answer wasn't ETL or code, but something that combined the governance of metadata-driven development with the agility of modern engineering?

This is the path Agile Data Engine has pursued for over a decade. The core principle: separate what you want to build from how it gets built. Define your data products as structured metadata - entities, attributes, keys, relationships, transformations - and let the platform generate the SQL, orchestration, CI/CD, and deployments automatically.

This aligned perfectly with Data Vault's repeatable patterns. When your modeling approach is pattern-based and your platform is metadata-driven, the combination unlocks consistency and speed that neither achieves alone.

Model once, generate everywhere. The platform generates code for Snowflake, Databricks, BigQuery, or Redshift. Your intellectual property lives in the metadata, not in vendor-specific code. Built-in DevOps. Packages are versioned and deployed through environments with built-in CI/CD. Governance as a byproduct. Lineage, quality rules, and business context are captured automatically. Knowledge in the system. When team members leave, the knowledge stays.

Start anywhere. Scale everywhere. One of the most powerful consequences of a metadata-driven approach is that you don't need to build the entire enterprise data warehouse before you see value. Start with finance. Or sales. Or customer service, HR, procurement - whichever domain has the most pressing business need. Build a governed, production-ready data product in days. Then expand.

Because every data product is built on the same metadata foundation - same modeling standards, same governance, same platform - each one connects naturally to the next. Your finance data product doesn't become a silo. It becomes the first piece of an enterprise-wide data platform. The customer data product adds to it. Procurement adds to it. Each domain accelerates the next because the metadata layer grows richer with every product shipped.

This is the difference between a top-down enterprise programme that takes two years before anyone sees a dashboard, and an iterative approach that delivers business value from week one - without ever losing sight of enterprise scale. The old Inmon vs. Kimball tension - rigor vs. speed - dissolves when the metadata layer ensures both. You don't choose between fast and coherent. You get both.

For the organizations that adopted this approach the benefits compounded over time. The richer the metadata layer became, the more valuable it proved.

The business impact: One customer described shifting from "80% maintenance, 20% new development" to the inverse - freeing engineering capacity for value creation rather than firefighting.

ADE pic 4

The AI Inflection: When a Decade of Metadata Pays Massive Dividends

Most of the industry's attention is on using AI to generate code directly. This is useful, but it has fundamental limitations. AI-generated code is probabilistic. It can hallucinate. It doesn't know your naming conventions, your modeling methodology, or your governance rules. Every script needs line-by-line review. And once written, the code sits in a repo with no structured metadata behind it.

Sound familiar? It's the code-first paradigm all over again - just faster at producing ungoverned output.

Metadata-driven platforms enable something radically different. Instead of asking AI to write code, you ask AI to generate metadata. A data professional describes their intent. The AI agent, equipped with knowledge of your standards and platform configuration, generates complete structured metadata. The engineer reviews the what and why - not thousands of lines of SQL.

AI doesn't write code. AI creates metadata. The platform generates the code deterministically.

No hallucination risk in production - the code is generated by an engine that has done this reliably for a decade. Standards enforced automatically - the AI works within your platform's guardrails. And every data product enriches the foundation - business context, lineage, quality rules, relationships. You're not just building faster. You're building an AI-ready data foundation as a byproduct.

The business impact: What used to take weeks now takes days. The backlog shrinks. AI initiatives have the governed, trustworthy data they need. And institutional knowledge grows with every product shipped - regardless of team turnover.

Where We're Going

Looking back across thirty years and dozens of enterprise customers across the Nordics and Europe, the pattern is unmistakable. Every era brought genuine innovation - and shortcuts that created debt.

The organizations that thrived were the ones that adopted new capabilities without abandoning proven fundamentals. They used cloud to enable better architecture, not replace it. They adopted Data Vault for resilience and auditability. They invested in metadata because it was better engineering - long before they knew AI was coming.

Now those organizations have something most don't: a rich, structured foundation that AI can work with and enrich further. The gap widens every quarter.

After thirty years, my strongest conviction is simple: the fundamentals always win. The technology changes. The buzzwords cycle. But modeling matters. Governance matters. Metadata matters. Culture matters.

And for the first time, we have the tools to make those fundamentals fast - not just necessary. That changes everything.

Agile Data Engine is a metadata-driven data platform helping organizations build governed, scalable data products for over a decade. With AI-powered agentic metadata creation, ADE enables teams to go from business intent to production-ready data products in a fraction of the time - with full governance, full documentation, and an AI-ready data foundation built as a byproduct.

Learn more at agiledataengine.com/future-of-data-engineering