A framework for data quality

by SK
2 views

This post is a guest contribution by George Siosi Samuels, managing director at Faiā. See how Faiā is committed to staying at the forefront of technological advancements here.

Why data quality—not quantity—will make or break your AI projects

As an enterprise leader navigating the artificial intelligence (AI) or blockchain landscapes, you’re likely facing a sobering reality: 78% of AI projects fail due to poor data quality. The numbers don’t lie, but neither does the solution waiting in the wings. 

I propose a “Treat Data Like Food” framework, which offers a simple analogy to address this crisis, though significant gaps remain. As someone who’s spent years studying such patterns, I’ll break down how to transform this compelling metaphor into an enterprise-ready strategy that delivers measurable results.

The food metaphor: Engaging but incomplete

The analogy of data as “nutrition” brilliantly simplifies complex concepts, making it ideal for executive buy-in and organizational alignment. However, like any powerful metaphor, it risks oversimplifying critical issues that enterprises face daily:

Data Provenance: Just as you might track a tomato’s farm-to-table journey, enterprises need robust lineage tools (e.g., Collibra, Alation) to audit data origins and transformations. Without this visibility, AI outcomes become as questionable as mystery meat.

Evolving Schemas: Menus change seasonally—so do data models. The framework needs tactical approaches for adaptive schema governance, especially as your AI models evolve alongside business requirements.

Cultural Nuances: A “farm-to-table” approach flourishes in Asia’s meticulous data environments but creates friction against Western “all-you-can-eat” data buffets—a tension many global enterprises struggle to reconcile.

If you’re looking to take action in your company, pair the metaphor with concrete implementation examples, such as how Pfizer (NASDAQ: PFE) used lineage tools to accelerate vaccine research and development (R&D) by ensuring data quality at each stage—from clinical trials to regulatory submission.

From theory to practice: Scaling data quality

The framework’s 5-step guidelines provide a foundation, but global enterprises need scalability mechanisms that work across diverse ecosystems. Let’s transform theory into practice:

Step 1: Automate data ‘nutrition’ (hygiene) checks

Deploy tools like Great Expectations or Monte Carlo for real-time quality monitoring across your data landscape. A good example of this includes Netflix, which used automated validation to flag inconsistent viewer data pre-processing, reducing model retraining needs by 40%.

Step 2: Implement hybrid governance models

Merge “all-you-can-eat” agility (e.g., cloud data lakes) with “farm-to-table” rigor (e.g., GDPR-compliant pipelines) for balanced data management. Unilever‘s (NASDAQ: UL) hybrid model reduced supply chain data errors by 40% while maintaining the flexibility needed for market-specific insights.

Step 3: Standardize ‘data nutrition labels’

Align labels with industry benchmarks (e.g., Data Management Capability Assessment Model (DCAM) for financial services) to create universal understanding. Include data freshness, source reliability, bias risk assessment, and compliance status indicators.

Bridging the cultural divide in data management

Global enterprises face fundamentally conflicting data philosophies that mirror cultural approaches to food:

West: Rapid ingestion, legacy system reliance, and quantity-first approaches.

East: Precision-centric methodologies with meticulous validation, but slower to scale.

So what’s the solution? Adopt a “fusion cuisine” strategy that leverages the strengths of both approaches:

Use Application Programming Interfaces or APIs (and more recently, Model Context Protocols or MCPs) to harmonize legacy systems (SAP) with modern data warehouses (Snowflake, Google’s (NASDAQ: GOOGL) BigQuery) for seamless integration.

Deploy region-specific governance tiers—e.g., stricter provenance tracking in European Union hubs while maintaining agility in developing markets.

The missing link: Operationalizing data quality

Transitioning from a quantity-first to a quality-first mindset requires operational discipline that many organizations lack. The pathway forward includes:

Phased Migration: Begin with non-critical datasets and employ tools like AWS Glue or Talend for low-risk ETL processes before tackling mission-critical data.

ROI Metrics: Track project success via reduced preprocessing time and enhanced model accuracy. For instance, Toyota (NASDAQ: TM) cut 30% of its AI training costs following its data quality migration initiative.

The future: AI-generated ‘data nutrition labels’

To truly innovate beyond the framework’s foundations, leverage generative AI to automatically create nutrition labels, flag potential biases, and suggest enrichment opportunities. An example of this includes IBM’s (NASDAQ: IBM) Watson, which now audits healthcare datasets for demographic representation gaps, helping address potential bias before models are trained.

Conclusion: Your data quality roadmap

The “Treating Data Like Food” framework offers a compelling starting point, but enterprise leaders must extend it by:

Automating quality checks and lineage tracking across the data lifecycle.

Customizing governance approaches for regional needs and regulatory environments.

Measuring success through reduced latency, error rates, and model retraining frequency.

By addressing scalability challenges, cultural differences, and automation opportunities, you transform this framework from conceptual to operational—positioning your AI initiatives for success where 78% currently fail.

Ready to get started? Download my free Data Nutrition Label Template to begin auditing your AI inputs today and take the first step toward data quality that nourishes rather than contaminates your AI ecosystem.

In order for artificial intelligence (AI) to work right within the law and thrive in the face of growing challenges, it needs to integrate an enterprise blockchain system that ensures data input quality and ownership—allowing it to keep data safe while also guaranteeing the immutability of data. Check out CoinGeek’s coverage on this emerging tech to learn more why Enterprise blockchain will be the backbone of AI.

Watch: Utilizing blockchain tech for data integrity

frameborder=”0″ allow=”accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share” referrerpolicy=”strict-origin-when-cross-origin” allowfullscreen>

FindTopBargains (FTB): Your go-to source for crypto news, expert views, and the latest developments shaping the decentralized economy. Stay informed and ahead of the curve!

Subscribe newsletter

Subscribe my Newsletter for new blog posts, tips & new photos. Let's stay updated!

@2025  All Rights Reserved.  FindTopBargains