Skip to Content

Generative AI is only as good as the data you feed it
Your data is your competitive advantage

Taylor Brown
5th March 2024

Generative AI is the pinnacle of data science. It will boost profits, reduce costs, and help you expand into new markets. To take full advantage of generative AI’s capabilities, train your models on all your data.

The world is being transformed by AI-assisted medicine, education, scientific research, law, and more. Today, researchers at the University of Toronto use generative AI to model proteins that don’t exist in nature; pharmaceutical giant Bayer now uses generative AI to accelerate the process of drug discovery; and education provider Khan Academy has developed an AI chatbot/tutor, Khanmigo, to personalize learning. And with each passing day, the list of AI use cases across all industries only continues to grow.

According to the Capgemini Research Institute, nearly all (96 percent) of executives cite generative AI as a hot topic of discussion in their respective boardrooms. Generative AI is not just used as an aid to surface information the way a search engine does; with generative AI, organizations can combine their proprietary data with foundation models that have been pre-trained on a broad base of public data to create a sustainable competitive advantage.

Generative AI then becomes the most knowledgeable entity within your organization.

However, as with all analytics, generative AI is only as good as its data. To fully leverage AI, an organization needs a solid data foundation and organizational norms that facilitate responsible and effective use of data.

Data readiness for generative AI depends on two key elements:

  1. The ability to move and integrate data from databases, applications, and other sources in an automated, reliable, cost-effective, and secure manner
  2. Knowing, protecting, and accessing data through data governance

Automated data pipeline platforms, like Fivetran, allow enterprises to capture all of their data, irrespective of the source platform. These automated tools reduce the friction and overhead required to maintain the flow of data to continuously train generative AI applications.


To operationalize generative AI effectively, organizations must establish a solid foundation of automated, reliable, and well-governed data operations. Generative AI requires a modern and scalable data infrastructure that can continuously integrate and centralize data from a variety of sources, including both structured and semi-structured data.

However, as businesses start to operationalize generative AI, they may encounter a number of challenges.

  • Data quality and preparation: Generative AI models are only as good as the data they are trained on. It is important to ensure that the data is high-quality, clean, and well-organized. This includes identifying any potential biases in the data that may distort the outputs of any model trained on it.
  • Security and governance: Security and governance in the context of generative AI concern masking sensitive information, controlling data residency, controlling and monitoring access, and being able to track the provenance and lineage of data models.
  • User experience: It is important to design user interfaces for your model that make it easy for people to interact with your models.
  • Scalability: It is important to choose a generative AI platform that can scale to meet your needs at a reasonable cost.

Generative AI models are trained on massive datasets of text, code, images, or other media. Foundation models, which are off-the-shelf generative AI models that are pre-trained on large volumes of (usually public) data, may be specialized by industry or use case. Choosing the right foundation model can have a significant impact on performance and capabilities. For example, a foundation model that specializes in code generation will do so in a more comprehensive and informative way than a model that is trained on a general dataset of text. Other specialties of foundation models may include sentiment analysis, geospatial analysis, image generation, audio generation, and so on.

While you can easily make use of pre-trained, publicly available AI models, your data is a unique asset that differentiates your organization from the competition. To make the most of it, you must additionally supply foundation models with your business’s unique context.

With access to your organization’s accumulated data, a properly tuned generative AI model can become the most knowledgeable member of your organization, assisting with analytics, customer assistance, sales and marketing, software engineering, and even product ideation.

The Fivetran product team leverages generative AI and natural language processing technologies to develop Fivetran Lite Connectors in a fraction of the time of Fivetran’s standard connectors, while ensuring the same high quality, data integrity, and security customers expect from Fivetran.

In addition, several notable organizations have already found practical ways to use generative AI. Global commercial real estate and investment management company JLL recently rolled out a proprietary large language model that employees access through a natural language interface, quickly answering questions about topics such as an office building’s leasing terms. Similarly, the motor club in the US, AAA, now uses generative AI to help agents quickly answer questions from customers. Of the 100 tech companies profiled in the Forbes Cloud 100, more than half use generative AI.

According to Carrie Tharp, VP Strategic Industries, Google Cloud, “Generative AI opens up a new avenue, allowing people to think differently about how business works. Whereas AI and ML were more about productivity and efficiency – doing things smarter and faster than before – now it’sabout ‘I can do it completely differently than before.’”

Until enterprises get the data right, the nirvana of asking generative AI app-specific and contextual organizational questions in a “Siri-like” way will remain elusive. Get the data right, and it opens up possibilities for all analytics workloads, including generative AI and LLMs.

To make full use of an ever-expanding roster of powerful foundation models, you must first ensure the integrity, accessibility and governance of your own data. Your journey into generative AI and the innovation and change it can bring will be fueled by high-quality, usable, trusted data built on automated, self-healing pipelines.




Operationalization begins with centralizing data and modernizing the data stack to include all available data.


By automating data pipelines, enterprises can focus on improving data models and algorithms to accelerate the efficacy and ROI of investing in a generative AI application.


Generative AI trained on your data will provide insights and guidance driven by your data, creating a unique competitive advantage that cannot be replicated.

Interesting read? Capgemini’s Innovation publication, Data-powered Innovation Review | Wave 7 features 16 such fascinating articles, crafted by leading experts from Capgemini, and partners like Aible, the Green Software Foundation, and Fivetran. Discover groundbreaking advancements in data-powered innovation, explore the broader applications of AI beyond language models, and learn how data and AI can contribute to creating a more sustainable planet and society.  Find all previous Waves here.


Taylor Brown

COO and Co-founder, Fivetran
As COO and co-founder, Taylor has helped build Fivetran, the industry leader in data integration, from an idea to a rapidly growing global business valued at more than $5.6 billion. He believes that magic happens when you can build a simple yet powerful product that is truly innovative and helps users solve a hard problem.