{"id":569332,"date":"2026-05-19T16:31:05","date_gmt":"2026-05-19T14:31:05","guid":{"rendered":"https:\/\/www.capgemini.com\/se-en\/?p=569332"},"modified":"2026-05-19T16:31:56","modified_gmt":"2026-05-19T14:31:56","slug":"strange-folders-in-the-cloud-distributing-pdfs-to-your-llm-is-no-basis-for-an-ai-strategy","status":"publish","type":"post","link":"https:\/\/www.capgemini.com\/se-en\/insights\/expert-perspectives\/strange-folders-in-the-cloud-distributing-pdfs-to-your-llm-is-no-basis-for-an-ai-strategy\/","title":{"rendered":"Strange folders in the cloud &#8211; Distributing PDFs to your LLM is no basis for an AI strategy"},"content":{"rendered":"\n<header class=\"wp-block-cg-blocks-hero-blogs header-hero-blogs\"><div class=\"container\"><div class=\"hero-blogs\"><div class=\"hero-blogs-content-wrapper\"><div class=\"row\"><div class=\"col-12\"><div class=\"header-title\"><h1><strong><strong>Strange folders in the cloud &#8211; Distributing PDFs to your LLM is no basis for an AI strategy<\/strong>\u00a0<\/strong><\/h1><\/div><\/div><\/div><\/div><div class=\"hero-blogs-bottom\"><div class=\"header-author\"><div class=\"author-img\"><img decoding=\"async\" src=\"\/wp-content\/themes\/capgemini2025\/assets\/images\/cg-logo-white.svg?w=200&amp;quality=10\" alt=\"\" loading=\"lazy\"\/><\/div><div class=\"author-name-date\"><h5 class=\"author-name\">Capgemini<\/h5><h5 class=\"blog-date\">May 19, 2026<\/h5><\/div><\/div><div class=\"brand-image\"><\/div><\/div><\/div><\/div><\/header>\n\n\n\n<section class=\"wp-block-cg-blocks-group section section--article-content\"><div class=\"article-main-content\"><div class=\"container\"><div class=\"grid-container\"><div class=\"col-12 col-md-2\"><nav class=\"article-social\"><ul class=\"social-nav\"><li class=\"ip-order-fb\"><a href=\"https:\/\/www.facebook.com\/sharer\/sharer.php?u=https:\/\/www.capgemini.com\/se-en\/?p=569332\" target=\"_blank\" rel=\"noopener noreferrer\" title=\"opens in a new window\"><i aria-hidden=\"true\" class=\"icon-fb\"><\/i><span class=\"sr-only\">Facebook<\/span><\/a><\/li><li class=\"ip-order-li\"><a href=\"https:\/\/www.linkedin.com\/sharing\/share-offsite\/?url=https:\/\/www.capgemini.com\/se-en\/?p=569332\" target=\"_blank\" rel=\"noopener noreferrer\" title=\"opens in a new window\"><i aria-hidden=\"true\" class=\"icon-li\"><\/i><span class=\"sr-only\">Linkedin<\/span><\/a><\/li><\/ul><\/nav><\/div><div><div class=\"article-text article-quote-text\">\n<h2 class=\"wp-block-heading\" id=\"h-in-the-era-of-ai-driven-decision-making-organizations-are-tempted-to-shortcut-their-data-strategy-by-feeding-large-language-models-llms-with-unstructured-documents-such-as-pdfs-while-this-approach-may-seem-convenient-it-often-results-in-poor-performance-hallucinated-outputs-and-a-lack-of-meaningful-insight-throwing-pdfs-into-an-llm-is-not-a-valid-data-strategy\"><strong>In the era of AI-driven decision making, organizations are tempted to shortcut their data strategy by feeding large language models (LLMs) with unstructured documents such as PDFs. While this approach may seem convenient, it often results in poor performance, hallucinated outputs, and a lack of meaningful insight. Throwing PDFs into an LLM is not a valid data strategy.<\/strong><\/h2>\n\n\n\n<p>We propose the need for semantic enrichment and domain-specific context as essential components of a scalable strategy. A key enabler in this transformation is the use of knowledge graphs, which organize information into interconnected entities and relationships. By grounding LLMs in structured, contextualized knowledge, organizations can drastically reduce hallucinations and improve the relevance and accuracy of AI-generated responses. We explore how knowledge graphs bridge the gap between raw data and intelligent reasoning, offering a foundation for trustworthy, enterprise-grade AI systems.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"has-pale-cyan-blue-color has-text-color has-link-color wp-elements-4adaf8233a36705e5c44f7b5908cf075\"><strong><em>\u201cThrowing documents at an LLM is like swapping the idol for a bag of sand \u2013 looks clever, but you\u2019re still triggering all the traps. Structure your knowledge if you want to escape with the prize.\u201d<\/em><\/strong><\/p>\n<\/blockquote>\n\n\n\n<p>In the era of AI-driven decision-making, organizations are tempted to shortcut their data strategy by feeding large language models (LLMs) with unstructured documents such as PDFs. While this approach may seem convenient, it often results in poor performance, hallucinated outputs, and a lack of meaningful insight. Throwing PDFs into an LLM is not a valid data strategy.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><em><strong>\u201cA little error in the beginning leads to a great one in the end.\u201d <\/strong><\/em>\u2013 St. Thomas Aquinas<\/p>\n<\/blockquote>\n\n\n\n<p>When we provide an LLM with information in the form of text (either the whole text attached in the prompt or matching chunks through a RAG approach), we do reduce the risk of hallucination. The model\u2019s responses will be somewhat closer to the source material. But let\u2019s be clear: the LLM is still not using the information \u2013 it is interpreting it through its world model (which is still just a language model), and generating output based on that, i.e., statistical patterns.<\/p>\n\n\n\n<p>That means hallucination risk never truly disappears. Even when the system returns a reference to the chunk from the document that was used as input, the burden of validation remains with you. You must read the referenced passage, interpret it, and verify that the model made the same assessment as you. Ouch. Every answer is still a gamble. Not scalable, not possible to automate, and not agentic compliant.<\/p>\n\n\n\n<p>The alternative is to <strong>remove the knowledge domain from the LLM entirely<\/strong>. Let the LLM do what it excels at: understanding natural language questions and translating them into structured queries. But do not let it invent the answers. Instead, answers should come from a deterministic, rule-based, yet semantically accessible data layer \u2013 a knowledge graph or equivalent structured representation.<\/p>\n\n\n\n<p>We propose the need for semantic enrichment and domain-specific context as essential components of a scalable strategy. A key enabler in this transformation is the use of knowledge graphs, which organize information into interconnected entities and relationships. By grounding LLMs in structured, contextualized knowledge, organizations can drastically reduce hallucinations and improve the relevance and accuracy of AI-generated responses. We explore how knowledge graphs bridge the gap between raw data and intelligent reasoning, offering a foundation for trustworthy, enterprise-grade AI systems.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-the-reality-check-testing-ai-with-structured-knowledge\"><strong>The reality check: Testing AI with structured knowledge<\/strong><\/h3>\n\n\n\n<p>To make this concrete, we set out to test the hypothesis that using an LLM to generate a knowledge model (and manually validating that), and then using an LLM to query the model for data to answer a question, is superior to letting the LLM ingest and assess the text directly.<\/p>\n\n\n\n<p>To take an example: If you feed the annual report of a major car manufacturer into a cloud LLM and ask about numbers or strategy, you\u2019ll often get decent answers. Today\u2019s large models can parse tables, charts, even images. But you\u2019ll still feel the need to double-check every response \u2013 which is exactly the point of this article. To make it clearer (and avoid wading through 14 megabytes of corporate boilerplate), we used a short excerpt with a diminutive local model, where hallucinations are easier to spot.<\/p>\n\n\n\n<p>We tested a local LLM (Gemma 3:1B) with a short excerpt from MegaCorp\u2019s annual report .<br><br><strong>The prompt: Show all subsidiaries with ESG contributions <\/strong><br><br>The reply came back smoothly:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"889\" height=\"353\" src=\"https:\/\/www.capgemini.com\/se-en\/wp-content\/uploads\/sites\/20\/2026\/05\/Picture2-e1779183591736.png\" alt=\"\" class=\"wp-image-569333\" srcset=\"https:\/\/www.capgemini.com\/se-en\/wp-content\/uploads\/sites\/20\/2026\/05\/Picture2-e1779183591736.png 889w, https:\/\/www.capgemini.com\/se-en\/wp-content\/uploads\/sites\/20\/2026\/05\/Picture2-e1779183591736.png?resize=300,119 300w, https:\/\/www.capgemini.com\/se-en\/wp-content\/uploads\/sites\/20\/2026\/05\/Picture2-e1779183591736.png?resize=768,305 768w\" sizes=\"auto, (max-width: 889px) 100vw, 889px\" \/><\/figure>\n\n\n\n<p>Convincing \u2013 but wrong. SubZ had no ESG contributions in the source. The model simply invented a plausible answer.<\/p>\n\n\n\n<p>Next, we asked the LLM to build a graph from the same document: nodes, edges, nothing else. After validating that graph, we let the LLM query it. This time it issued a Cypher query and returned only the subsidiaries with HAS_VISION actual ESG contributions. No hallucinated SubZ. And as a bonus, instead of getting merely a reference to a text chunk, you can easily ask the application to also HAS_SUBSIDIARY return the query (Cypher\/SPARQL) as well as the result.<\/p>\n\n\n\n<p>The point is simple: Querying a document index means validating every answer. Querying a graph means validating the graph once.<\/p>\n\n\n\n<p>This illustrates the key point: <strong>When you query a document index, you validate every answer. When you query a graph, you validate the graph once.<\/strong><\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-conclusion\"><strong>Conclusion<\/strong><\/h3>\n\n\n\n<p>At its heart, this isn\u2019t just about documents versus graphs. It is about decoupling the LLM from the knowledge domain, and where you place the burden of validation.<\/p>\n\n\n\n<p>\u2022 Document-first RAG forces validation at the very end of the pipeline \u2013 every answer must be checked, every hallucination caught, every gap patched. It scales poorly and exhausts human oversight and is not agentic compliant.<\/p>\n\n\n\n<p>\u2022 A separated data\/knowledge model approach moves validation upstream. Once information is structured, linked, and governed in a graph, the downstream systems \u2013 LLMs, agents, dashboards \u2013 can operate reliably on a deterministic knowledge base.<\/p>\n\n\n\n<p>This shift enables modularity. In a world of autonomous agents and automated decision-making, you want each layer of your architecture to do one thing well and pass on clean, validated outputs. Ontology evolution, knowledge ingestion, reasoning, and generation become separate modules \u2013 each improvable, replaceable, and automatable.<\/p>\n\n\n\n<p>The principle is simple: <strong>validate early, scale later. <\/strong>By moving the validation point as far upstream as possible, you gain the ability to automate reasoning chains, orchestrate agentic frameworks, and build enterprise AI systems that are not just powerful, but trustworthy.<\/p>\n\n\n\n<p>Throwing PDFs at an LLM may give you an answer. Building a modular knowledge model gives you a system.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1021\" height=\"761\" src=\"https:\/\/www.capgemini.com\/se-en\/wp-content\/uploads\/sites\/20\/2026\/05\/Picture1.png\" alt=\"\" class=\"wp-image-569335\" srcset=\"https:\/\/www.capgemini.com\/se-en\/wp-content\/uploads\/sites\/20\/2026\/05\/Picture1.png 1021w, https:\/\/www.capgemini.com\/se-en\/wp-content\/uploads\/sites\/20\/2026\/05\/Picture1.png?resize=300,224 300w, https:\/\/www.capgemini.com\/se-en\/wp-content\/uploads\/sites\/20\/2026\/05\/Picture1.png?resize=768,572 768w\" sizes=\"auto, (max-width: 1021px) 100vw, 1021px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"682\" height=\"686\" src=\"https:\/\/www.capgemini.com\/se-en\/wp-content\/uploads\/sites\/20\/2026\/05\/Picture3.png?w=682\" alt=\"\" class=\"wp-image-569334\" style=\"width:682px;height:auto\" srcset=\"https:\/\/www.capgemini.com\/se-en\/wp-content\/uploads\/sites\/20\/2026\/05\/Picture3.png 682w, https:\/\/www.capgemini.com\/se-en\/wp-content\/uploads\/sites\/20\/2026\/05\/Picture3.png?resize=150,150 150w, https:\/\/www.capgemini.com\/se-en\/wp-content\/uploads\/sites\/20\/2026\/05\/Picture3.png?resize=298,300 298w\" sizes=\"auto, (max-width: 682px) 100vw, 682px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"970\" height=\"360\" src=\"https:\/\/www.capgemini.com\/se-en\/wp-content\/uploads\/sites\/20\/2026\/05\/Picture4-copy-e1779192297781.png\" alt=\"\" class=\"wp-image-569352\" srcset=\"https:\/\/www.capgemini.com\/se-en\/wp-content\/uploads\/sites\/20\/2026\/05\/Picture4-copy-e1779192297781.png 970w, https:\/\/www.capgemini.com\/se-en\/wp-content\/uploads\/sites\/20\/2026\/05\/Picture4-copy-e1779192297781.png?resize=300,111 300w, https:\/\/www.capgemini.com\/se-en\/wp-content\/uploads\/sites\/20\/2026\/05\/Picture4-copy-e1779192297781.png?resize=768,285 768w\" sizes=\"auto, (max-width: 970px) 100vw, 970px\" \/><\/figure>\n\n\n\n<div style=\"height:32px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-key-take-aways\">Key take aways:<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Stop dumping PDFs into your LLM<\/strong><br>Treat unstructured documents as raw input, not a ready-made knowledge layer.<\/li>\n\n\n\n<li><strong>Validate upstre<\/strong>am<br>Structure and enrich your data \u2013 supported by LLMs \u2013 into a knowledge graph; so hallucinations are caught once, not every time you query.<\/li>\n\n\n\n<li><strong>Let LLMs translate, not invent<\/strong> <br>Use them for natural language understanding and query generation \u2013 but keep answers grounded in deterministic, governed data.<\/li>\n<\/ul>\n\n\n\n<p>Read the full report <strong><a href=\"https:\/\/www.capgemini.com\/se-en\/insights\/research-library\/data-powered-innovation-review-wave-11\/\">Data-powered innovation review &#8211; wave 11<\/a><\/strong><\/p>\n<\/div><\/div><\/div><\/div><\/div><\/section>\n\n\n\n<section class=\" section section--expert-slider wrapper-people-slider wp-block-cg-blocks-wrapper-people-slider\"><div class=\"container\"><div class=\"row\"><div class=\"content-title col-12 col-md-8\"><h2 data-maxlength=\"34\" class=\"people-heading-title\">Meet Our Author <\/h2><\/div><\/div><\/div><div class=\"slider slider-boxed\"><div class=\"container\"><div class=\"slider-window\"><div class=\"slider-list\">\t\t<div class=\"slide\">\n\t\t\t<div class=\"box\">\n\t\t\t\t<div class=\"row\">\n\t\t\t\t\t<div class=\"col-md-6 col-lg-4 box-img-wrapper\">\n\t\t\t\t\t\t<img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/www.capgemini.com\/se-en\/wp-content\/uploads\/sites\/20\/2024\/09\/Joakim-Nilsson-high-res.jpg\" alt=\"Joakim Nilsson\"\/>\n\t\t\t\t\t<\/div>\n\t\t\t\t\t<div class=\"col-md-6 col-lg-8 box-inner\">\n\t\t\t\t\t\t<div class=\"row title-social-media-header\">\n\t\t\t\t\t\t\t<div class=\"col-md-12 col-lg-6 mbl-social-icon\">\n\t\t\t\t\t\t\t\t<ul class=\"social-nav\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<li><a aria-label=\"Linkedin\" target=\"_blank\" title=\"Opens in a new window\" href=\"https:\/\/www.linkedin.com\/in\/joakim-nilsson-866169180\/\"><i aria-hidden=\"true\" class=\"icon-li\"><\/i><\/a><\/li>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/ul>\n\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\t<div class=\"col-md-12 col-lg-6 box-container\">\n\t\t\t\t\t\t\t\t<div class=\"box-title\">\n\t\t\t\t\t\t\t\t\t<h3 class=\"people-profile-title\">Joakim Nilsson<\/h3>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<span>Knowledge Graph Lead, Insights &amp; Data, Client Partner Lead &#8211; Neo4j Europe, Capgemini\u00a0<\/span>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\t<div class=\"col-md-12 col-lg-6 social-box-container dkt-social-icon\">\n\t\t\t\t\t\t\t\t<ul class=\"social-nav\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<li><a aria-label=\"Linkedin\" target=\"_blank\" title=\"Opens in a new window\" href=\"https:\/\/www.linkedin.com\/in\/joakim-nilsson-866169180\/\"><i aria-hidden=\"true\" class=\"icon-li\"><\/i><\/a><\/li>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/ul>\n\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\t\t\t\t\t<div class=\"people-info\">Joakim is part of both the Swedish and European CTO office where he drives the expansion of Knowledge Graphs forward. He is also client partner lead for Neo4j in Europe and has experience running Knowledge Graph projects as a consultant both for Capgemini and Neo4j, both in private and public sector &#8211; in Sweden and abroad.<\/div>\n\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\n\n\t\t<div class=\"slide\">\n\t\t\t<div class=\"box\">\n\t\t\t\t<div class=\"row\">\n\t\t\t\t\t<div class=\"col-md-6 col-lg-4 box-img-wrapper\">\n\t\t\t\t\t\t<img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/www.capgemini.com\/se-en\/wp-content\/uploads\/sites\/20\/2025\/03\/Johan-Mullern-Aspegren.jpg\" alt=\"Johan M\u00fcllern-Aspegren\"\/>\n\t\t\t\t\t<\/div>\n\t\t\t\t\t<div class=\"col-md-6 col-lg-8 box-inner\">\n\t\t\t\t\t\t<div class=\"row title-social-media-header\">\n\t\t\t\t\t\t\t<div class=\"col-md-12 col-lg-6 mbl-social-icon\">\n\t\t\t\t\t\t\t\t<ul class=\"social-nav\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<li><a aria-label=\"Email\" href=\"mailto:johan.mullern-aspegren@capgemini.com\"><i aria-hidden=\"true\" class=\"mail-ico\"><\/i><\/a><\/li>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<li><a aria-label=\"Linkedin\" target=\"_blank\" title=\"Opens in a new window\" href=\"https:\/\/www.linkedin.com\/in\/johan-mullern-aspegren\/\"><i aria-hidden=\"true\" class=\"icon-li\"><\/i><\/a><\/li>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/ul>\n\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\t<div class=\"col-md-12 col-lg-6 box-container\">\n\t\t\t\t\t\t\t\t<div class=\"box-title\">\n\t\t\t\t\t\t\t\t\t<h3 class=\"people-profile-title\">Johan M\u00fcllern-Aspegren<\/h3>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<span>Emerging Tech Lead, Applied Innovation Exchange Nordics, and Core Member of AI Futures Lab, Capgemini<\/span>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\t<div class=\"col-md-12 col-lg-6 social-box-container dkt-social-icon\">\n\t\t\t\t\t\t\t\t<ul class=\"social-nav\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<li><a aria-label=\"Email\" href=\"mailto:johan.mullern-aspegren@capgemini.com\"><i aria-hidden=\"true\" class=\"mail-ico\"><\/i><\/a><\/li>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<li><a aria-label=\"Linkedin\" target=\"_blank\" title=\"Opens in a new window\" href=\"https:\/\/www.linkedin.com\/in\/johan-mullern-aspegren\/\"><i aria-hidden=\"true\" class=\"icon-li\"><\/i><\/a><\/li>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/ul>\n\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\t\t\t\t\t<div class=\"people-info\">Johan M\u00fcllern-Aspegren is Emerging Tech Lead at the Applied Innovation Exchange (AIE) Nordics, where he explores, drives and applies innovation, helping organizations navigate emerging technologies and transform them into strategic opportunities. He is also part of Capgemini\u2019s AI Futures Lab, a global centre for AI research and innovation, where he collaborates with industry and academic partners to push the boundaries of AI development and understanding.<\/div>\n\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div><\/div><\/div><div class=\"slider-nav\"><button class=\"slider-prev inactive\" aria-label=\"Slider-previous\" tabindex=\"-1\"><\/button><ul class=\"slider-paginator\"><\/ul><button class=\"slider-next\" aria-label=\"Slider-next\"><\/button><\/div><\/div><\/section>\n","protected":false},"excerpt":{"rendered":"<p>In the era of AI-driven decision making, organizations are tempted to shortcut their data strategy by feeding large language models (LLMs) with unstructured documents such as PDFs. While this approach may seem convenient, it often results in poor performance, hallucinated outputs, and a lack of meaningful insight. Throwing PDFs into an LLM is not a valid data strategy. <\/p>\n","protected":false},"author":324,"featured_media":569337,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"cg_dt_proposed_to":[],"cg_seo_hreflang_relations":"[]","cg_seo_canonical_relation":"","cg_seo_hreflang_x_default_relation":"","cg_dt_approved_content":true,"cg_dt_mandatory_content":false,"cg_dt_notes":"","cg_dg_source_changed":false,"cg_dt_link_disabled":false,"_yoast_wpseo_primary_brand":"420","_jetpack_memberships_contains_paid_content":false,"footnotes":"","featured_focal_points":""},"categories":[1],"tags":[],"brand":[420],"service":[],"industry":[],"partners":[],"blog-topic":[86],"content-group":[],"class_list":["post-569332","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized","brand-capgemini","blog-topic-data-and-ai"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v22.8 (Yoast SEO v22.8) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Strange folders in the cloud - Distributing PDFs to your LLM is no basis for an AI strategy - Capgemini Sweden<\/title>\n<meta name=\"description\" content=\"Throwing PDFs into an LLM might feel like progress, but without a proper data strategy, it results in hallucinations, weak performance, and missed value.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.capgemini.com\/se-en\/insights\/expert-perspectives\/strange-folders-in-the-cloud-distributing-pdfs-to-your-llm-is-no-basis-for-an-ai-strategy\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Strange folders in the cloud - Distributing PDFs to your LLM is no basis for an AI strategy\" \/>\n<meta property=\"og:description\" content=\"Throwing PDFs into an LLM might feel like progress, but without a proper data strategy, it results in hallucinations, weak performance, and missed value.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.capgemini.com\/se-en\/insights\/expert-perspectives\/strange-folders-in-the-cloud-distributing-pdfs-to-your-llm-is-no-basis-for-an-ai-strategy\/\" \/>\n<meta property=\"og:site_name\" content=\"Capgemini Sweden\" \/>\n<meta property=\"article:published_time\" content=\"2026-05-19T14:31:05+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-05-19T14:31:56+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.capgemini.com\/se-en\/wp-content\/uploads\/sites\/20\/2026\/05\/unsplash1200x630.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"630\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Capgemini\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"santanughosh\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"7 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.capgemini.com\/se-en\/insights\/expert-perspectives\/strange-folders-in-the-cloud-distributing-pdfs-to-your-llm-is-no-basis-for-an-ai-strategy\/\",\"url\":\"https:\/\/www.capgemini.com\/se-en\/insights\/expert-perspectives\/strange-folders-in-the-cloud-distributing-pdfs-to-your-llm-is-no-basis-for-an-ai-strategy\/\",\"name\":\"Strange folders in the cloud - Distributing PDFs to your LLM is no basis for an AI strategy - Capgemini Sweden\",\"isPartOf\":{\"@id\":\"https:\/\/www.capgemini.com\/se-en\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.capgemini.com\/se-en\/insights\/expert-perspectives\/strange-folders-in-the-cloud-distributing-pdfs-to-your-llm-is-no-basis-for-an-ai-strategy\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.capgemini.com\/se-en\/insights\/expert-perspectives\/strange-folders-in-the-cloud-distributing-pdfs-to-your-llm-is-no-basis-for-an-ai-strategy\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.capgemini.com\/se-en\/wp-content\/uploads\/sites\/20\/2026\/05\/unsplash1200x630.jpg\",\"datePublished\":\"2026-05-19T14:31:05+00:00\",\"dateModified\":\"2026-05-19T14:31:56+00:00\",\"author\":{\"@id\":\"https:\/\/www.capgemini.com\/se-en\/#\/schema\/person\/d59b76baf50cb949bddc370f7a57f144\"},\"description\":\"Throwing PDFs into an LLM might feel like progress, but without a proper data strategy, it results in hallucinations, weak performance, and missed value.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.capgemini.com\/se-en\/insights\/expert-perspectives\/strange-folders-in-the-cloud-distributing-pdfs-to-your-llm-is-no-basis-for-an-ai-strategy\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.capgemini.com\/se-en\/insights\/expert-perspectives\/strange-folders-in-the-cloud-distributing-pdfs-to-your-llm-is-no-basis-for-an-ai-strategy\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.capgemini.com\/se-en\/insights\/expert-perspectives\/strange-folders-in-the-cloud-distributing-pdfs-to-your-llm-is-no-basis-for-an-ai-strategy\/#primaryimage\",\"url\":\"https:\/\/www.capgemini.com\/se-en\/wp-content\/uploads\/sites\/20\/2026\/05\/unsplash1200x630.jpg\",\"contentUrl\":\"https:\/\/www.capgemini.com\/se-en\/wp-content\/uploads\/sites\/20\/2026\/05\/unsplash1200x630.jpg\",\"width\":1200,\"height\":630},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.capgemini.com\/se-en\/insights\/expert-perspectives\/strange-folders-in-the-cloud-distributing-pdfs-to-your-llm-is-no-basis-for-an-ai-strategy\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.capgemini.com\/se-en\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Strange folders in the cloud &#8211; Distributing PDFs to your LLM is no basis for an AI strategy\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.capgemini.com\/se-en\/#website\",\"url\":\"https:\/\/www.capgemini.com\/se-en\/\",\"name\":\"Capgemini Sweden\",\"description\":\"Capgemini\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.capgemini.com\/se-en\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.capgemini.com\/se-en\/#\/schema\/person\/d59b76baf50cb949bddc370f7a57f144\",\"name\":\"santanughosh\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.capgemini.com\/se-en\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/b9a8a2aa4e433679f9b84dcf0bb036cf744b205a3345a9afb930ecd08b809f90?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/b9a8a2aa4e433679f9b84dcf0bb036cf744b205a3345a9afb930ecd08b809f90?s=96&d=mm&r=g\",\"caption\":\"santanughosh\"},\"url\":\"https:\/\/www.capgemini.com\/se-en\/author\/santanughosh\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Strange folders in the cloud - Distributing PDFs to your LLM is no basis for an AI strategy - Capgemini Sweden","description":"Throwing PDFs into an LLM might feel like progress, but without a proper data strategy, it results in hallucinations, weak performance, and missed value.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.capgemini.com\/se-en\/insights\/expert-perspectives\/strange-folders-in-the-cloud-distributing-pdfs-to-your-llm-is-no-basis-for-an-ai-strategy\/","og_locale":"en_US","og_type":"article","og_title":"Strange folders in the cloud - Distributing PDFs to your LLM is no basis for an AI strategy","og_description":"Throwing PDFs into an LLM might feel like progress, but without a proper data strategy, it results in hallucinations, weak performance, and missed value.","og_url":"https:\/\/www.capgemini.com\/se-en\/insights\/expert-perspectives\/strange-folders-in-the-cloud-distributing-pdfs-to-your-llm-is-no-basis-for-an-ai-strategy\/","og_site_name":"Capgemini Sweden","article_published_time":"2026-05-19T14:31:05+00:00","article_modified_time":"2026-05-19T14:31:56+00:00","og_image":[{"width":1200,"height":630,"url":"https:\/\/www.capgemini.com\/se-en\/wp-content\/uploads\/sites\/20\/2026\/05\/unsplash1200x630.jpg","type":"image\/jpeg"}],"author":"Capgemini","twitter_card":"summary_large_image","twitter_misc":{"Written by":"santanughosh","Est. reading time":"7 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.capgemini.com\/se-en\/insights\/expert-perspectives\/strange-folders-in-the-cloud-distributing-pdfs-to-your-llm-is-no-basis-for-an-ai-strategy\/","url":"https:\/\/www.capgemini.com\/se-en\/insights\/expert-perspectives\/strange-folders-in-the-cloud-distributing-pdfs-to-your-llm-is-no-basis-for-an-ai-strategy\/","name":"Strange folders in the cloud - Distributing PDFs to your LLM is no basis for an AI strategy - Capgemini Sweden","isPartOf":{"@id":"https:\/\/www.capgemini.com\/se-en\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.capgemini.com\/se-en\/insights\/expert-perspectives\/strange-folders-in-the-cloud-distributing-pdfs-to-your-llm-is-no-basis-for-an-ai-strategy\/#primaryimage"},"image":{"@id":"https:\/\/www.capgemini.com\/se-en\/insights\/expert-perspectives\/strange-folders-in-the-cloud-distributing-pdfs-to-your-llm-is-no-basis-for-an-ai-strategy\/#primaryimage"},"thumbnailUrl":"https:\/\/www.capgemini.com\/se-en\/wp-content\/uploads\/sites\/20\/2026\/05\/unsplash1200x630.jpg","datePublished":"2026-05-19T14:31:05+00:00","dateModified":"2026-05-19T14:31:56+00:00","author":{"@id":"https:\/\/www.capgemini.com\/se-en\/#\/schema\/person\/d59b76baf50cb949bddc370f7a57f144"},"description":"Throwing PDFs into an LLM might feel like progress, but without a proper data strategy, it results in hallucinations, weak performance, and missed value.","breadcrumb":{"@id":"https:\/\/www.capgemini.com\/se-en\/insights\/expert-perspectives\/strange-folders-in-the-cloud-distributing-pdfs-to-your-llm-is-no-basis-for-an-ai-strategy\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.capgemini.com\/se-en\/insights\/expert-perspectives\/strange-folders-in-the-cloud-distributing-pdfs-to-your-llm-is-no-basis-for-an-ai-strategy\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.capgemini.com\/se-en\/insights\/expert-perspectives\/strange-folders-in-the-cloud-distributing-pdfs-to-your-llm-is-no-basis-for-an-ai-strategy\/#primaryimage","url":"https:\/\/www.capgemini.com\/se-en\/wp-content\/uploads\/sites\/20\/2026\/05\/unsplash1200x630.jpg","contentUrl":"https:\/\/www.capgemini.com\/se-en\/wp-content\/uploads\/sites\/20\/2026\/05\/unsplash1200x630.jpg","width":1200,"height":630},{"@type":"BreadcrumbList","@id":"https:\/\/www.capgemini.com\/se-en\/insights\/expert-perspectives\/strange-folders-in-the-cloud-distributing-pdfs-to-your-llm-is-no-basis-for-an-ai-strategy\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.capgemini.com\/se-en\/"},{"@type":"ListItem","position":2,"name":"Strange folders in the cloud &#8211; Distributing PDFs to your LLM is no basis for an AI strategy"}]},{"@type":"WebSite","@id":"https:\/\/www.capgemini.com\/se-en\/#website","url":"https:\/\/www.capgemini.com\/se-en\/","name":"Capgemini Sweden","description":"Capgemini","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.capgemini.com\/se-en\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.capgemini.com\/se-en\/#\/schema\/person\/d59b76baf50cb949bddc370f7a57f144","name":"santanughosh","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.capgemini.com\/se-en\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/b9a8a2aa4e433679f9b84dcf0bb036cf744b205a3345a9afb930ecd08b809f90?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/b9a8a2aa4e433679f9b84dcf0bb036cf744b205a3345a9afb930ecd08b809f90?s=96&d=mm&r=g","caption":"santanughosh"},"url":"https:\/\/www.capgemini.com\/se-en\/author\/santanughosh\/"}]}},"blog_topic_info":[{"id":86,"name":"Data and AI"}],"taxonomy_info":{"category":[{"id":1,"name":"Uncategorized","slug":"uncategorized"}],"brand":[{"id":420,"name":"Capgemini","slug":"capgemini"}],"blog-topic":[{"id":86,"name":"Data and AI","slug":"data-and-ai"}],"following_users":[{"id":143,"name":"santanughosh","slug":"santanughosh"}]},"parsely":{"version":"1.1.0","canonical_url":"https:\/\/capgemini.com\/se-en\/insights\/expert-perspectives\/strange-folders-in-the-cloud-distributing-pdfs-to-your-llm-is-no-basis-for-an-ai-strategy\/","smart_links":{"inbound":0,"outbound":0},"traffic_boost_suggestions_count":0,"meta":{"@context":"https:\/\/schema.org","@type":"NewsArticle","headline":"Strange folders in the cloud &#8211; Distributing PDFs to your LLM is no basis for an AI strategy","url":"https:\/\/www.capgemini.com\/se-en\/insights\/expert-perspectives\/strange-folders-in-the-cloud-distributing-pdfs-to-your-llm-is-no-basis-for-an-ai-strategy\/","mainEntityOfPage":{"@type":"WebPage","@id":"https:\/\/www.capgemini.com\/se-en\/insights\/expert-perspectives\/strange-folders-in-the-cloud-distributing-pdfs-to-your-llm-is-no-basis-for-an-ai-strategy\/"},"thumbnailUrl":"https:\/\/www.capgemini.com\/se-en\/wp-content\/uploads\/sites\/20\/2026\/05\/unsplash1200x630.jpg?w=150&h=150&crop=1","image":{"@type":"ImageObject","url":"https:\/\/www.capgemini.com\/se-en\/wp-content\/uploads\/sites\/20\/2026\/05\/unsplash1200x630.jpg"},"articleSection":"Uncategorized","author":[],"creator":[],"publisher":{"@type":"Organization","name":"Capgemini Sweden","logo":""},"keywords":[],"dateCreated":"2026-05-19T14:31:05Z","datePublished":"2026-05-19T14:31:05Z","dateModified":"2026-05-19T14:31:56Z"},"rendered":"<meta name=\"parsely-title\" content=\"Strange folders in the cloud &#8211; Distributing PDFs to your LLM is no basis for an AI strategy\" \/>\n<meta name=\"parsely-link\" content=\"https:\/\/www.capgemini.com\/se-en\/insights\/expert-perspectives\/strange-folders-in-the-cloud-distributing-pdfs-to-your-llm-is-no-basis-for-an-ai-strategy\/\" \/>\n<meta name=\"parsely-type\" content=\"post\" \/>\n<meta name=\"parsely-image-url\" content=\"https:\/\/www.capgemini.com\/se-en\/wp-content\/uploads\/sites\/20\/2026\/05\/unsplash1200x630.jpg?w=150&amp;h=150&amp;crop=1\" \/>\n<meta name=\"parsely-pub-date\" content=\"2026-05-19T14:31:05Z\" \/>\n<meta name=\"parsely-section\" content=\"Uncategorized\" \/>","tracker_url":"https:\/\/cdn.parsely.com\/keys\/capgemini.com\/p.js"},"jetpack_featured_media_url":"https:\/\/www.capgemini.com\/se-en\/wp-content\/uploads\/sites\/20\/2026\/05\/unsplash1200x630.jpg","archive_status":false,"featured_image_src":"https:\/\/www.capgemini.com\/se-en\/wp-content\/uploads\/sites\/20\/2026\/05\/unsplash1200x630.jpg","featured_image_alt":"","jetpack_sharing_enabled":true,"distributor_meta":false,"distributor_terms":false,"distributor_media":false,"distributor_original_site_name":"Capgemini Sweden","distributor_original_site_url":"https:\/\/www.capgemini.com\/se-en","push-errors":false,"featured_image_url":"https:\/\/www.capgemini.com\/se-en\/wp-content\/uploads\/sites\/20\/2026\/05\/unsplash1200x630.jpg","author_title":"Capgemini","author_thumbnail_url":false,"author_thumbnail_alt":false,"_links":{"self":[{"href":"https:\/\/www.capgemini.com\/se-en\/wp-json\/wp\/v2\/posts\/569332","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.capgemini.com\/se-en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.capgemini.com\/se-en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.capgemini.com\/se-en\/wp-json\/wp\/v2\/users\/324"}],"replies":[{"embeddable":true,"href":"https:\/\/www.capgemini.com\/se-en\/wp-json\/wp\/v2\/comments?post=569332"}],"version-history":[{"count":10,"href":"https:\/\/www.capgemini.com\/se-en\/wp-json\/wp\/v2\/posts\/569332\/revisions"}],"predecessor-version":[{"id":569355,"href":"https:\/\/www.capgemini.com\/se-en\/wp-json\/wp\/v2\/posts\/569332\/revisions\/569355"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.capgemini.com\/se-en\/wp-json\/wp\/v2\/media\/569337"}],"wp:attachment":[{"href":"https:\/\/www.capgemini.com\/se-en\/wp-json\/wp\/v2\/media?parent=569332"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.capgemini.com\/se-en\/wp-json\/wp\/v2\/categories?post=569332"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.capgemini.com\/se-en\/wp-json\/wp\/v2\/tags?post=569332"},{"taxonomy":"brand","embeddable":true,"href":"https:\/\/www.capgemini.com\/se-en\/wp-json\/wp\/v2\/brand?post=569332"},{"taxonomy":"service","embeddable":true,"href":"https:\/\/www.capgemini.com\/se-en\/wp-json\/wp\/v2\/service?post=569332"},{"taxonomy":"industry","embeddable":true,"href":"https:\/\/www.capgemini.com\/se-en\/wp-json\/wp\/v2\/industry?post=569332"},{"taxonomy":"partners","embeddable":true,"href":"https:\/\/www.capgemini.com\/se-en\/wp-json\/wp\/v2\/partners?post=569332"},{"taxonomy":"blog-topic","embeddable":true,"href":"https:\/\/www.capgemini.com\/se-en\/wp-json\/wp\/v2\/blog-topic?post=569332"},{"taxonomy":"content-group","embeddable":true,"href":"https:\/\/www.capgemini.com\/se-en\/wp-json\/wp\/v2\/content-group?post=569332"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}