Pre-training poisoning targets the initial training corpus. Because foundation models are trained on datasets assembled from web crawls, books, code repositories, and other public sources, an attacker who can publish content that gets included in a future crawl can influence what those models learn. The attack requires patience and volume. Web crawls return enormous amounts of data, so a single poisoned document has minimal impact. But a sustained campaign to seed specific content across many sources, or to position that content in high-weight sources, can produce detectable effects on model behavior.
Fine-tuning poisoning is operationally more accessible. Many organizations fine-tune foundation models on their own data to customize them for specific tasks. If an attacker can influence the fine-tuning dataset, through a supply chain compromise, a compromised data source, or a contribution to a shared dataset used for fine-tuning, they can introduce backdoors into the organization's custom model. The 2025 Lakera research documented a case where code comments on GitHub poisoned a fine-tuned model; when Deepseek's DeepThink-R1 was trained on contaminated repositories, it learned a backdoor that activated months later without any continued external access by the attacker.
RAG poisoning attacks the retrieval component rather than the model itself. In a RAG system, the model retrieves relevant documents from a knowledge base before generating its response. If an attacker can inject malicious content into the knowledge base, that content is retrieved as trusted context for future queries. This is not the same as poisoning training data, since the model's weights are unchanged, but the effect on outputs can be similar: the model generates responses influenced by the attacker's planted content, which users may accept as authoritative because it appears to come from the organization's own knowledge base.