How LLM’s Unlearn - The Anne Markis Graham

Or more accurately, can they?

How Do LLMs Know Anything?

Large Language Models (LLMs) “know” things by learning from vast datasets. How does that process actually work?

It begins with data collection, pulling information from a wide range of sources like books, websites, and social media. This raw data is then broken down into tokens—smaller chunks of text, such as words or phrases. These tokens are turned into numerical representations (embeddings), which allow the model to understand relationships between words.

Once tokenized and embedded, the model undergoes training. During training, the model learns to predict patterns, meaning, and context from billions of text samples. After this initial phase, fine-tuning can be applied. Fine-tuning hones in on specific tasks or corrects certain behaviors by adjusting the model’s parameters based on more targeted data. Through this cycle, LLMs build their “knowledge,” but as they accumulate data, there’s the growing challenge of dealing with private, inaccurate, or harmful information.

Why Forget?

The need for LLMs to forget has become more apparent as these models have scaled up. LLMs sometimes inadvertently process private information, which could present legal and ethical issues.

Misinformation can sneak into the training process, leading to outputs that are factually wrong or even dangerous. These models can amplify harmful content if they aren’t (successfully) trained to recognize and avoid it. Certain “jailbreaking” techniques are rampant that try to get LLM’s to behave outside of their intended use.

Given these risks, the ability for an LLM to unlearn or forget specific pieces of information is critical to ensure safety, accuracy, and privacy.

Current Methods of Having an LLM Forget

The most direct approach is retraining the model from scratch. While this method guarantees the removal of unwanted information, it’s extremely resource-intensive. Retraining requires massive amounts of computational power, time, and money money, which makes it an impractical solution for frequent or minor corrections. It’s also a non-starter for anyone that buys a pre-trained model. (Many pre-trained models are also very secretive on their input sources).

Fine-tuning is another approach. Instead of starting over, fine-tuning allows developers to adjust the model’s knowledge by feeding it new data, ideally overwriting the incorrect or harmful information. However, fine-tuning doesn’t always effectively remove the old data—sometimes, the outdated information lingers beneath the surface and in some cases undo earlier fine-tuning.

And then there’s Retrieval-Augmented Generation (RAG). This method doesn’t actually erase data but works by giving priority to more relevant, up-to-date information when a response is generated. Essentially, RAG overrides the old information by emphasizing what’s currently most important, but the unwanted data still technically exists within the model’s architecture.

Can LLMs Forget?

The ability for LLMs to truly forget is still an open question. A recent study (2023) found that fine-tuning, while useful for updating a model’s knowledge, isn’t particularly effective at making the model “unlearn” specific information. Even after targeted fine-tuning, traces of the outdated or incorrect data may still influence the model’s behavior.

Up until recently, much of the research on LLMs has focused on how to make them retain as much knowledge as possible. A 2019 study highlighted that the primary concern was optimizing retention, not forgetting. Only now, with growing concerns over privacy and misinformation, are researchers shifting their attention to unlearning. But developing effective techniques for “forgetting” is a complex challenge, and existing methods fall short of truly erasing unwanted data.

Forgetting in LLMs isn’t just a matter of erasing a few data points—it involves rewiring the relationships and patterns the model has learned. And, as models become larger and more complex, selectively unlearning specific pieces of information without harming the model’s overall capabilities remains a major technical hurdle.

In short, while LLMs can be updated, they can’t yet forget in a human-like way. True unlearning is still a work in progress. As research evolves, finding effective ways to make LLMs forget will become crucial in ensuring their responsible and ethical deployment.

Sources

https://www.lesswrong.com/posts/mFAvspg4sXkrfZ7FA/deep-forgetting-and-unlearning-for-safely-scoped-llms

https://arxiv.org/pdf/2310.10683