Image by Markus Spiske from Unsplash

Everybody is talking about the revolution caused by recent advances in Generative Artificial Intelligence (GenAI), now capable of producing high-quality text, image, audio, and even video. This is impacting far more fields than any previous technological revolution, from programming to journalism and art.

The backbone machines behind this, generally referred to as “models”, frequently exhibit problematic behaviors, such as generating sexist content or facilitating automated hacking. Yet, preventing models from producing these unethical and illegal outcomes remains an open challenge. In short, the technology’s impact is not matched by proportional control.

This situation can be understood (if not justified) by the fact that these models are “trained” using gargantuan amounts of data and processing power. Capabilities improve with more data and compute, with minimal human involvement. This automation drives their success, as models learn patterns from data in a mostly automatic way, enabling highly capable systems. However, it also allows controversial behaviors to emerge, since training data –often undisclosed– can contain implicit or explicit undesirable knowledge.

One main control strategy is unlearning, which aims to remove problematic concepts from the model’s “brain”. This approach faces three challenges: the difficulty of defining what belongs to the target concept (e.g., when does an image start conflicting with copyright law?), our limited understanding of how concepts are stored in the model, and the damage to other model’s capabilities caused by this “lobotomy”. While some methods provide guarantees of forgetting, they tend to harm the model’s performance so notably that makes them unusable. Others offer some level of forgetting at a reduced performance cost, but without such guarantees.

Subsequently, we currently lack an effective way to control these models. Greater regulatory pressure on providers is necessary to enforce stronger monitoring of training data and thorough evaluation of potential problematic behaviors before release.

Postdoctoral researcher on Artificial Intelligence for Data Privacy at the CRISES research group at Rovira i Virgili university. Specialized in text anonymisation and unlearning, with interest in explainability and bio-inspired deep learning.

By Benet Manzanares Salor

Postdoctoral researcher on Artificial Intelligence for Data Privacy at the CRISES research group at Rovira i Virgili university. Specialized in text anonymisation and unlearning, with interest in explainability and bio-inspired deep learning.