The move against San Francisco-based OpenAI marks the strongest regulatory threat yet to a company that sparked the frenzy in generative artificial intelligence, enthralling consumers and businesses while raising concerns about its potential risks.
The Federal Trade Commission (FTC) has sent a 20-page demand for records about how OpenAI addresses risks related to its AI models. The agency is investigating whether the company engaged in unfair or deceptive practices that resulted in “reputational harm” to consumers.
One question relates to steps OpenAI has taken to address the potential for its products to “generate statements about real individuals that are false, misleading, or disparaging”. In one notorious example, ChatGPT falsely accused a US law professor of sexual harassment, and cited a non-existent Washington Post article in the process.
Operating in a way akin to predictive text, chatbots are based on models that predict the likeliest word or sentence to come after the user’s prompt. This can result in factual errors, but the plausible and human-seeming nature of the responses can trick users into thinking a response is 100% correct. The models are trained on vast amounts of data taken from the internet.
The FTC has also asked OpenAI to disclose the data it used to train the large language models that underpin products such as ChatGPT, something that OpenAI has declined to do so far. The US comedian Sarah Silverman is among the authors suing OpenAI over claims that ChatGPT’s LLM has been trained on data that includes their work.
The FTC has demanded to know whether OpenAI obtained the data from the internet directly – via a process known as “scraping” – or by purchasing it from third parties. It also asks for the names of the websites that data has been taken from, as well as any steps taken to prevent personal information from being included in the training data.
The Washington Post was first to report the investigation. The FTC declined comment. OpenAI has also been contacted for comment.
Enza Iannopollo, principal analyst at research firm Forrester, said poor governance within AI companies risked becoming a “disaster” for consumers and the businesses themselves, which risk investigations and fines.
“As long as large language models (LLMs) remain opaque and rely largely on scraped data for training, the risks of privacy abuses and harm to individuals will continue to grow,” she said.
As the race to develop more powerful AI services accelerates, regulatory scrutiny of the technology that could upend the way societies and businesses operate is growing.
Global regulators are aiming to apply existing rules covering everything from copyright and data privacy to two key issues: the data fed into models and the content they produce. In the UK, the prime minister, Rishi Sunak, has convened a global AI safety summit in the autumn, while the domestic competition watchdog is also scrutinising the industry.
In the US, the Senate majority leader, Chuck Schumer, has called for “comprehensive legislation” to advance and ensure safeguards on AI and will hold a series of forums later this year.
OpenAI in March also ran into trouble in Italy, where the regulator had ChatGPT taken offline over accusations OpenAI violated the EU’s GDPR – a wide-ranging privacy regime enacted in 2018.
ChatGPT was reinstated later after the US company agreed to install age verification features and let European users block their information from being used to train the AI model.