Meta's new AI assistant is trained on public Facebook and Instagram posts

[1/4]Meta CEO Mark Zuckerberg delivers a speech, as the letters AI for artificial intelligence appear on the screen, at a Meta Connect event at the company’s headquarters in Menlo Park, California, US, September 27, 2023. REUTERS/Carlos Barria/File Photo Obtaining licensing rights

MENLO PARK, Calif. (Reuters) – Meta Platforms (META.O) used public posts on Facebook and Instagram to train parts of its new Meta AI virtual assistant, but excluded private posts that were only shared with family and friends in an effort to be respectful. The company told Reuters in an interview that consumer privacy.

Meta also did not use private chats on its messaging services as training data for the model and took steps to filter out private details from public datasets used for training, said Nick Clegg, Meta’s head of global affairs, speaking on the sidelines of the company’s annual Connect conference. this week.

“We tried to exclude datasets with a high preponderance of personal information,” Clegg said, adding that the “vast majority” of the data Meta used for training was publicly available.

He cited LinkedIn as an example of a website whose content Meta intentionally chose not to use due to privacy concerns.

Clegg’s comments come as technology companies, including Meta, OpenAI and Alphabet’s Google, have been criticized for using information obtained from the internet without permission to train their AI models, which ingest massive amounts of data in order to summarize the information and create images. .

Companies are considering how to handle proprietary or copyrighted material offloaded in the process and that their AI systems might reproduce, as they face lawsuits from authors who accuse them of copyright infringement.

Meta AI was the most significant product among the first consumer-facing AI tools unveiled by CEO Mark Zuckerberg on Wednesday at Meta’s annual Connect product conference. Talk about artificial intelligence dominated this year’s event, unlike previous conferences that focused on augmented and virtual reality.

Meta said it built the assistant using a custom template based on the powerful Llama 2 large language model that the company launched for general commercial use in July, as well as a new template called Emu that generates images in response to text prompts.

The product will be able to generate text, audio, and images and will have access to real-time information through a partnership with Microsoft’s Bing search engine.

Public Facebook and Instagram posts that were used to train the Meta AI included text and images, Clegg said.

A Meta spokesperson told Reuters that these posts were used to train Emu on the image generation elements of the product, while the chat functions were based on Llama 2 with the addition of some publicly available and annotated datasets.

Interactions with Meta AI could also be used to improve features in the future, the spokesperson said.

Meta has imposed safety restrictions on the content the Meta AI tool can create, such as prohibiting the creation of realistic images of public figures, Clegg said.

Regarding copyrighted material, Clegg said he expects “a fair amount of litigation” over the issue of “whether or not creative content is covered under the existing fair use doctrine,” which allows limited use of protected works for purposes such as commenting and research. And parodies.

“We believe that is the case, but I strongly doubt that this will be reflected in lawsuits,” Clegg said.

Some companies with image generation tools make it easier to reproduce famous characters like Mickey Mouse, while others have paid for the materials or deliberately avoided including them in the training data.

For example, OpenAI signed a six-year deal with content provider Shutterstock this summer to use the company’s photo, video, and music libraries for training.

Asked whether Meta had taken any such steps to avoid reproducing copyrighted images, a Meta spokesperson pointed to the new terms of service that prevent users from creating content that violates privacy and intellectual property rights.

(Reporting by Katie Paul in Menlo Park, California; Preparing by Muhammad for the Arabic Bulletin) Editing by Kenneth Lee, Matthew Lewis and Lincoln Feast

Our standards: Thomson Reuters Trust Principles.

Obtaining licensing rightsopens a new tab