Google asserts that it trains artificial intelligence using stolen web data

on monday, gizmodo observer That the search giant updated its privacy policy to disclose that various artificial intelligence services, such as Bard and Cloud AI, may be trained on public data the company has scraped from the web.

“Our privacy policy has always been transparent in that Google uses publicly available information from the open web to train language models for services like Google Translate,” said Google spokeswoman Krista Muldoon. the edge. This latest update simply shows the inclusion of newer services like Bard. We incorporate privacy principles and safeguards into the development of our AI technologies, in line with our AI Principles. “

These are the latest changes to the Google Privacy Policy. The company now publicly acknowledges where your data is used at least…
Image: Google

After the update on July 1, 2023, Google Privacy Policy It now states that “Google uses the information to improve our services and to develop new products, features, and technologies that benefit our users and the public” and that the company may “use publicly available information to help train Google’s AI models and build products and features such as Google Translate, Bard, and Cloud AI capabilities.”

You can see who Policy revision date The update provides some additional clarity as to which services will be trained using the data collected. For example, the document now states that the information can be used for “artificial intelligence models” rather than “language models,” giving Google more freedom to train and build systems besides LLM on your public data. And even that note is buried under an embedded link to “publicly accessible sources” below.Your local informationwhich you have to click to open the relevant section.

See also  The DJI Osmo Pocket 3 mini camera has so many upgrades, I don't know where to start

The updated policy specifies that “publicly available information” is used to train Google’s AI products but does not explain how (or if) the company will prevent copyrighted material from being included in this data pool. Many publicly accessible websites have policies in place that prohibit data collection or web scraping for the purpose of training large language models and other AI toolkits. It will be interesting to see how this approach plays with several global regulations such as the General Data Protection Regulation (GDPR) that protect people from having their data misused without their express permission as well.

The combination of these laws and increased market competition has made makers of popular generative AI systems like OpenAI’s GPT-4 very careful about where they got the data used to train them and whether it includes social media posts or copyrighted works by artists. Humans and authors.

The question of whether the fair use doctrine extends to this type of application currently falls into a legal gray area. The uncertainty has sparked various lawsuits and prompted lawmakers in some countries to enact stricter laws that are better equipped to regulate how AI companies collect and use their training data. It also raises questions about how to process this data to ensure that they do not contribute to it serious failures Inside AI systems, with the people tasked with sorting through these huge pools of training data that are often subject to long hours and harsh working conditions.

Gannett, the largest newspaper publisher in the United States, is Google sued and its parent company, Alphabet, claim Advancements in artificial intelligence technology have helped the search giant monopolize the digital advertising market. Products such as Google’s AI search beta have also been called “Plagiarism enginesThey are criticized for starving websites of traffic.

See also  Pokemon Go fans want Niantic to "rethink" future Shadow Raids

Meanwhile, Twitter and Reddit – two social platforms that contain vast amounts of public information – have recently taken over violent Measures to try to prevent other companies from freely harvesting their data. The API changes and restrictions placed on the platforms have been met with backlash by their communities, with anti-dolling changes negatively impacting the core experiences of Twitter and Reddit users.

Leave a Reply

Your email address will not be published. Required fields are marked *