technology
Our most advanced text-to-image technology
Imagen 2 is our most advanced text-to-image publishing technology, delivering high-quality photorealistic output that is closely aligned and consistent with the user prompt. It can create more realistic images using a normal distribution of its training data, rather than adopting a pre-programmed pattern.
Imagen 2’s powerful text-to-image technology is available to developers and cloud customers via Imagen API in Google Cloud Vertex AI.
The Google Arts & Culture team is also deploying our Imagen 2 technology in… Cultural icons Experience, allowing users to explore, learn and test their cultural knowledge with the help of Google AI.
Improve understanding of image captions
Text-to-image models learn how to generate images that match a user’s prompt from details in the images and captions of their training datasets. But the quality of detail and accuracy in this pairing can vary widely for each image and caption.
To help create higher-quality, more accurate images that better align with the user prompt, more description has been added to the image captions in the Imagen 2 training dataset, helping Imagen 2 learn different styles of captions and generalize to better understand a wider range of user prompts. better.
These improved image-caption pairings help Imagen 2 better understand the relationship between images and words – increasing its understanding of context and nuance.
Here are examples of quick understanding of Imagen 2:
Generate more realistic images
The Imagen 2 dataset and model developments have brought improvements in several areas where text-to-image tools often struggle, including rendering realistic human hands and faces and keeping images free of distracting visual effects.
We’ve trained a specialized model for image aesthetics based on human preferences for qualities like good lighting, framing, exposure, sharpness, and more. Each image was given an aesthetics score that helped improve Imagen 2 to give more weight to images in its training dataset that match qualities preferred by humans. This technology improves Imagen 2’s ability to create higher quality images.
Fluid pattern adaptation
Imagen 2’s diffusion-based technologies provide a high degree of flexibility, making it easy to control and adjust the image style. By providing reference style images with a text prompt, we can adapt Imagen 2 to create new images that follow the same style.
Advanced painting and exterior painting
Imagen 2 also allows image editing capabilities such as “inpainting” and “outpainting”. By providing a reference image and an image mask, users can create new content directly in the original image using a technique called inpainting, or extend the original image beyond its boundaries using outpainting. This technology is planned for Google Cloud’s Vertex AI in the new year.
Responsible for design
To help mitigate the potential risks and challenges of text-to-image technology, we have strong guardrails in place, from design, development to deployment in our products.
Imagen 2 is integrated with SynthID, our cutting-edge toolset for watermarking and identifying AI-generated content, enabling permitted Google Cloud customers to add an imperceptible digital watermark directly to image pixels, without compromising image quality . This allows the watermark to remain discoverable by SynthID, even after applying modifications such as filters, cropping, or saving using lossy compression schemes.
Before we release capabilities to users, we conduct robust security testing to reduce the risk of harm. From the beginning, we invested in data integrity training for Imagen 2, and added technical guardrails to reduce problematic output such as violent, abusive, or sexually explicit content. We apply sanity checks to the training data and input and output claims that the system generates at build time. For example, we implement comprehensive security filters to avoid creating potentially problematic content, such as images of named individuals. As we expand the capabilities and launch of Imagen 2, we are also continually evaluating it for safety.
Thanks and appreciation
This work was made possible thanks to key research and engineering contributions from:
Aaron van den Oord, Ali Rizvi, Benigno Oria, Çağlar Unlu, Charlie Nash, Chris Wolfe, Conor Durkan, David Ding, Dowd Gurney, Evgeni Gladchenko, Felix Riedel, Hang Qi, Jacob Kelly, Jacob Bauer, Jeff Donahue, Junlin Zhang, Mateusz Malinowski, Mikołaj Binkowski, Pauline Luke, Robert Riacci, Robin Strudel, Sander Dielemann, Tobina Peter Igoe, Jaroslaw Janin, Zach Eaton-Rosen.
Thanks to: Ben Bariach, Don Bloxwich, Ed Hirst, Elspeth White, Gemma Jennings, Jenny Brennan, Komal Singh, Louis C. Kubo, Miaozen Wang, Nick Pizzuti, Nicole Breshtova, Nidhi Vyas, Nina Anderson, Norman Casagrande, Sasha Braun, Sven Jawwal, Tulsi Doshi, Will Hawkins, Yelin Kim, Zahra Ahmed for delivery leadership; Douglas Ek, Nando De Freitas, Oriol Viñales, Eli Collins, Demis Hassabis for their advice.
Thanks also to the many others who contributed via Google DeepMind, including our partners at Google.
“Typical beer trailblazer. Hipster-friendly web buff. Certified alcohol fanatic. Internetaholic. Infuriatingly humble zombie lover.”