News from the AI & ML world

DeeperML - #aivideogeneration

Ellie Ramirez-Camara@Data Phoenix //
Google has recently unveiled a suite of advancements in its AI media generation models at Google I/O 2025, signaling a major leap forward in the field. The highlights include the launch of Veo 3, the first video generation model from Google with integrated audio capabilities, alongside Imagen 4, and Flow, an AI filmmaking tool. These new tools and upgrades to Veo 2 are designed to provide creators with enhanced realism, emotional nuance, and coherence in AI-generated content. These upgrades are designed to target professional markets and are available to Ultra subscribers via the Gemini app and Flow platform.

The most notable announcement was Veo 3, which allows users to generate videos with synchronized audio, including ambient sounds, dialogue, and environmental noise. This model understands complex prompts, enabling users to create short stories brought to life with realistic physics and accurate lip-syncing. Veo 2 also received significant updates, including the ability to use images as references for character and scene consistency, precise camera controls, outpainting capabilities, and object manipulation tools. These enhanced features for Veo 2 are aimed at providing filmmakers with greater creative control.

Also introduced was Flow, an AI-powered video creation tool that integrates the Veo, Imagen, and Gemini models into a comprehensive platform. Flow allows creators to manage story elements such as cast, locations, objects, and styles in one interface, enabling them to combine reference media with natural language narratives to generate scenes. Google also introduced "AI Mode" in Google Search and Jules, a powerful new asynchronous coding agent. These advancements are part of Google's broader effort to lead in AI innovation, targeting professional markets with sophisticated tools that simplify the creation of high-quality media content.

Recommended read:
References :
  • pub.towardsai.net: TAI #154: Gemini Deep Think, Veo 3’s Audio Breakthrough, & Claude 4’s Blackmail Drama
  • Data Phoenix: Google announced several updates across its media generation models
  • Ars OpenForum: Google's Veo 3 delivers AI videos of realistic people with sound and music. We put it to the test.
  • hothardware.com: Google I/O was about a week ago, and if you haven't heard, one of Google's biggest announcements was the company's Veo 3 generative AI model for video. Gone are the days of creepy, low-quality clips that vaguely look like Will Smith eating spaghetti and don't traverse the uncanny valley very well. Veo 3 is more than capable of generating that
  • The Tech Basic: Google Veo 3 is a new tool that makes eight-second video clips at 720p resolution with matching sound effects and spoken words. It takes a text description or a still image and turns it into moving pictures. It uses a method called diffusion to learn from real videos that it saw during training.
  • THE DECODER: Google says Veo 3 users have generated millions of AI videos in just a few days

@aigptjournal.com //
Google is making waves in AI video creation with the release of Veo 2, an AI video generator accessible to Gemini Advanced and Google One AI Premium subscribers. This tool empowers users to produce cinema-quality, eight-second, 720p videos in MP4 format with a 16:9 landscape aspect ratio. Veo 2 stands out for its ability to understand real-world physics and human motion, resulting in more fluid character movements, lifelike scenes, and finer visual details across diverse subjects and styles, according to Google.

Users can create videos by simply describing the scene they envision. The more detailed the description, the greater the control over the final video. Users select Veo 2 from the model dropdown in Gemini and can input anything from a short story to a specific visual concept. Once generated, videos can be easily shared on platforms like TikTok and YouTube Shorts using the share button on mobile devices. Google is pushing the boundaries of open endedness AI, to ensure people can use AI to bring there visions to life.

One of Veo 2's key features is its ability to generate videos at 720p resolution, with architecture that supports up to 4K. The tool accurately reflects camera angles, lighting, and even cinematic effects, giving users of all backgrounds countless creative possibilities. It is designed for accessibility, allowing anyone from marketers to educators and hobbyists to produce professional-looking videos without expensive equipment or technical skills.

Recommended read:
References :
  • AI GPT Journal: Google Veo 2: The Future of Effortless AI Video Creation for Everyone
  • Last Week in AI: OpenAI’s new GPT-4.1 AI models focus on coding, OpenAI launches a pair of AI reasoning models, o3 and o4-mini, Google’s newest Gemini AI model focuses on efficiency, and more!
  • eWEEK: Gemini Advanced users can now create and share high-resolution videos with its newly released Veo 2.
  • Data Phoenix: Google has launched Veo 2, an advanced AI video generation model that creates high-resolution, realistic 8-second videos from text prompts, now available to Google One AI Premium subscribers through both Gemini Advanced and the Whisk creative experiment.

@Google DeepMind Blog //
Google has integrated its Veo 2 AI model into Gemini Advanced, allowing subscribers to generate 8-second, 720p videos directly from text prompts. Gemini Advanced users can now select Veo 2 to create dynamic videos, which can be shared on platforms like TikTok and YouTube. These videos are downloaded as MP4 files with a SynthID watermark, ensuring authenticity and traceability. This integration is currently available to Gemini Advanced subscribers and does not extend to Google Workspace business and educational plans.

Google is also adding Veo 2 to Whisk, an experimental tool in Google Labs, where users can create videos from image prompts using the new Whisk Animate feature. With Veo 2, users can create detailed videos with cinematic realism from text prompts. The model is designed to better understand real-world physics and human motion, delivering fluid character movement, lifelike scenes, and finer visual details across diverse subjects and styles. You can create up to eight-second-long video clips in a 720p resolution, which will then generate an MP4 in a 16:9 aspect ratio.

The new video model allows you to create what Google calls “detailed videos with cinematic realism” from text prompts. You can create up to eight-second-long video clips in a 720p resolution, which will then generate an MP4 in a 16:9 aspect ratio."By better understanding real-world physics and human motion, it delivers fluid character movement, lifelike scenes, and finer visual details across diverse subjects and styles," Google says. Veo 2 was previously available in early access, allowing users to create 1080p video for 50 cents per second of video generated. Clips are now free to produce for those on Advanced plans, but as with most AI video-generation tools, there’s a limit to how many you can request each month. Google didn't share that limit; it says it will tell users as they approach it. Along with the Veo 2 video tool, Google is also introducing Whisk Animate, which allows you to make your images into 8-second videos using the same tech as Veo 2. This feature isn't as readily available as Veo 2, but if you're in the US, you can access it through Google Labs.

Recommended read:
References :
  • Google DeepMind Blog: Transform text-based prompts into high-resolution eight-second videos in Gemini Advanced and use Whisk Animate to turn images into eight-second animated clips.
  • LearnAI: Starting today, Gemini Advanced users can generate and share videos using our state-of-the-art video model, Veo 2. In Gemini, you can now translate text-based prompts into dynamic videos.
  • PCMag Middle East ai: Google Gemini Advanced Now Lets You Generate 8-Second Video Clips
  • TestingCatalog: Google integrates Veo 2 AI into Gemini Advanced, enabling subscribers to create 8-second, 720p videos for TikTok and YouTube. Download MP4s with SynthID watermark.
  • Shelly Palmer: Google is just about to drop Veo, a video generation model that can create high-quality 1080p footage from text, image, and video prompts. Announced at Google I/O, Veo outputs cinematic shots with accurate physics, realistic motion, and a surprising grasp of visual storytelling — all from a short prompt. The post originally appeared here on
  • eWEEK: Gemini Advanced users can now create and share high-resolution videos with its newly released Veo 2. The AI video generator Veo 2 lets users generate a cinema-quality eight-second, 720p video delivered as an MP4 file in a 16:9 landscape. Veo 2 understands real-world physics and human motion better, which enables it to deliver “fluid character […] The post appeared first on .
  • Data Phoenix: Google introduces Veo 2 for video generation in Gemini and Whisk

@Google DeepMind Blog //
Google is integrating its Veo 2 video-generating AI model into Gemini Advanced, allowing subscribers to create short, cinematic videos from text prompts. The new feature, launched on April 15, 2025, enables Gemini Advanced users to generate 8-second, 720p videos in a 16:9 aspect ratio, suitable for sharing on platforms like TikTok and YouTube. These videos can be downloaded as MP4 files and include Google's SynthID watermark, ensuring transparency regarding AI-generated content. Currently, this offering is exclusively for Google One AI Premium subscribers and does not extend to Google Workspace business and educational plans.

Veo 2 is also being integrated into Whisk, an experimental tool within Google Labs. This integration includes a new feature called "Whisk Animate" that transforms uploaded images into animated video clips, also utilizing the Veo 2 model. Similar to Gemini, the video output in Whisk is limited to eight seconds and is accessible only to Premium subscribers. The integration of Veo 2 into Gemini Advanced and Whisk represents Google's efforts to compete with other AI video generation platforms.

Google's Veo 2 is designed to turn detailed text prompts into cinematic-quality videos with lifelike motion, natural physics, and visually rich scenes. The system is able to interpret detailed text prompts and turn them into fully animated clips with lifelike elements and a strong visual narrative. To ensure responsible use and transparency, Google employs its proprietary SynthID technology, which embeds an invisible watermark into each video frame. The company also implements red-teaming and additional review processes to prevent the creation of content that violates its content policies. The new video generation features are being rolled out globally and support all languages currently available in Gemini.

Recommended read:
References :
  • Google DeepMind Blog: Generate videos in Gemini and Whisk with Veo 2
  • PCMag Middle East ai: With Veo 2, videos are now free to produce for those on Advanced plans. The Whisk Animate tool also allows you to make images into 8-second videos using the same technology.
  • TestingCatalog: Gemini Advanced subscribers can now generate videos with Veo 2
  • THE DECODER: Google adds AI video generation to Gemini app and Whisk experiment
  • Analytics Vidhya: 3 Ways to Access Google Veo 2
  • www.tomsguide.com: I just tried Google's newest AI video generation features — and I'm blown away
  • www.analyticsvidhya.com: 3 Ways to Access Google Veo 2
  • LearnAI: Starting today, Gemini Advanced users can generate and share videos using our state-of-the-art video model, Veo 2. In Gemini, you can now translate text-based prompts into dynamic videos. Google Labs is also making Veo 2 available through Whisk, a generative AI experiment that allows you to create new images using both text and image prompts,...
  • www.tomsguide.com: Google rolls out Google Photos extension for Gemini — here’s what it can do
  • eWEEK: Gemini Advanced users can now create and share high-resolution videos with its newly released Veo 2.
  • Data Phoenix: Google introduces Veo 2 for video generation in Gemini and Whisk

@Google DeepMind Blog //
Google is expanding its AI video generation capabilities by integrating Veo 2, its most advanced generative video model, into the Gemini app and the experimental Whisk platform. This new functionality allows users to create short, high-resolution videos directly from text prompts, opening up new avenues for creative expression and content creation. Veo 2 is designed to produce realistic motion, natural physics, and visually rich scenes, making it a powerful tool for generating cinematic-quality content.

Currently, access to Veo 2 is primarily available to Google One AI Premium subscribers, who can generate eight-second, 720p videos in MP4 format within Gemini Advanced. The Whisk platform also incorporates Veo 2 through its "Whisk Animate" feature, enabling users to transform uploaded images into animated video clips. Google emphasizes that more detailed and descriptive text prompts generally yield better results, allowing users to fine-tune their creations and explore a wide range of styles, from realistic nature scenes to stylized and surreal sequences.

To ensure responsible AI development, Google is implementing several safeguards. All AI-generated videos created with Veo 2 will feature an invisible watermark embedded using SynthID technology, helping to identify them as AI-generated. Additionally, Google is employing red-teaming and review processes to prevent the creation of content that violates its policies. These new video generation features are being rolled out globally and support all languages currently available in Gemini, although standard Gemini users do not have access at this time.

Recommended read:
References :
  • The Official Google Blog: Video showcasing how you can generate videos in Gemini
  • chromeunboxed.com: Google has announced a significant upgrade to its AI video generation capabilities, integrating the powerful Veo 2 model into both Gemini Advanced and Whisk.
  • Google DeepMind Blog: Transform text-based prompts into high-resolution eight-second videos in Gemini Advanced and use Whisk Animate to turn images into eight-second animated clips.
  • PCMag Middle East ai: A new model called DolphinGemma can analyze sounds and put together sequences, accelerating decades-long research projects. Google is collaborating with researchers to learn how to decode dolphin vocalizations "in the quest for interspecies communication."
  • www.tomsguide.com: I just tried Google's newest AI video generation features — and I'm blown away
  • blog.google: Google's DolphinGemma AI model aims to decode dolphin communication, potentially leading to interspecies communication.
  • PCMag Middle East ai: Google's Gemini Advanced now offers free 8-second video clip generation with Veo 2, and image-to-video animation with Whisk Animate.
  • www.analyticsvidhya.com: Google's new Veo 2 model lets you create cinematic-quality videos from detailed text prompts.
  • www.artificialintelligence-news.com: Google's AI model, DolphinGemma, is designed to interpret and generate dolphin sounds, potentially paving the way for interspecies communication.
  • THE DECODER: Google adds AI video generation to Gemini app and Whisk experiment
  • TestingCatalog: Perplexity adds Gemini 2.5 Pro and voice mode to web platform
  • LearnAI: Try generating video in Gemini, powered by Veo 2
  • TestingCatalog: Gemini Advanced subscribers can now generate videos with Veo 2
  • Analytics Vidhya: Designed to turn detailed text prompts into cinematic-quality videos, Google Veo 2 creates lifelike motion, natural physics, and visually rich scenes across a range of styles. Currently, Google Veo 2 is available only to users in the United States, aged 18 and […] The post appeared first on .
  • Analytics India Magazine: Google Rolls Out Video AI Model for Gemini Users, Developers
  • shellypalmer.com: Google’s Veo is Almost Here
  • www.tomsguide.com: Google rolls out Google Photos extension for Gemini — here’s what it can do
  • venturebeat.com: VentureBeat reports on Google’s Gemini 2.5 Flash introduces adjustable ‘thinking budgets’ that cut AI costs by 600% when turned down
  • eWEEK: Google’s AI Video Generator Veo 2 Delivers Cinematic Results
  • TestingCatalog: Google launches Gemini 2.5 Flash model with hybrid reasoning
  • the-decoder.com: Google is rolling out new AI-powered video generation features in its Gemini app and the experimental tool Whisk.
  • Glenn Gabe: Smart move by Google. They are offering Google One AI Premium for FREE to college students through the spring of 2026 Gives you access to 2 TB of storage and incredible AI models, like Gemini 2.5 Pro and Veo 2, via these products: *Gemini Advanced, including Deep Research, Gemini Live, Canvas, and video generation with Veo 2 *NotebookLM Plus, including five times more Audio Overviews, notebooks and more *Gemini in Google Docs, Sheets and Slides
  • bsky.app: Gemini 2.5 Pro and Flash now have the ability to return image segmentation masks on command, as base64 encoded PNGs embedded in JSON strings I vibe coded an interactive tool for exploring this new capability - it costs a fraction of a cent per image https://simonwillison.net/2025/Apr/18/gemini-image-segmentation/
  • Google DeepMind Blog: Introducing Gemini 2.5 Flash
  • www.marketingaiinstitute.com: Google Cloud just wrapped its Next ‘25 event in Las Vegas, , spanning everything from advanced AI models to new ways of connecting your favorite tools with Google’s agentic ecosystem.
  • aigptjournal.com: Google Veo 2: The Future of Effortless AI Video Creation for Everyone
  • Last Week in AI: LWiAI Podcast #207 - GPT 4.1, Gemini 2.5 Flash, Ironwood, Claude Max
  • learn.aisingapore.org: Introducing Gemini 2.5 Flash
  • Data Phoenix: Google introduces Veo 2 for video generation in Gemini and Whisk

@the-decoder.com //
Nvidia is making significant advancements in artificial intelligence, showcasing innovations in both hardware and video generation. A new method developed by Nvidia, in collaboration with Stanford University, UCSD, UC Berkeley, and UT Austin, allows for the creation of AI-generated videos up to one minute long. This breakthrough addresses previous limitations in video length, where models like OpenAI's Sora, Meta's MovieGen, and Google's Veo 2 were capped at 20, 16, and 8 seconds respectively.

The key innovation lies in the introduction of Test-Time Training layers (TTT layers), which are integrated into a pre-trained Transformer architecture. These layers replace simple hidden states in conventional Recurrent Neural Networks (RNNs) with small neural networks that continuously learn during the video generation process. This allows the system to maintain consistency across longer sequences, ensuring elements like characters and environments remain stable throughout the video. This new method has even been showcased with an AI-generated "Tom and Jerry" cartoon.

Furthermore, Nvidia has unveiled its new Llama-3.1 Nemotron Ultra large language model (LLM), which outperforms DeepSeek R1 despite having less than half the parameters. The Llama-3.1-Nemotron-Ultra-253B is a 253-billion parameter model designed for advanced reasoning, instruction following, and AI assistant workflows. Its architecture includes innovations such as skipped attention layers, fused feedforward networks, and variable FFN compression ratios. The model's code is publicly available on Hugging Face, reflecting Nvidia's commitment to open-source AI development.

Recommended read:
References :
  • analyticsindiamag.com: NVIDIA-Backed Rescale secures $115 Mn in Series D Round
  • the-decoder.com: AI-generated Tom chases Jerry for a full minute thanks to new method from Nvidia and others
  • AI News | VentureBeat: Nvidia’s new Llama-3.1 Nemotron Ultra outperforms DeepSeek R1 at half the size
  • THE DECODER: AI-generated Tom chases Jerry for a full minute thanks to new method from Nvidia and others

Kara Sherrer@eWEEK //
Runway AI Inc. has launched Gen-4, its latest AI video generation model, addressing the significant challenge of maintaining consistent characters and objects across different scenes. This new model represents a considerable advancement in AI video technology and improves the realism and usability of AI-generated videos. Gen-4 allows users to upload a reference image of an object to be included in a video, along with design instructions, and ensures that the object maintains a consistent look throughout the entire clip.

The Gen-4 model empowers users to place any object or subject in different locations while maintaining consistency, and even allows for modifications such as changing camera angles or lighting conditions. The model combines visual references with text instructions to preserve styles throughout videos. Gen-4 is currently available to paying subscribers and Enterprise customers, with additional features planned for future updates.

Recommended read:
References :
  • Analytics India Magazine: Runway introduces its Next-Gen Image-to-Video Generation AI Model
  • SiliconANGLE: Runway launches new Gen-4 AI video generator
  • THE DECODER: Runway releases Gen-4 video model with focus on consistency
  • venturebeat.com: Runway Gen-4 solves AI video’s biggest problem: character consistency across scenes
  • www.producthunt.com: Product Hunt page for Runway Gen-4.
  • eWEEK: The Gen-4 model aims to solve several problems with AI video generation including inconsistent characters and objects.
  • iThinkDifferent: Runway has released Gen-4, its latest AI model for video generation. The company says the system addresses one of the biggest challenges in AI video generation: maintaining consistent characters and objects throughout scenes.
  • Charlie Fink: Runway’s Gen-4 release overshadows OpenAI’s image upgrade as Higgsfield, Udio, Prodia, and Pika debut powerful new AI tools for video, music, and image generation.

Kara Sherrer@eWEEK //
Runway AI Inc. has launched Gen-4, a new AI model for video generation designed to address a significant limitation in AI video creation: character consistency across scenes. The New York-based startup, backed by investments from tech giants such as Nvidia and Google, aims to transform film production with this new system, which introduces character and scene consistency across multiple shots. This capability has been elusive for most AI video generators until now, potentially opening new avenues for Hollywood and other creative industries.

Gen-4 allows users to upload a reference image of an object or character and then generate videos where that element retains a consistent look throughout the entire clip. The model combines visual references with text instructions to preserve styles throughout videos, even as details like camera angle or lighting conditions change. Initially, users can generate five- and ten-second clips, but Runway's demo videos hint at future updates that could allow for more complex, longer-form content creation. This technology could also function as an image editing tool, allowing users to combine illustrations and generate multiple variations to streamline the revision process.

Recommended read:
References :
  • Analytics India Magazine: Runway Introduces its Next-Gen Image-to-Video Generation AI Model
  • SiliconANGLE: Runway launches new Gen-4 AI video generator
  • THE DECODER: Runway releases Gen-4 video model with focus on consistency
  • venturebeat.com: Runway's new Gen-4 AI creates consistent characters across entire videos from a single reference image, challenging OpenAI's viral Ghibli trend and potentially transforming how Hollywood makes films.
  • eWEEK: AI Gets Cinematic: Runway’s Gen-4 Brings Film-Quality Consistency to Video Generation
  • Charlie Fink: Runway’s Gen-4 release overshadows OpenAI’s image upgrade as Higgsfield, Udio, Prodia, and Pika debut powerful new AI tools for video, music, and image generation.