News from the AI & ML world

DeeperML - #aivideogeneration

@Google DeepMind Blog //
Google has integrated its Veo 2 video-generating AI model into Gemini Advanced, offering subscribers the ability to create short video clips directly from text prompts. This new feature allows users to generate eight-second, 720p videos in a 16:9 aspect ratio, suitable for sharing on platforms like TikTok and YouTube. These videos can be downloaded as MP4 files and include Google's SynthID watermark, helping to identify them as AI-generated content. The availability is currently limited to Google One AI Premium subscribers and does not extend to Google Workspace business or educational plans.

This enhancement positions Google to compete with other AI video generation platforms, such as OpenAI's Sora. The company emphasizes that Veo 2 delivers "detailed videos with cinematic realism" by better understanding real-world physics and human motion. Users can input a variety of prompts, from realistic nature scenes to stylized or surreal sequences, with more detailed prompts generally yielding better results. Although clips are now free to produce for Advanced plan subscribers, Google has indicated that there will be a monthly video generation limit.

In addition to Gemini Advanced, Veo 2 is also being added to Whisk, an experimental tool in Google Labs, under the feature "Whisk Animate." This new function transforms uploaded images into animated video clips, also using the Veo 2 model. As with Gemini, the output is restricted to eight seconds and is only accessible to Premium subscribers in the US. These integrations represent Google's continued investment in AI capabilities and its commitment to providing innovative tools for its users.

Recommended read:
References :

@Google DeepMind Blog //
Google is expanding its AI video generation capabilities by integrating Veo 2, its most advanced generative video model, into the Gemini app and the experimental Whisk platform. This new functionality allows users to create short, high-resolution videos directly from text prompts, opening up new avenues for creative expression and content creation. Veo 2 is designed to produce realistic motion, natural physics, and visually rich scenes, making it a powerful tool for generating cinematic-quality content.

Currently, access to Veo 2 is primarily available to Google One AI Premium subscribers, who can generate eight-second, 720p videos in MP4 format within Gemini Advanced. The Whisk platform also incorporates Veo 2 through its "Whisk Animate" feature, enabling users to transform uploaded images into animated video clips. Google emphasizes that more detailed and descriptive text prompts generally yield better results, allowing users to fine-tune their creations and explore a wide range of styles, from realistic nature scenes to stylized and surreal sequences.

To ensure responsible AI development, Google is implementing several safeguards. All AI-generated videos created with Veo 2 will feature an invisible watermark embedded using SynthID technology, helping to identify them as AI-generated. Additionally, Google is employing red-teaming and review processes to prevent the creation of content that violates its policies. These new video generation features are being rolled out globally and support all languages currently available in Gemini, although standard Gemini users do not have access at this time.

Recommended read:
References :
  • The Official Google Blog: Video showcasing how you can generate videos in Gemini
  • Google DeepMind Blog: Transform text-based prompts into high-resolution eight-second videos in Gemini Advanced and use Whisk Animate to turn images into eight-second animated clips.
  • chromeunboxed.com: Google has announced a significant upgrade to its AI video generation capabilities, integrating the powerful Veo 2 model into both Gemini Advanced and Whisk.
  • PCMag Middle East ai: A new model called DolphinGemma can analyze sounds and put together sequences, accelerating decades-long research projects. Google is collaborating with researchers to learn how to decode dolphin vocalizations "in the quest for interspecies communication."
  • www.tomsguide.com: I just tried Google's newest AI video generation features — and I'm blown away
  • blog.google: Google's DolphinGemma AI model aims to decode dolphin communication, potentially leading to interspecies communication.
  • PCMag Middle East ai: With Veo 2, videos are now free to produce for those on Advanced plans. The Whisk Animate tool also allows you to make images into 8-second videos using the same technology.
  • www.analyticsvidhya.com: Google's new Veo 2 model lets you create cinematic-quality videos from detailed text prompts.
  • www.artificialintelligence-news.com: Google's AI model, DolphinGemma, is designed to interpret and generate dolphin sounds, potentially paving the way for interspecies communication.
  • THE DECODER: Google adds AI video generation to Gemini app and Whisk experiment
  • TestingCatalog: Perplexity adds Gemini 2.5 Pro and voice mode to web platform
  • Analytics Vidhya: Designed to turn detailed text prompts into cinematic-quality videos, Google Veo 2 creates lifelike motion, natural physics, and visually rich scenes across a range of styles. Currently, Google Veo 2 is available only to users in the United States, aged 18 and […] The post appeared first on .
  • TestingCatalog: Google integrates Veo 2 AI into Gemini Advanced, enabling subscribers to create 8-second, 720p videos for TikTok and YouTube. Download MP4s with SynthID watermark.
  • LearnAI: Try generating video in Gemini, powered by Veo 2
  • Analytics India Magazine: Google Rolls Out Video AI Model for Gemini Users, Developers
  • shellypalmer.com: Google’s Veo is Almost Here
  • www.tomsguide.com: Google rolls out Google Photos extension for Gemini — here’s what it can do
  • venturebeat.com: VentureBeat reports on Google’s Gemini 2.5 Flash introduces adjustable ‘thinking budgets’ that cut AI costs by 600% when turned down
  • eWEEK: Google’s AI Video Generator Veo 2 Delivers Cinematic Results
  • the-decoder.com: Google is rolling out new AI-powered video generation features in its Gemini app and the experimental tool Whisk.

@the-decoder.com //
Nvidia is making significant advancements in artificial intelligence, showcasing innovations in both hardware and video generation. A new method developed by Nvidia, in collaboration with Stanford University, UCSD, UC Berkeley, and UT Austin, allows for the creation of AI-generated videos up to one minute long. This breakthrough addresses previous limitations in video length, where models like OpenAI's Sora, Meta's MovieGen, and Google's Veo 2 were capped at 20, 16, and 8 seconds respectively.

The key innovation lies in the introduction of Test-Time Training layers (TTT layers), which are integrated into a pre-trained Transformer architecture. These layers replace simple hidden states in conventional Recurrent Neural Networks (RNNs) with small neural networks that continuously learn during the video generation process. This allows the system to maintain consistency across longer sequences, ensuring elements like characters and environments remain stable throughout the video. This new method has even been showcased with an AI-generated "Tom and Jerry" cartoon.

Furthermore, Nvidia has unveiled its new Llama-3.1 Nemotron Ultra large language model (LLM), which outperforms DeepSeek R1 despite having less than half the parameters. The Llama-3.1-Nemotron-Ultra-253B is a 253-billion parameter model designed for advanced reasoning, instruction following, and AI assistant workflows. Its architecture includes innovations such as skipped attention layers, fused feedforward networks, and variable FFN compression ratios. The model's code is publicly available on Hugging Face, reflecting Nvidia's commitment to open-source AI development.

Recommended read:
References :
  • analyticsindiamag.com: NVIDIA-Backed Rescale secures $115 Mn in Series D Round
  • the-decoder.com: AI-generated Tom chases Jerry for a full minute thanks to new method from Nvidia and others
  • AI News | VentureBeat: Nvidia’s new Llama-3.1 Nemotron Ultra outperforms DeepSeek R1 at half the size
  • THE DECODER: AI-generated Tom chases Jerry for a full minute thanks to new method from Nvidia and others

Kara Sherrer@eWEEK //
Runway AI Inc. has launched Gen-4, its latest AI video generation model, addressing the significant challenge of maintaining consistent characters and objects across different scenes. This new model represents a considerable advancement in AI video technology and improves the realism and usability of AI-generated videos. Gen-4 allows users to upload a reference image of an object to be included in a video, along with design instructions, and ensures that the object maintains a consistent look throughout the entire clip.

The Gen-4 model empowers users to place any object or subject in different locations while maintaining consistency, and even allows for modifications such as changing camera angles or lighting conditions. The model combines visual references with text instructions to preserve styles throughout videos. Gen-4 is currently available to paying subscribers and Enterprise customers, with additional features planned for future updates.

Recommended read:
References :
  • Analytics India Magazine: Runway introduces its Next-Gen Image-to-Video Generation AI Model
  • SiliconANGLE: Runway launches new Gen-4 AI video generator
  • THE DECODER: Runway releases Gen-4 video model with focus on consistency
  • venturebeat.com: Runway Gen-4 solves AI video’s biggest problem: character consistency across scenes
  • www.producthunt.com: Product Hunt page for Runway Gen-4.
  • eWEEK: The Gen-4 model aims to solve several problems with AI video generation including inconsistent characters and objects.
  • iThinkDifferent: Runway has released Gen-4, its latest AI model for video generation. The company says the system addresses one of the biggest challenges in AI video generation: maintaining consistent characters and objects throughout scenes.
  • Charlie Fink: Runway’s Gen-4 release overshadows OpenAI’s image upgrade as Higgsfield, Udio, Prodia, and Pika debut powerful new AI tools for video, music, and image generation.

Kara Sherrer@eWEEK //
Runway AI Inc. has launched Gen-4, a new AI model for video generation designed to address a significant limitation in AI video creation: character consistency across scenes. The New York-based startup, backed by investments from tech giants such as Nvidia and Google, aims to transform film production with this new system, which introduces character and scene consistency across multiple shots. This capability has been elusive for most AI video generators until now, potentially opening new avenues for Hollywood and other creative industries.

Gen-4 allows users to upload a reference image of an object or character and then generate videos where that element retains a consistent look throughout the entire clip. The model combines visual references with text instructions to preserve styles throughout videos, even as details like camera angle or lighting conditions change. Initially, users can generate five- and ten-second clips, but Runway's demo videos hint at future updates that could allow for more complex, longer-form content creation. This technology could also function as an image editing tool, allowing users to combine illustrations and generate multiple variations to streamline the revision process.

Recommended read:
References :
  • Analytics India Magazine: Runway Introduces its Next-Gen Image-to-Video Generation AI Model
  • SiliconANGLE: Runway launches new Gen-4 AI video generator
  • THE DECODER: Runway releases Gen-4 video model with focus on consistency
  • venturebeat.com: Runway's new Gen-4 AI creates consistent characters across entire videos from a single reference image, challenging OpenAI's viral Ghibli trend and potentially transforming how Hollywood makes films.
  • eWEEK: AI Gets Cinematic: Runway’s Gen-4 Brings Film-Quality Consistency to Video Generation
  • Charlie Fink: Runway’s Gen-4 release overshadows OpenAI’s image upgrade as Higgsfield, Udio, Prodia, and Pika debut powerful new AI tools for video, music, and image generation.