Ellie Ramirez-Camara@Data Phoenix
//
Google has recently unveiled a suite of advancements in its AI media generation models at Google I/O 2025, signaling a major leap forward in the field. The highlights include the launch of Veo 3, the first video generation model from Google with integrated audio capabilities, alongside Imagen 4, and Flow, an AI filmmaking tool. These new tools and upgrades to Veo 2 are designed to provide creators with enhanced realism, emotional nuance, and coherence in AI-generated content. These upgrades are designed to target professional markets and are available to Ultra subscribers via the Gemini app and Flow platform.
The most notable announcement was Veo 3, which allows users to generate videos with synchronized audio, including ambient sounds, dialogue, and environmental noise. This model understands complex prompts, enabling users to create short stories brought to life with realistic physics and accurate lip-syncing. Veo 2 also received significant updates, including the ability to use images as references for character and scene consistency, precise camera controls, outpainting capabilities, and object manipulation tools. These enhanced features for Veo 2 are aimed at providing filmmakers with greater creative control. Also introduced was Flow, an AI-powered video creation tool that integrates the Veo, Imagen, and Gemini models into a comprehensive platform. Flow allows creators to manage story elements such as cast, locations, objects, and styles in one interface, enabling them to combine reference media with natural language narratives to generate scenes. Google also introduced "AI Mode" in Google Search and Jules, a powerful new asynchronous coding agent. These advancements are part of Google's broader effort to lead in AI innovation, targeting professional markets with sophisticated tools that simplify the creation of high-quality media content. Recommended read:
References :
@aigptjournal.com
//
Google is making waves in AI video creation with the release of Veo 2, an AI video generator accessible to Gemini Advanced and Google One AI Premium subscribers. This tool empowers users to produce cinema-quality, eight-second, 720p videos in MP4 format with a 16:9 landscape aspect ratio. Veo 2 stands out for its ability to understand real-world physics and human motion, resulting in more fluid character movements, lifelike scenes, and finer visual details across diverse subjects and styles, according to Google.
Users can create videos by simply describing the scene they envision. The more detailed the description, the greater the control over the final video. Users select Veo 2 from the model dropdown in Gemini and can input anything from a short story to a specific visual concept. Once generated, videos can be easily shared on platforms like TikTok and YouTube Shorts using the share button on mobile devices. Google is pushing the boundaries of open endedness AI, to ensure people can use AI to bring there visions to life. One of Veo 2's key features is its ability to generate videos at 720p resolution, with architecture that supports up to 4K. The tool accurately reflects camera angles, lighting, and even cinematic effects, giving users of all backgrounds countless creative possibilities. It is designed for accessibility, allowing anyone from marketers to educators and hobbyists to produce professional-looking videos without expensive equipment or technical skills. Recommended read:
References :
@Google DeepMind Blog
//
Google has integrated its Veo 2 AI model into Gemini Advanced, allowing subscribers to generate 8-second, 720p videos directly from text prompts. Gemini Advanced users can now select Veo 2 to create dynamic videos, which can be shared on platforms like TikTok and YouTube. These videos are downloaded as MP4 files with a SynthID watermark, ensuring authenticity and traceability. This integration is currently available to Gemini Advanced subscribers and does not extend to Google Workspace business and educational plans.
Google is also adding Veo 2 to Whisk, an experimental tool in Google Labs, where users can create videos from image prompts using the new Whisk Animate feature. With Veo 2, users can create detailed videos with cinematic realism from text prompts. The model is designed to better understand real-world physics and human motion, delivering fluid character movement, lifelike scenes, and finer visual details across diverse subjects and styles. You can create up to eight-second-long video clips in a 720p resolution, which will then generate an MP4 in a 16:9 aspect ratio. The new video model allows you to create what Google calls “detailed videos with cinematic realism” from text prompts. You can create up to eight-second-long video clips in a 720p resolution, which will then generate an MP4 in a 16:9 aspect ratio."By better understanding real-world physics and human motion, it delivers fluid character movement, lifelike scenes, and finer visual details across diverse subjects and styles," Google says. Veo 2 was previously available in early access, allowing users to create 1080p video for 50 cents per second of video generated. Clips are now free to produce for those on Advanced plans, but as with most AI video-generation tools, there’s a limit to how many you can request each month. Google didn't share that limit; it says it will tell users as they approach it. Along with the Veo 2 video tool, Google is also introducing Whisk Animate, which allows you to make your images into 8-second videos using the same tech as Veo 2. This feature isn't as readily available as Veo 2, but if you're in the US, you can access it through Google Labs. Recommended read:
References :
@Google DeepMind Blog
//
Google is integrating its Veo 2 video-generating AI model into Gemini Advanced, allowing subscribers to create short, cinematic videos from text prompts. The new feature, launched on April 15, 2025, enables Gemini Advanced users to generate 8-second, 720p videos in a 16:9 aspect ratio, suitable for sharing on platforms like TikTok and YouTube. These videos can be downloaded as MP4 files and include Google's SynthID watermark, ensuring transparency regarding AI-generated content. Currently, this offering is exclusively for Google One AI Premium subscribers and does not extend to Google Workspace business and educational plans.
Veo 2 is also being integrated into Whisk, an experimental tool within Google Labs. This integration includes a new feature called "Whisk Animate" that transforms uploaded images into animated video clips, also utilizing the Veo 2 model. Similar to Gemini, the video output in Whisk is limited to eight seconds and is accessible only to Premium subscribers. The integration of Veo 2 into Gemini Advanced and Whisk represents Google's efforts to compete with other AI video generation platforms. Google's Veo 2 is designed to turn detailed text prompts into cinematic-quality videos with lifelike motion, natural physics, and visually rich scenes. The system is able to interpret detailed text prompts and turn them into fully animated clips with lifelike elements and a strong visual narrative. To ensure responsible use and transparency, Google employs its proprietary SynthID technology, which embeds an invisible watermark into each video frame. The company also implements red-teaming and additional review processes to prevent the creation of content that violates its content policies. The new video generation features are being rolled out globally and support all languages currently available in Gemini. Recommended read:
References :
@Google DeepMind Blog
//
Google is expanding its AI video generation capabilities by integrating Veo 2, its most advanced generative video model, into the Gemini app and the experimental Whisk platform. This new functionality allows users to create short, high-resolution videos directly from text prompts, opening up new avenues for creative expression and content creation. Veo 2 is designed to produce realistic motion, natural physics, and visually rich scenes, making it a powerful tool for generating cinematic-quality content.
Currently, access to Veo 2 is primarily available to Google One AI Premium subscribers, who can generate eight-second, 720p videos in MP4 format within Gemini Advanced. The Whisk platform also incorporates Veo 2 through its "Whisk Animate" feature, enabling users to transform uploaded images into animated video clips. Google emphasizes that more detailed and descriptive text prompts generally yield better results, allowing users to fine-tune their creations and explore a wide range of styles, from realistic nature scenes to stylized and surreal sequences. To ensure responsible AI development, Google is implementing several safeguards. All AI-generated videos created with Veo 2 will feature an invisible watermark embedded using SynthID technology, helping to identify them as AI-generated. Additionally, Google is employing red-teaming and review processes to prevent the creation of content that violates its policies. These new video generation features are being rolled out globally and support all languages currently available in Gemini, although standard Gemini users do not have access at this time. Recommended read:
References :
@the-decoder.com
//
Nvidia is making significant advancements in artificial intelligence, showcasing innovations in both hardware and video generation. A new method developed by Nvidia, in collaboration with Stanford University, UCSD, UC Berkeley, and UT Austin, allows for the creation of AI-generated videos up to one minute long. This breakthrough addresses previous limitations in video length, where models like OpenAI's Sora, Meta's MovieGen, and Google's Veo 2 were capped at 20, 16, and 8 seconds respectively.
The key innovation lies in the introduction of Test-Time Training layers (TTT layers), which are integrated into a pre-trained Transformer architecture. These layers replace simple hidden states in conventional Recurrent Neural Networks (RNNs) with small neural networks that continuously learn during the video generation process. This allows the system to maintain consistency across longer sequences, ensuring elements like characters and environments remain stable throughout the video. This new method has even been showcased with an AI-generated "Tom and Jerry" cartoon. Furthermore, Nvidia has unveiled its new Llama-3.1 Nemotron Ultra large language model (LLM), which outperforms DeepSeek R1 despite having less than half the parameters. The Llama-3.1-Nemotron-Ultra-253B is a 253-billion parameter model designed for advanced reasoning, instruction following, and AI assistant workflows. Its architecture includes innovations such as skipped attention layers, fused feedforward networks, and variable FFN compression ratios. The model's code is publicly available on Hugging Face, reflecting Nvidia's commitment to open-source AI development. Recommended read:
References :
Kara Sherrer@eWEEK
//
Runway AI Inc. has launched Gen-4, its latest AI video generation model, addressing the significant challenge of maintaining consistent characters and objects across different scenes. This new model represents a considerable advancement in AI video technology and improves the realism and usability of AI-generated videos. Gen-4 allows users to upload a reference image of an object to be included in a video, along with design instructions, and ensures that the object maintains a consistent look throughout the entire clip.
The Gen-4 model empowers users to place any object or subject in different locations while maintaining consistency, and even allows for modifications such as changing camera angles or lighting conditions. The model combines visual references with text instructions to preserve styles throughout videos. Gen-4 is currently available to paying subscribers and Enterprise customers, with additional features planned for future updates. Recommended read:
References :
Kara Sherrer@eWEEK
//
Runway AI Inc. has launched Gen-4, a new AI model for video generation designed to address a significant limitation in AI video creation: character consistency across scenes. The New York-based startup, backed by investments from tech giants such as Nvidia and Google, aims to transform film production with this new system, which introduces character and scene consistency across multiple shots. This capability has been elusive for most AI video generators until now, potentially opening new avenues for Hollywood and other creative industries.
Gen-4 allows users to upload a reference image of an object or character and then generate videos where that element retains a consistent look throughout the entire clip. The model combines visual references with text instructions to preserve styles throughout videos, even as details like camera angle or lighting conditions change. Initially, users can generate five- and ten-second clips, but Runway's demo videos hint at future updates that could allow for more complex, longer-form content creation. This technology could also function as an image editing tool, allowing users to combine illustrations and generate multiple variations to streamline the revision process. Recommended read:
References :
|
BenchmarksBlogsResearch Tools |