News from the AI & ML world

DeeperML - #aidata

Megan Crouse@eWEEK //
OpenAI's ChatGPT is expanding its reach with new integrations, allowing users to connect directly to tools like Google Drive and Dropbox. This update allows ChatGPT to access and analyze data from these cloud storage services, enabling users to ask questions and receive summaries with cited sources. The platform is positioning itself as a user interface for data, offering one-click access to files, effectively streamlining the search process for information stored across various documents and spreadsheets. In addition to cloud connectors, ChatGPT has also introduced a "Record" feature for Team accounts that can record meetings, generate summaries, and offer action items.

These new features for ChatGPT come with data privacy considerations. While OpenAI states that files accessed through Google Drive or Dropbox connectors are not used for training its models for ChatGPT Team, Enterprise, and Education accounts, concerns remain about the data usage for free users and ChatGPT Plus subscribers. However, OpenAI confirms that audio recorded by the tool is immediately deleted after transcription, and transcripts are subject to workspace retention policies. Moreover, content from Team, Enterprise, and Edu workspaces, including audio recordings and transcripts from ChatGPT record, is excluded from model training by default.

Meanwhile, Reddit has filed a lawsuit against Anthropic, alleging the AI company scraped Reddit's data without permission to train its Claude AI models. Reddit accuses Anthropic of accessing its servers over 100,000 times after promising to stop scraping and claims Anthropic intentionally trained on the personal data of Reddit users without requesting their consent. Reddit has licensing deals with OpenAI and Google, but Anthropic doesn't have such a deal. Reddit seeks an injunction to force Anthropic to stop using any Reddit data immediately, and also asking the court to prohibit Anthropic from selling or licensing any product that was built using that data. Despite these controversies, Microsoft CEO Satya Nadella has stated that Microsoft profits from every ChatGPT usage, highlighting the success of their investment in OpenAI.

Recommended read:
References :
  • shellypalmer.com: OpenAI's latest update to ChatGPT lets it read your files in Google Drive and Dropbox. Just like that, your cloud storage is now part of your prompt.
  • www.artificialintelligence-news.com: Reddit sues Anthropic over AI data scraping
  • Tech News | Euronews RSS: Social media company Reddit sued artificial intelligence (AI) company Anthropic for allegedly scraping user comments to train its chatbot Claude.
  • www.itpro.com: Latest ChatGPT update lets users record meetings and connect to tools like Dropbox and Google Drive
  • Maginative: Reddit Sues Anthropic for Allegedly Scraping Its Data Without Permission
  • www.windowscentral.com: Satya Nadella says Microsoft makes money every time you use ChatGPT: "Every day that ChatGPT succeeds is a fantastic day"

Chris McKay@Maginative //
Snowflake is aggressively expanding its footprint in the cloud data platform market, moving beyond its traditional data warehousing focus to become a comprehensive AI platform. This strategic shift was highlighted at Snowflake Summit 2025, where the company showcased its vision of empowering business users with advanced AI capabilities for data exploration and analysis. A key element of this transformation is the recent acquisition of Crunchy Data, a move that brings enterprise-grade PostgreSQL capabilities into Snowflake’s AI Data Cloud. This acquisition is viewed as both a defensive and offensive maneuver in the competitive landscape of cloud-native data intelligence platforms.

The acquisition of Crunchy Data for a reported $250 million marks a significant step in Snowflake’s strategy to enable more complex data pipelines and enhance its AI-driven data workflows. Crunchy Data's expertise in PostgreSQL, a well-established open-source database, provides Snowflake with a FedRAMP-compliant, developer-friendly, and AI-ready database solution. Snowflake intends to provide enhanced scalability, operational governance, and performance tooling for its wider enterprise client base by incorporating Crunchy Data's technology. This strategy is meant to address the need for safe and scalable databases for mission-critical AI applications and also places Snowflake in closer competition with Databricks.

Furthermore, Snowflake introduced new AI-powered services at the Summit, including Snowflake Intelligence and Cortex AI, designed to make business data more accessible and actionable. Snowflake Intelligence enables users to query data in natural language and take actions based on the insights, while Cortex AISQL embeds AI operations directly into SQL. These initiatives, coupled with the integration of Crunchy Data’s PostgreSQL capabilities, indicate Snowflake's ambition to be the operating system for enterprise AI. By integrating such features, Snowflake is trying to transform from a simple data warehouse to a fully developed platform for AI-native apps and workflows, setting the stage for further expansion and innovation in the cloud data space.

Recommended read:
References :
  • futurumgroup.com: Is Snowflake’s Crunchy Data Acquisition a Game-Changer in the AI Data Platform Race?
  • BigDATAwire: Why Snowflake Bought Crunchy Data
  • Maginative: The Biggest Announcements from Snowflake Summit 2025
  • futurumgroup.com: Brad Shimmin, Vice President & Practice Lead, Data and Analytics at Futurum shares his/her insights on Snowflake’s acquisition of Crunchy.
  • www.bigdatawire.com: Monday brought the first surprise from Snowflake Summit 25: the acquisition of Crunchy Data for a reported $250 million.
  • futurumgroup.com: Snowflake Summit ’25: Accelerating AI with Unified Data & Compute
  • Maginative: Snowflake has announced its acquisition of Crunchy Data, a company specializing in enterprise-grade PostgreSQL solutions.
  • SiliconANGLE: Snowflake up its AI game, Circle IPO blasts off, and Elon splits with Trump
  • WhatIs: Customers pleased with Snowflake plans for AI
  • BigDATAwire: Agentic AI Spurs Data Stack Updates at Snowflake Summit
  • siliconangle.com: Snowflake up its AI game, Circle IPO blasts off, and Elon splits with Trump

Chris McKay@Maginative //
Snowflake has announced the acquisition of Crunchy Data, a leading provider of enterprise-grade PostgreSQL solutions. This strategic move is designed to enhance Snowflake's AI Data Cloud by integrating robust PostgreSQL capabilities, making it easier for developers to build and deploy AI applications and agentic systems. The acquisition brings approximately 100 employees from Crunchy Data into Snowflake, signaling a significant expansion of Snowflake's capabilities in the database realm. This positions Snowflake to better compete with rivals like Databricks in the rapidly evolving AI infrastructure market, driven by the increasing demand for databases that can power AI agents.

This acquisition comes amidst a "PostgreSQL gold rush," as major platforms recognize the critical role of the data layer in feeding AI agents. Just weeks prior, Databricks acquired Neon, another Postgres startup, and other companies like Salesforce and ServiceNow have also made acquisitions in the data management space. Snowflake's SVP of Engineering, Vivek Raghunathan, highlighted the massive $350 billion market opportunity, underscoring the trend where AI agents, rather than humans, are increasingly driving database usage. PostgreSQL's popularity among developers and its suitability for rapid, automated provisioning make it an ideal choice for AI agent demands.

Crunchy Data brings enterprise-grade operational database capabilities that complement Snowflake's existing strengths. While Snowflake has excelled in analytical workloads involving massive datasets, it has been comparatively weaker on the transactional side, where real-time data storage and retrieval are essential. Crunchy Data's expertise in enterprise and regulated markets, including federal agencies and financial institutions, aligns well with Snowflake's existing customer base. The integration of Crunchy Data's PostgreSQL capabilities will enable Snowflake to provide a more comprehensive solution for organizations looking to leverage AI in their operations.

Recommended read:
References :
  • Maginative: Snowflake has announced its acquisition of Crunchy Data, a company specializing in enterprise-grade PostgreSQL solutions.
  • www.infoworld.com: Snowflake acquires Crunchy Data for enterprise-grade PostgreSQL to counter Databricks’ Neon buy
  • BigDATAwire: Monday brought the first surprise from Snowflake Summit 25: the acquisition of Crunchy Data for a reported $250 million.
  • www.bigdatawire.com: Monday brought the first surprise from Snowflake Summit 25: the acquisition of Crunchy Data for a reported $250 million.

Berry Zwets@Techzine Global //
Snowflake has unveiled a significant expansion of its AI capabilities at its annual Snowflake Summit 2025, solidifying its transition from a data warehouse to a comprehensive AI platform. CEO Sridhar Ramaswamy emphasized that "Snowflake is where data does more," highlighting the company's commitment to providing users with advanced AI tools directly integrated into their workflows. The announcements showcase a broad range of features aimed at simplifying data analysis, enhancing data integration, and streamlining AI development for business users.

Snowflake Intelligence and Cortex AI are central to the company's new AI-driven approach. Snowflake Intelligence acts as an agentic experience that enables business users to query data using natural language and take actions based on the insights they receive. Cortex Agents, Snowflake’s orchestration layer, supports multistep reasoning across both structured and unstructured data. A key advantage is governance inheritance, which automatically applies Snowflake's existing access controls to AI operations, removing a significant barrier to enterprise AI adoption.

In addition to Snowflake Intelligence, Cortex AISQL allows analysts to process images, documents, and audio within their familiar SQL syntax using native functions. Snowflake is also addressing legacy data workloads with SnowConvert AI, a new tool designed to simplify the migration of data, data warehouses, BI reports, and code to its platform. This AI-powered suite includes a migration assistant, code verification, and data validation, aiming to reduce migration time by half and ensure seamless transitions to the Snowflake platform.

Recommended read:
References :
  • www.bigdatawire.com: Snowflake Widens Analytics and AI Reach at Summit 25
  • WhatIs: AI tools highlight latest swath of Snowflake capabilities
  • Techzine Global: Snowflake enters enterprise PostgreSQL market with acquisition of Crunchy Data
  • www.infoworld.com: Snowflake ( ) has introduced a new multi-modal data ingestion service — Openflow — designed to help enterprises solve challenges around data integration and engineering in the wake of demand for generative AI and agentic AI use cases.
  • MarkTechPost: San Francisco, CA – The data cloud landscape is buzzing as Snowflake, a heavyweight in data warehousing and analytics, today pulled the wraps off two potentially transformative AI solutions: Cortex AISQL and Snowflake Intelligence. Revealed at the prestigious Snowflake Summit, these innovations are engineered to fundamentally alter how organizations interact with and derive intelligence from
  • Maginative: Snowflake Acquires Crunchy Data to Bring PostgreSQL to Its AI Cloud
  • www.marktechpost.com: The data cloud landscape is buzzing as Snowflake, a heavyweight in data warehousing and analytics, today pulled the wraps off two potentially transformative AI solutions: Cortex AISQL and Snowflake Intelligence.
  • Latest news: Snowflake's new AI agents make it easier for businesses to make sense of their data
  • BigDATAwire: Snowflake Widens Analytics and AI Reach at Summit 25
  • Maginative: At Snowflake Summit 25, the cloud data giant unveiled its most ambitious product vision yet: transforming from a data warehouse into an AI platform that puts advanced capabilities directly into business users’ hands.
  • siliconangle.com: Snowflake up its AI game, Circle IPO blasts off, and Elon splits with Trump
  • SiliconANGLE: Snowflake up its AI game, Circle IPO blasts off, and Elon splits with Trump
  • futurumgroup.com: Nick Patience, AI Practice Lead at Futurum, shares his insights on Snowflake Summit 2025. Key announcements like Cortex AI, OpenFlow, and Adaptive Compute aim to accelerate enterprise AI by unifying data and enhancing compute efficiency.
  • BigDATAwire: How do you move, store, access, and track data in the age of AI? It sounds like a simple question, but it has complex implications for the companies that are
  • futurumgroup.com: Nick Patience, AI Practice Lead at Futurum, shares his insights on Snowflake Summit 2025.
  • www.bigdatawire.com: How do you move, store, access, and track data in the age of AI?
  • SiliconANGLE: TS Imagine’s trading platform receives AI makeover from Snowflake
  • www.digitimes.com: At a recent summit hosted by US cloud database provider Snowflake, OpenAI CEO Sam Altman shared his vision for the future of artificial intelligence (AI), predicting that by 2026, AI agents will evolve to assist in solving complex business challenges and even help humans "discover" new knowledge.

@openssf.org //
Global cybersecurity agencies, including the U.S. Cybersecurity and Infrastructure Security Agency (CISA), the National Security Agency (NSA), the Federal Bureau of Investigation (FBI), and international partners, have jointly released guidance on AI data security best practices. The new Cybersecurity Information Sheet (CSI) aims to address the critical importance of securing data used to train and operate AI systems, emphasizing that the accuracy, integrity, and trustworthiness of AI outcomes are directly linked to the quality and security of the underlying data. The guidance identifies potential risks related to data security and integrity throughout the AI lifecycle, from initial planning and design to post-deployment operation and monitoring.

Building on previous guidance, the new CSI provides ten general best practices organizations can implement to enhance AI data security. These steps include ensuring data comes from trusted, reliable sources using provenance tracking to verify data changes, using checksums and cryptographic hashes to maintain data integrity during storage and transport, and employing quantum-resistant digital signatures to authenticate and verify trusted revisions during training and other post-training processes. The guidance also recommends using only trusted infrastructure, such as computing environments leveraging zero trust architecture, classifying data based on sensitivity to define proper access controls, and encrypting data using quantum-resistant methods like AES-256.

The guidelines also emphasize the importance of secure data storage using certified devices compliant with NIST FIPS 140-3, which covers security requirements for cryptographic modules, and privacy preservation of sensitive data through methods like data masking. Furthermore, the agencies advise secure deletion of AI training data from repurposed or decommissioned storage devices. Owners and operators of National Security Systems, the Defense Industrial Base, federal agencies, and critical infrastructure sectors are urged to review the publication and implement its recommended best practices to mitigate risks like data supply chain poisoning and malicious data tampering.

Recommended read:
References :
  • industrialcyber.co: Global cybersecurity agencies release AI data security guidelines, highlight data integrity as weakness
  • www.scworld.com: AI data security best practices outlined by CISA and partners
  • Tenable Blog: Check out expert recommendations for protecting your AI system data. Plus, boost your IT department’s cybersecurity skills with a new interactive framework.

@insidehpc.com //
NVIDIA and Dataiku are collaborating on the NVIDIA AI Data Platform reference design to support organizations' generative AI strategies by simplifying unstructured data storage and access. This collaboration aims to democratize analytics, models, and agents within enterprises by enabling more users to harness high-performance NVIDIA infrastructure for transformative innovation. As a validated component of the full-stack reference architecture, any agentic application developed in Dataiku will work on the latest NVIDIA-Certified Systems, including NVIDIA RTX PRO Server and NVIDIA HGX B200 systems. Dataiku will also work with NVIDIA on the NVIDIA AI Data Platform reference design, built to support organizations’ generative AI strategies by simplifying unstructured data storage and access.

DDN (DataDirect Networks) also announced its collaboration with NVIDIA on the NVIDIA AI Data Platform reference design. This collaboration aims to simplify how unstructured data is stored, accessed, and activated to support generative AI strategies. The DDN-NVIDIA offering combines DDN Infinia, an AI-native data platform, with NVIDIA NIM and NeMo Retriever microservices, NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs, and NVIDIA Networking. This enables enterprises to deploy Retrieval-Augmented Generation (RAG) pipelines and intelligent AI applications grounded in their own proprietary data—securely, efficiently, and at scale.

Starburst is also adding agentic AI capabilities to its platform, including a pre-built agent for insight exploration as well as tools and tech for building custom agents. These new agentic AI capabilities include Starburst AI Workflows, which includes a collection of capabilities, including vector-native AI search, AI SQL functions, and AI model access governance functions. The AI search functions include a built-in vector store that allows users to convert data into vector embeddings and then to search against them. Starburst is storing the vector embeddings in Apache Iceberg, which it has built its lakehouse around.

Recommended read:
References :

@Dataconomy //
Databricks has announced its acquisition of Neon, an open-source database startup specializing in serverless Postgres, in a deal reportedly valued at $1 billion. This strategic move is aimed at enhancing Databricks' AI infrastructure, specifically addressing the database bottleneck that often hampers the performance of AI agents. Neon's technology allows for the rapid creation and deployment of database instances, spinning up new databases in milliseconds, which is critical for the speed and scalability required by AI-driven applications. The integration of Neon's serverless Postgres architecture will enable Databricks to provide a more streamlined and efficient environment for building and running AI agents.

Databricks plans to incorporate Neon's scalable Postgres offering into its existing big data platform, eliminating the need to scale separate server and storage components in tandem when responding to AI workload spikes. This resolves a common issue in modern cloud architectures where users are forced to over-provision either compute or storage to meet the demands of the other. With Neon's serverless architecture, Databricks aims to provide instant provisioning, separation of compute and storage, and API-first management, enabling a more flexible and cost-effective solution for managing AI workloads. According to Databricks, Neon reports that 80% of its database instances are provisioned by software rather than humans.

The acquisition of Neon is expected to give Databricks a competitive edge, particularly against competitors like Snowflake. While Snowflake currently lacks similar AI-driven database provisioning capabilities, Databricks' integration of Neon's technology positions it as a leader in the next generation of AI application building. The combination of Databricks' existing data intelligence platform with Neon's serverless Postgres database will allow for the programmatic provisioning of databases in response to the needs of AI agents, overcoming the limitations of traditional, manually provisioned databases.

Recommended read:
References :
  • Databricks: Today, we are excited to announce that we have agreed to acquire Neon, a developer-first, serverless Postgres company.
  • www.infoworld.com: Databricks to acquire open-source database startup Neon to build the next wave of AI agents
  • www.bigdatawire.com: Databricks Nabs Neon to Solve AI Database Bottleneck
  • Dataconomy: Databricks has agreed to acquire Neon, an open-source database startup, for approximately $1 billion.
  • BigDATAwire: Databricks today announced its intent to buy Neon, a database startup founded by Nikita Shamgunov that develops a serverless and infinitely scalable version of the open source Postgres database.
  • Techzine Global: Neon’s technology can spin up a Postgres instance in less than 500 milliseconds, which is crucial for AI agents’ fast working methods.
  • AI News | VentureBeat: The $1 Billion database bet: What Databricks’ Neon acquisition means for your AI strategy
  • analyticsindiamag.com: Databricks to Acquire Database Startup Neon for $1 Billion