Our team just returned from GTC 2026 in San Jose. Three days of connecting with others building at the intersection of AI and the physical world, and Jensen didn't disappoint. Meanwhile: Anthropic gave Claude a mouse and a keyboard. OpenAI killed Sora. Cursor got caught running a Chinese model under the hood. And Mamba 3 quietly posted results that should make every Transformer-only lab nervous. The theme this week isn't any single announcement, it's the speed at which the assumptions underneath the AI stack are being replaced. Architectures, business models, product surfaces, even the definition of what an "AI agent" is. All of it shifted in seven days.
Read time: ~5 min
What’s new and interesting in AI/ML this week.
Meta published Hyperagents, a framework where AI agents don't just optimize tasks, they optimize the process by which they optimize tasks. Self-improving agents are the logical endpoint of the agentic AI trend. The paper shows measurable gains on research workflows where the agent iterates on its own methodology between runs.
OpenAI is raising an additional $10B from a16z, D.E. Shaw, MGX, TPG, and others. This comes weeks after killing Sora to redirect compute. The message: even with $10B+ in annual revenue, the compute bill is outpacing the business. Every AI company is now in a race between revenue growth and infrastructure costs.
Coming out of GTC 2026, NVIDIA released OpenClaw (open-source agent framework) and NemoClaw (enterprise layer with privacy, monitoring, and policy controls). Jensen's line: "Every SaaS will become an agentic company." The enterprise agent infrastructure layer is now a three-way race between Anthropic (Claude Cowork), NVIDIA (NemoClaw), and Microsoft (Copilot). The question for vertical builders: which substrate do you build on, or do you build your own?
The Model Under the Hood
Cursor launched Composer 2 last week as "frontier-level coding intelligence" — $0.50 per million input tokens, a new self-summarization technique that compresses working memory mid-task, and benchmark scores that rivaled Claude and GPT-4. Developers noticed it was fast. Unusually fast. Then someone intercepted the API traffic.
Composer 2 is running Moonshot AI's Kimi K2.5 — a 1 trillion parameter model backed by Alibaba and Tencent. Cursor did heavy continued pre-training (they estimate 75% of the compute was theirs), and the self-summarization technique is genuinely novel: it cuts compaction errors by 50% compared to naive context truncation. But the provenance question is the story. The best open coding model available to a Western startup right now is Chinese. Meta's Llama 4 Behemoth is delayed indefinitely. Mistral and Cohere haven't shipped a competitive code model. The Western open-source gap isn't theoretical anymore — it's showing up in production products that millions of developers use daily.
For any company building AI products on top of foundation models, the Cursor episode is a warning: know what's under the hood. Model provenance matters, for compliance, for data sovereignty, and for understanding where your supply chain actually runs. If your AI vendor can't tell you which model they're using, that's a risk factor, not a feature.
Quick hits worth your attention this week.
Claude can now control your Mac, open apps, navigate browsers, fill out forms. Pair your phone via Dispatch and send instructions remotely. The /schedule command runs tasks on a cron. MacStories reported ~50/50 on complex multi-step workflows but near-perfect on single-app retrieval. The agent layer is moving to native surfaces — desktop for knowledge workers, messaging for field operations.
OpenAI is killing Sora, its video generation app. Disney has exited its $1B partnership. The Sora research team is being redirected as compute demand grows. Even OpenAI can't sustain every product line. Video generation is expensive, and the gap between impressive demos and sustainable business models just claimed its highest-profile casualty.
Luma AI released Uni-1, the first model that reasons and generates images simultaneously in a single autoregressive transformer. It doesn't think then create, it does both at once. Outscores Google and OpenAI on image benchmarks at 30% lower cost. The architecture is the interesting part: unified intelligence that collapses the pipeline from multi-step to single-pass.
Carnegie Mellon and Princeton released Mamba 3, the first efficient architecture to consistently match or exceed Transformer baselines. Nearly 4% better on language modeling, 6% faster inference, half the memory. Open source.
Google launched Stitch, an AI-native design canvas that generates full UIs from natural language. Free. Exports to AI Studio. Figma's stock dropped on the announcement. "Vibe coding" was last month's phrase. "Vibe design" is this month's. The pattern: every creative tool is being rebuilt with generation at the core, not bolted on as a feature.
The Surface and the Skill
Anthropic gave Claude a mouse and a keyboard. We gave Derrick a phone number. OpenAI gave Sora a eulogy.
Three bets on where AI meets the real world. Anthropic says the desktop. We say the text thread. OpenAI said video, and this week they admitted the economics don't work. The lesson: the surface has to be native to the workflow. Claude Cowork works because knowledge workers already use Macs. Derrick works because geologists already use iMessage.
But the surface is only half the story. The other half is what the agent carries.
Every agent platform announced this week gives the model access to your environment. Files, apps, APIs. None of them give the model your team's institutional knowledge, your real-time operational context, or purpose-built tools that connect to the systems already running your operation.
We call the knowledge layer skills: validated, versioned, operator-owned. But skills without live context are just documents. And context without the right tools is just data. The three have to work together: skills that encode what your team knows, context that flows in from the wellsite in real time, and tools, from specialized models to MCP integrations, that let the agent actually act on both.
That's the layer we're building at DrillSense. Not another dashboard. Not another copilot. An operational intelligence stack, and Derrick is the first agent running on it.
The stack is being rebuilt from the bottom up. The question isn't which model wins. It's what your agent knows, what it sees, and what it can do.
≠
Data ≠ Decisions. Context changes everything. DrillSense is the intelligence layer for drilling operations, built for the people who make the calls.
Know someone who should be reading this? Send them the archive, or subscribe from drillsense.com.