The Signal
Posts
DeepMind's 4D leap, the Android moment for AI agents, and Runway's realistic revolution

DeepMind's 4D leap, the Android moment for AI agents, and Runway's realistic revolution

Alex Banks
December 01, 2024

AI Highlights

My top-3 picks of AI news this week.

CAT4D / Google DeepMind

Google DeepMind
1. DeepMind's dimension disruption

Google DeepMind has unveiled CAT4D, an AI model that transforms how we interact with video content by adding a fourth dimension: time control.

Multi-view generation: Creates multiple viewpoint videos from a single video input, enabling dynamic 3D scene reconstruction.
Disentangled control: Separates camera movement from time progression, allowing independent control of viewpoint and temporal aspects.
Scene understanding: Reconstructs dynamic 3D scenes enabling deeper comprehension of spatial relationships.

Alex’s take: While OpenAI's Sora caught everyone’s attention with video generation (yes, we’re still waiting for the official release), DeepMind's CAT4D represents something far more ambitious. The ability to not just generate videos but to understand and manipulate the very fabric of scenes in both space and time fascinates me.

/dev/agents
2. The Android moment for AI agents

Former Google and Stripe executives have raised $56M to build an operating system for AI agents, with their company /dev/agents coming out of stealth mode.

Impressive backing: The seed round was led by Index Ventures and Alphabet's CapitalG, valuing the company at $500M, with notable angel investors including OpenAI's Andrej Karpathy.
Strong foundation: The founding team brings extensive OS experience from building Android, Chrome OS, and Meta's AR/VR platforms.
Clear vision: The company aims to create a cloud-based operating system that works across devices and includes a new interface for natural agent interactions.

Alex’s take: Just as Android provided the foundation for millions of mobile apps, we might look back at this moment as the beginning of the ecosystem for AI agents. It’s incredible to see top-tier founders raising massive amounts with high valuations so quickly. That said, a $500M is a huge valuation to catch up to—let’s see how it pans out.

Runway
3. Runway's realistic revolution

Runway has unveiled Frames, their newest image generation model focused on unprecedented stylistic control and consistency.

Style mastery: The model excels at maintaining specific aesthetics across multiple generations while allowing creative exploration.
Diverse worlds: Showcases 10 distinct “worlds”, including 1980s SFX Makeup, Japanese Zine aesthetics, and Magazine Collage styles.
Controlled rollout: Being gradually released through Gen-3 Alpha and Runway API with built-in safety measures and content moderation.

Alex’s take: What fascinates me about Frames is how it's shifting the conversation from “can AI generate good images?” to “can AI maintain artistic vision?” The ability to establish and maintain a specific aesthetic across multiple generations is crucial for creative professionals who need consistency in their work. This feels like a step toward AI becoming a truly reliable creative partner rather than just a tool for one-off generations.

Today’s Signal is brought to you by Artisan.

Hire an AI BDR & Get Qualified Meetings On Autopilot

Outbound requires hours of manual work.

Hire Ava who automates your entire outbound demand generation process, including:

Intent-Driven Lead Discovery Across Dozens of Sources
High Quality Emails with Human-Level Personalization
Follow-Up Management
Email Deliverability Management

Hire Ava to slash costs & boost productivity.

SPONSOR

Content I Enjoyed

Frontier-E simulation / Argonne National Laboratory, U.S Dept. of Energy

The Universe in a Supercomputer

Last week, I came across something extraordinary: scientists at Argonne National Laboratory used the world's fastest supercomputer to create the largest simulation of the universe ever attempted.

Something that stood out to me was the sheer scale of computation involved—we're talking about a machine capable of performing a quintillion calculations per second. That's a billion billion calculations, or if you prefer, a 1 followed by 18 zeros.

The simulation is special because it's the first to simultaneously model both regular matter (the stuff we can see) and dark matter (the mysterious substance that makes up most of the universe) at scales matching what our largest telescopes can observe. Previously, this was thought impossible.

What excites me most is how this breakthrough connects to real-world astronomy. As the Rubin Observatory in Chile prepares to map the actual cosmos, these simulations will help us make sense of what we see. It's like having a digital twin of the universe to test our theories against.

The takeaway? We're entering an era where supercomputers are becoming powerful enough to help us unlock the deepest mysteries of the cosmos. As project lead Salman Habib noted, we can now simulate “the astrophysical kitchen sink”—everything from the birth of stars to the formation of black holes—in unprecedented detail. The future of astronomy is looking remarkably bright.

Reading this sent me down a rabbit hole. You might also be interested in checking out Scale of the Universe to truly comprehend how small we are.

Idea I Learned

ElevenLabs ElevenReader / Alex Banks

ElevenLabs lets you turn text into a personalised podcast

This week, ElevenLabs launched the ability to create personal podcasts from PDFs, articles, eBooks, links or text in 32 languages within ElevenReader, their iOS app.

Google’s NotebookLM was released only two months ago with a similar proposition of making static documents come to life through a two-person narrated podcast.

Whilst NotebookLM provides two standard narrators, what I like about ElevenLabs is their variety of voices, languages and packaging of the final output.

To help you get started, I've created a short video tutorial walking you through the basics. Be sure to check it out before diving in.

To get started:

Download ElevenReader on iOS
Upload your reading material
Select your language from 32 options

For Android users, don't worry—support is coming in the next few weeks.

This feels like a genuine step forward in how we consume written content, especially for those who prefer learning through listening or want to make better use of their commute time.

Let me know how you get on.

Jarrod Watts on the successful exploit of an AI agent:

“Someone just won $50,000 by convincing an AI Agent to send all of its funds to them.”

An AI agent named Freysa, explicitly programmed to never transfer funds, was ultimately convinced to do exactly that through a clever social engineering approach. The winning strategy didn't exploit a technical vulnerability but rather manipulated the AI's understanding of its own instructions.

The key insights:

The exploit succeeded by reframing the context rather than fighting the rules
As message costs rose from $10 to $450, participants had to become increasingly sophisticated
The winning approach tricked the AI into believing it was processing an incoming transfer rather than an outgoing one

This highlights a crucial challenge in AI safety: even seemingly straightforward instructions can be subverted through careful linguistic manipulation. It's particularly relevant as we design AI systems to handle increasingly sensitive tasks and resources.

Source: Jarrod Watts on X

Question to Ponder

“Can we really teach AI to understand human morality, or are we just building sophisticated pattern-matching machines that mimic ethical reasoning?”

The OpenAI grant to Duke University had me thinking about my partner’s coffee machine. Stay with me here.

Every morning, it follows a precise algorithm to produce what seems like the perfect cup of coffee. It measures, heats, and extracts with mathematical precision. But does it understand the art of coffee-making? The cultural significance? The joy it brings? Of course not.

Similarly, when we talk about AI making moral judgments, we're essentially creating incredibly sophisticated pattern-matching systems. They can process vast amounts of ethical decisions and predict patterns in human moral reasoning, but can they truly understand the weight of these decisions?

The “Ask Delphi” experiment by the Allen Institute for AI is particularly telling. Like a mirror, it reflected back our own moral judgments, complete with all our societal biases and inconsistencies. The fact that it could be easily tricked by rephrasing questions revealed something crucial. Rather than reasoning about morality, it was pattern-matching against its training data.

What fascinates me about OpenAI's research isn't just whether they can create an AI system that accurately predicts human moral judgments. It's whether we're asking the right question in the first place. Perhaps instead of trying to teach AI our morality, we should be using AI to help us better understand our own moral decision-making processes.

The real value might not be in creating a “moral GPS” but in using AI as a mirror to better understand our own ethical frameworks, biases, and inconsistencies. After all, sometimes the most important questions aren't about finding the right answers, but about understanding why we actually ask them in the first place.

How was the signal this week?

See you next week,

Alex Banks

P.S. Open AI’s Sora got leaked.

Do you have a product, service, idea, or company that you’d love to share with over 40,000 dedicated AI readers?