- The Signal
- Posts
- Anthropic Unleashes Claude 3.7, OpenAI Drops GPT-4.5, and AI Defeats Deadly Snake Venom
Anthropic Unleashes Claude 3.7, OpenAI Drops GPT-4.5, and AI Defeats Deadly Snake Venom


AI Highlights
My top-3 picks of AI news this week.
Anthropic
1. Claude’s Power Boost
Anthropic has released Claude 3.7 Sonnet, their most intelligent model to date, featuring hybrid reasoning capabilities.
Hybrid reasoning model: Combines quick standard responses with deeper reflection through an extended thinking mode for complex problem-solving, with 128K token “thinking budget” via API.
Performance leader: Achieves state-of-the-art results on SWE-bench Verified (70.3%) and tops benchmarks across Cursor, Cognition, and Vercel coding platforms.
Claude Code: New agentic coding tool that collaborates on development, searches/edits code, writes/runs tests, and handles GitHub operations.
Alex’s take: Claude 3.7 Sonnet is up to 5x cheaper than OpenAI's o1 model. At $3/million input tokens and $15/million output tokens, they're making the dawn of hybrid reasoning accessible to a much wider audience. I think this signals a fundamental shift in how AI systems will continue to develop in the future by combining both “fast” and “slow” thought together in one model.
OpenAI
2. GPT-4.5 Enters The Chat
OpenAI has released GPT-4.5, their newest and largest language model yet, available now to Pro users and developers worldwide, and its video model “Sora” has been opened to EU members.
Dual intelligence approach: OpenAI is advancing AI along two axes—unsupervised learning (GPT-4.5) for intuition and world knowledge, and reasoning models (OpenAI o1, o3-mini) for step-by-step thinking.
Reduced hallucinations: GPT-4.5 shows significant improvement in factual accuracy, with a 37.1% hallucination rate compared to GPT-4o's 61.8%.
Sora expansion: In the same week, OpenAI expanded its AI video generation tool Sora to the EU, UK, Switzerland, Norway, Liechtenstein, and Iceland, making it available to Plus, Pro, and Team users in these regions.
Alex’s take: I find OpenAI's explicit framing of their two-track approach fascinating, especially how it stacks up against Anthropic, who are combining both reasoning and deep thought into a single hybrid model. It reminds me of the left brain/right brain concept—one side for creativity and intuition, the other for logical reasoning.
Biology
3. AI Decodes Deadly Snake Venom
Researchers at the University of Washington's Baker Lab have created AI-designed proteins that can neutralise lethal snake venom toxins, potentially revolutionising treatment for the 400,000 people affected by snakebites annually.
Computational breakthrough: Using deep learning models like RFdiffusion and ProteinMPNN on NVIDIA GPUs, the team generated and screened millions of potential antitoxin structures virtually, compressing years of work into weeks.
Superior performance: The AI-designed proteins demonstrated 80-100% survival rates in mouse studies exposed to lethal neurotoxins, specifically targeting the deadliest three-finger toxins (3FTx).
Accessibility advantages: Unlike traditional antivenoms that require refrigeration and trained medical staff, these proteins are small, heat-resistant, and potentially much cheaper to produce.
Alex’s take: While we often focus on AI's impact in content creation and knowledge work, this research shows AI's profound capability to solve some of our most neglected health challenges. I believe we're only scratching the surface of what's possible when AI drives molecular design—from venom antidotes today to treatments for viral infections and autoimmune diseases tomorrow.
Today’s Signal is brought to you by Recraft.
I've wasted hours wrestling with Midjourney and DALL-E trying to get the output I'm looking for.
That's when I found Recraft. A premium AI image generation and editing tool that gives you control over your designs.
→ Seamless integration of text into designs regardless of size or scale
→ Create images in your brand style and colours
→ Professional editing that doesn’t look like “AI”
Recraft is trusted by over 3 million users across 200 countries, including designers from most innovative companies like Ogilvy, HubSpot, Netflix, Asana, and Airbus.
Get $12 off any plan with code ALEX12.
Content I Enjoyed
The All-In Podcast's Take on Our Autonomous Future
As a regular listener to the All-In Podcast, I thoroughly enjoyed their segment this week on the state of autonomous robots.
Brett Adcock, the CEO of Figure AI, wrote a tweet this week saying they have accelerated their in-home timeline by two years, with alpha testing of robots in homes expected by the end of this year.
On the pod, Chamath stated “…the model is not perfect yet to be general purpose” and “the actuators are good, they’re not great”. The physical dexterity of the actuators still lags behind what's needed for genuinely functional home robots. If they’re not super functional over the next year, over the next two years, their dexterity will increase dramatically and find its way to exciting applications across laundry, cleaning, and dog-walking. The list is limitless and makes for a very interesting roadmap.
The besties also noted the timeline for middle-class adoption (5-10 years) and compared general-purpose humanoids with single-purpose robots like the $1,000 autonomous lawnmowers already appearing in Austin neighbourhoods.
As someone following this space closely, it’s refreshing to hear practical, grounded discussions about robotics rather than just the typical hype cycle. The race to bring capable robots into our homes is accelerating faster than many truly realise.
Idea I Learned
The first commercial-grade diffusion LLM (dLLM)
Inception Labs has launched Mercury, the first commercial-grade diffusion large language model (dLLM).
Unlike traditional autoregressive large language models (LLMs) that predict text sequentially, what I love about Mercury is it uses a diffusion process to generate entire text sequences simultaneously through a coarse-to-fine refinement method.
In essence, it doesn’t predict tokens from left-to-right, but instead does it all at once. This diffusion process is actually how most image/video generation models work today.
Mercury up to 10x faster and cheaper than current LLMs, reaching over 1000 tokens per second on NVIDIA H100 GPUs. In addition, “Mercury Coder” has demonstrated competitive performance in code completion, matching or surpassing models like GPT-4o Mini and Claude 3.5 Haiku.
This feels like we’re really pushing the frontier of intelligence and speed for language models—looking to diffusion instead of autoregression could mean new, unique, and undiscovered use cases right around the corner.
Georgi Gerganov on the “gibberlink” project:
Today I was sent the following cool demo:
Two AI agents on a phone call realize they’re both AI and switch to a superior audio signal ggwave
— Georgi Gerganov (@ggerganov)
4:11 PM • Feb 24, 2025
The “Gibberlink” project, developed by Anton Pidkuiko and Boris Starkov, enables AI systems to communicate through audio channels at speeds vastly exceeding human speech capabilities.
Instead of being constrained by human language patterns, these AI systems can transmit data through sound waves in a way that optimises for machine-to-machine efficiency. Using the “ggwave” library, the AI agents effectively create a high-bandwidth communication channel that bypasses traditional linguistic constraints.
I think this demonstrates the potential of communication protocols that are optimised for machine efficiency rather than human comprehension. It also raises important considerations about transparency and oversight as AI systems potentially develop communication methods that operate outside human interpretability. As a society, it’s important we have sufficient monitoring and standards to make sure these interactions remain auditable and aligned with human interests moving forward.
Source: Georgi Gerganov on X
Question to Ponder
“What's the deal with o1 in ChatGPT? I can never get a good output and I find myself using Claude all the time now and thinking about ditching my subscription.”
The frustration with o1 is real, and you’re not alone in feeling this way.
However, something that struck me only recently is that o1 isn't a chat model like Claude 3.7 or GPT-4o. It's designed as a reasoning engine, which means interacting with it is fundamentally different.
Instead of writing prompts, I recommend writing “briefs” and include more context than you think necessary. I’ve distilled the perfect o1 prompt into 4 key components:
Goal: Have a clear, specific objective statement.
Return Format: Explain exactly how you want the information structured. Cover specific fields, exact data points, and how you want it organised.
Warnings: Add explicit guardrails to prevent common errors and hallucinations.
Context Dump: Write out your personal situation and preferences. This is the “why” behind your request that provides rich, relevant context for o1 to reason through.
But should we need all this structure just to get “good enough” results?
I personally use Claude as a daily driver and only switch to a reasoning model like o1 if I’m really wanting to get into the weeds on a tough problem.
If Claude consistently works better for you with less hassle, then that’s valuable information about where to invest your subscription dollars.
If you want to dive in more about this idea of reasoning models vs chatbots, I really enjoyed reading Ben Hylak’s deep dive into o1.

How was the signal this week? |
See you next week, Alex Banks | ![]() |