Building Claude Code with Boris Cherny
Summary
Boris Cherny, the creator and engineering lead behind Claude Code at Anthropic, shares his journey from early coding experiments (selling Pokémon cards on eBay, writing TI-83 calculator solvers) through startups, a seven-year tenure at Meta leading code quality, to joining Anthropic. His first pull request at Anthropic was rejected because he wrote it by hand instead of using their early AI tool, Clyde—a moment that opened his eyes to the potential of agentic AI.
Claude Code began as a side-project bash tool that hit the Anthropic API, evolving from a chatbot to an agentic coding tool with tool use (bash, file edit). Internally, its adoption skyrocketed, leading to debates about whether to release it publicly. The decision to launch was driven by safety research: studying the model in the wild to improve alignment and mitigate risks. Today, Claude Code writes ~80% of code at Anthropic, and Boris himself ships 20-30 pull requests daily with zero handwritten code.
The conversation delves into the practicalities of AI-driven development: Boris’s workflow involves parallel agents across multiple terminal tabs or the Claude desktop/iOS app, using plan mode and letting Opus 4.5/4.6 one-shot implementations. Code review has transformed: Claude Code reviews its own PRs in CI, catching ~80% of bugs, followed by human review for safety-critical enterprise software. The team emphasizes prototyping—building dozens of iterations rapidly—over writing PRDs, and embraces a generalist culture where everyone codes, from engineers to finance staff.
Boris reflects on the broader implications, comparing the rise of AI coding to the printing press: a niche skill (scribes/writing) becomes democratized, unlocking unforeseen economic and creative potential. He discusses which engineering skills remain valuable (methodical debugging, hypothesis-driven product sense, cross-disciplinary curiosity) and which fade (strong opinions on languages/frameworks). Safety is a paramount concern, addressed through layered mitigations: model alignment, runtime classifiers, and sandboxing. The episode closes with book recommendations (Cixin Liu’s short stories, Accelerando, Functional Programming in Scala) and optimism about the future shaped by generalists who can adapt as the technology rapidly evolves.
Recommendations
Books
- Cixin Liu’s short stories — Recommended for those new to hard sci-fi. Boris is a big sci-fi reader and finds Liu’s work compelling.
- Accelerando by Charles Stross — Described as ‘the product roadmap for the next 50 years,’ capturing the quickening pace of technological change and the AI singularity.
- Functional Programming in Scala — Recommended to learn functional programming and thinking in types. Boris emphasizes doing the exercises to internalize the concepts.
People
- Anders Hejlsberg — Creator of TypeScript, praised for the beauty and depth of the type system, pushing ideas further than even hardcore functional languages.
- Cixin Liu — Sci-fi author of ‘The Three-Body Problem’ and short stories, recommended for his exploration of hard sci-fi concepts.
- Charles Stross — Author of ‘Accelerando,’ a novel that Boris says captures the feeling of accelerating technological change.
Tools
- Claude Code — The AI coding agent developed at Anthropic. Boris discusses its evolution from a side project to a tool that writes 80% of code at Anthropic and his personal workflow of 20-30 PRs per day.
- Claude Co-Work — A visual agent workspace for non-engineers, built in about 10 days using Claude Code. It includes a virtual machine for safety and a Chrome extension for browser automation.
- SonarQube MCP server — A tool that acts as a universal translator between AI applications (like Claude Code, GitHub Copilot, Cursor) and SonarQube’s code analysis platform, providing automated verification.
Topic Timeline
- 00:00:00 — Introduction: Boris Cherny’s background and the episode’s focus — The episode opens with the story of Boris Cherny’s first pull request at Anthropic being rejected because he wrote it by hand, not using AI. Boris is introduced as the creator and engineering lead of Claude Code, formerly a code quality lead at Meta across Instagram, Facebook, WhatsApp, and Messenger. The episode will cover Claude Code’s evolution from side project to fast-growing developer tool, the internal debate about releasing it, Boris’s AI-driven workflow, and the transformative impact of AI on engineering skills.
- 00:01:13 — Boris’s early introduction to coding: practicality and entrepreneurship — Boris recounts learning HTML to sell Pokémon cards on eBay with blink tags and writing TI-83 calculator programs in BASIC and assembly to solve math tests, eventually sharing them via serial cable. He studied economics, dropped out for startups, and his first venture was a weed review website to get free samples. Coding was always a practical means to build things, not a planned career. He joined Y Combinator early and worked on a medical software startup, Agile Diagnosis, where he shadowed doctors and realized their workflow made the software unusable, leading to a pivot.
- 00:10:19 — Seven years at Meta: from Facebook Groups to leading code quality — Boris started on Facebook Groups, drawn by the mission of connecting communities, and later became tech lead. He moved to Instagram’s labs team in Japan after his wife got a job in rural Japan, but found the Python/Django stack problematic and shifted to Dev Infra to migrate Instagram to Facebook’s monolith and GraphQL. He eventually led code quality for all of Meta (Instagram, Facebook, WhatsApp, Messenger, Reality Labs) under the ‘Better Engineering’ program, which mandated engineers spend 20% time on tech debt. He measured the causal impact of code quality on productivity, finding double-digit percentage improvements.
- 00:18:51 — Joining Anthropic and the first AI-powered pull request — Boris joined Anthropic for its safety mission and serious approach to AI. His ramp-up buddy, Adam Wolf, rejected his first handwritten pull request, telling him to use the internal tool Clyde (Claude Code’s predecessor). Using Clyde, the model one-shotted a working PR—Boris’s first ‘field AGI moment.’ This was August/September 2024, and he realized the model could do far more than line completions.
- 00:22:09 — Origin of Claude Code: from bash tool to agentic coding — Claude Code started as a bash tool to hit the Anthropic API, initially a chatbot. Boris added tool use (a bash tool) and was amazed when Sonnet 3.5 wrote an AppleScript to check his music player. This was his second AGI moment. He realized models want to use tools and shouldn’t be boxed in as a component. The tool evolved with file edit and more tools. Internally, adoption grew vertically, prompting debate: keep it internal for productivity or release it to study safety in the wild? The safety argument won, and it launched publicly.
- 00:30:25 — Boris’s AI-powered workflow: parallel agents and productivity — With Opus 4.5, Boris stopped handwriting code entirely, uninstalling his IDE. He now ships 20-30 PRs daily, none handwritten. His workflow uses parallel agents: five terminal tabs with separate repo checkouts, each running Claude Code in plan mode, or using the desktop app’s work tree support. He also starts agents on his iPhone. For familiar codebases, he focuses on productivity over learning mode; he iterates on plans across tabs and lets Opus one-shot implementations. The PRs vary from one-liners to thousands of lines, and are not just simple migrations.
- 00:39:54 — Transforming code review: AI-assisted verification and human oversight — Code review has changed dramatically. Claude Code runs tests locally and often writes its own tests. At Anthropic, every PR is reviewed by Claude Code first (catching ~80% of bugs), using the open-source coder view skill with best-of-N passes and deduping agents. A human engineer always does a second pass for safety-critical enterprise software. Boris now automates lint rule creation: when he sees a pattern, he asks Claude to write a lint rule in the PR. Deterministic checks (type checkers, linters, builds) remain crucial, but AI augments the process.
- 00:46:18 — Claude Code architecture, safety layers, and technical choices — Claude Code’s architecture is simple: a core agent loop with tools that are constantly added/removed. Safety is multi-layered (Swiss cheese model). For example, against prompt injection in WebFetch: 1) model alignment (Opus 4.6 is more resistant), 2) runtime classifiers that block suspicious requests, 3) summarizing results via a subagent. They tried RAG with a local vector database but abandoned it due to sync issues and permission complexities; agentic search (glob/grep) performed better. Permissioning involves allow-listing safe Unix commands and user-configurable patterns, with interactive prompts for risky operations.
- 00:54:11 — Anthropic’s culture: member of technical staff, prototyping, and generalists — Anthropic uses the title ‘member of technical staff’ for almost everyone, encouraging a generalist mindset where engineers, designers, and others all code and contribute across disciplines. The team doesn’t write PRDs; they prototype extensively. Boris built 20 interactive to-do list prototypes in a day and a half. Features like Agent Teams (swarms) were prototyped hundreds of times. The cost of building is low, so they experiment rapidly. The culture rewards curiosity, short attention span (context switching between agents), and adaptability.
- 01:04:24 — Claude Co-Work: building for non-engineers in 10 days — Claude Co-Work, a visual agent workspace for non-engineers, was built by a small team in about 10 days using Claude Code. It emerged from latent demand: non-engineers (finance, sales, even a tomato plant monitor) were using Claude Code. Co-Work adds guardrails: a virtual machine for safety, Chrome extension for browser automation, and rethought permissions. It shares the Claude desktop app (Electron/TypeScript) and uses the same agent SDK. Growth has been steeper than Claude Code’s initial trajectory. Windows support was coming soon.
- 01:13:52 — Agent Teams (swarms) and uncorrelated context windows — Agent Teams (swarms) allow a lead agent to delegate to sub-agents with uncorrelated context windows (fresh context aside from the prompt). This acts as test-time compute, improving results for complex tasks. Experiments since September 2024 clicked with Opus 4.6. Internal evaluations show swarms build more complex software than a single agent. It’s a research preview because it’s token-intensive. Use cases include building plugins (Asana board creation and implementation) and other features. The configuration is context-specific, not regimented.
- 01:17:38 — The printing press analogy and the future of software engineering — Boris compares the current AI transition to the printing press: scribes (a tiny literate elite) were employed by illiterate kings; the press democratized writing, exploding the market for written work and creating new roles like authors. Similarly, software engineers have been the ‘scribes’ for business owners who can’t code. Democratization will unlock unpredictable innovations. Skills like strong opinions on languages/frameworks become less relevant; methodical debugging, hypothesis-driven product sense, cross-disciplinary curiosity, and adaptability grow in value. This is the ‘year of the generalist.’
- 01:30:17 — Evolving beliefs, safety concerns, and book recommendations — Boris’s belief in the importance of AI safety has strengthened after seeing risks from inside Anthropic. He recommends sci-fi books: Cixin Liu’s short stories for hard sci-fi newcomers, and ‘Accelerando’ by Charles Stross for its depiction of accelerating technological change. On the technical side, he recommends ‘Functional Programming in Scala’ (doing the exercises) to learn thinking in types, which remains valuable even as language debates fade.
Episode Info
- Podcast: The Pragmatic Engineer
- Author: Gergely Orosz
- Category: Technology News Tech News
- Published: 2026-03-04T18:09:46Z
- Duration: 01:37:01
References
- URL PocketCasts: https://pocketcasts.com/podcast/59045350-573e-013d-e880-02cacb2c6223/episode/92da5117-43d7-4220-a8c6-b33584ba9dac/
- Episode UUID: 92da5117-43d7-4220-a8c6-b33584ba9dac
Podcast Info
- Name: The Pragmatic Engineer
- Type: episodic
- Site: https://newsletter.pragmaticengineer.com/podcast
- UUID: 59045350-573e-013d-e880-02cacb2c6223
Transcript
[00:00:00] What happens when you join one of the top AI labs in the world and your first pull request gets
[00:00:04] rejected? Not because the code was bad, but because you wrote it by hand. This is exactly
[00:00:09] what happened to Boris Cherny when he joined Antrophic. Boris is the creator and engineering
[00:00:14] lead behind CloudCode. Before joining Antrophic, he spent seven years at Meta, where he led code
[00:00:19] quality across Instagram, Facebook, WhatsApp, and Messenger, and was one of the most prolific
[00:00:23] code authors and code reviewers at the company. In today’s episode, we cover how CloudCode went
[00:00:28] from a side project to one of the fastest-growing developer tools, and the internal debate at
[00:00:32] Antrophic whether to release it at all, Boris’ daily workflow of shipping 20-30 pull requests a
[00:00:37] day with zero handwritten code, and how code review works when AI writes everything, why Boris
[00:00:42] believes we’re living through a time as transformative as a printing press, and which
[00:00:46] engineering skills matter more now and which ones do not. If you want to understand how one of the
[00:00:51] people closest to AI coding agents actually builds software today, and what that means for the rest
[00:00:56] of us engineers, this episode is for you.
[00:00:58] This episode is presented by Statsig, the unified platform for flags, analytics, experiments,
[00:01:03] and more. Check out the show notes to learn more about them, and our other season sponsors,
[00:01:06] Sonar and WorkOS.
[00:01:08] How did you get into tech, software engineering, and coding in general?
[00:01:13] It starts a while back. I think there was kind of like two parallel paths that crossed.
[00:01:18] So when I was maybe 13 or something like this, I started selling my old Pokemon cards on eBay.
[00:01:24] And I realized that on eBay, you can actually like write,
[00:01:28] HTML. And I was looking at other people’s Pokemon card listings, and I realized like
[00:01:32] some of them have like big colors and fonts and stuff like this. And then I discovered the blink
[00:01:37] tag. And I realized it was blink tag. And if I put the blink tag on it, I could sell my card,
[00:01:43] you know, for like 99 cents instead of 49 cents or whatever. So I kind of learned about HTML this
[00:01:48] way. Then I got an HTML book and kind of learned about HTML. And then the second thing was, this
[00:01:54] was also, I think, sometime in middle school. We had these old TI-83,
[00:01:58] graphing calculators, and we used them for math. And what I realized is I can get a better answer
[00:02:05] on the math test if I just program the answers to the math test into my calculator. And so I wrote
[00:02:09] these little programs, these program the answers, and then the test got harder. So then I had to
[00:02:13] program solvers instead of the actual questions because I didn’t know what, you know, the
[00:02:17] coefficients and stuff would be ahead of time. And then the math got more advanced like the next year.
[00:02:22] And so I had to drop down from basic to assembly to just make the program run a little bit faster.
[00:02:28] Oh, wow. So again,
[00:02:28] in high school, you dropped down to assembly.
[00:02:30] I think this is like middle school or high school, maybe like eighth or ninth grade or something
[00:02:34] like this. Then the thing I realized is everyone in my class was starting to realize that I had
[00:02:39] this solver and they got kind of jealous. And so I bought this little serial cable so I can give it
[00:02:43] to them too. And then the next math test, everyone on the class just got A’s. And the teacher was
[00:02:47] like, what’s going on? And then eventually she realized it. She was like, okay, you get away
[00:02:51] with it once and knock it off. But for me, it was very practical. So, you know, in school,
[00:02:58] I studied economics. I actually dropped out to start startups. And I never thought that coding
[00:03:04] would be a career at all. It was always very practical to me. Coding is a means to build
[00:03:10] things and to make useful things. This startup, the first one was, I think it’s like my friends
[00:03:16] and I were trying to get weed. And so we started this like weed review startup. We made like a
[00:03:22] website. We called kind of different dispensaries, I think. And then we just tried to get kind of
[00:03:28] weed samples so we could like review it for them. And it actually kind of blew up. And then I
[00:03:33] actually got more interested in, at the time, no one was like testing this stuff. And so I got into
[00:03:39] kind of the like chemical testing, kind of chemical analysis. And then after this, I kind of did a
[00:03:45] bunch of other startups. And then I joined YC actually pretty early. And I was the first hire
[00:03:50] of this YC startup up in Palo Alto after. How did you decide to go to one startup after the other?
[00:03:56] Kind of vibes.
[00:03:58] Vibes, I’d say. Because, you know, like, you know, startups, it’s never a linear path. You
[00:04:02] always kind of pivot, pivot, pivot. You have to figure out what the market wants and what users
[00:04:05] want. And it’s never the thing that you think. You always try a thing. But the idea is always
[00:04:11] a hypothesis. And then almost always you have to pivot once, twice, three times. You know,
[00:04:16] at this medical software company, this is called Agile Diagnosis. This was kind of an early YC
[00:04:21] company. This was back in maybe 2011, 2012, something like that. It was medical software
[00:04:27] for doctors.
[00:04:28] And the idea was there’s these like clinical decision protocols that vary a lot hospital
[00:04:32] to hospital. And our idea was there was one hospital in Chicago that had a really great
[00:04:36] protocol specifically for cardiac symptoms. And so we’re like, wouldn’t outcomes be great
[00:04:41] if every hospital in the U.S. would use the same protocol? And so we tried to standardize it. And
[00:04:47] we made this like decision tree software for doctors to use. And I wrote, you know, some of
[00:04:52] the software. The team was like, it was just a few of us. It was a pretty small team. And I wrote
[00:04:58] this like SVG renderer because it was this visual decision tree. And we launched it. And then we had
[00:05:13] a DAU chart and the DAUs were flat and couldn’t figure it out. And we were piloting it with a few
[00:05:17] hospitals at the time. And at the time we were based in Palo Alto, we were piloting it with,
[00:05:22] you know, a few hospitals, including UCSF. And I rode a motorcycle at the time. So I rode my
[00:05:26] motorcycle up to, you know, UCSF. And I was like, I’m going to do this. I’m going to do this. And I
[00:05:28] shadowed doctors for a couple of days just to see how do they actually use this. And I realized
[00:05:34] that actually doctors don’t have time to sit down and use a computer because you’re seeing a patient.
[00:05:41] Then you have maybe five minutes until the next patient. And in those five minutes, you have to
[00:05:45] walk down the hall. You have to go to the computer station. You have to open up this
[00:05:49] totally legacy computer. By the time it boots up, that’s like three minutes.
[00:05:53] Then you open up Inner Explorer 6. That takes like 30 seconds. Then you have to go to the
[00:05:58] open up this like app that we built. You have to sign in and your five minutes are up. You don’t
[00:06:01] even have time to use it. And so we rewrote everything to run on Android and they still
[00:06:05] weren’t using it. And the thing we realized is doctors are walking around with a bunch of
[00:06:09] residents behind them. In this kind of situation, it’s like a social situation, right? Like the
[00:06:13] thing that matters is they’re seen as an authority. They don’t want to be seen on their phones.
[00:06:20] And then we pivoted again. So at that point, we’re like, OK, so maybe the doctor isn’t the
[00:06:24] target user. Actually, we want it to be used by maybe nurses or x-ray technicians.
[00:06:28] At that point, I left because I was like, this is actually pretty far off from kind of what I
[00:06:32] wanted to do. This is like the most fun thing for me is finding this product market fit because
[00:06:38] it’s always surprising. You can’t have one big idea because the idea is probably going to be
[00:06:43] wrong. So you kind of form hypotheses. You follow it down and you see what’s right.
[00:06:48] Also, I find it so interesting how you’re telling us this story because I feel
[00:06:53] behind a lot of sort of success stories, we hear the success story, we hear the path,
[00:06:58] but first of all, a lot of startups are like this. And second of all, what struck me is you were
[00:07:02] hired as a software engineer, right? And this was back before product engineers or anything was a
[00:07:07] thing, which we’re now talking about. But you just like you rode your motorbike and you went there
[00:07:12] and you shadowed the people and you understood how they’re using it, why they’re not using it,
[00:07:18] getting ideas. I feel this is what makes a great software engineer back then and even today.
[00:07:26] You weren’t.
[00:07:28] Doesn’t seem to me that you were focused on the technology. You were focused on the outcome, though.
[00:07:32] Yeah. I mean, look, there’s different kinds of engineers and there’s different ways to do it.
[00:07:35] And, you know, even even on our team right now, I look at an engineer like Jared Sumner
[00:07:40] and he’s just incredible technical mind. He understands systems better than anyone I’ve met.
[00:07:46] And, you know, you need you need people like this. You need people with this kind of depth.
[00:07:50] For me, engineering has always been a practical thing. And, you know, for me,
[00:07:56] I’ve always been a generalist. And like, it doesn’t
[00:07:58] matter if I’m doing, you know, like design or, you know, if I’m doing engineering or user research
[00:08:03] or whatever.
[00:08:04] The investment thesis for AI and software engineering is straightforward. As AI writes
[00:08:08] more code, more code needs to be verified. But there’s a catch. AI generated code is on average
[00:08:14] harder to verify than human working code. This is why there’s Sonar, the makers of SonarCube.
[00:08:19] As a critical verification layer for the AI enabled world,
[00:08:23] Sonar ensures that speed and volume with AI does not compromise your code base.
[00:08:28] Sonar’s competitive position is built on 17 years of specialized expertise that no
[00:08:32] foundational model can replicate. We’re talking about deep analysis engines like symbolic execution
[00:08:38] and cross repository data flow tracking that simulate how code actually behaves, not just
[00:08:43] what it says. To bridge the divide between AI productivity and code quality, Sonar has released
[00:08:48] a SonarCube MCP server. This tool acts as a universal translator between AI applications
[00:08:54] and the SonarCube platform. By using the model context,
[00:08:58] it gives AI tools like Cloud Code, GitHub Copilot, and Cursor direct access to SonarCube’s analysis
[00:09:04] capabilities. Instead of context switching, your AI agent becomes a full-fledged code review and
[00:09:09] quality assurance copilot capable of analyzing code snippets for issues, filtering bugs by severity,
[00:09:15] and even checking your project’s quality gate status before you ever commit code.
[00:09:19] Whether you’re working with coding assistants or scaling up with full-agent workflows,
[00:09:23] Sonar provides the automated verification that 75% of the Fortune 100
[00:09:28] rely on. It’s about giving your developers the freedom to innovate without the fear of breaking
[00:09:32] the code base. Head to sonarsource.com slash pragmatic to learn more about how Sonar enables
[00:09:37] the confidence to develop at the speed of AI. With this, let’s get back to Boris’ career and
[00:09:42] what he learned working at startups. My first job I ever had, I was like,
[00:09:46] I think I was 16. And I just wanted to buy an electric guitar. And so what I did was I started,
[00:09:52] I just started freelancing. And so I was like, okay, I guess I’ll make websites. And I think
[00:09:56] Fiverr was not a thing back then. So there’s a lot of people out there that are doing that.
[00:09:57] Some other freelancing websites. So I just started like, I put up a website, I started bidding on
[00:10:01] stuff. And my first paycheck, I just spent the entire thing on an electric guitar. But it was
[00:10:06] very practical, right? Because it’s like, when you’re in this kind of setup, you have to do the
[00:10:09] engineering, you have to do kind of the accounting, you have to do the design, you have to talk to
[00:10:13] customers. It’s just always been like that for me. After a couple of these startups, you ended up at
[00:10:19] Facebook, now called Meta. And there, you spent seven years there. Can you just talk us through
[00:10:26] what you’ve worked there, what you’ve learned?
[00:10:27] You’ve also had a very remarkable career growth in terms of four promotions over seven years.
[00:10:34] And what do you take away from that experience?
[00:10:38] Yeah, so I started on Facebook groups. That was the first time I worked on,
[00:10:43] Vlad Kolesnikov hired me. I think he’s actually still at Facebook. I think he’s on some other
[00:10:48] team now. And it was cool, actually, there’s a big group of people that I worked with that were
[00:10:53] these kind of early JavaScript people, too. And, you know, like, I did a bunch of JavaScript stuff.
[00:10:57] It’s funny, like, I kept crossing paths with these people. And so Vlad, he worked on Bolt.js,
[00:11:03] which was the software, it was the framework that powered Ads Manager, which later became React.js.
[00:11:09] I kept crossing paths with these people. And later on, for example, yeah, later on,
[00:11:13] there was a bunch more people like this. But anyway, so I was working on Facebook groups.
[00:11:17] I was really excited about it because of this mission of connecting people to their community.
[00:11:24] This is the thing that drew me in. And at the time, I was a big Reddit user.
[00:11:27] I became a Reddit user back when I was a teenager, because I didn’t know anyone else that coded.
[00:11:35] Even in college, I didn’t really know anyone that coded. And honestly, I was always kind of
[00:11:39] embarrassed about it, because I thought it was this nerdy thing. And I thought it was kind of
[00:11:42] this thing that I knew how to do. But I wanted, you know, I wanted to be like a cool kid. And,
[00:11:47] you know, like, I couldn’t, like, tell people that I coded. It was very nerdy.
[00:11:51] And at some point, I discovered it was some, like, programming community on Reddit.
[00:11:56] And I was just shocked.
[00:11:57] Like, there’s other people that are into this thing. It’s, like, such a weird hobby. It’s so
[00:12:01] niche. And it was just so exciting to find like-minded people like this and get this connection.
[00:12:06] And so I just wanted to work on this. I wanted to kind of contribute to this in some way.
[00:12:10] So I worked on Facebook groups for a while. And then, you know, there’s a bunch of different
[00:12:16] projects. I have to kind of get into details for any of these. Eventually, I became the tech lead
[00:12:21] for Facebook groups and kind of grew into this. And the org grew. The work really changed. It
[00:12:27] changed from kind of big to big. And I was like, oh, I’m going to do this. I’m going to do this. I’m
[00:12:27] going to do this. I’m going to do this. I’m going to do this. I’m going to do this. I’m going to do this.
[00:12:28] And so I just kind of grew into this. And then, you know, there was a lot of like doc writing and
[00:12:30] coordination and kind of delegating to other. The culture was changing at the time. So, you know,
[00:12:35] this early Facebook culture was disappearing. The docs were coming in. The, you know, alignment
[00:12:38] meetings were coming in. There was a lot of a lot more work around this kind of foundational stuff
[00:12:43] like privacy, security, things like this that I think, honestly, early on, a lot of corners were
[00:12:48] cut in order to grow. But at some point, you just have to pay that debt. And that was the time when
[00:12:52] that happened. Then I spent a few years at Instagram after. And that was also a funny story.
[00:12:57] My wife got a got a job offer and she was just really excited about it. And she came to me and
[00:13:02] was like, hey, like, I got this offer, but we’re going to have to move. Is that OK? And I was like,
[00:13:08] yeah, that’s fine. You know, like I work in tech. We can work remotely anywhere. Where’s the job?
[00:13:11] And she was like, it’s a NARA. And I was like, where’s that? And NARA is like rural Japan.
[00:13:17] And this was a different time zone as well. Different time zone. Yeah. This was 12 hours
[00:13:20] or something difference or something like that. Something like that. Yeah. It was like 2021.
[00:13:25] Wow. And then I tried to kind of find a team that would
[00:13:27] sponsor me because there was there were these kind of arcane HR rules about like the time zone
[00:13:31] you have to be in and the team you have to be co-located with and so on. And so there was a
[00:13:36] little kind of nascent team for Instagram in Tokyo. And Will Bailey was running the same.
[00:13:42] He was also the guy that made Instagram stories. And so he was my manager for a while. And so we
[00:13:48] decided to grow that team together. And I worked remotely from NARA. And then most of the team was
[00:13:51] in Tokyo. And during this time, I started hacking on Instagram and the stack was,
[00:13:57] just insane. Like Facebook was the single best web serving stack in the world. The way that
[00:14:04] everything is optimized, like from the hack language to the HHVM runtime to GraphQL as the
[00:14:10] transport layer to like the client libraries, like Relay and all the stuff. It was just in React. It
[00:14:16] was just amazing. There’s no other dev stack in the world that was this good. And it’s just fully
[00:14:21] optimized. And then I went to Instagram and it’s like, you know, Python where the type checker
[00:14:26] didn’t work.
[00:14:27] Click to definition didn’t work. And it was this like kind of hacked together Django and then like
[00:14:32] a fork of, you know, the Cython runtime. And just nothing really worked. And so I came to Instagram,
[00:14:39] I joined the labs team, you know, in Japan. And the idea was to find the next big thing for
[00:14:43] Instagram. We tried some stuff, but what I very quickly realized is that I was just not effective
[00:14:48] at working on the stack because it was such a terrible stack. And so I just went and started
[00:14:53] working on Dev Infra because we needed to fix it.
[00:14:57] And there’s a few projects that we worked on. So one was migrating from Python to the big Facebook
[00:15:01] monolith. Another one was migrating from REST to GraphQL. And these projects, they’re actually in
[00:15:06] progress. You know, like these are things that involve, it takes hundreds of engineers many
[00:15:10] years to do this. It’s a big code base. It’s a big migration. Now it’s much faster.
[00:15:16] Yeah. With these tools that we have, the AI tools and migrations are a pretty good use case for them
[00:15:21] though.
[00:15:21] Yeah. It’s like the, it’s the perfect use case for it. And then I just started getting kind of
[00:15:25] deeper into this. And by the end, by the time I got to the end of the year, I was like, oh, I’m going to
[00:15:25] By the time I left Instagram, so I was working on this on Dev Infra and kind of leading a bunch of
[00:15:30] these migrations. That’s also where I intersected with Fiona Fung, who is now the manager for the
[00:15:36] Cloud Code team. I just worked with her and she was just such an amazing leader, this incredible
[00:15:40] depth and kind of history in tech. And I just thought like, there’s no better, there’s no better
[00:15:44] manager for this team. And then I also started working on code quality. And so the work on
[00:15:49] Instagram kind of expanded a bit. And by the time I left, I was leading code quality for all of Meta.
[00:15:55] And so I was responsible for the quality of the code bases across Instagram, Facebook, Messenger,
[00:16:00] WhatsApp, Reality Labs, kind of all these code bases. At Meta, it was this program called Better
[00:16:05] Engineering. And the idea was, I think it’s sort of like 2016 or 2018 or something. But Zuck mandated
[00:16:11] that every engineer at the company, 20% of their time has to be spent fixing tech debt.
[00:16:18] Oh, interesting.
[00:16:19] And we called this Better Engineering. And some of this is kind of bottom up where, you know,
[00:16:25] the team knows best the tech debt that they have to fix. And then some of it is stopped down
[00:16:28] where you need to do, you know, very big migrations. You need to migrate to new language
[00:16:33] features, new frameworks, things like this. And at Facebook scale, you know, there was tens of
[00:16:38] thousands of these migrations every year. And so I just sort of leading all this. And I realized
[00:16:42] very quick that you just need a little bit more order to it. There was no goals. No one knew kind
[00:16:48] of like what the outcomes were. There wasn’t any tracking. And so we developed a bunch of stuff.
[00:16:53] One of the ideas was,
[00:16:55] a centralized way to prioritize the different kind of code quality efforts. The second thing
[00:16:59] was figuring out the impact of code quality on engineering productivity, which turned out to
[00:17:03] be significant.
[00:17:04] How did you measure? What did you find there?
[00:17:06] There was a bunch of stuff. I think some of this has been published. I don’t know if all of it has,
[00:17:10] but essentially you try to do like causal analysis and causal inference. This is the methodology.
[00:17:15] You try to figure out like, what are the factors that make it so engineers are more productive?
[00:17:19] Some of it is code quality. Some of it is outside of code quality. So for example,
[00:17:22] Meta went back to, you know, return to office.
[00:17:25] They don’t work from home. That was partially driven by this because we just found some,
[00:17:29] you know, fairly strong correlations that we thought were causal about this.
[00:17:33] But code quality actually contributes like, you know, double digit percent to productivity.
[00:17:37] It turns out even, even at the biggest scale.
[00:17:40] It’s kind of comforting to hear because I think it’s, it’s rare to have a place where you actually
[00:17:45] measure this, but I think we feel it like when you have a clean code base, a modular, or it can get
[00:17:50] easier to work with. And I think, you know, reasoning could also be easier for L.
[00:17:55] To, to work with it. And my hint would be, yes, it should be right. But I think there’s just very
[00:18:02] little data, but that’s the feeling that I would have. Yeah. I think a lot of the big companies
[00:18:06] have published about this. Like I think Facebook published something, uh, Microsoft publishes a
[00:18:10] bunch about this. Google does, but yeah, totally. If every time that you build a feature, you have
[00:18:15] to think about, do I use framework X or Y or Z? These are all options that you can consider
[00:18:20] because the code base is in a partially migrated state where all of these are around the code
[00:18:24] somewhere.
[00:18:25] As an engineer, you’re going to have a bad time as a new hire. You’re going to have a bad time
[00:18:29] as a model, you might just pick the wrong thing. And then, you know, like the user has to, of
[00:18:33] course, correct you. So actually, you know, the better thing to do is just always have,
[00:18:37] you know, a clean code base, always make sure that when you, when you start a migration,
[00:18:41] you finish the migration. And this is great for engineers. And nowadays it’s, it’s great for
[00:18:45] models too. And then you joined Entropic and I’ve heard the story, which you can confirm or give
[00:18:51] more color to it, that your first pull request was rejected by Adam Wolf.
[00:18:55] He was my ramp up buddy. So I joined Entropic. I was trying to figure out kind of like what to do
[00:18:59] next. And, you know, I met a bunch of people at all the different labs and Entropic was just
[00:19:03] the obvious choice for me because of the mission. This is the thing that personally I know that I
[00:19:08] need the most. And also just kind of seeing all this change that’s happening. It’s important to
[00:19:12] have some sort of framework to think about this and to think about our role in it. I’m also a
[00:19:17] really big sci-fi reader. Like that’s definitely my genre. I’m a big reader. I have like, you know,
[00:19:21] giant bookshelf at home and stuff. And I just know how bad this is.
[00:19:25] And I just felt like this is a place that has serious thinkers. People are taking this very
[00:19:30] seriously and thinking about what, what can we do to make this thing go better? So when I joined
[00:19:34] Entropic, I did a bunch of ramp up projects, uh, just, you know, various stuff that I was hacking
[00:19:39] on. And I wrote my first pull request by hand because I thought that’s how you write code.
[00:19:44] That used to be how you write code.
[00:19:46] That used to be how you write code. But even at the time at Entropic, there was this thing
[00:19:49] called Clyde and it was the, it was the predecessor to quad code. It was, it was super janky. It was
[00:19:55] like, it was like, it was like, it was like, it was like, it was like, it was like, it was like,
[00:19:55] it was Python. You know, it took like 40 seconds to start up. It was a research code.
[00:19:59] It was not agentic, but if you prompt it very carefully and hold the tool just right,
[00:20:04] it can write code for you. And so Adam rejected my PR and he was like, actually,
[00:20:09] you should use this Clyde thing for it instead. And I was like, okay, cool. It took me like half
[00:20:13] a day to figure out how to use this tool. Cause you have to like pass in a bunch of flags and
[00:20:17] like use it correctly. Um, but then it, it sped out a working PR. It just one-shotted it.
[00:20:23] Oh.
[00:20:23] And this was like 2024, it was like September, 2024, August, something like that. And I think
[00:20:32] for me, this was my first field of AGI moment at Anthropic. Cause I was just, oh my God, like
[00:20:37] I didn’t know the model could do this. Like I was used to these like kind of tab completions,
[00:20:43] line level completions and IDE. I had no idea that it could just make a working pull request for me.
[00:20:49] Boris just talked about how he had a true wow moment at work using their AI model.
[00:20:53] A very different wow moment is when you use a tool at work that makes things so much easier
[00:20:58] than before. And this leads us nicely to our presenting sponsor, Statsig. Statsig offers
[00:21:03] engineering teams a tooling for experimentation and feature flagging that used to require years
[00:21:08] of internal work to build. It’s the kind of tool that was so complex to build that only large
[00:21:12] companies like Meta or Uber had their own custom advanced tooling for it. Here’s what Statsig
[00:21:17] looked like in practice. You ship a change behind a feature gate and roll it out gradually, say to
[00:21:23] first. You watch what happens, not just did it crash, but what did it do to the metrics you
[00:21:28] care about? Conversion, retention, error rates, latency. If something looks off, you turn it off
[00:21:33] quickly. If it’s trending the right way, you keep it rolling forward. And the key is that measurement
[00:21:38] is part of the workflow. You’re not switching between three tools and trying to match up
[00:21:42] segments and dashboards after the fact. Feature flags, experiments and analytics are all in one
[00:21:47] place using the same underlying user assignments and data. This is why teams at companies like
[00:21:53] Bastion use Statsig. Statsig has a generous free tier to get started and pro pricing for
[00:21:58] teams starts at $150 per month. To learn more and get a 30-day enterprise trial, go to statsig.com
[00:22:03] slash pragmatic. And with this, let’s get back to Boris and the origin story of CloudCode.
[00:22:09] And then when you joined Entrofic, we’ve covered this in a deep dive, but we could recap briefly on
[00:22:16] how CloudCode came to be out of what seemed like a side project or just a cool hack.
[00:22:22] So yeah, I started hacking on a bunch of different stuff. I was working on some things in product.
[00:22:28] I worked on reinforcement learning for a little bit just to kind of understand the layer under
[00:22:32] the layer at which I was building. This is still advice that I give to a lot of engineers is always
[00:22:37] understand the layer under. It’s really important because that just gives you the depth and you kind
[00:22:42] of like you have a little bit more levers to work at the layer that you actually work at.
[00:22:45] This was the advice 10 years ago. It’s still the advice today. But the layer under is a little bit
[00:22:49] different now. You know, before it was like, I don’t know, I don’t know, I don’t know, I don’t know, I don’t know, I don’t know, I don’t know, I don’t know.
[00:22:52] If you’re writing JavaScript, understand the JavaScript VM and frameworks and stuff.
[00:22:56] Yeah.
[00:22:56] Now it’s like understand the model. So I was hacking on a bunch of different stuff.
[00:23:00] Some things ship, some things didn’t ship. And at some point I just wanted to understand the
[00:23:05] public Anthropic API because I’d never used it before. And I didn’t want to build a UI. I just
[00:23:10] wanted to, you know, hack something up quite quickly because we didn’t have CloudCode back
[00:23:14] then. We’re still writing code by hand. And I wrote this little bash tool that all we did was
[00:23:21] hit the Anthropic API.
[00:23:22] API. And it was essentially like a chat based application, but just in the terminal because
[00:23:25] that’s what AI used to be. And, you know, I still think about it like engineers are the first
[00:23:31] adopters. And so when we started to move out of conversational AI to agentic AI, it took a little
[00:23:39] bit, but engineers understood it pretty quick. And I think now when you ask non-engineers about
[00:23:44] like what is AI, they would say it’s this conversational AI. It’s like a chatbot or
[00:23:49] something. And that’s why I’m actually very excited for, you know, Quark, this new product
[00:23:54] that we launched, because it’s going to bring the same thing that engineers saw very early
[00:23:59] to everyone else. But when I think about, you know, Quark, I think back to this moment that
[00:24:03] we’re talking about, like very early on. Quad code originally wasn’t quad code. It was a chatbot
[00:24:08] because that’s what I thought AI was. But we had to kind of figure out kind of what is the next
[00:24:13] thing. And so at the time I built this chatbot, it was somewhat useful, but it was just a chatbot.
[00:24:19] And the next thing that I tried was I wanted it to use tools because tool use just came out and I
[00:24:26] didn’t know what it was. And I was like, let’s experiment. And I gave it a single tool, which
[00:24:31] was the bash tool. And I didn’t know what to do with the bash tool. And so I asked it, you know,
[00:24:35] like I actually didn’t know if it could even do this, but I asked her like, what music am I
[00:24:38] listening to? And it just wrote a little Apple script program using like sed or whatever to
[00:24:45] open up my music player and then like query it to see what music it’s listening.
[00:24:49] And just one shot at this with Sonnet 3.5. This is actually my second field AGI moment
[00:24:56] very quickly after the first one. And the model just wants to use tools. That’s just what I
[00:25:04] realized. Like this thing, like if you give it a tool, it will figure out how to use it to get
[00:25:09] the thing done. And I think at the time when I think about the way that people were approaching
[00:25:14] AI and coding, everyone essentially had this mental model of you take the model and you put
[00:25:19] it in a box and you figure out like, what is the interface? Like how do you want to interact with
[00:25:24] this model? What do you need it to do? Essentially, it’s like if you have a program, you stub out some
[00:25:29] module, stub out some function and you say, okay, this is now AI, but otherwise the rest of the
[00:25:33] program is just a program. And so this is just not the way to think about the model. The way to
[00:25:37] think about it is the model is its own thing. You give it tools. You give it programs that it can
[00:25:43] run. You let it run programs. You let it write programs, but you don’t make it a component of
[00:25:47] this larger system in this way.
[00:25:49] And I think there’s just like, you know, this is a version of the bitter lesson.
[00:25:53] There’s the bitter lesson is a very specific framing, but there’s many corollaries to it.
[00:25:57] This is one of the corollaries is just let the model do its thing. Don’t try to put it in a box.
[00:26:03] Don’t try to force it to behave a particular way.
[00:26:06] One of the first ways you saw it was giving it tools, giving it access to the bash and
[00:26:10] then later to the file system and then to more tools, right?
[00:26:13] That’s right. Yeah, we give it a, we give it bash. Then I say we, it was just me the first
[00:26:19] three months.
[00:26:19] But then the team grew. So it was bash. It was and file edit. That was the second one.
[00:26:25] And one of the interesting thing we talked about last time for the deep dive is when you built it
[00:26:30] and it started to actually write code with the tools that you had, you’ve had an internal debate
[00:26:36] inside and traffic. Should we just keep it to ourselves? Because making suddenly it spread
[00:26:40] across engineering and it was making all of you a lot more productive, right?
[00:26:43] Yeah, that’s right.
[00:26:44] In the end, the decision was to release so that we can study safety in the wild.
[00:26:50] Because when you think about safety and that, you know, I keep talking about the word safety.
[00:26:53] The reason Anthropic exists as a lab is safety. This is the reason it was founded. This is the
[00:26:57] reason it exists. If you ask anyone at Anthropic why they chose it, it’s because of safety.
[00:27:02] And so if you think about model safety, you know, there’s different layers at which to think about
[00:27:05] it. There’s kind of alignment and mechanistic interpretability. This is at the model layer.
[00:27:09] Then there’s evals. And this is kind of like a, it’s kind of putting the model in a Petri dish
[00:27:13] and synthetically studying it in this way. And then you can study it in the wild and you can
[00:27:18] see how it actually behaves. And then you can study it in the wild and you can see how it
[00:27:19] behaves. You can see how users talk about it. You can, you can see like, what are the risks
[00:27:23] in the wild? And you actually learn a lot this way. And by doing this, we’ve been able to make
[00:27:28] the model much safer. So in hindsight, it was, it was totally the right decision.
[00:27:33] It’s amusing to hear about it from your perspective, because from the outside,
[00:27:38] what, what I saw and what a lot of engineers saw is like, oh, Anthropic released cloud code.
[00:27:42] Oh, wow. This, you know, for the first release with, uh, I believe it was with Sonet 4 release,
[00:27:49] right?
[00:27:49] Yeah.
[00:27:49] Was, was, did it come out with Sonet 4 originally or Sonet 4.5?
[00:27:53] I think it was, it was for, that was the general availability in February,
[00:27:56] but I think it was research preview before that.
[00:27:58] Yeah. But when it came out, my interpretation was like, oh, this thing can write code pretty well.
[00:28:04] And over time it became a lot more capable. So from, from our perspective, it was like this
[00:28:08] really capable coding tool that we just started to adopt and use and use for all sorts of
[00:28:14] increasingly productive, productive parts. And it has become, I believe, one of the fastest
[00:28:19] growing developer tools. And I’m always surprised to hear the story that it actually comes from
[00:28:25] research and the goal to understand how people use the model. Because at the other hand, like
[00:28:30] some startups have been trying to build developer tools deliberately to, to get adoption. And yet
[00:28:35] this research tool is getting a lot more adoption.
[00:28:38] I mean, this is a, you know, Anthropic, we’re, we’re a research lab, we’re a safety lab. And,
[00:28:42] you know, product is this kind of thing tacked onto the side. Product exists so that we can
[00:28:46] serve research better. And so we can make the model safer.
[00:28:49] And this is kind of how we think about everything. There, there was this,
[00:28:53] there’s also this funny moment early on when we, we had this launch review
[00:28:56] and we were deciding whether to launch it. I remember this moment because we were in the
[00:29:00] room. I think there was like, there was Mike Krieger, there was Dario, there were some other
[00:29:03] folks in the room and we were deciding what should we do. We were looking at the internal adoption
[00:29:07] chart, which was just vertical. Like since we released it, it was just insane. It was,
[00:29:12] you know, like nowadays, it’s a hundred percent, right? Just a hundred percent. Like nowadays,
[00:29:16] everyone at Anthro, every technical employee at Anthropic uses
[00:29:19] code every day. It’s pretty much a hundred percent for non-technical employees. It’s also
[00:29:23] like, it’s actually getting quite close to a hundred percent. It’s, it’s increasing very
[00:29:27] quickly. Like, you know, like half the sales team uses quad code. Um, and I think that’s
[00:29:32] increasing. It’s just, it’s crazy. Dario had this question about like, how, how did it grow this
[00:29:35] fast? Are you like forcing people to use it? And I was like, no, we offer this tool. People vote
[00:29:42] with their feet and, you know, it was just like, let people use the tool that they prefer.
[00:29:45] Yeah. And they chose it.
[00:29:46] You don’t seem like the person who’s acting exactly.
[00:29:49] Forcing people to use your tool.
[00:29:51] Yeah. Yeah. I mean, the way we did it, we just, we launched the thing and then we just like,
[00:29:55] listened to the users and we talked to people, we saw how they use it. We followed up, we made
[00:29:59] it better. And yeah, I mean, now, now we’re at the point where quad code writes, I think something
[00:30:03] like 80% of the code and at Anthropic on average, and you know, it writes all my code for sure.
[00:30:09] Yeah. And this started for you, it started the first time you mentioned, I think it was in
[00:30:13] November when it started to write all of your code. When did that switch come and what, what
[00:30:18] happened to me?
[00:30:19] How much did you trust it to, to write your code or how much you trust it? How much you
[00:30:23] reviewed that code, for example?
[00:30:25] So the switch was instant when we started using Opus 4.5. This was before, before it came out,
[00:30:30] you know, we were dogfooding it for a little bit and it was just right away. It’s such a more
[00:30:35] capable model. I just found that I didn’t have to open my IDE anymore. I just uninstalled my IDE
[00:30:40] because, because I just didn’t need it at that point. I actually did that like a month later
[00:30:45] because I just didn’t even realize that I wasn’t using it anymore.
[00:30:48] Yeah.
[00:30:49] A lot of us had similar experiences once Opus 4.5 was out in the public and especially over the
[00:30:54] winter break. I had a similar experience. I just realized that this thing, it actually writes,
[00:30:59] if I’m being honest with myself, as good code as I would have written in the stack that I’m
[00:31:03] very familiar with and my code base, my side projects where I know it and just a lot better
[00:31:08] than what I could for code base that I’m not as familiar or technologies I’m not as familiar with.
[00:31:13] Yeah. I’ll be honest, it writes better code than I do.
[00:31:16] I don’t want to go there. I still like to keep my,
[00:31:19] but probably true.
[00:31:20] Yeah. Yeah. I realized this because also in December I was traveling a little bit. I was
[00:31:25] like on a, I was on a coding vacation. We were talking about this before, but I, I went to Europe.
[00:31:29] We were just in a different time zone, kind of nomading around. And it was so fun because I was
[00:31:33] just coding all day, every day, which is my favorite thing to do. And I wrote maybe, you know,
[00:31:38] like 10, 20 pull requests every day, something like that. Opus 4.5 and quad code wrote a hundred
[00:31:43] percent of every single one. I didn’t edit a single line manually. And I realized at the end of that
[00:31:48] month, I was like, I’m going to do this. I’m going to do this. I’m going to do this. I’m going to do this.
[00:31:49] Opus introduced maybe two bugs. Whereas if I’d written that by hand, that would have been,
[00:31:53] you know, like 20 bucks or something like that.
[00:31:57] Can we talk about your development workflow? You have written threads about this,
[00:32:00] which is awesome. It’s on, it’s on social media, on threads and on, on X.
[00:32:04] But can you tell us how you use today, uh, quad code in terms of, you know, parallelism
[00:32:09] and, and tips and tricks that you and the team have kind of learned and share across the,
[00:32:14] across the team?
[00:32:14] Yeah. I mean, look, there’s no one right way to use quad code. So I can share
[00:32:18] some tips and things, but I think the wrong conclusion to draw would be to just copy,
[00:32:24] copy these and use it. The way we build quad code is we build it to be hackable
[00:32:30] because we know every engineer’s workflow is different. There’s no one way to do things.
[00:32:36] There’s no two engineers that have the same workflow. It’s just every, every engineer is
[00:32:39] the same with workstation set up, right? Like a keyboard is monitor placement,
[00:32:42] all that. Everyone has it differently.
[00:32:43] Yeah. It’s like, we’re like graphs people, right? Like you choose it, you choose your tools. Like
[00:32:46] we care deeply about it. So there’s no one right way.
[00:32:48] So for me, the way that I do it generally is I have five terminal tabs. Each one of them has
[00:32:55] a checkout of the repository. So it’s five parallel checkouts. And usually all kind of
[00:33:01] round Robin and start quad code in each one, almost every time I start in plan mode. So that’s
[00:33:06] like shift tab twice in the terminal. And I also overflow as I run out of tabs because there’s
[00:33:12] only so many terminal tabs. I used to use web a lot for this, like quad.ai slash code. That’s
[00:33:17] the place that I overflow to.
[00:33:18] But nowadays I actually use the desktop app. It’s more convenient. So quad code, you know,
[00:33:23] it’s been in our desktop app for, you know, for many months. It’s just the code tab in the,
[00:33:26] in the quad app. And I actually really like it because it has built in a work tree support.
[00:33:32] So that’s existed for a while. And that’s quite nice for parallelism. So you have multiple,
[00:33:37] you don’t need multiple checkouts. You just have one, and then we automatically set up
[00:33:40] get work trees for you. So you get this kind of environment isolation. The reason I do that is
[00:33:45] I actually just really hate fiddling with get work trees on the command line.
[00:33:48] Because it’s kind of fiddly. Like you need to know the CD.
[00:33:51] Get work tree for those who are not as familiar with it. It’s when you can check out instead of
[00:33:57] having a separate local folder, it’s almost like check out a separate branch, right? And then you
[00:34:02] can work on it separately, but not have the complex, have the complex only at like merge time.
[00:34:07] That’s right. Imagine that you have a folder, but you have maybe like get makes five copies of that
[00:34:13] folder in a way that’s very cheap and kind of easy to throw away. So you get this kind of isolation.
[00:34:17] You can work in parallel.
[00:34:18] And the quads don’t interfere.
[00:34:20] Yeah. So you now have support for this, which I think you recently added like
[00:34:23] native support, but like for your workflow, you just stuck with the old one of checking
[00:34:28] out all the separate folders, right?
[00:34:30] Yeah, exactly. I actually find that over time I’m using the desktop app more and more for this
[00:34:34] just because I don’t need these separate checkouts. And, you know, I just have a bunch
[00:34:37] of quads running in parallel and I don’t have to think about it. The other surprise hit is the iOS
[00:34:42] app for me. Every day I start, like I wake up and I just start a few agents on my phone.
[00:34:47] Oh, the native one. Yeah.
[00:34:48] The native one. Yeah.
[00:34:48] It’s like, it’s the quad app. It’s the code tab in the quad app. And it’s the same exact
[00:34:52] quad code.
[00:34:53] Except it runs in the cloud, right?
[00:34:55] It runs in the cloud. Yeah. So you have to kind of configure the environment. Luckily,
[00:34:58] our environment’s pretty simple. So, you know, and we just use hooks for it. So you just use
[00:35:02] the session start hook and configure it. This is kind of one of the benefits of making quad
[00:35:06] code really hackable is it’s very easy to do this kind of configuration. And this is something,
[00:35:11] honestly, I would never have predicted because, you know, like I code on a computer. If you told
[00:35:17] me six months ago, I’d be like, oh, I don’t know. I don’t know. I don’t know. I don’t know. I don’t
[00:35:18] know. I haven’t pulled the data, maybe like a third half, something like this of my code on a
[00:35:24] phone. That’s crazy. But that’s, that’s what I’m doing today.
[00:35:29] And you’re using parallel agents. At what point did you start using them? And how has it changed
[00:35:34] your work? Because one thing that I noticed on myself, I don’t really use that many parallel
[00:35:40] agents. I may be like two at a time, but I’m someone who, well, I like to be in charge and
[00:35:46] especially with Claude. Claude is a tool.
[00:35:48] That you can follow it along. It tells you what it’s doing. You can also have, for example,
[00:35:53] learn mode, which this was shipped a lot earlier where you can actually follow along. It gives you
[00:35:57] tasks. I, I feel that like staying in one tab and following along the model is pretty fast as well.
[00:36:03] I can kind of keep in touch. I’m assuming at some point you must’ve done this, but then what
[00:36:07] happened when you changed to parallel and do you feel you’re losing any control or it doesn’t
[00:36:13] really matter that much? Yeah. I think there’s kind of like two modes to think about or kind of
[00:36:17] like two, two, uh,
[00:36:18] two kind of workflows to think about. So when you’re new to a code base, highly recommend
[00:36:22] learn mode is awesome. I highly recommend it for people that are onboarding to the quad code team,
[00:36:27] people that onboard to Anthropic. Um, the thing that we recommend is, so you do for people that
[00:36:32] haven’t tried it, you do slash config in quad code, you pick the output style and you can do
[00:36:37] learn or explanatory. We usually recommend explanatory cause that tends to be better for
[00:36:41] new code bases, um, that you kind of haven’t been in before. For me, once you’re familiar with the
[00:36:46] code base, you just want to be productive.
[00:36:48] Right? Like you just want to ship as much as you can and you want to kind of be effective doing
[00:36:52] that. Um, so the world really switches. I don’t really go deep into tasks anymore. I start a quad
[00:36:58] in plan mode. I’ll have it kick something off with Opus 4.5. I think it got there with 4.6.
[00:37:04] It just really, really does it. Once there is a good plan, it just, it will one shot the
[00:37:08] implementation almost every time. So the most important thing is to go back and forth a little
[00:37:12] bit to get the plan, right? So what I do is I start one, I enter plan mode. I give it a prompt
[00:37:18] along. I’ll go to my second tab and I’ll start the second quad also in plan mode, get it chugging
[00:37:23] along, then go to the third tab, go to the fourth one. Then maybe I’ll go back to the first one when
[00:37:27] I get notified that it’s done. Uh, and then I’ll kind of, do you have notifications on or you
[00:37:32] turned them off? I actually operate in both modes. Um, sometimes I do like, you know, focus mode on
[00:37:37] the Mac. Um, so I just have it off, but also sometimes I use the system notifications.
[00:37:42] And you’re very, very productive with, with PRS. I mean, I think it was very visible even around
[00:37:47] the holiday break.
[00:37:48] Uh, on social media, you actually were responding to, I think someone reported a bug or a feature
[00:37:55] request. I’m not sure which one it was. And then an hour or two later it was done because, because
[00:37:59] you did it. You’ve also talked about like number of pull requests you’ve done on a day, not to like
[00:38:03] show off, but just as context, what, what does a pull request typically involve in terms of
[00:38:09] complexity? Are these like, are some, some super trivial or some actually like larger pieces of
[00:38:14] work as well?
[00:38:15] Yeah. Pull requests, each one varies a lot. Um,
[00:38:18] sometimes it’s a few lines, sometimes it’s a few hundred or a few thousand lines. They’re all just
[00:38:22] very, very different. It’s changed so much. Like back when I was at Instagram, I think I was one of
[00:38:27] the, uh, top two, maybe top three, most productive engineers at Instagram just by volume of code
[00:38:32] written. Um, so I’ve always, you know, for me, I’ve, I’ve always just coded a lot. Like this is,
[00:38:37] uh, coding is like a way that I can express myself. And it’s just like, it’s a way that my brain
[00:38:41] thinks also. And so now I just get to do it. But I think with quad code, the, the, the kind of code
[00:38:47] that you write, if you are very productive, you’re going to be able to do a lot of things.
[00:38:48] It tends to be even, it’s just the number of PR sort of undersells what, what’s happening.
[00:38:55] Because I think people that used to be very productive in the old days before AI assistance,
[00:39:00] a lot of the code maybe was like code migrations or something like this. So like people that shipped,
[00:39:05] you know, 20, 30 PRs every day, a lot of it was like pretty, you know, like a one-liner or kind
[00:39:09] of migrating A to B or whatever. Nowadays I ship, you know, 20, 30 PRs every day,
[00:39:15] but every PR is just completely different. Some of them are thousands of lines. Some of them are
[00:39:18] hundreds. Some of them are dozens of them are one-liners. It’s none of these are kind of code
[00:39:23] migrations because actually quad just does those. And I don’t need to be part of that.
[00:39:27] Shipping this much code or this much more productive, the obvious question that comes
[00:39:30] up for any, I guess, software professional is, well, the review, what the way teams used to work.
[00:39:35] And I’m not sure if Instagram did this, but a lot of other companies did this is
[00:39:39] you make a pull request, you put it up there. There’s a mandatory human reviewer at Google.
[00:39:44] There’s actually two, cause there’s one on cool quality as well.
[00:39:48] How has that changed?
[00:39:48] Has this workflow changed? How does the podcast team think about code review and
[00:39:53] how has it changed over time?
[00:39:54] Yeah. I’ll start by thinking, I’ll start by talking about how code review used to work for me.
[00:39:58] So the way that I used to do it is every time I also used to be one of the most prolific code
[00:40:03] reviewers. Oh, okay. So both I’ve met at yeah. Yeah. Right. Or is it code reviewers?
[00:40:08] That’s actually, and that’s one of the benefits of being in a different time zone. Like I’m not
[00:40:11] super human. I just didn’t have any meetings. And the way that I approach code review is
[00:40:15] every time that I would have to comment about something, I would,
[00:40:18] drop it in a spreadsheet and I would like describe the issue. So let’s say, you know,
[00:40:23] like someone named a parameter, you know, in a function badly, I would like to put that in a
[00:40:26] spreadsheet. If someone did some bad react pattern or something, I would, I would put that in a
[00:40:30] spreadsheet. And then over time, I would just kind of tally up the spreadsheet. And anytime
[00:40:34] that a particular row had more than three or four instances, I would write a lint roll for it.
[00:40:38] So just automate it with kind of, and so that’s what it used to look like for me. I’ve always
[00:40:43] tried to automate myself away because there’s just so many things to do. And this is one of
[00:40:47] our superpowers.
[00:40:48] As engineers is we’re able to automate all of the tedious work. There’s very few other fields
[00:40:54] where you’re able to do this thing. This is a thing uniquely that we’re able to do. And this
[00:40:58] is a thing that I’ve just always enjoyed because it gives me more free time and I get to do the
[00:41:03] work I actually enjoy. And so today, the way this looks is a little different, but it mirrors this
[00:41:08] a little bit. So when quad code writes code, it generally, it will run tests locally. And this
[00:41:14] is something quad just often decides to do when it’s relevant or it’ll write new tests. So you
[00:41:18] kind of do this, this kind of verification. When we make changes to quad code, quad will also test
[00:41:24] itself. So it’ll launch itself kind of in a sub process. It’ll verify itself and it’ll test itself
[00:41:29] end to end.
[00:41:29] This is for your internal quad code implementation. So you have like this test suite so they can test
[00:41:35] itself.
[00:41:36] Yeah, that’s right. That’s right. But it’ll literally launch itself just in a bash process
[00:41:39] and kind of just see like, hey, do I still work?
[00:41:42] Wow. Okay.
[00:41:43] So it’ll do this. And this is something that we just didn’t code in. Like it just with Opus 4.5
[00:41:47] especially.
[00:41:48] It just started spontaneously doing this. It just wants to kind of check. So we do this.
[00:41:53] And then we also run quad-p. So this is the quad agent SDK in CI. So every pull request
[00:41:59] at Anthropic is code reviewed by quad code. And that actually catches maybe like 80% of bugs,
[00:42:05] something like this. And it’s the first round of kind of code review. Quad will automatically
[00:42:10] address some of these. Some of them it’ll leave to a human because it’s not sure what to do.
[00:42:14] There’s always an engineer that does the second pass of code review.
[00:42:18] So there always has to be a person in the loop approving the change.
[00:42:22] So on the team before anything goes into production, if you will, an engineer does look at it.
[00:42:28] Yes.
[00:42:29] As you’re thinking of code review, would you do this for every type of project? Or this is
[00:42:33] specifically because you now know that this actually has real world impact. People depend
[00:42:36] on it. You know, there’s a lot of users. Let me put it the other way around. Like,
[00:42:41] can you see places where you would just not have an engineer review code? What situations would
[00:42:46] that be in?
[00:42:47] I think it depends how,
[00:42:48] how it’s used. Yeah, I’d agree with that. Like, you know, if you’re building some personal side
[00:42:51] project, like you can just YOLO straight to main, you know, like.
[00:42:55] Even before AI, you would have not reviewed. You just trust yourself or, you know, just ship to
[00:43:01] production or SSH into production and do some changes, that kind of stuff, right?
[00:43:05] Exactly. Exactly. The very first versions of quad code that were internal, like, you know,
[00:43:09] I committed straight to main. But then, you know, as soon as you have users and, you know, for
[00:43:13] Anthropic, our main customer base is enterprises. This is what we care about the most. For us,
[00:43:17] for safety reasons, security is the most important thing. So, you know, if you’re building a project,
[00:43:18] really important. Privacy is important. These are, these are all related. It’s also very important
[00:43:22] for our customers. And so because this is an enterprise product, it has to be secure. It has
[00:43:27] to be, we have to make sure that it meets a certain bar. So we definitely use a lot of automation,
[00:43:32] but at least for now, there has to be a human in the loop just to make sure.
[00:43:37] One thing that is just known about LMZ is they’re non-deterministic. And by putting LM as a reviewer,
[00:43:45] doing a review, like it will give,
[00:43:48] good feedback. But how do you deal with the fact that you can be sure if it’s always giving the
[00:43:54] feedback, you cannot be sure that even if it’s capable of catching an issue that it will
[00:43:58] necessarily catch that? Are you doing anything in this loop to do deterministic thing? For example,
[00:44:03] linting is very deterministic, as you will very well know, like, have you thought of marrying some
[00:44:07] of these ideas or are using, for example, are using linters on the code base where you found
[00:44:11] no need to for it? Yeah, absolutely. Absolutely. Yeah. This is just, yeah. Yeah. We have type
[00:44:16] checkers. We have linters. We run the build. We run the build. We run the build. We run the build.
[00:44:18] Cloud is actually so good at writing lint rules. So actually what I do now, I used to tally stuff up
[00:44:23] in the spreadsheet. Now what I do is when a coworker puts up a pull request and I’m like,
[00:44:28] this is lintable, I’ll just be at cloud, please write a lint rule for this in that PR on their
[00:44:32] PR. And we have, you know, you just run like slash, I think it’s like set up GitHub or something like
[00:44:37] this. You can do this in cloud code and it’ll install the GitHub app, which then makes it so
[00:44:42] you can tag at cloud on any pull request, any issue. I use this every single day. So very,
[00:44:48] very useful. So you want these deterministic steps. Also though, there are, there are ways to
[00:44:54] get cloud to be a little bit more deterministic. So for example, you can do best event, you can
[00:44:58] have it do multiple passes. And this is actually quite easy to do. So, you know, for example,
[00:45:04] the coder view skill that we use internally, it’s open source and it’s available in the
[00:45:09] quad code repo. And so all we do is, you know, we launch parallel agents to do stuff and then
[00:45:13] we launch parallel deduping agents to check for false positives. But essentially best event,
[00:45:18] and the way you implement it is, is all you say is cloud start three agents to do this.
[00:45:23] And that’s it.
[00:45:24] Boris just talked about building that enterprise infrastructure layer, the auth, the permissions,
[00:45:28] the security that has to all work before you can ship to real customers. This makes it a great
[00:45:34] time to speak about our season sponsor, WorkOS. If you’re building any SaaS, especially AI product
[00:45:39] one, then authentication, permissions, security, and enterprise identity can quietly turn into a
[00:45:45] long-term investment. SAML edge cases, directory sync,
[00:45:48] audit logs, and all the things enterprise customers expect. It’s a lot of work to build
[00:45:53] these mission critical parts and then some more to maintain them, but you don’t have to. WorkOS
[00:45:57] provides these building blocks as infrastructure. So your team can stay focused on what actually
[00:46:02] makes your product unique. That’s why companies like Antrofic, OpenAI, and Cursor already run on
[00:46:07] WorkOS. Great engineers know what not to build. If identity is one of those things for you,
[00:46:13] visit WorkOS.com. And with this, let’s get back to building cloud code with Boris.
[00:46:18] How does cloud code work in terms of architecture? So as an engineer, how can I imagine
[00:46:23] it’s set up? We covered some of this in the deep dive, and I think you told me that you had some
[00:46:29] pretty complex ideas when you started and you just simplified a lot of it. Yeah, yeah. It’s very
[00:46:34] simple. Like, you know, there’s not much to it. There’s like, there’s a core query loop. There’s
[00:46:39] a few tools that it uses. We delete these tools all the time. We add new tools all the time. We’re
[00:46:45] just always experimenting with it. So there’s kind of this core kind of agent. And then we have a
[00:46:48] part of it. Then there’s the 2E part of it. And then there’s actually a ton of different pieces
[00:46:54] around security and making sure that everything that cloud code does is safe and that there’s
[00:47:01] a human in the loop for when it happens. And by safety, do you mean as a user,
[00:47:07] when it’s doing stuff on my computer, or also as Antrofic monitoring use cases that could
[00:47:13] be deemed unsafe? Yeah, there’s kind of a couple versions of this. Safety, there’s just many,
[00:47:18] many layers. And for things like safety and security, there’s no one perfect answer. So
[00:47:21] you know, it’s always a Swiss cheese model. You just need a bunch of layers. And with enough
[00:47:25] layers, the probability of catching anything goes up. And so you just have to kind of count the
[00:47:30] number of nines and that probability and pick the threshold that you want. And so for something like
[00:47:33] prompt injection, for example, we do this generally at three different layers. So let’s think about
[00:47:39] something like WebFetch. So CloudFetch is a URL, and it reads the contents of that web page,
[00:47:45] and then it does something in cloud code. So one of the things that we do is we do a lot of
[00:47:48] one of the risks for something like this is prompt injection. Maybe there’s an instruction
[00:47:51] on that website to be like, hey, quad, delete all the folders or something like that. So we think
[00:47:56] about this in a number of ways. The most basic way is it’s an alignment problem. And so Opus 4.6 is
[00:48:01] the most aligned model we’ve ever released, because we’ve taught the model how to be more
[00:48:06] resistant to prompt injection. And so you can read about this on the model card. And I think
[00:48:10] that’s part of the release. The second part is that we have classifiers at runtime, where if
[00:48:16] there is a request that seems to be prompt injected,
[00:48:18] we block it. And we just make the model try again. And then the third layer is for something
[00:48:23] like WebFetch, we actually summarize the results in using a subagent. And then we return that
[00:48:28] summary back to the main agent. So again, this kind of reduces the probability of prompt injection.
[00:48:33] And so you can kind of see how this isn’t just one mechanism. It’s a layer. And by having a
[00:48:38] bunch of these different layers, it just reduces the probability a lot.
[00:48:41] One interesting technical choice that you also mentioned is using RAG or not. RAG retrieval,
[00:48:48] augmented generation. And you mentioned how in the earlier version of QuadCode, you use a local
[00:48:53] vector database to speed up search, and you layer through this away. Can you talk about how this one,
[00:49:00] because this was another example where, I guess, did the model get better?
[00:49:04] Yeah, I mean, this is one of those things where we try so many different things. We try so many
[00:49:08] different tools. And just statistically, most of them we throw away. Even something like the
[00:49:13] spinner in QuadCode, I think it’s gone through like 100 iterations, I want to say.
[00:49:18] Oh, just the spinner. And, you know, out of those, we’ve landed maybe like 10 or 20 in
[00:49:23] production. And like 80 of them, I probably just threw away because it didn’t feel good enough.
[00:49:28] So just statistically, almost all the code we write, we throw away because it’s just so easy
[00:49:32] to write this code and try stuff and see what feels good. So for something like RAG, we tried
[00:49:37] a bunch of different approaches early on. So the first one was RAG for retrieval, because I think
[00:49:42] this I was just like reading up like how people were doing retrieval. And it seemed like all the
[00:49:46] papers were talking about RAG. And so the way that we’re doing retrieval is we’re trying to
[00:49:48] it was like a local vector database. I think it was like written in TypeScript. And you just
[00:49:53] looked on the user machine. And then I was using some like embedding model that was in the cloud
[00:49:58] to compute the embeddings before storing it. And that worked like pretty good. But there’s a lot
[00:50:05] of issues with RAG. So for example, I was finding that the code drifted out of sync. Like if I make
[00:50:10] a local function, it’s not yet indexed. And so RAG isn’t going to find it. There’s also this
[00:50:15] question of like, how exactly is the index permissioned? So who can access it?
[00:50:18] I can access it. But then how do we like encode that in kind of permission policies?
[00:50:23] How do we make sure no one else can access it? How do we make sure that like if there’s a rogue
[00:50:27] IT person within the company, they can’t access someone else’s data? This is really,
[00:50:32] really important that we think about this. And so we just decided like it was sort of working,
[00:50:37] but it also has a lot of downsides. And so we tried a bunch of other stuff. One of them was
[00:50:42] just using the model to kind of index everything recursively. That was kind of a cool idea.
[00:50:48] There’s another version where we just tried glob and grep. We tried a bunch of different stuff.
[00:50:52] It turned out that agentic search just outperformed everything. And when I say
[00:50:57] agentic search, it’s a fancy word for glob and grep. That’s all it is.
[00:51:01] Nice. So the model both got good enough and you realize that it can use these tools
[00:51:05] pretty efficiently.
[00:51:07] Yeah. And this was, it was partially inspired, honestly, by my experience at Instagram.
[00:51:11] Because at Instagram, click to definition didn’t work because the dev stack was just
[00:51:16] borked like half the time. And I think that’s a good thing.
[00:51:18] Now it’s better. And so what engineers have learned to do instead is let’s say you’re looking
[00:51:23] for the definition of the function foo. Instead of click to definition, what you would do is you
[00:51:27] would use the global index, which is quite good at meta. And then you would search for foo per
[00:51:32] opening parentheses. And this worked pretty well. And it’s funny because like this works for the
[00:51:38] model pretty well too.
[00:51:40] Interesting how one idea from one area can come to the other. One of the more advanced parts of
[00:51:46] cloud code that we also have is the idea of the global index. And so what we’re doing is we’re
[00:51:48] looking at the global index. And we’re looking at the global index. And we’re looking at the global
[00:51:48] index. And we’re looking at the global index. And we’re looking at the global index. And we’re
[00:51:48] also previously talked about is the permission system. Can you talk about what was complex
[00:51:54] about it? And also you recently opened source sandboxing, right?
[00:51:58] Permissioning is really complex. There’s like everything else that has to do with security.
[00:52:05] It’s a Swiss cheese model. There are a number of classifiers that run to make sure the command is
[00:52:10] safe. And there’s also static analysis that we do to make sure the command is safe. As a user,
[00:52:16] you can also allow list particular patterns that you
[00:52:18] know to be safe. So for example, some standard Unix utilities we pre-allow because we know they’re
[00:52:25] read-only because we know they can’t export your data or anything like this. So we just won’t
[00:52:29] prompt you for permission. But actually quite few tools fall into this category because even
[00:52:35] something like the find command, there’s actually a way to execute arbitrary code as part of that
[00:52:40] command because there’s like system flags that you can use for this. Or even something like the
[00:52:44] command, there’s ways to use this. So there’s just like all this like Arcania,
[00:52:48] about these various Unix utilities where it’s actually not as safe as you think.
[00:52:52] And so we want to be by default fairly conservative about what we allow by default. As a user,
[00:52:57] though, you can configure an allow list. So you can say, for example, like these patterns are
[00:53:01] allowed, these patterns are not allowed. And so we let you define that. And we also check this
[00:53:06] allow list to make sure that it’s safe. Yeah. And then you have this like neat permission system
[00:53:12] where every time you run a command that needs permission, you can decide to run it once or run
[00:53:17] it for either.
[00:53:18] This is a funny artifact. This was actually in the very, very first version of quad code.
[00:53:27] This is the way permissions worked. This is the very first release. This was like September 2024,
[00:53:33] the first internal release. I remember at the time we weren’t sure whether agentic safety could be
[00:53:37] even be solved. And so there was actually a lot of pushback internally from safety teams because
[00:53:42] they were like, OK, like you can’t just let the model run bash commands like that’s unsafe. So
[00:53:47] like, what do you do?
[00:53:48] Like, this is not a solvable problem. So like, we can’t launch this. I brainstormed with Ben Mann
[00:53:52] and Ben was he started the labs team. He’s one of the founders at Anthropic.
[00:53:57] He’s actually he’s the person that hired me to Anthropic. We just came up with permission
[00:54:01] prompts as the way to do this. You put the if you’re not sure, just ask the human and they can
[00:54:05] decide.
[00:54:06] Yeah.
[00:54:07] I want to ask you about how software engineering is done in general in terms of
[00:54:11] Anthropic. And one of the first questions, which is a, I guess, a more formal one, but or from the
[00:54:18] outside is titles or lack of them. Everyone at Anthropic has the same title member of technical
[00:54:23] staff. Why did this happen? And what does this result in? This kind of like everyone, there are
[00:54:29] basically no titles, right? Except for one.
[00:54:32] I think it’s kind of an acknowledgement that everyone just is figuring stuff out. And if you
[00:54:39] kind of squint and look at the work people are doing, it’s all quite similar. And it’s kind of
[00:54:44] quite generalist. And if you talk to the average person, it’s kind of like, oh, I don’t know, I don’t
[00:54:48] software engineer. They might not just be doing coding. They might also be doing a little design.
[00:54:52] They might also be talking to users. They might be writing their own product requirements.
[00:54:58] They might be writing software and also, you know, doing research. They might be writing product code
[00:55:02] and also infrastructure code. At Anthropic, there’s a lot of generalists. This is also, you
[00:55:07] know, from my background, this is one of the reasons that I gravitated towards it. And I think
[00:55:11] member of technical staff just kind of encodes this in the way that people talk to each other,
[00:55:16] even if they don’t know.
[00:55:18] Without this title, the default would have been, I see your name on Slack and under your name,
[00:55:22] it says software engineer. And then I’m like, well, OK, I guess you’re like you’re the coding
[00:55:26] person. And so I’m not going to ask you like product questions. But when everyone’s title
[00:55:30] is member of technical staff, by default, you assume everyone does everything. And so it kind
[00:55:34] of inverts this this relationship between people, even if you don’t know each other well yet. In a
[00:55:39] way, it’s kind of this like optimism built into the built into the structure. I think it’s also
[00:55:45] a glimpse of the future, because I think this is where software engineers are going to be. And I
[00:55:48] think this is where software engineering is going. I think this is where every discipline is going
[00:55:51] is more of this generalist model.
[00:55:54] It definitely feels like it in software engineering. And I heard this funny comment by Mark
[00:56:00] Andreessen, how he said that there’s this Mexican standoff happening in the tech world where the
[00:56:05] designers are saying that they’re actually now doing like PM and engineering work. The engineers
[00:56:11] are saying that we’re doing design and like everyone thinks they’re doing the work of the
[00:56:16] others. And they’re kind of standing there like I’m doing, you know, I’m doing this, I’m doing that.
[00:56:18] You’re working as well. But the reality is everyone’s role is expanding, most of it thanks
[00:56:21] to AI, because it makes it easier for an engineer to do product work or for a product person to
[00:56:25] engineer work and so on. So it’s what you’ve said. I remember back in the back in June or July of
[00:56:31] last year, I walked into the office. And the data, there’s a row of data scientists, that’s
[00:56:36] right next to the quad code team, at least at the time. And I walked in and our data scientist for
[00:56:42] the quad code team had quad code up on on his monitor. And he was using it. And I was like,
[00:56:48] this is interesting, because you’re a data scientist, did you have like, why are you using
[00:56:52] a terminal? Like, you didn’t have Node.js installed, because we depended on Node.js back
[00:56:56] then. I was like, are you are you dogfooding it? Like, are you just like trying to like figure out
[00:57:00] how this thing works or something? He’s like, no, no, I’m like, I’m using it to run queries. He was
[00:57:04] just like using it to run SQL. And it has like little like ASCII visualizations in the terminal.
[00:57:09] And then the next week, the entire row of data scientists had quad code running on their
[00:57:14] computers. And this expanded. And so
[00:57:18] if you look at the team today, on the quad code team, everyone codes, the engineer’s code,
[00:57:23] our engineering manager codes, designers code, data scientist code, our finance guy codes,
[00:57:31] everyone on the team codes. And I think part of it is quad code just make it so easy. So you don’t
[00:57:38] really have to understand the code base, you can just like dive in and kind of make small changes
[00:57:42] quite easily. But I think another thing is, people are able to use quad code to do
[00:57:48] their jobs more, whether it’s, you know, financial forecasts, or, you know, data science or
[00:57:52] whatever. And by doing this, it’s actually quite an easy crossover to just use it to write a little
[00:57:56] bit of code also. So it’s just a way to dip your toe in the water. One other interesting thing
[00:58:00] about how you work is Kat, who was talking about she is, I guess, you title is the same, but people
[00:58:07] might gravitate for a role a bit more. I understand she’s a little bit more on a product role. But you
[00:58:12] said that PRDs are just not really written inside Entroping. And PRDs, product requirement document,
[00:58:18] it’s a well known artifact across big tech, and increasingly over larger startups where you write
[00:58:22] a spec. And the idea is that you write down your thoughts, people align, you send it over. And now
[00:58:27] you know what to build. But apparently, you’re not doing much of this or at all.
[00:58:30] Some of this, I think, is because Entropic is still, you know, it’s still a startup. So you
[00:58:34] don’t actually have to align with that many people. Usually, you can just kind of talk about
[00:58:37] it or do it in Slack or whatever. But yeah, also part of it is, you know, like Kat used to be an
[00:58:42] engineering manager, she’s extremely technical. And I think this is this is the way that you know,
[00:58:47] our product team thinks about it.
[00:58:48] You’re doing a lot of prototyping instead. So like, that’s also something where, when we talked
[00:58:55] about how you were building Cloud Code early on, you were showing actually, you had a whole thread
[00:59:00] about the number, I think you did like 15 or 20 prototypes for the to do list, and all of them
[00:59:06] interactive working. And what surprised me, compared to my past experience, and you said
[00:59:11] that, well, you did this in like a day and a half, all 20, tried it out, got a feeling for it,
[00:59:16] which is incomprehensible for me.
[00:59:18] It would have taken a week or two weeks, and people would have not done 20, they would have
[00:59:21] done three. Yeah. So like, are you seeing this? Is there an increase in prototyping and building
[00:59:27] and showing instead of, you know, writing things?
[00:59:30] Yeah, absolutely. I mean, on our team, the culture is we don’t really write stuff, we just we show
[00:59:34] it’s a little hard to reflect back on the time before, because I think now just prototyping
[00:59:39] everything is so baked into the way that we build. Just everything is prototyped multiple times.
[00:59:45] Like, you know, we launched agent teams earlier this week,
[00:59:48] this is our implementation of swarms. It’s very exciting, because it just lets quad do more work
[00:59:54] for longer, more autonomously. You have a bunch of different uncorrelated context windows. And you
[00:59:59] have this kind of communication between agents, they can just do more. This is something that
[01:00:03] Daisy and Suzanne and other folks on the team, and Karen, they prototyped this for months. And they
[01:00:10] tried all in all, probably hundreds of versions of this before they got a user experience that felt
[01:00:15] really good. It was just really, really good. It was really, really, really, really, really, really,
[01:00:17] really hard to get right. There’s just no way we could have shipped this if if we started with,
[01:00:22] you know, like static mocks in Figma, or if we started with a PRD or something like this,
[01:00:27] it’s a thing that you have to build. And you have to feel and you have to see how it feels.
[01:00:31] And to me, one of the takeaways, even from there was like, we probably should prototype more and
[01:00:35] just be more daring or just release your priors of how long it took to build a prototype or who
[01:00:41] needed to build. Back then, it was always an engineer that needed to build, but it’s probably
[01:00:44] not true anymore. Yeah, that’s right. I mean, we’re in this world right now,
[01:00:47] also, where we just we don’t know what the right answer is. You know, I think back in the old way
[01:00:52] of building you, the cost of building was high. And so you had to actually spend a lot of effort
[01:00:57] to aim very carefully before you take your shot. Because after you take your shot, it’s very hard
[01:01:02] to course correct, you can only take so few shots. But now it’s changed, the cost of building is very
[01:01:07] low. But also, we don’t know where we’re aiming. So we just have to like, we have to try and we
[01:01:11] have to see what feels good. And it’s just very, very exploratory. And I think also a big part of
[01:01:16] it is humility.
[01:01:17] Where, you know, personally, I’m wrong. Like half the time, I’d say like, most of my ideas are bad,
[01:01:23] at least half of them are bad. And I don’t know which half until I try it.
[01:01:28] And then get feedback from others as well, sometimes.
[01:01:30] That’s right. It’s like, I have to try it myself. And then I have to see what others think.
[01:01:34] Because, you know, my intuition does not always match others.
[01:01:36] When you were showing these prototypes of just how the tasks were built, you were telling me that
[01:01:42] you build the prototypes, and then your process was always you first like looked at it, you tried
[01:01:46] it out.
[01:01:47] Yeah, feel for it. And then for the ones that you felt were good, you showed it to others. And
[01:01:52] sometimes they give you feedback like, nah, this doesn’t work. And then sometimes when it felt good,
[01:01:56] then you share that even broader. So I feel like, you know, like, it’s a mix, right? Where like,
[01:02:00] sometimes you can decide already. And then sometimes you get feedback, and then eventually
[01:02:04] some good ideas come out of it.
[01:02:06] Yeah. And there’s a lot of examples of this. Like, we launched this kind of condensed view for file
[01:02:10] reads and file search, just because the model is just so agentic. Now, like, I felt like half the
[01:02:15] screen is these like file reads, and I actually don’t care.
[01:02:17] I, you know, I read a thing, I don’t really care what it is. And so we condense this down to make
[01:02:21] the output a little bit more readable. I really liked it. After probably 30 prototypes or something
[01:02:26] like this, it took it took so much effort to make that feel really good and clean. We rolled it out
[01:02:31] to employees at Anthropic for about a month, and we had everyone dogfooded. And I fixed another
[01:02:35] probably dozen dozen bugs, dozen tweaks based on all this feedback. We launched it externally. And
[01:02:41] you know, almost all users liked it. But there were a few users that didn’t because they want
[01:02:45] more expanded output.
[01:02:47] On the GitHub issue, I was just going back and forth with people to be like, you know, like,
[01:02:51] what don’t you like? And people gave a lot of feedback, I shipped another version, then some
[01:02:55] people liked it, some people didn’t. And so I iterated again, and kind of made it good. And
[01:02:59] it’s actually, I think, almost there, where people can configure it the way that they want,
[01:03:03] but still the default is really good. But this is just the process. You know, we get it right some
[01:03:08] of the time we have to learn from our users, we want to hear from people, so we can get it right.
[01:03:12] Do you use ticketing systems for your work? Or, you know, where you capture like,
[01:03:15] here’s the work I want to do? Or,
[01:03:17] do you just pretty much do the work as, as it comes in?
[01:03:20] So at Anthropic, we leave it up to teams on the quad code team, we leave it up to every person.
[01:03:25] Different people use, use this differently. For example, I don’t use a ticketing system.
[01:03:29] Some people like to use Asana or notes or something like this. One of the coolest things
[01:03:34] that I saw, this is maybe like three months ago or something, we launched plugins. And the way
[01:03:39] we launched that is Daisy for a weekend. She had a very early version of swarms. And she let the
[01:03:46] swarm run. And she told me, I’m going to do this. And I’m like, I’m going to do this. And I’m like,
[01:03:47] your job is to build plugins. You have to come up with a spec, then you have to make an Asana board
[01:03:52] and split up into tasks. And then all the different agents have to build it. And she set up a container
[01:03:58] and she set up a quad in dangerous mode. And she let it run for the entire weekend. It spawned a
[01:04:04] couple hundred agents, they made 100 tasks on the Asana board. And then they implemented it. And
[01:04:09] that’s pretty much the version of plugins that we shipped. These kind of coordination systems,
[01:04:13] they used to be for humans. But I think nowadays, it’s just as much for models.
[01:04:17] Let’s talk about cloud co-work. It’s one of the very impressive things about this. It looks great.
[01:04:24] So I tried it out. It’s inside cloud, you have the co-work tab there. And you can, I feel it’s a lot
[01:04:30] more visual way of running agents interacting with them. One of the surprising things I heard
[01:04:35] that it was built in 10 days. Can you take us through like what it took to build it? And what
[01:04:40] does it actually mean? Was it from the idea or like from the decision of building it? And how
[01:04:44] big was the team building it? The team was really small. It was just a few people.
[01:04:47] For a long time, we felt that there is some product to be built for non-engineers. The reason
[01:04:54] we felt this is for a long time, people that were using cloud code are non-engineers. And so, you
[01:05:01] know, in the product world, when you see latent demand, you see people jumping through hoops to
[01:05:04] use a product that was not designed for them. That’s a really good sign. It’s time to build
[01:05:10] another product that is built just for them. There’s all these people on Twitter that there’s
[01:05:14] this one guy that was using cloud code to like monitor his tomato.
[01:05:17] Plants. I just, I love this. It was like getting a webcam set up and the quad was like, oh my God,
[01:05:22] I’m so happy that our plant is budding. And because it was, it had like a webcam and just
[01:05:27] like every day it was like monitoring it. And it was so happy that the tomatoes were growing.
[01:05:30] There was someone that was using cloud code to, you know, recover photos off of a corrupted hard
[01:05:35] drive. And it was like his wedding photos. Wow.
[01:05:38] You know, like I said, our entire finance team at Anthropic uses cloud code.
[01:05:42] Our sales team uses cloud code. So there’s just all these people that are non-engineers that were
[01:05:46] using it.
[01:05:47] And at that point, cloud code, it’s available in a lot of form factors, right? Like we started in a
[01:05:52] terminal, then we expanded and we added support for IDEs. So we have extensions for, you know,
[01:05:58] every VS code based IDE, every JetBrains based IDE. There’s also iOS and Android apps. There’s
[01:06:04] the desktop app. There’s web. So then there’s like Slack and GitHub apps. So we kind of expanded to
[01:06:11] all these places to make cloud code easier for engineers. But ultimately, none of these are
[01:06:15] built still for non-engineers.
[01:06:17] And so cloud code evolved a lot, but it still felt like there’s a there’s kind of a gap and there’s a
[01:06:22] product that could make this even easier for people. And so for the last couple of months, the team was
[01:06:27] kind of hacking around and just saying, like, what is the right product? And at some point, someone
[01:06:31] came up with this idea of like, what if we just take cloud code, add some guardrails? So, for
[01:06:35] example, co-work ships with a virtual machine. This is one of the many ways that we make sure it’s
[01:06:40] really safe, especially for non-technical users that don’t want to read like bash commands to
[01:06:45] figure out what it’s doing.
[01:06:47] And they were hacking on this. I think it was something like 10 days until end or something. It was just fully built with
[01:06:52] quad code and then we shipped it.
[01:06:55] And can you give us a sense of like the complexity behind an app like this?
[01:06:59] And if we can walk through like what parts needed to be built, because from the outside, it’s a little bit hard to tell, like, is
[01:07:06] this just a nice UI wrapper that’s, you know, like, I don’t know, like a few hundred lines of code.
[01:07:10] I’m just being obviously I’m provocative here or behind the scenes, it’s actually a really complex piece of software.
[01:07:17] And the reason I asked.
[01:07:17] It’s like Uber is a great example where people look at the app.
[01:07:20] It looks really simple.
[01:07:21] I work there and I know it’s it’s really, really complex because you don’t see a lot of the complexity.
[01:07:26] There’s a lot of regional things.
[01:07:27] There’s a lot of back end things that are all hidden.
[01:07:29] So from just from looking at a cloud co-work, it’s hard to tell how much of this is is additional business logic that needed to be carefully
[01:07:37] thought out versus it’s actually just a nice little thin wrapper on top of the model.
[01:07:41] In some places, I think there’s less complexity than you would think.
[01:07:44] In some places, there’s more complexity.
[01:07:46] So on the product side, it’s quite simple.
[01:07:47] Because it’s just the quad desktop app.
[01:07:50] So, you know, you download the quad app.
[01:07:51] It’s it’s a single desktop app.
[01:07:53] It has a tab for co-work.
[01:07:54] It has a tab for code.
[01:07:55] It has a tab for chat.
[01:07:56] So it is just one app.
[01:07:57] And we’re able to inherit a lot of that product logic.
[01:07:59] There’s some UI rendering code under the hood.
[01:08:02] You know, it’s just the same quad code running.
[01:08:03] It’s the same quad agent SDK that powers quad code.
[01:08:07] A lot of the complexity actually is about safety because we know, like I said, we know the user is non-technical.
[01:08:13] And so we just want to make sure they have a good experience.
[01:08:16] And so, for example, if someone launches the app.
[01:08:17] And, you know, like they delete a bunch of family photos, that’s really not good.
[01:08:21] And so we wanted to make sure that we protect against this.
[01:08:24] So you can’t accidentally do that.
[01:08:26] And so that’s where a lot of the guardrails came from.
[01:08:28] So there’s a bunch of classifiers running on the back end.
[01:08:30] This is for safety and again, extra mitigations for things like prompt injection and, you know, risks like this are on security.
[01:08:37] On the front end, there’s an entire virtual machine that we ship.
[01:08:41] There’s a bunch of operating system system level integrations to make sure people don’t accidentally delete things.
[01:08:47] So just around safety there, there’s a lot there.
[01:08:50] And then we also have to rethink the permission system because we inherit the permission system from quad code.
[01:08:56] But also for co-work, actually a big part of value is not just running locally, but it’s using all of your tools the way that quad code uses it.
[01:09:03] But the thing is, for non-technical users, your tools aren’t really available as CLIs.
[01:09:08] Some of them are available over MCP.
[01:09:10] Many of them are available in a browser.
[01:09:13] And so co-work is really, really good when you pair it with a Chrome extension.
[01:09:16] And this is the way that I usually use it.
[01:09:18] So, you know, for example, I use it every week to do project management for the team.
[01:09:22] We have like we have a spreadsheet that tracks kind of at a really high level what everyone’s working on.
[01:09:26] And this is kind of my personal way of project managing.
[01:09:28] You know, other people, like I said, use Asana, other people use notes or whatever.
[01:09:32] For my own tests, I don’t use anything, but kind of for the team overall, I have the spreadsheet.
[01:09:37] And I have co-work kind of check in and I just ask co-work every week, hey, can you look at the rows for any status that has not been filled out?
[01:09:44] Can you just ping the engineer on Slack?
[01:09:46] And so it’ll open one tab in Chrome for the spreadsheet.
[01:09:49] It’ll open another tab with Slack and then it’ll just start messaging engineers in Slack and it just one shots it.
[01:09:55] There’s like one engineer’s name for some reason it can’t autocomplete, but everything else it just gets.
[01:10:01] And so this is actually like from a safety point of view, we also thought pretty deeply about this Chrome extension, how this works and how the permissioning model should interact with this local permissioning model.
[01:10:11] So there’s also a bunch of code to kind of make sure that that’s that feels smooth.
[01:10:15] And what’s the tech side behind it?
[01:10:16] I assume a lot of it will be similar to the cloud app, but is it is it Electron, TypeScript, those kind of things or something else?
[01:10:23] Yeah, just Electron and TypeScript.
[01:10:25] Actually, some of the people working on it are early Electron folks.
[01:10:28] So Felix, who’s, you know, the creator of of co-work, he was a really early engineer on Electron and he helped build it.
[01:10:35] Oh, amazing. And co-work launched Mac OS only.
[01:10:41] What was the reason for both for choosing this platform first and for now only choosing this platform?
[01:10:46] Yeah, so Windows coming soon, I think probably by the time this podcast comes out, we will have Windows support.
[01:10:53] We just wanted to start early and start learning, you know, like everything we do at Anthropic, it’s kind of like the way that I told my own story.
[01:11:00] The one of the things I like about Anthropic is it just really, really matches the way that people here think about it.
[01:11:06] You know, back to this point where, like, we don’t have high certainty about the things that we build and our intuition is often wrong.
[01:11:12] And so we just have to, like, learn from users and figure out what people actually want.
[01:11:16] And.
[01:11:16] Just spend a lot of time listening to people and understanding the feedback deeply.
[01:11:20] This is the way that we build a product.
[01:11:21] And so we always launch a little bit before it’s ready.
[01:11:24] We did this for quad code when we launched quad code.
[01:11:27] Initially, it didn’t even support Windows.
[01:11:29] Also, it didn’t support, you know, like a lot of different stacks.
[01:11:32] And then over the coming weeks, we added support for every stack.
[01:11:34] Now quad code supports every single stack, you know, like Windows, whatever weird Linux destroy use Mac OS.
[01:11:41] We support everything.
[01:11:42] And so for co-work also, we just wanted to launch early.
[01:11:44] We wanted to start with Mac.
[01:11:46] That was that was just the starting point.
[01:11:47] But yeah, it’s it’s going to support everything.
[01:11:50] What one thing you mentioned is is getting feedback.
[01:11:53] I’m curious both for cloud code and for cloud co-work.
[01:11:56] How do you go about things like observability monitoring when you’re rolling out?
[01:12:00] Do you use any feature flags?
[01:12:02] And I’m more interested in, like, did you build custom tools for this or did you decide to use certain vendors?
[01:12:08] Because especially for observability, I’m sure that this is this is both important, but it also sounds like pretty high scale in terms of.
[01:12:16] The number of users that we can derive or it’s this will not be a small operation.
[01:12:20] Yeah, there’s there’s some off the shelf vendors that we use.
[01:12:23] There’s some custom code that we use.
[01:12:24] So it’s actually it’s a mix of both.
[01:12:26] There’s nothing too surprising about it.
[01:12:29] There’s one thing about Anthropic that’s kind of interesting is because we’re an enterprise company and we care a lot about privacy and security, we can’t see people’s data.
[01:12:36] And so, you know, like if someone reports a bug like I actually can’t pull up your logs to kind of see what’s going on, a lot of work goes into kind of figuring out how to log events and things like this.
[01:12:46] Like this in a privacy preserving way, this is just very important to the way that we operate.
[01:12:50] For cowork, what kind of learnings have you had so far?
[01:12:52] It’s been out for, I think, a few weeks now.
[01:12:56] Did you see something unexpected?
[01:12:59] Are you shaping the product based on feedback that you’re getting?
[01:13:02] Yeah, every day the team is landing so many fixes.
[01:13:06] The most surprising thing is just how much people are loving it.
[01:13:08] To be honest, when Cloud Code first came out, it actually wasn’t an overnight hit.
[01:13:13] This is something people think it was.
[01:13:15] But.
[01:13:15] It was sort of a slow takeoff at the beginning, and I think the first big inflection was in May when we released Opus 4 and Sonnet 4.
[01:13:22] That’s when it really clicked and that’s when our growth became exponential.
[01:13:26] But at the beginning, it was sort of a research preview.
[01:13:28] People didn’t really know how to use it.
[01:13:29] Some people got it immediately, but most people didn’t.
[01:13:32] It took it took a little while.
[01:13:33] For cowork, it’s a much steeper growth trajectory than Quad Code was at the beginning.
[01:13:38] So it’s just been an instant hit.
[01:13:39] And that’s actually been very surprising.
[01:13:42] I didn’t really expect that.
[01:13:43] One of your new releases.
[01:13:45] Which came out just very recently.
[01:13:47] It was, I think, yesterday or the day before when we’re recording this podcast was Agent Teams.
[01:13:52] And as I understand the idea with what Agent Teams, Agent Swarms, instead of a single agent, you can have a lead agent and it can delegate to its different teammates.
[01:14:03] How do you start experimenting with this and how did you decide to ship it now?
[01:14:07] We’re always doing experiments, right?
[01:14:09] There’s there’s there’s all sorts of ways to get more mileage out of out of Quad Code.
[01:14:14] Right. There’s there’s there’s all sorts of ways to get more mileage out of out of Quad Code.
[01:14:15] One way you can do it is by extending context.
[01:14:19] Another way is auto compacting context.
[01:14:20] So it’s essentially infinite context.
[01:14:22] And that’s what we have right now.
[01:14:23] Another way is using subagents.
[01:14:25] So you have multiple agents kind of working together.
[01:14:29] There’s just like a lot of different approaches to get a little bit more mileage out of the context window.
[01:14:32] There’s this one idea called uncorrelated context windows.
[01:14:36] That’s what we call it. And the idea is you have multiple context windows, but they essentially start fresh so they don’t know about each other.
[01:14:44] And so an example.
[01:14:45] Of this is like a correlated context window is if you have one, if you have the model and it does a task and then you have it just to a second task in that same context window.
[01:14:53] And in this case, the second task knows about the first one because it’s in the same window.
[01:14:57] But for something like a subagent, it’s uncorrelated because the main agent prompts the subagent.
[01:15:01] But the subagent context windows fresh besides that prompt, it doesn’t know what’s in the parent context window.
[01:15:07] And you can see this actually a little bit in, for example, like subagents versus skills, because when you run a skill, you know, or slash command.
[01:15:15] It sees the parent context window versus for a subagent.
[01:15:18] It doesn’t.
[01:15:19] So it’s uncorrelated.
[01:15:20] There’s some cases where you want that context.
[01:15:23] There’s some cases when you don’t.
[01:15:25] And there’s this kind of interesting thing where uncorrelated context windows and just throwing more context at the problem and throwing more tokens at it when the windows are uncorrelated gives you better results.
[01:15:36] It’s actually a form of test time compute to do this.
[01:15:39] And for something like teams, we’ve been experimenting with this for a while, I think, since maybe like.
[01:15:45] October or September or something like this.
[01:15:47] And it really just felt like with Opus 4.6, it clicked where the model figured out really how to use this.
[01:15:54] And sometimes you see these kind of cute exchanges where the agents are talking to each other and they’re like discussing something.
[01:16:00] And this is very cool to see.
[01:16:01] It’s very like humanistic in a way.
[01:16:03] But there’s other times where you just get very good results.
[01:16:05] And so we had a bunch of internal evaluations, for example, where we have quad build something very, very complex, something more complex than what a single quad would build.
[01:16:13] And we saw the results.
[01:16:15] It’s just really, really improved with Opus 4.6 with teams.
[01:16:18] And that’s why we felt it’s the right time to release it.
[01:16:20] We also wanted to be careful.
[01:16:22] And the reason you have to opt into it, the reason it’s a research preview is it uses a ton of tokens because it’s just a bunch of quads that are running.
[01:16:29] Not everyone wants this all the time.
[01:16:32] So just excited to see how people use it.
[01:16:34] And, you know, to hear the feedback, it’s it’s something you want for fairly complex tasks.
[01:16:39] You don’t probably want this for every task.
[01:16:41] The main quad decides the roles for the sub quads.
[01:16:44] We don’t have that.
[01:16:45] Kind of a regimented way to do this.
[01:16:46] It’s context specific.
[01:16:48] I wouldn’t say there is one right way to do it.
[01:16:50] I think actually a lot of the magic of this comes out of this idea of uncorrelated context windows.
[01:16:55] It’s less about the specific configuration of the agents.
[01:16:57] But, you know, it’s something that people should experiment with.
[01:17:00] I don’t think there’s a one size fits all.
[01:17:02] Have you seen use cases even in even I know it’s still research, but have you seen use cases where it could look if it looks promising this approach, this swarm approach?
[01:17:10] Well, you know, like I said before, plugins were fully built with swarms.
[01:17:13] There’s a bunch of other features.
[01:17:14] We’re since that were built in this way.
[01:17:16] So, yeah, I think for anything where you see a single quad struggling, the swarms can help.
[01:17:21] It’s an interesting to look at.
[01:17:23] Talking about change in general with Andrew Carpathy, you had a really interesting exchange back in December where when he posted that he’s never felt as much behind as a programmer as he is now because of the progress with AI.
[01:17:38] And then you share the story about how you start to debug a memory leak the old fashioned way.
[01:17:44] And then.
[01:17:44] Just one shot at it.
[01:17:46] I think it was a reflection of like how everyone is feeling that things are changing so fast.
[01:17:50] And in the in the holiday break, I started to feel that things have really shifted.
[01:17:55] How did you, I guess, come to terms with this or start to embrace this change?
[01:18:00] This is something I really struggle with.
[01:18:02] The model is improving so quickly that the ideas that worked with the old model might not work with the new model.
[01:18:11] The things that didn’t work with the new model might work.
[01:18:14] Or with.
[01:18:14] The old model might work with the new model.
[01:18:16] And it’s weird because there’s just not a lot, a lot of other technologies like this.
[01:18:21] So I just don’t really have a lot of experience to draw on to figure out how I should approach this.
[01:18:28] And it’s been this new skill that I’ve had to learn.
[01:18:31] In a way, it’s like you just always have to bring this beginner mindset.
[01:18:35] Honestly, like I’m using the word humility a lot, but you always just have to bring this kind of intellectual humility because just all of these ideas that were bad before are now good.
[01:18:43] And.
[01:18:44] Inverse.
[01:18:45] I think that’s honestly it.
[01:18:47] It’s something I constantly have to remind myself about.
[01:18:50] And back in the, it’s funny back in the old world, when someone tries an idea again, and we’ve tried it in the past and it didn’t work.
[01:18:57] Usually the feedback is like, why are you doing this again?
[01:19:00] Yeah.
[01:19:00] You should learn.
[01:19:01] This is, I mean, we used to call it a bit of a gatekeeping, but it was somewhat valid where I know with architecture, someone came and said, like, why don’t we do microservices?
[01:19:09] And someone said, we tried it and it didn’t work.
[01:19:11] And if you tried it a year or two or three years ago, it was kind of valid, right?
[01:19:14] Because not much has changed.
[01:19:16] Yeah, that’s right.
[01:19:16] That’s right.
[01:19:16] And it’s something like Microsoft versus funny because it’s like every 10 years it goes in and out of it and out of style.
[01:19:22] But yeah, now, now it’s I think the first time ever where it’s actually not crazy to just try the same idea every few months because the model improves and it just works.
[01:19:30] And I actually see this with engineers on the team, like new people that are newer to the team, people that are newer to engineering sometimes do things in a better way than than I do.
[01:19:40] And I just have to, like, look at them and I have to learn and I have to adjust my expectations.
[01:19:44] You know, like an example of this is, you know, when we release features, sometimes I’ll like screenshot myself using them on, you know, on X or on threads or whatever, just to kind of talk about it.
[01:19:54] But recently, Tariq, our, you know, our DevRel guy, he actually codes a lot.
[01:19:59] He’s amazing.
[01:20:00] And he just started automating this.
[01:20:02] So he’s having like Cloud Code generate its own videos for its launches.
[01:20:06] And he just started doing this.
[01:20:08] And, you know, this is something like I thought would be, you know, maybe it’s possible.
[01:20:12] It’s not something I would have tried because I wouldn’t have thought.
[01:20:14] The model was ready, but he just he just did it and it just kind of worked.
[01:20:17] One thing that I felt like just a bit like odd about, and I think a lot of developers can relate, is I’ve come to terms with this starting from Opus 4.5 and also similar models.
[01:20:30] I think GPT 5.2 gave me similar vibes as well.
[01:20:34] The models have been just really good at writing code.
[01:20:36] And I realized that I don’t think I will handwrite the code when I want to get stuff done.
[01:20:42] If I actually want to, you know, get the.
[01:20:44] Pleasure of writing, I can still do it.
[01:20:46] But one thing I reflected on is it’s just been so much effort to get good at coding.
[01:20:51] I remember when I when I was learning, when I started from like kind of hacking around to go into university to learning C and C++, and it was just bloody hard and actually going through my first few jobs where I started to become better at it, it became better at debugging.
[01:21:06] And there’s a point where like a lot of my identity was tied to being good at coding.
[01:21:11] That’s how we used to get jobs or higher paying jobs.
[01:21:14] And so when I was an engineering manager, when we designed the interview loop at Uber, we we had talked with managers of what we need to screen for.
[01:21:20] And we would talk like, well, what do developers do most of their time?
[01:21:23] About 50% of time they code.
[01:21:25] Therefore, we placed about 50% of the signal was all about coding.
[01:21:29] So there was a lot of things tied into coding because it is just hard.
[01:21:32] I think we all know that it takes grit.
[01:21:34] It takes some level of intelligence to get good at it.
[01:21:37] And there’s a sense of loss of like, well, I think it’s great on one end that the model can do it.
[01:21:42] But it feels that.
[01:21:44] Something really quickly got taken away that I don’t think I personally thought it would happen this quickly.
[01:21:50] And I’m.
[01:21:52] I think a lot of other people are feeling like some people move on a bit easier, but there’s definitely the sense of grief.
[01:21:59] How did you think about it?
[01:22:00] Because again, you’re, you’re an example of you, you wrote so much code at, at Facebook also outside of it.
[01:22:08] I know it was just a tool of doing it, but not many people could do what you did and now the models can also work as good as you.
[01:22:14] I think it’s, it’s something that used to be a thing that we do as software engineers.
[01:22:23] It’s becoming a thing that everyone is able to do.
[01:22:25] There was a moment, you know, like when I started coding, it was a very practical thing and it was a way to get things done.
[01:22:31] And at some point I just fell in love with the art of coding and like languages and kind of the, the, the tools themselves.
[01:22:39] And at some point I kind of fell down this rabbit hole.
[01:22:41] I wrote this, like I wrote, I wrote a book about, you know, a programming language.
[01:22:44] Typescript, you wrote the first ever typescript, uh, book at, with O’Reilly.
[01:22:49] Yeah, yeah, yeah.
[01:22:49] That’s right.
[01:22:50] Um, it was, it was funny.
[01:22:52] Actually, there, there was this, like, there was this amazing moment for me in my little town in Japan.
[01:22:56] I went to the bookstore and I found that book translated in Japanese.
[01:22:59] No, in this tiny town.
[01:23:00] And that was just like the coolest moment.
[01:23:02] And then I actually realized I, I don’t remember typescript at all.
[01:23:04] Cause I was only writing Python for a couple of years at that point.
[01:23:09] Yeah.
[01:23:09] And like at some point I started the, the first, the, the biggest typescript meetup in the world that was in, that was in SF.
[01:23:14] And I got to meet kind of a lot of my heroes.
[01:23:16] There was like Chris Kowal, who wrote like general theory of reactivity.
[01:23:19] There was a Ryan Dahl, the guy that made Node.
[01:23:22] One of the first times that I went really deep into this, this community and, um, just the language itself and the tools themselves.
[01:23:30] And for something like typescript, there’s this beauty in the type system, in the type system.
[01:23:35] Cause Heilsberg is just like, he, he, he’s just brilliant.
[01:23:38] Like the, the idea of like conditional types and just like anything can be a literal type.
[01:23:43] And there’s.
[01:23:44] There’s these very deep ideas that even the most hardcore functional languages do not have.
[01:23:51] Like even in something like Haskell, like it doesn’t go this far and Anders just took it and he pushed it much further than, than had, had been pushed.
[01:23:59] And you know, like Joe Pamer and a bunch of other folks kind of explored a lot of these ideas and thought of this.
[01:24:04] And I think for them, it was also very practical, right?
[01:24:06] Because they had these large on type JavaScript code bases.
[01:24:09] How do you gradually migrate it to something typed and you have to come up with these very beautiful ideas to, to do.
[01:24:13] For me, Scala was another kind of rabbit hole that I fell into and kind of like the, this functional programming world and still, when I write code and when the model writes code, I always think in the types first, that that’s what matters is what it, what is the type signature that matters more than the code itself and getting that right.
[01:24:30] So there is this beauty to it.
[01:24:32] There’s a, there’s an art to it for sure, but in the end, it’s a practical thing.
[01:24:38] And in the end, this is a thing that we use to, to build things.
[01:24:42] And, you know, it.
[01:24:43] It’s a means, it’s a means to an end.
[01:24:45] It’s not an, it’s not an end to itself.
[01:24:48] I think one metaphor I have for kind of this moment in time that we’re in is the, the printing press in, you know, like the, the 1400s or whatever, because at that moment it was, it was actually quite similar, right?
[01:24:59] Like there was a group of scribes that, you know, knew how to write.
[01:25:02] And it was, as I understand, of course, we never lived it, but as, as I imagined, it was, it was a hard process to learn.
[01:25:09] You needed to learn, you needed to get the equipment.
[01:25:11] You probably needed some sponsorship or being selected.
[01:25:13] Yeah, yeah.
[01:25:14] Practicing because you needed to produce the same thing over and over again.
[01:25:18] And few people could do that.
[01:25:20] And I assume it was either high prestige or highly paid or who knows, let’s assume it was within the printing press came along.
[01:25:26] Yeah.
[01:25:27] Yeah.
[01:25:27] And at least in Europe, like you have to, like a Lord or a King or something had to, had to employ you.
[01:25:32] And then you have to go through, you know, years of training and there was this class of scribes that knew how to write.
[01:25:38] They were employed by someone like this often the King themselves, like, or, you know, the queen was, was not literate.
[01:25:43] So it was this very, very niche skill.
[01:25:45] And it was like less than 1% of the population was literate in Europe, you know, back then.
[01:25:50] And then the printing press came out and what happened?
[01:25:54] So the cost of printed material went down something like a hundred X over the next, I think, 30 years, 50 years or something.
[01:26:02] The quantity of printed materials went up like 10,000 X in the next 50, 100 years.
[01:26:07] This was the first effect.
[01:26:09] Literacy, it took a little while for it to catch up.
[01:26:12] So I think global literacy went up.
[01:26:13] Does that make sense?
[01:26:13] Something like 70%, but that took like another 200 years, 300 years because learning, learning to read is just very hard.
[01:26:20] Learning to write is hard.
[01:26:20] It takes a lot of effort.
[01:26:21] It takes a education system.
[01:26:23] It takes, you know, infrastructure to have paper and ink and the free time to do this instead of working on a farm.
[01:26:29] So it kind of, it took early stage of, of, of industrialization to actually get there.
[01:26:34] But it, but I think this effect of making it so this thing that was locked away in ivory tower and now it’s accessible to everyone.
[01:26:41] This was just, you know, like none of the things around.
[01:26:43] Us would exist today without this.
[01:26:45] Like if, if we weren’t literate, if the people that built, you know, this microphone weren’t, weren’t literate, it would have just been very hard to have a modern economy.
[01:26:54] None of these things would exist.
[01:26:56] And I just kind of think about back then, if people had to predict what would happen when the printing press came out, no one would have predicted that the microphone would become a thing.
[01:27:06] So I, I just feel like this is, uh, this is the best, the best, uh, analog for, for the moment that we’re in right now.
[01:27:13] Yeah.
[01:27:13] It was interesting.
[01:27:13] That you say that some of the Kings were illiterate who are employing the scribes, because if we’re being honest with ourselves, we have business owners who know what they want to build and there are employing software engineers because they themselves cannot write code.
[01:27:29] And I think we, we like to mock the CEOs who are coming there, coming to the team.
[01:27:34] They, they might even have a drawn prototype or whiteboard and saying this should be easy, but of course they don’t understand how difficult it is.
[01:27:41] But there seems to be a bit of analogy.
[01:27:43] Where, where there’s a person who wants what they want, but until now they needed to hire a software, a specialist who can build that.
[01:27:51] And there’s always that disconnect between the idea and the person.
[01:27:54] And just like with the printing press, like what would happen if they could actually express them?
[01:27:59] Like the King could actually read or write their own letters.
[01:28:01] They wouldn’t need that middleman.
[01:28:03] And it, things become more efficient.
[01:28:05] I mean, of course for the scribe, it’s not the best news necessarily, but I mean, smart scribes can also do, you know, so someone needs to like write the books, uh, run them.
[01:28:13] Yeah, exactly.
[01:28:15] And if you think about what happened to the scribes, right?
[01:28:17] Like they ceased to become scribes, but now there’s a category of writers and authors like these people now exist.
[01:28:24] And, uh, the reason they exist is because the market for literature just expanded a ton.
[01:28:29] And I guess also if we think about like back then a scribe’s work was read by a few people.
[01:28:35] And with the printing press and author, there’s a lot more authors and some of them are not really read, but some of them have wider reach than, than they could imagine.
[01:28:42] There’s new careers.
[01:28:43] That, that exists because of that.
[01:28:45] Yeah.
[01:28:45] I love the analogy.
[01:28:47] And the most exciting thing for me is it’s just so impossible to say today what will happen after this happens.
[01:28:56] And after this transition happens, just, you know, the, the economy, as we know, it would not have existed without it.
[01:29:03] So what’s next?
[01:29:05] Like what, what is the thing that we can’t even predict today that will exist because anyone can do this?
[01:29:13] Well,
[01:29:13] we cannot predict, but I think we can look at what is working right now.
[01:29:17] If you look around in your environment, may that be the team across and traffic who are software engineers or, or builders or members of technical staff, however we call them, who to you are stand out.
[01:29:28] What are they doing?
[01:29:29] What skills have they built up and, and how have they changed the way they work?
[01:29:35] It’s hard to name individuals because honestly, this is just the strongest.
[01:29:38] These are the strongest people I’ve ever worked with in my career.
[01:29:41] There’s all sorts of different archetypes.
[01:29:43] There are some people that are really amazing prototypers.
[01:29:46] So take something from zero to 0.5.
[01:29:48] Just, you know, figure out like, what are some cool ideas?
[01:29:51] What did the technology unlock?
[01:29:52] There’s other people that are amazing at finding product market fit.
[01:29:55] So kind of 0.5 to one or maybe zero to one.
[01:29:58] There’s other people that span different disciplines.
[01:30:01] And I’m just seeing more and more of these people.
[01:30:02] Like I said, like people that span product engineering and infrastructure engineering or, you know, product and design or design and engineering.
[01:30:11] I think I’m just seeing a lot more of these people.
[01:30:13] More of these, of these hybrids.
[01:30:15] What’s a belief that changed from last year to this year?
[01:30:17] Something that, you know, like you either believed or, or a conviction that you have that you’ve either revised or completed through a way.
[01:30:26] I think one thing I wasn’t sure about is how big a problem is safety.
[01:30:30] To be totally honest.
[01:30:31] Um, I joined, I joined Anthropic because like I said, I read a lot of sci-fi and I kind of, I know how bad this thing can go if it goes bad.
[01:30:39] It wasn’t something I was sure about.
[01:30:40] Um, but seeing it from the.
[01:30:43] Inside and then seeing how the new risks that have arisen in the last year, it just makes me much, much more worried about it.
[01:30:50] Um, so I, I think it’s, it was kind of an important thing for me.
[01:30:55] Now it’s just the most important thing for me is how do we make sure this thing goes well?
[01:30:59] I think it’s safe to say you were a really great software engineer even before all the AI things started.
[01:31:05] And you seem to be a very productive engineer, of course, part of a team as well, but, but also individually.
[01:31:10] What are some skills of.
[01:31:13] Like, you know, before being a software engineer that are, are still as valuable or maybe even more valuable than before.
[01:31:19] And what are ones that are maybe just not as much and, and they’re best left behind.
[01:31:24] Probably.
[01:31:25] Okay.
[01:31:25] So the stuff that’s left behind is, uh, best off behind is maybe like very strong opinions about like code style and languages and things like this.
[01:31:33] Like, I, I can’t wait to get past like these endless language debates and framework debates and all this stuff, because the model can just like, you know, use whatever language and framework.
[01:31:41] And if you don’t like it, it can just rewrite it.
[01:31:43] So it just doesn’t matter anymore.
[01:31:45] I think something that still matters a lot today is thing is it’s being methodical and hypothesis driven.
[01:31:52] This matters both in product design in this world where everything is being disrupted and we need to figure out what to build next.
[01:31:59] And this is something everyone is thinking about.
[01:32:01] Um, but it also matters for engineering day to day, you know, like something like debugging, you just have to be very methodical about it and the model can, can do this and it can help a lot.
[01:32:10] Um, but I think still, we’re in this transition point where you.
[01:32:12] Still.
[01:32:13] Still need to have the skill.
[01:32:14] I don’t know if you’re, you’re, you’re still gonna need to have it in six months.
[01:32:17] Other skills that I think are more valuable are being curious and being open to doing things beyond your swim lane.
[01:32:27] So, you know, if you’re working on engineering, but you really understand the business side, you can just build really awesome products.
[01:32:35] And I, and I think the next, you know, billion dollar product, you know, like after quad code, whatever the next startup is that, you know, becomes the next trillion dollar.
[01:32:42] It might just be like one person that has some cool idea and their brain just is able to think across, you know, engineering and product and business or, you know, like design and finance and something else.
[01:32:56] Like it’s, people are going to become more and more multidiscipline and this will become more and more rewarded.
[01:33:01] So in, in some ways, I think this will be the year of the generalist.
[01:33:04] I think the other skill that’s actually been, been rewarded of it is, uh, having a short attention span.
[01:33:10] That’s being rewarded now.
[01:33:11] Oh yeah.
[01:33:11] It’s, uh, you know, like people, you know, like teenagers are using, you know, like, like TikTok and, and all this stuff.
[01:33:18] And I think in some ways it’s kind of dangerous for society, um, because like you want people that can think deeply and can contemplate ideas and, uh, aren’t just moving on to the next idea very quick.
[01:33:29] But in some ways, I think this year is kind of the year that is going to reward, uh, it’s like the year of ADHD because the work for me has become jumping between quads.
[01:33:40] It’s become managing.
[01:33:41] And so it’s not so much about deep work.
[01:33:44] It’s about how good am I about context switching and, you know, jumping across multiple different contexts very quickly.
[01:33:51] Could I add that from what I understand, what all you said, maybe we could add one thing, which is adaptability because you’re saying, of course, that ADHD and, and you can jump across, but of course, earlier you were very good at focusing deeply on one thing as well.
[01:34:05] And what strikes me about you, and maybe this is true for other people as well, you, you’re just kind of very open to adapting your working style.
[01:34:11] And seeing what works well for this stage, especially when things are changing.
[01:34:16] I think the one certain thing we can be sure is whenever the next model comes out, it’ll change again.
[01:34:21] And you need to be curious and open to adapting how you work.
[01:34:24] Right.
[01:34:24] Yeah.
[01:34:25] And as closing, what’s a book or books that, that you would recommend?
[01:34:28] I’ve gone down as he should Lou rabbit hole.
[01:34:31] Um, so he’s the three body problem guy, but he actually has like a lot of other really great books.
[01:34:35] I really love his short stories.
[01:34:37] Um, he has a couple of books of short stories.
[01:34:39] I’m a big fan for people that are new to sci-fi.
[01:34:41] Yeah.
[01:34:41] Yeah.
[01:34:41] And you want like a little bit like harder sci-fi.
[01:34:44] Um, I really love accelerando by Strauss.
[01:34:47] This is a book I would totally recommend.
[01:34:48] It’s like essentially the product roadmap for the next 50 years.
[01:34:52] Um, it, it, it starts with takeoff kind of starting to happen and kind of AI singularity.
[01:34:57] And then it ends up with like, uh, these kind of like group lobster consciousnesses orbiting Jupiter.
[01:35:04] And it’s just like amazing.
[01:35:06] And the thing that I think it really captures is just the pace, this like quickening, quickening, quickening pace of how this feels.
[01:35:11] It really matches the feeling right now.
[01:35:13] And then on the technical side, I would strongly recommend functional programming and Scala, even if language choice just doesn’t matter as much anymore.
[01:35:21] I think there is this art to functional programming that just teaches you how to code better.
[01:35:26] Um, and it’ll just teach you how to think in types.
[01:35:29] If you read this book, I think what’s really important is to do the exercises also.
[01:35:33] And I’ve gone through and I’ve done all of them probably like three times over.
[01:35:36] And it’s just amazing.
[01:35:37] It really just like knocks this idea of functional types into your head.
[01:35:42] And it’s just a thing you can’t stop thinking about.
[01:35:44] Boris, thank you so much.
[01:35:46] This was awesome.
[01:35:47] Yeah.
[01:35:48] Thanks, Gergay.
[01:35:49] This was a really interesting conversation.
[01:35:51] And the thing that I keep coming back to is to Boris’ printing press analogy.
[01:35:55] The idea that medieval scribes were this tiny elite who could write, employed by kings who themselves were often illiterate, and that we soft rangers might be in a similar position today.
[01:36:04] We are the scribes.
[01:36:06] We spent years mastering this craft and now the printing press is arriving.
[01:36:10] But what Boris told me is that.
[01:36:11] The scribes did not disappear.
[01:36:13] They became writers and authors and the entire market for written work expanded beyond anything anyone could have predicted.
[01:36:19] I do find this hopeful and also appreciate that Boris didn’t sugarcoat it.
[01:36:23] The other thing that struck with me is just how differently the Cloud Code team built software.
[01:36:28] No PRDs, no mandatory ticketing system, designers and data scientists and finance people all writing code and building dozens or hundreds of prototypes before shipping a feature.
[01:36:38] And Boris is shipping 20 to 30 pull requests a day without editing a single line.
[01:36:41] So I think it’s really interesting.
[01:36:41] And there are different verification systems in place.
[01:36:45] Cloud Code reviewing its code, automated lint rules, best of end passes and human code review.
[01:36:50] If you’ve enjoyed this podcast, please do subscribe on your favorite podcast platform and on YouTube.
[01:36:55] A special thank you if you also leave a rating on the show.
[01:36:58] Thanks and see you in the next one.