AI agents for your digital chores


Summary

The episode explores the concept of proactive AI agents, moving beyond reactive chatbots. Dhruv Batra, co-founder and chief scientist at Utori, explains that most current AI agents are triggered by a prompt, similar to ChatGPT. His company is building proactive agents that monitor the web on your behalf, aiming to free users from mundane digital tasks like checking for product price drops, campground reservations, or specific news events.

Utori’s first product, called Scouts, is a team of agents that act like ‘Google Alerts for the AI era.’ Users describe in natural language what they want to track, and the agents proactively monitor the web and report back. This is framed as a step towards a future where ‘no human has to interact with a web page again,’ replaced by a personal digital concierge or ‘AI chief of staff’ that coordinates workflows.

The conversation delves into the technical and economic challenges. Technically, agents must interact with a web designed for humans, often requiring multimodal LLMs to interpret screenshots and navigate sites. Economically, Batra argues that agent traffic should be seen differently from adversarial scraping bots; it represents a human with high intent, potentially leading to new value exchanges like agents paying for API access.

Batra also discusses the long-term vision, where these monitoring agents can escalate to taking actions (like making a booking) with user permission. He highlights the unique nature of persistent agents that can run for weeks or months, tracking evolving narratives—a shift from the short-lived interactions typical of today’s coding or chat agents. The episode concludes by touching on the resistance from entrenched platforms and the potential for AI to fundamentally rethink humanity’s interface with the digital world.


Recommendations

Concepts

  • Yutori — A Japanese word meaning ‘a sense of mental spaciousness’ or ‘elbow room in your mind.’ It’s the namesake of Batra’s company and represents the feeling of having time and space for important things, which their AI agents aim to deliver.
  • Model Context Protocol (MCP) — Mentioned as a protocol that allows AI agents to absorb information directly through APIs, providing a more efficient alternative to browser automation for accessing structured data from services.

People

  • Doug Engelbart — Referenced for his seminal 1960s demonstration (the ‘mother of all demos’) that introduced foundational concepts of modern knowledge work like GUIs, the mouse, and collaborative editing, suggesting a similar paradigm shift is coming with AI interfaces.

Tools

  • Scouts by Utori — Described as a team of proactive AI agents that monitor the web for anything you care about, using natural language queries. It’s likened to ‘Google Alerts for the AI era’ or an ‘RSS feed for the web described in natural language.’
  • Habitat Simulator — Mentioned as the ‘world’s fastest 3D simulator’ developed by Batra’s team at Meta FAIR for training virtual robots in simulation before deployment on physical robots like the Boston Dynamics Spot.

Topic Timeline

  • 00:00:17Introduction to Proactive AI Agents — Host Ryan Donovan introduces the topic of proactive AI agents, distinguishing them from the reactive agents discussed previously. He welcomes guest Dhruv Batra, co-founder and chief scientist at Utori, who will explain what a proactive agent is.
  • 00:00:49Dhruv Batra’s Background in AI — Dhruv Batra shares his 20-year journey in AI research, spanning computer vision, NLP, and embodied AI at Meta and Georgia Tech. He discusses the evolution of the field from the ‘AlexNet moment’ to the current wave, noting that terms like AI and AGI were once considered unserious.
  • 00:04:51The Redefinition of AGI and Current Hype — Batra critiques a ‘sleight of hand’ in redefining AGI to exclude areas like robotics and physical intelligence, effectively limiting it to digital, language-based tasks we can do today. He calls this an ‘amazing natural language interface’ but not true AGI, applying a ‘wet blanket’ to the hype.
  • 00:08:25Introducing Utori and Proactive Agents — Batra introduces his company, Utori (meaning ‘mental spaciousness’), and its mission to build AI agents that handle mundane web tasks. He describes the vision of a ‘webless future’ where a personal AI concierge executes workflows so humans don’t have to interact with web pages directly.
  • 00:10:52The Scouts Product: Proactive Monitoring — Batra details Utori’s first product, Scouts—a team of agents that monitor the web for anything a user cares about, described in natural language. Examples include tracking campground reservations, product prices, band tours, or specific news. It’s positioned as a proactive, read-only monitoring service.
  • 00:12:55Technical and Economic Challenges of Agent Traffic — The discussion addresses the load AI agents put on websites. Batra explains Scouts uses APIs where available but needs browser automation for the long tail of the web. He argues agent traffic represents high-intent human users, suggesting a need for new economic models beyond the ad-based attention economy.
  • 00:16:56How Proactive Agents Function Technically — Batra explains that proactive agents like Scouts are essentially ‘agentic search wrapped in a cron job.’ They check the web at intelligent intervals based on the query. The goal is to evolve from read-only monitoring to taking actions (like bookings) with user consent, requiring careful trust escalation.
  • 00:19:38Technology Stack: LLMs and the Need for Intelligence — When asked about using non-LLM systems, Batra states Utori uses multimodal LLMs because websites are designed for human visual consumption. He contrasts this with narrow, hard-coded scrapers, advocating for a general, ‘intelligence-first’ approach that can handle anything a human can do with a browser.
  • 00:22:30The Future Interface: AI as the UI — Batra agrees with the idea of a single, intelligent interface to everything—a reimagining of humanity’s interaction with the digital world. He describes a future of ‘generative user interfaces’ created on-the-fly for user queries, consolidating information from multiple sources into a high-bandwidth visual medium.
  • 00:26:51Resistance from Incumbents and Path to Change — The conversation addresses potential resistance from large platforms wanting to keep users in their ecosystems. Batra cites the innovator’s dilemma but believes the fundamental shift enabled by AI—towards personal, paid assistants that serve the user—will drive change, as users ultimately seek control and value.
  • 00:30:48The Nature of Persistent, Long-Running Agents — Batra highlights a unique aspect of proactive agents: they can be persistent entities running for weeks or months. He shares an example of a Scout tracking the narrative arc of a Meta acquisition and subsequent lab developments over 10 weeks, contrasting this with typical short-lived coding or chat agents.

Episode Info

  • Podcast: The Stack Overflow Podcast
  • Author: The Stack Overflow Podcast
  • Category: Technology Society & Culture Business
  • Published: 2025-10-14T04:30:00Z
  • Duration: 00:34:43

References


Podcast Info


Transcript

[00:00:00] Hello, everyone, and welcome to the Stack Overflow podcast, a place to talk all things

[00:00:15] software and technology.

[00:00:17] I’m your host, Ryan Donovan, and today we are talking about AI agents.

[00:00:21] Now I know we’ve talked about it before, but today we’re talking about proactive agents

[00:00:25] instead of reactive agents.

[00:00:27] And we’re joined today by Dhruv Batra, who is co-founder and chief scientist at Utori.

[00:00:34] He’s going to tell us all about what a proactive agent is.

[00:00:38] So welcome to the show, Dhruv.

[00:00:40] Thanks, Ryan.

[00:00:40] Happy to be here.

[00:00:41] So top of the show, we like to get to know our guests.

[00:00:45] Tell us a little bit about how you got into software and technology.

[00:00:49] Sure.

[00:00:49] Again, thank you for having me.

[00:00:51] I’m an AI researcher, and I’ve been in the field coming up on 20 years at this

[00:00:57] point.

[00:00:57] People think of the current wave as marked by an epochal event of Chachipati launching.

[00:01:05] We had a similar event about 12 years ago at this point, which in the community we refer

[00:01:11] to as the AlexNet moment or the deep learning wave.

[00:01:14] I got into AI in 2005, which is significantly before that.

[00:01:21] Back then, it was not respectable to use the phrase AI or AGI.

[00:01:27] You were not considered a serious scientist if you used those phrases.

[00:01:31] So you said you were working on machine learning with applications to domains like computer

[00:01:36] vision and other aspects of AI.

[00:01:38] Over the years, I’ve worked in core computer vision problems like recognizing objects and

[00:01:44] images, building chatbots, which are core NLP problems or natural language processing

[00:01:50] or understanding problems.

[00:01:53] I was a professor at Georgia Tech in 2009.

[00:01:57] In 2016, when I got there, I created their deep learning class.

[00:02:01] I’m coming off of spending eight years at Meta.

[00:02:05] I was a senior director leading FAIR Embodied AI.

[00:02:10] FAIR is Meta’s fundamental AI research division, and Embodied AI refers to AI for robotics

[00:02:19] and AI for smart glasses.

[00:02:21] So one of my teams developed the image question answering model.

[00:02:26] That’s in the early 2000s.

[00:02:27] In the earliest days, they collaborated with the product team and shipped it on the Ray-Ban

[00:02:30] Meta sunglasses.

[00:02:31] Other teams of mine developed the world’s fastest 3D simulator called Habitat for training

[00:02:37] virtual robots and simulation, deploying them on the Boston Dynamics Spot Robot.

[00:02:43] And that team took it to a White House correspondence dinner to show to congressional staffers

[00:02:47] the technology that’s coming.

[00:02:49] So over the years, I’ve spanned the spectrum of all areas of AI.

[00:02:57] I’ve seen, at this point, two completely distinct epochal waves of technology coming in.

[00:03:06] And it’s been a fascinating journey.

[00:03:09] Most people who’ve been in the area this long would tell you that we didn’t think we’d be

[00:03:15] at this point.

[00:03:17] And it’s simultaneously true that we have made tremendous progress, but there is still

[00:03:24] plenty to be done.

[00:03:25] I am not one.

[00:03:27] I’m not one of those people who, I think, are playing word games around the phrasing

[00:03:32] AGI.

[00:03:33] I think the original visions from the 1950s of an intelligent agent that can interact

[00:03:39] with the world and accomplish goals is still significantly far ahead of us.

[00:03:45] Yeah.

[00:03:46] It’s interesting to talk about spanning the whole gamut.

[00:03:48] I remember my first AI programming class was in 1997.

[00:03:53] So that was even older.

[00:03:56] A lot of Bayes.

[00:03:57] A lot of genetic algorithms.

[00:03:59] And even a neural net there.

[00:04:02] This is a question my now former colleagues, now that I’ve resigned from Georgia Tech,

[00:04:06] my now former colleagues at Georgia Tech, and I’m sure this is happening at other universities

[00:04:10] as well, they’re having to grapple with the phrase AI.

[00:04:15] And there’s a course called Intro to AI that is typically taught to undergraduates.

[00:04:20] And it today does not teach the methods or, at most places, needs to be completely revamped.

[00:04:27] Because the set of ideas that we thought in the 80s and the 90s and the 2000s that would

[00:04:34] lead us to developing general purpose intelligence systems have not so far panned out.

[00:04:39] And the set of ideas that are the most promising are featured in that course that is today

[00:04:47] called Intro to AI.

[00:04:50] Yeah.

[00:04:51] Well, you talked about how AI, AGI were dirty words.

[00:04:57] Do you think we’re on a track that’ll pan out now?

[00:05:01] Do you think it’s still something we should say with a little more care?

[00:05:07] So the way I think about that is, I think what’s been happening is we’ve gone through

[00:05:14] two phases.

[00:05:15] Phase number one, it is certainly true that in the 2010s, when a renewed emphasis on the

[00:05:24] phrase AGI came about.

[00:05:27] It was trying to fight against a pattern in literature about focusing on narrow purpose

[00:05:35] problems.

[00:05:35] So it is certainly true that if you go back 15, 20 years ago, the computer vision community

[00:05:41] was focused on one set of methods.

[00:05:43] The natural language understanding community was focused on one set of methods.

[00:05:47] The robotics community was focused on another set of methods.

[00:05:50] And it was exceedingly hard to cross over discipline boundaries.

[00:05:54] You had to learn entirely new things.

[00:05:57] And even within those discipline boundaries, people were developing hyper-specialized methods.

[00:06:03] Like if you wanted to build a chess-playing bot or a go-playing agent, as DeepMind did,

[00:06:10] you focused on one specific set of techniques that told you nothing about building chatbots,

[00:06:15] that told you nothing about recognizing objects and images.

[00:06:17] And progress in one domain did nothing for other domains.

[00:06:21] And so an emphasis on generality was needed.

[00:06:24] That, hey, we’re here to solve the bigger problem, not the bigger problem.

[00:06:27] To solve these narrow problems.

[00:06:29] But what I noticed in the last few years, a handful, two to three years, is a certain

[00:06:35] sleight of hand happening where we are redefining AGI to mean, well, robotics is out of scope.

[00:06:45] Like physical intelligence is not AGI.

[00:06:49] Tactile sensing is not AGI.

[00:06:52] Maybe all of video understanding is not AGI either.

[00:06:56] Right.

[00:06:57] Right.

[00:06:57] We’ve sort of defined AGI to mean the set of things we can do today.

[00:07:04] Like digital environments, largely language-based interaction, where language is defined more

[00:07:12] generally than just English or commonly spoken languages, to mean even programming languages

[00:07:18] and any tokenized sequences that follow a particular pattern in a grammar.

[00:07:24] But that’s not AGI.

[00:07:26] Right.

[00:07:27] It’s a specific form of intelligence, right?

[00:07:30] Yeah.

[00:07:30] It’s an amazing natural language interface at worst, right?

[00:07:35] Yeah.

[00:07:36] It is fascinating how much progress has been made.

[00:07:39] And it is economically valuable.

[00:07:42] It is intellectually interesting.

[00:07:44] And I am not one of those who says that this is a dead end or that this is headed towards a wall.

[00:07:51] I’m intellectually humble enough to realize that I don’t know.

[00:07:55] I can’t predict that far out.

[00:07:56] But I can say that there’s a certain opportunistic redefinition happening here.

[00:08:04] I do love the wet blanket on the hype because we are going into a very hyped topic.

[00:08:11] You know, everybody is talking about AI agents.

[00:08:14] But, you know, most of the agents are kicked off by a prompt, right?

[00:08:20] Pretty much the same as ChatGPT.

[00:08:22] But you’re talking about today proactive agents.

[00:08:25] Can you talk about how that works in practice?

[00:08:28] So at Yutori, we’re building proactive agents.

[00:08:31] Before I dive into proactive agents, let me just, you know, tell you about the company and why we started this line of work.

[00:08:38] The phrase or the word Yutori is a Japanese word for a sense of mental spaciousness.

[00:08:45] It literally translates to when you have elbow room or livery in your mind.

[00:08:49] It’s the opposite of a feeling of mental fragmentation.

[00:08:54] Yutori is the feeling that you experience when you feel you have time and space to do the things that are important to you.

[00:09:00] Whether that be stepping into a state of flow, spending time with loved ones, you know, pursuing a particular activity, whatever.

[00:09:08] When you have the time and space to be able to do that.

[00:09:11] And we named the company that because we want to build AI agents that can deliver Yutori to our users.

[00:09:21] And where we’re starting is we think.

[00:09:24] That the web is an extremely, you know, it’s humanity is one of the greatest inventions and a clunky mess due for an overall.

[00:09:35] We spend hours of our time on these mundane tasks, filling forms, tracking appointments, buying and returning things, tracking information online, coordinating events, securing information from web pages.

[00:09:48] And ultimately, we’re all capped by our own bandwidth.

[00:09:52] How many times?

[00:09:54] Can we go through those same workflows?

[00:09:56] How many times can you check a web page?

[00:09:58] How many different web pages can you check?

[00:10:00] And what we imagine is a future where in some sense, no human has to interact with a web page again.

[00:10:09] No human has to interact with the web again.

[00:10:11] That you have every human on the planet has a team of AI assistants that are executing workflows on the web.

[00:10:18] Coordinated by your own personal digital concierge or an AI chief of staff.

[00:10:23] That you talk to.

[00:10:25] That understands your context.

[00:10:27] That understands what you are working through.

[00:10:30] And then executes and coordinates workflows for you on the web.

[00:10:33] You know, the analogy would be riding horses.

[00:10:37] You can probably do it for entertainment, but you’re not going to do it for utility.

[00:10:42] Those days are gone.

[00:10:43] And that’s where we have to get to.

[00:10:45] But no human should have to interact with the web.

[00:10:48] So that’s the vision.

[00:10:50] In terms of answering your question about proactive assistance.

[00:10:52] We’ve launched our first product, which is the instantiation of proactive agents.

[00:10:58] Our first product is called Scouts.

[00:11:01] Scouts is a team of agents that monitor the web for you.

[00:11:05] For anything that you may care about.

[00:11:07] So you can come to a scout and say, hey, I’m interested in campground reservations in this national park.

[00:11:15] The dates become available on a certain day.

[00:11:18] Let me know whenever those dates become available.

[00:11:21] It could be a certain product whose price you’re tracking.

[00:11:26] It could be a trip that you want to take with a certain configuration.

[00:11:31] It could be an extremely hyper-specific news event that only you are interested in.

[00:11:36] Let me know whenever there’s news about this event or a band coming into town.

[00:11:43] It could be things related to your work.

[00:11:48] So maybe you’re tracking this.

[00:11:50] Maybe you’re tracking the announcement of AI agent startups that are announcing raising funding.

[00:11:56] Maybe because those can be potential customers.

[00:11:58] Maybe you’re tracking people announcing that they are quitting their positions because those can be potential hires.

[00:12:05] So I think the abstraction is there are lots of digital workflows where you go to execute them.

[00:12:16] There is a piece of information that is not yet available.

[00:12:19] It will be available.

[00:12:21] And what you would like to do is for an agent to monitor them.

[00:12:24] And so the proactive nature comes in the monitoring aspect of it.

[00:12:29] And the description is completely in natural language.

[00:12:32] You can think of this as Google Alerts for the AI era, if you will, or an RSS feed for the web described in natural language.

[00:12:43] There’s your dream of the webless future.

[00:12:48] But we still have the web pages now.

[00:12:50] And it sounds like this is a lot of visit and report back.

[00:12:55] I’ve heard of a lot of folks complaining about the sort of load that AI agents are putting on websites.

[00:13:02] Is there any thought to how to mitigate that?

[00:13:05] Absolutely.

[00:13:07] So you made a couple of points.

[00:13:09] One, that the web today is designed for human consumption.

[00:13:14] And in some sense, agents are having to consume it.

[00:13:17] They’re having to consume information the way humans would consume information because there isn’t a parallel pathway for the entire web.

[00:13:26] In the Scouts product that we’ve built, actually, we allow both.

[00:13:31] Any time there is an API surface that agents can talk to directly through, let’s say, an MCP interface, which is a model context protocol, so that the AI agents can absorb information through APIs, we use that.

[00:13:44] So your web search and your weather and your phone.

[00:13:46] Your weather and your finance APIs.

[00:13:49] There is no need for an agent to spin up a browser and type things literally into a Google.com homepage.

[00:13:55] You can get that information.

[00:13:57] However, today there is a long tail of the web that is just not designed for agentic flow.

[00:14:05] Your indie developer who wrote a tennis court reservation system in the Golden Gate Park.

[00:14:13] There’s not going to be an API available for that.

[00:14:15] Right.

[00:14:16] You can’t access it like a human would.

[00:14:18] And for that, we do.

[00:14:20] We have browser use, in-house browser use agents that operate browsers like humans would, perceive web pages through screenshots, click buttons, operate those forms.

[00:14:33] Now, it is the case that when you do that, you are accessing this information and contributing to the load on that website.

[00:14:41] However, I think it’s important to note that in this case, this is exactly what a user needs.

[00:14:45] This is exactly what a user wanted.

[00:14:47] And I would distinguish this.

[00:14:49] Classically, there is this historical understanding of automated systems on the web as adversarial.

[00:14:56] That the value exchange is only one way.

[00:15:00] That bots come to your website.

[00:15:02] They scrape your content.

[00:15:04] They do not contribute to anything valuable on your website.

[00:15:06] They just contribute to your traffic and your bills.

[00:15:09] I think we have to rethink this going forward.

[00:15:13] If we imagine a world.

[00:15:14] Imagine a world in which most traffic on the web is user-issued agentic traffic.

[00:15:23] In that world, the value exchange can be much more fairer.

[00:15:28] Somebody arrived to your web page because a human told them to.

[00:15:33] They were monitoring.

[00:15:35] They would have done it themselves.

[00:15:36] They would have opened the browser on their laptop.

[00:15:38] Instead of doing that on their laptop, they asked an AI agent to do it on a remote browser.

[00:15:43] Functionally, there is a difference.

[00:15:46] But I think intentionally, there isn’t a difference.

[00:15:49] And in that world, new economic incentives have to be created.

[00:15:55] Today, the incentives are the way they are because the web world is set up for advertisements to be served to human eyeballs.

[00:16:04] It’s an attention economy, right?

[00:16:06] Exactly.

[00:16:07] And when there aren’t human eyeballs visiting your web page, there can still be a value exchange.

[00:16:12] Because somebody is sending an agent.

[00:16:15] You can think of that agent as a buyer’s agent.

[00:16:18] They are there because they are representing an actual human with a very high degree of intent.

[00:16:24] And then you can talk about value exchange.

[00:16:27] Maybe in some cases, the agent pays for access to the website.

[00:16:32] Maybe in some cases, the website pays for attracting the agent to that website.

[00:16:38] Because you want to offer something that is relevant to the intent.

[00:16:42] Maybe the agent or the underlying LLM license is the data.

[00:16:46] We’ve seen that too.

[00:16:47] That would fall under the paying for access that I mentioned.

[00:16:50] Yeah, absolutely.

[00:16:52] So I want to talk about the sort of the functional nature of this, right?

[00:16:56] Traditional agents are at rest, not spinning up EC2 instances until a prompt comes along.

[00:17:05] What are the proactive agents doing?

[00:17:08] Yeah.

[00:17:09] So in scouts today, the proactive agents,

[00:17:12] are proactive after you’ve told us what you care about, what you want to monitor.

[00:17:19] So what they are doing, you know, in purely technical terms,

[00:17:23] you can think of it as agentic search wrapped in a cron job.

[00:17:28] Meaning that, you know, we’re going to go out to the world.

[00:17:31] You were interested in some piece of information.

[00:17:33] We’re going to go out with some frequency.

[00:17:35] That’s the simplest way of understanding it.

[00:17:37] There are more technically challenging and interesting ways

[00:17:41] in which you can optimize this.

[00:17:43] Because depending on the query, you should decide intelligently

[00:17:47] how often you want to go out into the world.

[00:17:50] Right.

[00:17:51] And this is where there are unique challenges that lie in this sort of product.

[00:17:56] If someone told you that they’re interested in a particular piece of data

[00:18:02] that is correlated with the markets, that only happens 9 a.m. to 4 p.m.,

[00:18:06] then you shouldn’t go out outside of that window.

[00:18:09] But they’re not going to write all of this down.

[00:18:10] You should be intelligent enough to figure this out yourself.

[00:18:14] Right.

[00:18:15] And, you know, in other queries, depending on what you’re finding in the world,

[00:18:20] maybe you’re tracking whether there’s a band that has come into town.

[00:18:25] You don’t need to check every hour.

[00:18:27] You can check every day.

[00:18:28] You can check every week.

[00:18:29] And you can tell based on what you are getting as feedback from the world

[00:18:34] when you did go out the last time.

[00:18:36] So there’s proactivity in that.

[00:18:38] That’s the problem.

[00:18:39] That’s the product as it is today.

[00:18:41] It is a read-only product that does not go past auth walls

[00:18:45] and does not buy, book, reserve anything on your behalf.

[00:18:49] However, where we’re headed is exactly that world

[00:18:53] because the reason why you issued this monitoring query

[00:18:57] is because you care about something.

[00:18:59] The reason why you’re monitoring a band coming into town is because you want a ticket.

[00:19:03] The reason why you’re monitoring a tennis court reservation system

[00:19:06] is because you want an appointment.

[00:19:08] And the next time these agents are going to come to you and say,

[00:19:11] hey, we found that time slot you were looking for.

[00:19:14] Do you want us to just buy it for you?

[00:19:16] That’s an escalation of trust.

[00:19:19] Still, you’re in control.

[00:19:21] But it is proactivity in the sense that they’re going to then go ahead

[00:19:24] and make that booking.

[00:19:26] So for some of these right actions for AI agents,

[00:19:31] I’ve seen some organizations use things other than LLMs for that.

[00:19:38] Obviously, LLMs have some hallucination built in.

[00:19:42] Are you using or planning on using something other than LLMs?

[00:19:46] Or do you think this is something LLMs can do by themselves?

[00:19:49] We are using LLMs, multimodal LLMs,

[00:19:54] because as I mentioned, websites are laid out for human consumption.

[00:19:59] And so you have to see the website like a human would.

[00:20:02] I think the thing you’re referring to is for any specific narrow use case,

[00:20:08] and workflow.

[00:20:09] Like if all you care about is this one particular tennis court

[00:20:13] and you’re looking for a 7 a.m. reservation.

[00:20:15] Just fill in my address, right?

[00:20:17] Yeah.

[00:20:18] Fill in my address.

[00:20:19] Click this button.

[00:20:20] You don’t need intelligence.

[00:20:22] All of the intelligence lies in the head of the programmer

[00:20:24] that writes this particular scraper out.

[00:20:27] And you just run it unintelligently in a cron job.

[00:20:31] That is the world we live in today.

[00:20:33] That is what people have been doing.

[00:20:34] There are entire communities of people writing, you know,

[00:20:37] scrapers and bots for, you know, restaurant reservations

[00:20:40] and catching shoe drops when…

[00:20:43] Comments on websites.

[00:20:45] Comments on websites.

[00:20:46] We are taking an intelligence first and a completely general approach.

[00:20:51] Anything.

[00:20:52] Anything on the web.

[00:20:53] If there is a piece of information out there,

[00:20:56] if you can do it as a human with a browser,

[00:20:59] we should be able to do it.

[00:21:01] It’s not there yet uniformly.

[00:21:04] It is generally this…

[00:21:07] Phenomena in AI that we tend to have jagged surfaces.

[00:21:10] There are some things that we are going to be superhuman at.

[00:21:13] There are other things that we are going to be worse than human at.

[00:21:16] Which is why our first product is a read-only product.

[00:21:19] Mistakes are less costly.

[00:21:21] Right.

[00:21:22] When you go to write actions,

[00:21:25] certain mistakes are going to be far more costly than other mistakes.

[00:21:28] And so you are going to have to sequence that.

[00:21:30] That’s a product decision.

[00:21:32] From a technology perspective,

[00:21:34] what that means is we are going to have to create sandboxes.

[00:21:37] Where we can practice those things.

[00:21:39] These agents are trained with a set of techniques.

[00:21:42] For example, reinforcement learning.

[00:21:45] Which is learning by interacting with the world.

[00:21:47] And learning from feedback.

[00:21:49] And what you often need in those cases are sandboxes.

[00:21:52] So that the mistakes aren’t costly.

[00:21:54] This actually refers back to something that I said at the top of the show.

[00:21:57] Which is this is how we train robots as well.

[00:21:59] This is why there are simulators of…

[00:22:01] 3D simulators of physical worlds.

[00:22:03] Because you don’t break yourself.

[00:22:05] You don’t harm others and yourselves.

[00:22:06] In simulation.

[00:22:08] Right.

[00:22:09] You don’t want the accidental terminators.

[00:22:11] You don’t want that.

[00:22:13] But also the methods that we have today are extremely data hungry.

[00:22:22] And it is easy to generate that data in simulation.

[00:22:27] A lot of this you are talking about.

[00:22:30] Something I have sort of posited to folks.

[00:22:32] Is that there is going to be one entry point to everything.

[00:22:35] In the future.

[00:22:37] You have one piece of interface.

[00:22:39] And we had a writer write something that was the AI is the UI.

[00:22:45] It is not the program itself.

[00:22:47] It is the UI.

[00:22:49] What do you think that one final interface will be?

[00:22:54] Wonderful question.

[00:22:56] I don’t have a pretty short answer to that question.

[00:23:01] But internally at Yotori this is exactly how we think about it.

[00:23:04] That we are reimagining humanity’s interface with the digital world and the web.

[00:23:11] There are two key components we need.

[00:23:13] Intelligence and generative user interfaces.

[00:23:17] Today a human sits down.

[00:23:20] And it is typically a designer.

[00:23:22] Thinks about that workflow that a consumer goes through.

[00:23:25] Where are the friction points?

[00:23:27] What makes sense?

[00:23:28] What is natural?

[00:23:30] What is aesthetically pleasing?

[00:23:32] What is frictionless?

[00:23:33] And they design that workflow.

[00:23:36] Tomorrow that is not going to be the case.

[00:23:39] You are going to have interfaces generated for you.

[00:23:42] You are going to talk to intelligent systems.

[00:23:45] They are going to fan out and secure information from multiple websites and sources.

[00:23:52] So there isn’t a single address that you are going to.

[00:23:55] When you say tell me about a band coming into town.

[00:23:59] Or tell me about my meetings today.

[00:24:02] Or tell me about something.

[00:24:04] You want an interface that compactly represents that information.

[00:24:08] That you can interact with.

[00:24:10] You maybe want to zoom into.

[00:24:12] It is going to be a visual medium.

[00:24:14] Because just the way we are wired up.

[00:24:17] It is a high bandwidth pathway into our brains.

[00:24:20] Pixels are much higher bandwidth than trying to talk to humans.

[00:24:26] That is going to happen.

[00:24:28] But it is going to be an interface generated for you.

[00:24:31] And your query.

[00:24:33] What did you ask us to do?

[00:24:35] This is a dream that many people have thought about.

[00:24:39] And it is going to be a front end to the web.

[00:24:42] What does that look like?

[00:24:44] What is the front end to everything?

[00:24:46] It sounds a little bit like the UI and minority report if you have seen it.

[00:24:51] I have.

[00:24:54] It actually goes much longer before that.

[00:24:59] There is this.

[00:25:01] Doug Engelbart of Xerox Spark gave a talk I think in 1967 or 1963.

[00:25:11] Which is now retroactively known as the mother of all demos.

[00:25:16] In that one talk.

[00:25:19] This man and that team introduced basically the fundamentals of what we today consider modern knowledge work.

[00:25:29] In this one talk.

[00:25:30] One talk, this man introduces graphical user interfaces, the mouse, a collaborative document

[00:25:40] editing system, a video calling interface, two people get on a call, they are editing the same

[00:25:47] document simultaneously. Each one of those features, each one of those ideas over the next

[00:25:53] 50 years becomes its own 100 billion to a trillion dollar company. Today, we think of

[00:26:01] that as knowledge work. These group of people imagined this interaction in the late 1950s,

[00:26:08] early 1960s. And I do think with AI, we have that ability now, we’re going to imagine what does

[00:26:18] knowledge work or interaction with a digital surface look like.

[00:26:23] It’s not going to look like what we think of it today. This is a culmination of the last 50 years,

[00:26:30] the paradigm of the last 50 years, but there is a new paradigm coming, which is, talk to it.

[00:26:35] Yeah. So I think this is, you promised a little wet blanket, but this is some good

[00:26:42] idealism for the AI era. But I think there’s going to be a bit of a resistance from the folks

[00:26:51] who have built businesses on this. I’ve seen…

[00:26:53] You know, a lot of the larger enterprise companies who are building AI agents stuff,

[00:26:59] they want it to stay within their world, their ecosystem. Why, you know, obviously we know why

[00:27:05] they’re resisting, but how can they get on board, you know, become part of the one world soup?

[00:27:13] I think we both understand why entrenched interests resist change. And it’s not even,

[00:27:22] I generally don’t.

[00:27:23] scribe to malice these things. This is classic innovator’s dilemma mixed with the principal

[00:27:30] agent problem that you’re a large enterprise, you have distribution, you have an existing product,

[00:27:37] it’s a mature market, you have to serve your existing customers, you have optimized your

[00:27:42] product to the market and the customers. It’s hard from that point to do a fundamental rethink

[00:27:48] that is going to immediately cannibalize your existing revenue. And understood, that’s hard.

[00:27:56] That coupled with the fact that with large existing bureaucracies where there are fractured

[00:28:01] interests, where you’re trying to optimize as a middle manager, your local pathway,

[00:28:08] as opposed to the bigger picture, hard problem. That idea is not new. People have understood that

[00:28:16] this is what smaller players have an advantage at.

[00:28:18] I think what really matters is, is there a change that is beyond both the bigger and the smaller

[00:28:27] player coming? Is there a fundamental rethink that can trigger this? Is there a new technology? Is

[00:28:34] there a new regulation? Is there something new that can actually change behaviors? I do think

[00:28:39] we are in that moment. This goes back to some of our earlier conversation. We’re not at AGI

[00:28:45] as broadly defined by the original thinkers of AI.

[00:28:48] But we are at something special. We have created general purpose interaction machines with

[00:28:56] digital content, not with physical content yet, but with digital content. We have not yet

[00:29:02] productionized it to a degree that every problem is solved, but we have line of sight. And what

[00:29:08] this means is that we can actually rethink our relationship. The last 20, 30 years, it’s been,

[00:29:16] the consumer has basically been at,

[00:29:18] at the mercy of this economic incentive that the way you pay for things is, you know, you are

[00:29:25] served advertisements and that you get access to free services. I do think that incentive is ripe

[00:29:34] for change that the consumer has demonstrated. They’re willing to pay for subscriptions and

[00:29:40] services. The arrival of AI systems means we can actually build personal assistants and AI chief

[00:29:48] of staff.

[00:29:48] That serve you and that you’re willing to pay for because they’re delivering value to you.

[00:29:54] I agree with you that your data today, because of historical reasons is locked into various

[00:30:00] services. The existing incumbents have a few choices. You can either start putting up walls

[00:30:06] as some incumbents have that I do not let you take your data out, even though you want to.

[00:30:11] And I think, I think these moves will play poorly. You have,

[00:30:18] you have users that want that value. I do not think that you can trap people for long. You may

[00:30:25] be able to do it for a short period of time into a service that they feel that they are no longer,

[00:30:31] they’re no longer in control.

[00:30:33] Right. And until that, that business threat becomes existential and then it’s,

[00:30:38] maybe it’s too late at that point.

[00:30:40] Maybe it’s too late. And that is how change happens.

[00:30:43] Yeah. Hopeful and dire.

[00:30:46] Yeah.

[00:30:48] One thing that we understandably, so didn’t get a chance to cover is the unique and interesting

[00:30:57] nature of these kinds of agents. So our scouts, for example, we’ve only, our product has only

[00:31:05] existed at this point for 10 weeks, but I have scouts that have been running for those 10 weeks.

[00:31:13] That is an extremely long horizon reinforcement learning problem.

[00:31:17] Sure.

[00:31:17] I have agents that have been running for those 10 weeks.

[00:31:18] Agents that have been interacting with the world and keeping me updated. I created a scout

[00:31:23] 10 weeks ago, or maybe just over that period when Meta had recently announced its acquisition

[00:31:32] or acqui-hire of a scale AI co-founder. This predated the term Meta super intelligence.

[00:31:41] At that point of time, I created a scout. Hey, let me know if there’s any future news about

[00:31:47] this.

[00:31:47] This acquisition. That scout has, for the last 10 weeks, gone on this narrative arc. It interacts

[00:31:55] with the world frequently. It discovered that following the acquisition from scale, there is

[00:32:00] a new lab that Meta created called Meta super intelligence. Then it began tracking Meta super

[00:32:05] intelligence, who’s getting hired at the MSL lab, what places those are hiring from, what is

[00:32:13] happening to the labs that these people are moving from.

[00:32:18] What is happening to the startups that these people are moving from. Most recently, what is

[00:32:24] happening now to the departures from MSL of these people who in the last two and a half months have

[00:32:30] decided to leave. This is an extremely long-running agent. This is not how we typically

[00:32:39] build agents. Most coding agents, most LLMs, they are extremely short-lived. One interaction,

[00:32:46] a few turns in a chat.

[00:32:47] A few hundred lines of code. We’re moving towards a world where there are going to be persistent,

[00:32:54] always-on entities that are tracking the evolution of something that’s happening in the world.

[00:33:02] That’s an interesting world.

[00:33:03] Yeah, it’s the semantic tracking instead of just having the keyword search. It’s like you said,

[00:33:08] it’s that AI search, LLM search applied to a more proactive agent, right?

[00:33:14] Yeah.

[00:33:15] That’s awesome.

[00:33:17] All right, everyone. It’s that time of the show where we shout out somebody who came

[00:33:24] onto Stack Overflow, dropped a little knowledge, shared some curiosity, and earned themselves a

[00:33:28] badge. Today we’re shouting out the winner of a Populous badge, somebody who came to a question

[00:33:33] and dropped an answer that was so good, it outscored the accepted answer. So congrats to

[00:33:39] Don Kirkby for answering, find all references to an object in Python. If you’re curious about that

[00:33:47] as well, we’ll have a link for you in the show notes. I am Ryan Donovan. I host the podcast, edit the blog here at

[00:33:53] Stack Overflow. If you have questions, concerns, topics, comments, et cetera, et cetera, email me at

[00:33:59] podcast at stackoverflow.com. And if you want to reach out to me directly, you can find me on LinkedIn.

[00:34:05] Thank you for having me, Ryan. And I’m Dhruv Batra. I can be found at dhruvbatra.com. And my company is

[00:34:13] Utori. We can be find at utori.com. Thank you for having me.

[00:34:17] All right, everyone. Thanks for listening. We’ll talk to you next time.