What is a Principal Engineer at Amazon? With Steve Huynh
Summary
Steve Huynh, a former Amazon Principal Engineer with 17 years at the company, shares insights into the unique role of a Principal Engineer at Amazon. He explains why the promotion from Senior to Principal Engineer is exceptionally difficult, despite hundreds of openings and thousands of senior engineers vying for these positions. The discussion covers the tight-knit Principal Engineer community, including in-person events, Slack groups, and the internal “Principles of Amazon” presentation series.
Steve details the scale and engineering challenges at Amazon, such as handling brownouts and COEs (Correction of Errors), and the impact of latency on revenue. He describes the transition from a monolithic architecture to microservices due to scaling limits, and how this shift introduced new performance trade-offs. The conversation also touches on Amazon’s internal freedom of movement policy, which allows engineers to transfer teams more easily, fostering an internal talent marketplace.
Steve outlines the paradoxes of being a Principal Engineer: belonging to all teams yet none, having significant autonomy but high accountability for impact, and managing overwhelming bandwidth demands while staying present. He highlights the visibility and influence the role provides, as well as the expectation to solve problems through various means—code, system design, process changes, or resource allocation—rather than just writing software.
He reflects on Amazon’s culture, emphasizing principled thinking and the writing culture centered around six-page memos. Steve also shares his experience with patents, using the example of a high-performance ticket-selling system he helped design. Finally, he discusses his post-Amazon career, focusing on content creation and sharing advice on meta-learning and career development.
Recommendations
Books
- So Good They Can’t Ignore You — Steve recommends this book by Cal Newport for its focus on building career capital—developing in-demand skills to gain more control over your career path and lifestyle.
- Designing Data-Intensive Applications — Steve mentions this tech book (often abbreviated as DDIA) as a valuable resource, noting that a new edition is expected soon.
Concepts
- Meta-learning — Steve emphasizes the importance of learning how to learn quickly as a key skill for career resilience and growth, making you ‘recession-proof’ and always valuable.
Topic Timeline
- 00:00:00 — Introduction to Amazon’s Principal Engineer role — Steve Huynh introduces the topic of Amazon’s Principal Engineer level, highlighting its uniqueness and difficulty to achieve. He mentions the strong internal community and the challenges of scaling and reliability at Amazon.
- 00:01:10 — Steve’s career journey and team transfers at Amazon — Steve discusses his 17-year tenure at Amazon, working across multiple teams like Search Inside the Book, Kindle, Prime Video, payments, and live sports streaming. He explains Amazon’s internal freedom of movement policy, which allows engineers to transfer teams more easily, creating an internal talent marketplace.
- 00:09:03 — Engineering scale and challenges at Amazon — Steve describes the massive scale at Amazon, where a single request can spawn hundreds of downstream calls, leading to self-inflicted DDoS scenarios. He explains concepts like brownouts and the importance of availability and resilience in transactional systems like payments and live sports streaming.
- 00:16:53 — Latency’s impact on revenue and architectural evolution — Steve discusses Amazon’s discovery that lower latency directly increases revenue, leading to a focus on performance optimization. He traces Amazon’s architectural journey from a monolithic C++ system constrained by 32-bit limits to a microservices architecture, highlighting the trade-offs between maintainability and latency.
- 00:26:47 — The difficulty of promotion to Principal Engineer — Steve explains why the jump from Senior to Principal Engineer at Amazon is so challenging, describing it as a ‘two and a half level jump.’ He notes the high demand for principals and the brain drain of talented engineers leaving for other companies with more sane progression ladders.
- 00:30:36 — The Principal Engineer community and internal resources — Steve describes the tight-knit Principal Engineer community at Amazon, including Slack channels, offsites, and the internal ‘Principles of Amazon’ presentation series. He highlights the value of internal resources like COEs (Correction of Errors) postmortems, which provide blameless learning from outages.
- 00:39:00 — Paradoxes of the Principal Engineer role — Steve discusses the paradoxes outlined by his peer Bhavik Kothari: belonging to all teams yet none, having freedom but high accountability, bandwidth challenges, and difficulty staying present. He shares his experience reporting to a VP and the expectation to define and solve high-impact problems autonomously.
- 00:51:51 — Pros and cons of being a Principal Engineer — Steve reflects on the benefits of the role, including unprecedented visibility across the organization and the status that comes with it. He also warns about the trap of overconfidence—feeling like an expert in everything—and the challenge of staying focused amid overwhelming demands.
- 00:54:56 — Amazon’s culture: principled thinking and writing — Steve highlights Amazon’s secret sauce: principled thinking based on fixed leadership principles like customer obsession and bias for action. He also praises the writing culture centered on six-page memos, which enable efficient communication and context-setting across the company.
- 01:01:06 — Patents and high-performance system design — Steve explains Amazon’s defensive patent strategy and shares an example patent for a high-performance ticket-selling system. The design involves distributing inventory to edge nodes, using CPU cache and bit manipulation for fast contiguous seat allocation, showcasing innovative solutions to real-world scaling problems.
- 01:07:14 — Life after Amazon and career advice — Steve talks about his current work creating content on YouTube and through a newsletter. He offers career advice focused on meta-learning—developing the skill of learning quickly—and recommends books like ‘So Good They Can’t Ignore You’ by Cal Newport and ‘Designing Data-Intensive Applications.‘
Episode Info
- Podcast: The Pragmatic Engineer
- Author: Gergely Orosz
- Category: Technology News Tech News
- Published: 2025-07-09T18:44:48Z
- Duration: 01:13:17
References
- URL PocketCasts: https://pocketcasts.com/podcast/59045350-573e-013d-e880-02cacb2c6223/episode/cabb38e4-b91e-4950-be2a-151f7e8d19c0/
- Episode UUID: cabb38e4-b91e-4950-be2a-151f7e8d19c0
Podcast Info
- Name: The Pragmatic Engineer
- Type: episodic
- Site: https://newsletter.pragmaticengineer.com/podcast
- UUID: 59045350-573e-013d-e880-02cacb2c6223
Transcript
[00:00:00] If you’re going to optimize for performance, you’re saying, why can’t we be at one millisecond?
[00:00:04] Or why can’t we be at 10 milliseconds and start from there?
[00:00:06] Instead of sort of saying, hey, let’s try to decrease latencies by 50% or 25%.
[00:00:11] Let’s just start from what is the conceptually fastest thing that we could do.
[00:00:15] And that’s actually how Amazon was created.
[00:00:17] Amazon’s principal engineering level is unique in many ways across big tech.
[00:00:21] Steve Hewden was a software engineer at Amazon for 17 years
[00:00:23] and worked as a last four years as a principal engineer.
[00:00:27] Today, we talk about the ins and outs of this role,
[00:00:29] including why being promoted from senior to principal is so hard,
[00:00:33] even though Amazon usually has hundreds of principal engineering openings
[00:00:36] and thousands of seniors trying to get into these positions.
[00:00:40] The Amazon principal engineering community, the in-person events,
[00:00:43] the Slack group, and the principles of Amazon internal presentation series.
[00:00:46] Engineering concepts at Amazon are unreliability,
[00:00:49] such as brownouts and COE, correction of errors, and many more topics.
[00:00:53] If you’re interested in understanding one of the hardest engineering levels to get into across big tech,
[00:00:58] together with story,
[00:00:59] you’ll be able to learn a lot more about how Amazon was created and what it’s like to be a part of it.
[00:01:02] So, if you’re interested in learning more about how Amazon was created and what it’s like to be a part of it,
[00:01:03] subscribe to our YouTube channel and on your favorite podcast player,
[00:01:05] it greatly helps more people discover this show.
[00:01:07] If you enjoyed it, thanks for doing so.
[00:01:10] So, Steve, welcome to the podcast.
[00:01:12] Thanks for having me.
[00:01:13] How long were you at Amazon? 17 years?
[00:01:16] Yeah, I was there for 17 and a half years.
[00:01:19] And yeah, I just quit last year.
[00:01:22] So, I’ve been basically a year doing other things now.
[00:01:26] And what were the things that you worked on while you were there?
[00:01:28] Yeah.
[00:01:29] Yeah.
[00:01:29] Yeah.
[00:01:29] Yeah.
[00:01:29] Yeah.
[00:01:29] Yeah.
[00:01:29] Yeah.
[00:01:29] Yeah.
[00:01:29] People always talk about my long tenure there.
[00:01:32] But, you know, I feel like I’ve had like five or six jobs over that time period.
[00:01:38] I started off on, you know, a project called Search Inside the Book.
[00:01:42] I worked on the first Kindle launch.
[00:01:45] Wow.
[00:01:45] I worked on the precursor to Prime Video.
[00:01:49] I sort of like worked there at the beginning part of my career.
[00:01:51] And then I sort of ended my career there for the last five years of my time there.
[00:01:55] I worked in payments.
[00:01:58] I worked in Amazon Local, which was sort of our Groupon project when that type of business was looking like it was going to take over.
[00:02:07] I worked on Amazon Restaurants.
[00:02:09] I worked on Amazon Tickets, which was our Ticketmaster clone.
[00:02:13] And then my last five years was working on Live Sports Streaming on Prime Video.
[00:02:19] If you want to build a great product, you have to ship quickly.
[00:02:21] But how do you know what works?
[00:02:23] More importantly, how do you avoid shipping things that don’t work?
[00:02:27] Yeah.
[00:02:27] The answer, StatSig.
[00:02:30] StatSig is a unified platform for flags, analytics, experiments, and more.
[00:02:34] Combining five plus products into a single platform with a unified set of data.
[00:02:39] Here’s how it works.
[00:02:41] First, StatSig helps you ship a feature via Feature Flag or Config.
[00:02:45] Then it measures how it’s working from alerts and errors to replays of people using that feature to measurement of top line impact.
[00:02:53] Then you get your analytics, user account metrics, and dashboards to track your progress over time.
[00:02:58] All linked to the stuff you ship.
[00:02:59] Even better, StatSig is incredibly affordable.
[00:03:02] With the super generous free tier, a starter program with $50,000 of free credits,
[00:03:07] and custom plans to help you consolidate your existing spend on flags, analytics, or A-B testing tools.
[00:03:12] To get started, go to statzig.com slash pragmatic.
[00:03:16] That is S-T-A-T-S-I-G dot com slash pragmatic.
[00:03:19] Happy building.
[00:03:20] This episode was brought to you by Graphite,
[00:03:22] the developer productivity platform that helps developers create, review, and merge smaller code changes,
[00:03:27] stay unblocked, and ship faster.
[00:03:31] Code review is a huge time sink for engineering teams.
[00:03:34] Most developers spend about a day per week or more reviewing code or blocked waiting for a review.
[00:03:39] It doesn’t have to be this way.
[00:03:41] Graphite brings stack pull requests,
[00:03:43] the workflow at the heart of the best-in-class internal code review tools at companies like Meta and Google,
[00:03:48] to every software company on GitHub.
[00:03:51] Graphite also leverages high-signal, code-based-aware AI to give developers immediate actionable feedback on their pull requests,
[00:03:57] allowing teams to cut down on review cycles.
[00:04:00] Tens of thousands of developers at top companies like Asana, Ramp, Tekton, and Vercel rely on Graphite every day.
[00:04:07] Start stacking with Graphite today for free and reduce your time to merge from days to hours.
[00:04:12] Get started at gt.dev slash pragmatic.
[00:04:15] That is G for Graphite, T for technology.dev slash pragmatic.
[00:04:20] So that’s a lot of different teams.
[00:04:22] How did you work out so many teams?
[00:04:24] Is it just like there’s a lot of internal transfer?
[00:04:27] Did you get bored?
[00:04:28] Was it just you followed your manager?
[00:04:30] How does it work inside Amazon?
[00:04:31] Because when people think about companies, people who have not worked on Amazon,
[00:04:34] they would kind of assume you go, you work there, you’re on a team for like, you know, four, five, six years.
[00:04:39] Clearly not the case.
[00:04:40] You know, it depends a little bit on like corporate policy and then where you are with your career.
[00:04:44] I started as a support engineer, so sort of like operationally focused person.
[00:04:51] And then, you know, I was basically like, I want to be a software developer.
[00:04:55] And so, you know, I think.
[00:04:57] Getting into the company was pretty difficult.
[00:05:00] But once I was there, sort of set that target and changed roles.
[00:05:04] And when I changed the role, you know, it was a natural time to move to another team.
[00:05:11] There’s some also some internal policy.
[00:05:14] So basically at Amazon, it used to be that you had to stay on a team for at least a year before you transferred.
[00:05:21] And if you wanted to transfer.
[00:05:24] Like a senior manager or director, whoever.
[00:05:27] At the top could block your transfer.
[00:05:29] And what that ended up meaning was that, like certain teams that were just terrible to work on, those teams actually had more than 100 percent attrition over the course of a year because you measured attrition with a year long time unit.
[00:05:43] Amazon did something actually smart at the corporate level.
[00:05:47] They basically said, OK, well, you have freedom of movement now.
[00:05:52] This sort of happened, I don’t know, probably like 13 years ago, 10, 13 years ago.
[00:05:57] And so they said, you have freedom of movement now.
[00:05:59] A VP or a director can can’t block you.
[00:06:03] They can say, OK, well, we need another month to get like a transition plan going.
[00:06:07] But essentially, you have freedom of movement as long as you’re not on a performance improvement plan, which meant that certain teams were sources of high quality engineering talent and certain teams were sinks of high quality engineering talent.
[00:06:19] And it sort of created an internal marketplace for for different roles.
[00:06:25] Now, what that ended up meaning was.
[00:06:27] That certain teams, they basically didn’t want you to know what the policy was.
[00:06:32] They wanted you to sort of think that you were kind of stuck.
[00:06:36] But, you know, despite that sort of like local gamesmanship that was going.
[00:06:41] Yeah, like basically some managers didn’t want their best people to leave.
[00:06:43] Exactly.
[00:06:43] Let’s just say it how it is.
[00:06:44] But ultimately, I think it’s a it’s a great strategy because it put the like if there was a team that was difficult to staff, the problem was on the management.
[00:06:55] It wasn’t something.
[00:06:57] It wasn’t something that had to be, you know, bared by or borne from the the employee themselves.
[00:07:03] And so, you know, getting back to my own career journey at a very large company like Amazon, there are so many awesome things that are going on.
[00:07:11] And, you know, I decided to just kind of go where my curiosity took me.
[00:07:17] Now, there were some times where, you know, there were reorgs or, you know, a line of business got spun down.
[00:07:25] But ultimately, you know, I think.
[00:07:27] Freedom of movement was one of the smartest things that that Amazon did.
[00:07:30] And I think this is something that people don’t really appreciate about some large companies.
[00:07:34] So, you know, not all companies are like Amazon and every company changes, right?
[00:07:37] Like today, I’m assuming it will be hard to move as many teams with an Amazon.
[00:07:42] Depending on where you are, you know, if you’re in a satellite office where there’s two teams, you can probably move on to the other team at max.
[00:07:49] But I think this is one of the underrated things of large companies.
[00:07:52] Like once you are in, it’s almost always easier to get that job at another team.
[00:07:56] From the inside, especially because you can talk to them.
[00:08:00] You know, this is I talk with the Reddit mobile team and I asked like, oh, how how can you get become a platform engineer on the mobile team?
[00:08:06] And they said, like, well, you know, most of our hires have been internal.
[00:08:09] They just helped us out on hackathons.
[00:08:11] They come around, they commit stuff.
[00:08:13] We know them.
[00:08:14] It’s a low risk hire.
[00:08:15] I think it’s just nice to remember that when you think of like a big company like Amazon or Meta or Microsoft, it’s just so many small teams.
[00:08:21] And once you’re in, you actually have almost priority access to those teams.
[00:08:25] If you play your cards right.
[00:08:28] And, you know, you might interview for that team, but it’s it’s such lower stakes than an external interview.
[00:08:34] And, you know, just all things being equal.
[00:08:37] Would you rather take somebody that’s, you know, internal and knows the culture?
[00:08:42] They know how software is developed within a particular context or somebody that’s just as good but doesn’t, you know, hasn’t been onboarded.
[00:08:50] And I think ultimately you’re going to pick the person that’s internal, all things being equal.
[00:08:54] Yeah.
[00:08:55] It’s just kind of like business rationality for the most part.
[00:08:57] So one thing about Amazon and about large companies like Amazon is people talk about externally about the scale.
[00:09:03] And it’s hard to imagine.
[00:09:05] But can you give us a sense of the scale that you’ve seen or like some tough engineering challenges that you worked on that would have been just really hard to work at a smaller startup?
[00:09:13] Yeah, I think that’s the thing that you just you will not see at most other places is the scale of things.
[00:09:22] I’ll give you a couple of examples.
[00:09:24] So, you know.
[00:09:25] Prime is the exclusive club that everybody is a member of.
[00:09:29] And, you know, in the U.S., the shipping benefit is probably, you know, the most popular.
[00:09:35] But globally, Prime Video is, you know, it’s the thing that people use the most with their subscription.
[00:09:45] And so if you think about, you know, our service-oriented architecture and, you know, just loading up the app.
[00:09:52] The gateway page is the place where all of our.
[00:09:55] Requests come in.
[00:09:57] Right.
[00:09:57] And so it’s just it’s just like Netflix.
[00:09:59] It’s this infinite scroll of carousels.
[00:10:02] So the gateway page is the Amazon Prime landing page.
[00:10:04] Yeah, it’s the landing page there.
[00:10:06] And so you’re like, OK, cool.
[00:10:08] If let’s say 90, 95, 99 percent of all of your requests are coming from that page and that page needs to be personalized.
[00:10:15] You know, and you have a service-oriented architecture with a bunch of microservices.
[00:10:22] One request to that page turns into.
[00:10:25] Let’s just say hundreds of downstream requests to different services.
[00:10:30] It might even be more than that.
[00:10:31] It’s actually kind of hard to count.
[00:10:33] Yeah.
[00:10:33] And is this page right?
[00:10:35] Like all the all the stuff flowing all personalized stuff.
[00:10:37] So that’s the that’s the retail one.
[00:10:39] But I was talking about the Prime Video one.
[00:10:41] The Prime Video.
[00:10:41] But essentially, it’s the same thing.
[00:10:43] Yeah.
[00:10:43] And so, you know, same thing for the retail website as well.
[00:10:46] And so if you have one request sort of spidering out into, you know, two orders of magnitude more requests internally.
[00:10:54] You start to see.
[00:10:54] You start to see like really, really large scale for these microservices.
[00:10:59] So a microservice will have like a reverse proxy or load balancer in front of it.
[00:11:03] And you are sort of unironically talking about things like tens of thousands of requests per second or hundreds of thousands of requests per second coming into your service.
[00:11:13] So it’s like the services are like behind, you know, like there’s the Prime.
[00:11:17] There’s all the things loading.
[00:11:18] They’re spidering out like making, you know, to render that one recommendation, for example, for I don’t know, the video that you would like.
[00:11:24] It will make a lot of requests to different services.
[00:11:27] And then so when you’re operating a smaller service inside of Amazon, suddenly you’re going to be hit with what you just said.
[00:11:34] 10K, 100K requests per second, that kind of scale.
[00:11:37] And you will essentially be DDoSing yourself.
[00:11:41] You’re just like, okay, cool.
[00:11:44] Let’s change a caching configuration on some item details.
[00:11:49] And it turns out you’ve just browned out like a critical service.
[00:11:54] What does browned out mean?
[00:11:57] Oh, sorry.
[00:11:58] I’m using some jargon.
[00:11:59] So if you want to talk about availability, if you suppose you are DDoSing a service or sending a lot of requests over to them, you can, you know, you can just take them down.
[00:12:11] That would be like a blackout.
[00:12:12] Yeah.
[00:12:13] And so like you send a request.
[00:12:15] Oh, you can’t establish a connection.
[00:12:17] It immediately comes back.
[00:12:18] But there’s a type of outage where they brown out.
[00:12:22] So basically they’re reachable.
[00:12:24] They might accept a connection.
[00:12:25] But, you know, they’ll essentially time out or they might return partial results or bad results.
[00:12:32] Or the only thing that they do return is a 500 for some percentage or proportion.
[00:12:37] After you waited a bunch of time for that.
[00:12:39] Yeah.
[00:12:39] And so, you know, now we start talking about like availability and resilience in the face of like all of these DDoSing that you’re doing to yourself.
[00:12:48] And so the thing on top of scale that is going to really complicate things is your deployability.
[00:12:55] Right.
[00:12:55] And so, you know, your service is a dependency of some of the process that’s going on.
[00:13:02] It depends on, you know, maybe AWS.
[00:13:04] It may depend on another service.
[00:13:06] You know, how do you make sure that if, you know, suppose there’s a failure for a primary dependency and that dependency comes back up, how do you make sure you don’t just like inundate it with a bunch of requests as it’s trying to recover?
[00:13:19] Yeah.
[00:13:20] And so you have all of these sort of like odd dynamics that occur.
[00:13:23] I used a brownout.
[00:13:24] I used a brownout as something that is a perennial problem that we have, right, where there’s maybe a dependency on a base service like S3 or DynamoDB or whatever it is.
[00:13:36] There might be some increased latency that may cause a chain reaction of a dependency going down.
[00:13:43] And then one of these sort of middle tier services would brownout.
[00:13:46] So what are like, you know, you’re an owner of the services for your team.
[00:13:52] And so then it’s like, OK.
[00:13:54] What?
[00:13:54] What do we do in those situations?
[00:13:56] How do we know that they’re browning out?
[00:13:58] What do we do in the face of, you know, a dependency outage?
[00:14:01] And then critically, if there is an outage and then the service comes back up, how do we make sure that we give it enough space so that it can breathe so that, you know, you know, as they’re trying to recover from some sort of outage, we don’t just take them down immediately again.
[00:14:17] And I guess for like most of us who are not working right now on these services, like these sound pretty cool in theory.
[00:14:24] But you’re saying this was actually like, like, this is not theory.
[00:14:27] This actually was like, oh, this service is going down.
[00:14:30] We are literally having 100K requests per second.
[00:14:33] And we’re like pushing that on to like other three services with up with the same because we need to invoke three other services.
[00:14:39] One of them has browned out.
[00:14:40] What do we do now?
[00:14:41] How do we fix it?
[00:14:42] Yeah.
[00:14:43] And I think for certain other large tech companies, you know, you can do best effort.
[00:14:52] Right.
[00:14:52] Which is basically like, hey, we’re temporary.
[00:14:54] Temporarily down.
[00:14:56] But, you know, you can, you can, you know, you have some sort of degraded service that makes sense.
[00:15:01] But if you’re on, say, a website that does purchases, now we’re talking about transactions.
[00:15:07] Yeah.
[00:15:07] Or if you’re in the prime video, like live video streaming use case.
[00:15:11] Now we’re talking about a football game that you’re unable to see.
[00:15:16] And then when we recover, the game might be over.
[00:15:19] Yeah.
[00:15:20] Right.
[00:15:20] And so it’s much higher stakes.
[00:15:21] And so I think the scale with.
[00:15:24] Transactional semantics, right?
[00:15:27] Like, that’s actually the challenge that you’re not going to see unless you sort of like work for a payment processor or something like that.
[00:15:34] Yeah.
[00:15:34] I guess that does real world pressure challenge.
[00:15:38] Like you are losing money.
[00:15:39] This is.
[00:15:39] I’m starting to understand why.
[00:15:41] Like, I have noticed that startups love to hire from certain companies.
[00:15:45] They usually start off to hire from other startups because it’s similar environment from large tech companies.
[00:15:49] It’s a bit of a maybe I’m generalizing.
[00:15:51] Obviously, this is will not be true 100% of the time.
[00:15:53] But for example, hiring from Google.
[00:15:55] A lot of startups are not as happy because the people coming from Google are used to having this amazing team around them, internal tools, but most startups love hiring from Amazon.
[00:16:03] And I’m starting to get a sense of why this actually is.
[00:16:06] Yeah, I think that’s part of the culture.
[00:16:08] You know, you you get you get hired as a software developer and they hand you a pager.
[00:16:13] And before, you know, phone apps and things like that, it was like this pager from the 90s.
[00:16:19] But it’s it’s it’s really great because you have to you have to, like, operate.
[00:16:24] The software that you write, if you if you actually you cannot write the software, hand it over to the testing team and then throw it over to the SRE team after you’re done.
[00:16:34] Like you own that that piece of software.
[00:16:36] Yeah. Yeah.
[00:16:37] At every team, right?
[00:16:38] One interesting thing that we talked about yesterday over over dinner with with Casey Moritory is you said something interesting on how Amazon measured how on their retail website, I think it was retail, maybe Amazon Prime, the lower the latency of something loading, like a page loading, like a purchase
[00:16:53] stage or a purchase.
[00:16:54] But the more people converted, the more revenue they got, and they started to measure and there was a linear linear correction as the faster it was, the more people converted.
[00:17:01] And it seemed that had no end.
[00:17:04] And the question Casey asked is like, OK, if this is the case, what would stop Amazon?
[00:17:09] Because you have the best technologies in the world.
[00:17:11] You have AWS, you know, you can build whatever you want to get the latency of the website down to, let’s say, like 10 milliseconds or or even one millisecond, because if this goes up, you would maximize revenue.
[00:17:22] So can you tell me about like how how that.
[00:17:24] Thing like this measurement actually happened and you know, why is Amazon’s website still maybe not the fastest in the world, even though it would generate so many more billions, right?
[00:17:36] Yeah.
[00:17:37] Well, there are a couple of questions embedded in there, but we’ll start with the, you know, the latency to to gross revenue measurement.
[00:17:45] So essentially somebody way back when, you know, because we invest in logs and telemetry started tracking how much gross revenue we would make.
[00:17:54] And so, you know, there was this really big focus on, you know, the latency of the checkout pages, based off of like the latency for detail pages, based off the latency of gateway, based off of latency of the checkout pages.
[00:18:01] And they noticed this dynamic where it’s like if you’re faster, you just make more money.
[00:18:06] It’s a it’s a pretty clear correlation.
[00:18:09] I think you would even go as far as to say as causation.
[00:18:12] And so there was this really big focus on on latencies.
[00:18:17] I love the idea that, you know, if you’re going to optimize for performance, you’re saying, like, why can’t we be at one millisecond?
[00:18:24] Or why can’t we be at one millisecond?
[00:18:24] Or why can’t we be at one millisecond?
[00:18:24] Or why can’t we be at one millisecond?
[00:18:24] Why can’t we be at 10 milliseconds and start from there instead of sort of saying, like, hey, let’s try to decrease latencies by 50 percent or 25 percent?
[00:18:32] Like, let’s just start from what is the conceptually fastest thing that we could do.
[00:18:37] And I think in a vacuum, the conceptually fastest thing that we could do is sort of like a monolith, which is how Amazon started, where, you know, you have a web server with all of your catalog information.
[00:18:52] So all of the items that are there and then transactions.
[00:18:54] So all of the items that are there and then transactions.
[00:18:55] So all of the items that are there and then transactions.
[00:18:55] That would be the fastest way to run.
[00:18:59] And basically, like a web request would be it opens the HTTP or HTTPS handshake.
[00:19:05] It hits the server.
[00:19:06] The server in an ideal world has everything cached or calculated.
[00:19:10] It sends it back.
[00:19:11] So the total latency would be the time for this request, the time to transfer that data based on your internet speed.
[00:19:18] And that’s it.
[00:19:18] That is the absolute.
[00:19:19] You cannot be faster than that.
[00:19:20] I don’t think so.
[00:19:21] Maybe there’s some exotic sort of thing.
[00:19:23] Maybe you can do some exotic.
[00:19:24] Like protocol that I know predicts the future.
[00:19:26] I’m like with UDP sends it.
[00:19:27] But but yeah, but this is your baseline.
[00:19:29] I guess the optimal would be like zero click instead of like a one click checkout.
[00:19:33] Right.
[00:19:33] So we just send you stuff before like, you know, you want it.
[00:19:37] That would be the I guess the theoretical maximum.
[00:19:39] But, you know, if you if there’s some sort of like web request.
[00:19:42] Right.
[00:19:42] So some HTTP request and then some sort of like buy button.
[00:19:46] That would be the fastest.
[00:19:47] Right.
[00:19:48] And that’s actually how Amazon was created.
[00:19:50] We bought this, you know, sort of the opposite of horizontal scaling is vertical scaling.
[00:19:54] We bought these big sun boxes and, you know, we hacked up our own web server and C++.
[00:20:01] And, you know, to scale up, we bought bigger hardware.
[00:20:05] And then when that didn’t work, you know, we bought like six of these big boxes and that ran Amazon.
[00:20:10] And we ran that wave up until the early 2000s.
[00:20:14] And then what we realized, we we ran into a wall, which was that, you know, when you when you built the C++ binary,
[00:20:23] the binary could only be four gigabytes.
[00:20:26] And that was a hard limit based off of the 32-bit software, the architecture that we’re running on before.
[00:20:32] We could not get above four gigabytes.
[00:20:35] And so these product managers would come and just be like, well, just make a change for me.
[00:20:39] Right.
[00:20:39] To the devs.
[00:20:40] And then they would just be like, I don’t think you understand that this is a hard constraint.
[00:20:44] And so we.
[00:20:44] The size of the code or the binary code, the compiled one, it was there.
[00:20:48] And you had so much business logic by then that it just filled up four gigabytes.
[00:20:52] Yeah.
[00:20:52] Yeah.
[00:20:53] And, you know, we had a distributed C++ build.
[00:20:56] So, you know, you could, you know, it would take many, many hours for it to compile.
[00:21:00] And so we would distribute it across desktops.
[00:21:02] And it was this whole big thing.
[00:21:04] But we ran into that wall.
[00:21:06] And so what we decided to do, and I think this was super smart, was like to lean into service oriented architectures.
[00:21:12] Right.
[00:21:13] And microservices.
[00:21:14] Yep.
[00:21:15] And when you break it down, a web service call is essentially it’s a remote procedure call.
[00:21:21] Right.
[00:21:21] So you have this execution pointer.
[00:21:23] And then you’re like, OK, well, I need to do some computation or I need to gather some data.
[00:21:27] I’m going to turn in turn, make a HTTP request downstream to another service.
[00:21:31] And then you can sort of chain those things together.
[00:21:34] And so getting back to the original thing about performance.
[00:21:37] In a world where you have to, because you have thousands and thousands of developers building, you know, this stuff.
[00:21:43] And the fact that you cannot have a monolith as big as Amazon retail, you know, past something that’s sort of like circa 2002 Amazon size.
[00:21:53] You have to lean into remote procedure call.
[00:21:55] You have to say that there is a web service.
[00:21:58] The best performance that you can actually get is always going to be bounded by the number of web requests that you end up making.
[00:22:04] Whether it’s the, you know, the first order calls to say, go get the item details.
[00:22:09] But then also any blocking call that happens downstream.
[00:22:13] And by blocking call, we mean like you need to wait for this to finish to get your data.
[00:22:17] Like, you know, does it serve as that like returns?
[00:22:20] I don’t know your top five most likely to buy things.
[00:22:22] It might need to make those, let’s say, five requests or just one request.
[00:22:26] It needs to wait for that before it can return.
[00:22:28] Exactly.
[00:22:28] Exactly.
[00:22:29] And you can do this telemetry stuff.
[00:22:30] You can do this observability stuff to figure out, you know, within that service call chain, what the blocking call is.
[00:22:37] And you can get some, you know, some amount of visualization on it.
[00:22:40] And so then you can get down to the point where it’s like, okay, if we’re going to start from first principles, what’s the least amount of latency that you can get for, say, like a web request or a checkout page call?
[00:22:51] You’re going to run into like the absolute minimum, right?
[00:22:56] And it’s going to be based off of like, what are the required operations, you know, evaluation or transactions or whatever for that particular request?
[00:23:05] Yeah.
[00:23:06] And then basically, so as I understand, like as it became a Microsoft, like more microservices and services, this is great for maintainability.
[00:23:12] And also you just start, well, you first just solve the issue of the monolith size.
[00:23:16] And, you know, as we know, as with history, of course, like now teams could be more autonomous.
[00:23:21] They’re not as dependent.
[00:23:22] They could do the APIs, but it was a trade-off for latency.
[00:23:25] And now, like you had to go back and figure out the blocking calls, how to speed those up, how to do, I guess, you know, trade-off things like caching.
[00:23:35] Like, you know, you can have things fast, but it might not be as correct on the first one.
[00:23:38] Or like just tricky UI where you don’t show the data just yet, but it’s coming.
[00:23:43] And the users sense a sense of like progress, those kind of things.
[00:23:47] And it also, I think, forces teams to really, and products, to really say, okay.
[00:23:51] Like, what is the strictly necessary processing that happens on this page?
[00:23:56] Some of the work that I was doing before I left Prime Video was basically like you have these really, really big, heavy gateway page, you know, or landing page requests.
[00:24:06] And, you know, if you’re in a situation with high load, can you preemptively reduce the amount of, say, personalization that’s going on to sort of speed up that page?
[00:24:19] Or, you know, to increase.
[00:24:21] You know, to increase the amount of, like, throughput that you’re able to have to serve more customers.
[00:24:25] Can you do that in a smart way, right, that sort of anticipates load that’s coming onto that page, say, if there’s a football game coming up or something like that?
[00:24:36] Yeah.
[00:24:37] Sounds like these are just like, A, they seem just hard to solve, but now you have to solve them.
[00:24:43] So it sounds like this kept you busy and not everyone else busy at Amazon to this date, right?
[00:24:49] Like, is this, do you think this is an ongoing engineering challenge for Amazon?
[00:24:53] Because, you know, what I would imagine the tricky thing being here is like, okay, you can optimize whatever you have.
[00:24:59] You can find the critical paths.
[00:25:01] But Amazon keeps growing, right?
[00:25:03] Like, there’s new teams, new services, new everything coming on.
[00:25:05] So this thing will change all the time.
[00:25:07] It’s an ongoing puzzle to solve.
[00:25:09] Yeah, absolutely.
[00:25:10] Yeah, I think, you know, they definitely have a ton of work in front of them.
[00:25:15] Also, you know, it’s part of their ethos to really, like, launch new lines.
[00:25:19] And so, you know, the ability for a team to go from zero to launch product within the confines and the context of a large corporate entity, I think that’s, you know, part of the DNA that’s there.
[00:25:33] So as long as they’re planting seeds, as the sort of like internal terminology is, I think that, you know, software developers will be in demand for quite amount of time.
[00:25:43] Yeah, and I guess it’s a good reminder that, you know, there’s every now and then we have the monolith versus microservices debate.
[00:25:47] It sounds like it kind of just makes sense for a startup to start with a monolith.
[00:25:51] Like, you can always do what Amazon did, and you have the benefits of latency, everything is in one place.
[00:25:56] Like, I’m sure there might be reasons to start with microservices to start with.
[00:25:59] But if you’re a small team, I mean, even today, I don’t think that argument changes, right?
[00:26:04] Like, Amazon got really big wins by starting with a monolith back in the day.
[00:26:09] Yeah, absolutely.
[00:26:10] I think it just makes a ton of sense to start with a monolith, wait till it breaks.
[00:26:17] And then…
[00:26:17] The part that where it breaks is when you have like 50 developers working on the same piece of code.
[00:26:22] Once that sort of breaking point occurs, then you start to like try to figure out like how you can sort of break things up.
[00:26:29] But starting with a microservice architecture, especially when you’re small, like what a waste of time and energy.
[00:26:35] Totally.
[00:26:35] So you were a principal engineer at Amazon, and apparently I learned that, you know, most companies, they have different levels.
[00:26:43] And again, this principal engineer, some companies have like staff level, but it’s usually like entry level.
[00:26:47] Yeah.
[00:26:47] Entry level, mid-level, senior, and then you have staff, or in the case of Amazon, it’s principal.
[00:26:53] I’ve learned that Amazon’s principal level is both really hard to get into compared to a lot of other companies.
[00:26:59] And it’s pretty special in some ways.
[00:27:00] So we’ll talk about that.
[00:27:01] But can you tell me like how is the career kind of development?
[00:27:06] Because most people imagine like, oh, it should be pretty straightforward.
[00:27:09] I spend like, I don’t know, two years as a junior, two years as a mid, roughly, and two years as a senior, then I get to principal.
[00:27:14] How does it actually work at Amazon?
[00:27:16] I think it’s linear.
[00:27:17] Up until you hit principal, right?
[00:27:19] So, you know, you join, you’re a junior developer, you get promoted to mid.
[00:27:23] At mid, you know, you’re starting to influence the team, but then you get to senior.
[00:27:28] And so now your expected impact is at the team level.
[00:27:33] And then there’s this jump that you get to principal.
[00:27:37] And principal is L6?
[00:27:39] Principal is L7.
[00:27:40] L7, yes.
[00:27:41] Yeah.
[00:27:41] And so I think you really have to start with like, why is that jump so big?
[00:27:46] Because I think at every…
[00:27:46] At pretty much any other company, it’s just a linear progression.
[00:27:50] Like there’s nothing necessarily special about staff.
[00:27:53] You know, you can just sort of go to that level of senior staff and then principal.
[00:27:56] But for some reason, Amazon decided that they weren’t going to have a staff level.
[00:28:03] And so, and I think they sort of like couched it around like having high standards.
[00:28:08] Basically, to get from senior to principal, you have to do like two and a half level jump.
[00:28:14] From L6 to L7.
[00:28:15] Yes.
[00:28:15] Technically, it sounds like…
[00:28:16] It’s like one level, but at some other companies, this might be like, you know, L8, L9 or L8 and a half.
[00:28:23] Yeah.
[00:28:24] And, you know, so the hand wavy argument is like, hey, we have high standards.
[00:28:27] And like, you know, it means something to get to that level.
[00:28:30] It’s like, fine.
[00:28:31] But I noticed that some of the best engineers that I’d ever worked with were having such problems getting to principal engineer that they ended up moving to Facebook or to Meta or to all these other places where the progression was just sane.
[00:28:46] Now they’re…
[00:28:46] Staff or senior staff.
[00:28:47] Now they’re senior staff and, you know, principal and distinguished engineer at other companies.
[00:28:52] And so, because we had high standards, we actually had this brain drain.
[00:28:57] And it wasn’t a brain drain at lower levels.
[00:28:59] It was the brain drain at sort of like the higher levels.
[00:29:03] And it was just an example of something where it’s just like, why did you do that to yourself?
[00:29:08] And so, that’s the context for being a principal at Amazon.
[00:29:12] It’s safe to say it’s wicked hard to get internally, right?
[00:29:16] So, you know…
[00:29:16] You know, I’m colleagues with Ethan Evans.
[00:29:19] And so, we talk about what’s the hardest promotion at Amazon.
[00:29:24] And, you know, I had made the argument that it was, you know, it was senior engineer to principal.
[00:29:29] And he’s like, yeah, that’s hard.
[00:29:31] Actually, the hardest one, Steve, is, you know, VP to senior VP.
[00:29:34] Because there’s only eight spots or ten spots for that.
[00:29:38] And maybe 300 VPs that are all trying to get this.
[00:29:42] That’s more of a supply and demand thing.
[00:29:44] I will say that at Amazon…
[00:29:46] There is gigantic demand for principal engineers.
[00:29:50] And so, there are roles that have been open for years.
[00:29:53] I think something on the order of like 13 months or 17 months or something like that
[00:29:58] to get an external hire to join as a principal engineer.
[00:30:02] But that metric is only calculated when the role is filled.
[00:30:05] And so, probably, you know, there are hundreds of principal engineer openings at Amazon.
[00:30:11] And there are thousands of senior engineers…
[00:30:14] Who desperately want to get there.
[00:30:15] That would love to be…
[00:30:16] Putting in the work.
[00:30:17] You know?
[00:30:17] And so, there’s this sort of like…
[00:30:19] There’s this tension, right?
[00:30:22] And I don’t think you see that at the lower levels.
[00:30:25] I don’t think that that’s happening at senior or mid or junior.
[00:30:28] And so, like that incongruity, I think, is super interesting.
[00:30:32] But once you do get to principal engineer, one thing that I’ve never heard any other company have is
[00:30:36] there is apparently a principal engineer community.
[00:30:39] Which is, I’ve heard, again, from other people, that it’s tightly knit.
[00:30:43] It’s actually special.
[00:30:44] It’s actually just really nice organization.
[00:30:46] Can you talk about that?
[00:30:47] So, like, you know, once you got in there somehow, I don’t know.
[00:30:51] Was it blood, sweat, and tears at promotion?
[00:30:53] There is a community.
[00:30:54] I think it’s actually really great.
[00:30:56] My own history, you know, I went from support engineer to senior engineer in like four years at Amazon.
[00:31:03] But then from senior to principal, it took me eight years.
[00:31:07] And I got promoted in Q1 of 2020.
[00:31:11] Turns out to be a consequential, like, year for the industry, for the world.
[00:31:15] That was forced.
[00:31:16] Full remote work started.
[00:31:18] And so, you know, I got promoted and everybody’s like, you know, congratulations.
[00:31:21] They used to have, like, a principal engineer offsite where they just flew everybody into Seattle or nearby.
[00:31:26] And then to sort of, like, you know, mingle and to talk to other folks.
[00:31:32] That stopped during the pandemic.
[00:31:34] And then, you know, by the time the pandemic restrictions started leaving,
[00:31:38] the population of principal engineers had essentially doubled.
[00:31:42] That’s still to say, like, there are still hundreds and hundreds of openings
[00:31:45] for principal engineer.
[00:31:47] But then the, you know, the sort of, like, offsite community shifted over to the senior principals
[00:31:52] that I didn’t have access to.
[00:31:54] But, you know, at the moment, the manifestation of the principal engineering community
[00:31:58] is essentially through the Slack channel, which is absolutely awesome.
[00:32:04] And then we had principal offsites for, like, our local organization.
[00:32:09] So, like, Amazon Music, Prime Video, Twitch, that sort of thing.
[00:32:12] Those meetups were amazing.
[00:32:14] So, the reason.
[00:32:15] The reason they were is because of this high standard that Amazon had created.
[00:32:20] And so, what it meant is that everybody that was able to achieve that overly high standard,
[00:32:26] there’s something exceptional about them.
[00:32:28] There’s, you know, they’re super deep in a particular technology
[00:32:32] or they were associated with, you know, the growth of a really large line of business,
[00:32:39] either within Amazon or externally.
[00:32:41] They were essentially leaders within the industry.
[00:32:44] And you could just literally, you could just scoop out five people and then put them into a room.
[00:32:52] And the conversation is just, it’s just amazing, right?
[00:32:55] And I would, I would sort of be like, I don’t even belong here.
[00:32:58] Like, look at this guy, you know, he wrote a book on, you know, on a particular topic.
[00:33:03] And this guy, you know, he, you know, he is, you know, a luminary in a particular field.
[00:33:10] And then this person just, like, is an amazing code machine and can just write,
[00:33:14] you know, an entire application over a weekend.
[00:33:17] And then you’re like, what am I doing here?
[00:33:19] You know?
[00:33:20] I do wonder if that community might be coming back now.
[00:33:23] I know you’ve left, but now Amazon is not in person.
[00:33:26] Because it sounds like a lot of the benefit was the in-person part as well.
[00:33:30] Because this is what I never heard.
[00:33:31] Again, even before the pandemic, I didn’t hear other companies, say, for example, Uber.
[00:33:35] I’ve heard that the senior staff engineers do get together every now and then.
[00:33:39] But it was very, like, roots.
[00:33:41] So it was bottoms up.
[00:33:43] But my understanding is that Amazon,
[00:33:44] Amazon actually invested not just, you know, some principal engineers saying,
[00:33:48] hey, let’s get together, but also just kind of, you know,
[00:33:51] like making sure that that group really had something.
[00:33:55] Like, I think it’s smart.
[00:33:56] I think more companies should do it, but I’m just not seeing it.
[00:33:59] The investment was also in terms of headcount.
[00:34:04] So there are program managers and, like, product managers, essentially,
[00:34:09] that are, you know, bringing the folks together.
[00:34:12] Oh, awesome.
[00:34:13] There’s a wonderful.
[00:34:14] It’s called the Principles of Amazon series, where, you know, principal engineers will just, you know,
[00:34:20] they’ll do a presentation and it’s recorded.
[00:34:22] That’s been happening for, you know, 20 years.
[00:34:25] And, you know, we record everything that’s there.
[00:34:27] But it takes work to actually.
[00:34:29] That’s an internal series.
[00:34:31] And is that open to, like, everyone at Amazon or it’s for the principals?
[00:34:35] Oh, it’s open for everybody at Amazon to consume.
[00:34:38] To consume, yeah.
[00:34:39] And then, you know, there might be some senior engineers and stuff like that that would make a presentation.
[00:34:43] That’s part of their program.
[00:34:44] It’s not just a promotion packet.
[00:34:45] It was to be able to make an Amazon-wide presentation on a particular thing.
[00:34:49] My point was, though, that that stuff doesn’t just happen on its own.
[00:34:53] Like, you have to, like, you need a program manager or multiple folks to sort of, like, herd the cats.
[00:34:59] And to, like, schedule the off-sites.
[00:35:02] And to make sure that the, you know, the Slack channel doesn’t go off the rails, right?
[00:35:06] And it’s still useful.
[00:35:07] And it’s just not going to happen, like, grassroots with just, like, throwing a bunch of people into a room.
[00:35:13] This episode is brought to you by Augment Code.
[00:35:17] You’re a professional software engineer.
[00:35:18] Vibes will not cut it.
[00:35:20] Augment Code is the AI assistant built for real engineering teams.
[00:35:24] It ingests your entire repo, millions of lines, tens of thousands of files, so every suggestion lands in context and keeps you in flow.
[00:35:32] With Augment’s new remote agent, queue apparel tasks like bug fixes, features, and refactors, close your laptop, and return to ready-for-review pull requests.
[00:35:40] Where other tools stall, Augment Code sprints.
[00:35:44] Augment Code never trains or sells your code, so your team’s intellectual property stays yours.
[00:35:49] And you don’t have to switch tooling.
[00:35:50] Keep using VS Code, JetBrains, Android Studio, or even Vim.
[00:35:54] Don’t hire an AI for vibes.
[00:35:56] Get the agent that knows you and your code base best.
[00:35:59] Start your 14-day free trial at AugmentCode.com slash Pragmatic.
[00:36:03] I think, you know, these are the things, I mean, we’re now exposing a few of these things here and there.
[00:36:09] But some of these companies, like, you know, Amazon is a great example.
[00:36:12] Well, there’s more.
[00:36:13] There’s more to the item what meets the surface.
[00:36:15] So, like, once you’re inside Amazon, for example, you now, as an engineer, even if not a principal engineer, you now have access to the whole, you know, 20 years of principal presentations.
[00:36:23] Like, when I joined Uber, I was amazed at how we had the RFCs available.
[00:36:28] Like, I could read all historic ones.
[00:36:30] So, I think there is, and every company has its own.
[00:36:32] Of course, once you’re in there, you have access to this, like, knowledge base, which it will just never be published.
[00:36:37] It cannot because it has, you know, business-sensitive things, etc.
[00:36:40] So, I think as an engineer, like, you can just really just.
[00:36:43] Like, be a sponge when you join, especially one of the companies that is known to be a bit more open internally.
[00:36:50] Amazon, I think, a really interesting one because externally, it’s very closed, is my sense.
[00:36:54] They’re very careful about what they share.
[00:36:56] For example, the postmortems for AWS, it’s very few are published externally.
[00:37:00] But internally, they’re all there.
[00:37:02] As I understand there, as an engineer, you can access, you can learn from them, like, really cool real-world learnings.
[00:37:08] Absolutely.
[00:37:08] You know, it is an open place internally.
[00:37:12] And we are so selective.
[00:37:13] But what we, I say we as though I still work there.
[00:37:16] What they publish externally and, you know, the postmortems, we call them COEs.
[00:37:22] A COE sounds for?
[00:37:23] It’s a correction of error.
[00:37:25] Yeah.
[00:37:25] It’s, you know, it’s this idea that, you know, you have, like, holes in Swiss cheese and you have, like, a failure requires that there’s a hole across layers.
[00:37:36] That’s the best reading.
[00:37:37] Like, I would just subscribe to the email list where they were published internally.
[00:37:41] So, you have this, like, stream of, like.
[00:37:43] Of disasters that are going on within the company.
[00:37:46] And you just, you know, you grab some popcorn and you pop open one of these COEs and you learn so much from that.
[00:37:52] And I think that that’s part of the secret sauce.
[00:37:55] The idea, and I don’t know if it’s like this for 100% of them, is that it’s a blameless culture sort of thing.
[00:38:02] And so, to really screw up requires that multiple people drop the ball.
[00:38:08] Yeah.
[00:38:08] And you learn so much from that sort of stuff.
[00:38:12] You know, the brand.
[00:38:13] Brownouts.
[00:38:13] You know, these lessons that you would learn from, you know, trying to recover from really large dependencies.
[00:38:20] Those things are immortalized inside some of these COEs.
[00:38:22] So, there’s some very famous outages that happened within Amazon.
[00:38:27] And, you know, there were an egg on our face.
[00:38:29] And we really, really learned those lessons through those postmortems.
[00:38:33] They’re absolutely wonderful.
[00:38:34] As a principal engineer, so far we kind of glamorized the role saying, you know, it is hard to get into.
[00:38:39] But once you’re there, you have the community, you do this really impactful work.
[00:38:43] But one of the principal engineers at Amazon who’s still there called Bhavik Kothari,
[00:38:48] he collected some things that are maybe not as glamorous or more challenging about principal engineering.
[00:38:54] He had five of these things, or five or six.
[00:38:57] I just want to go through with you and your take on this.
[00:39:00] So, first he wrote, there is this paradox of belonging, that you’re part of all teams, yet you’re part of none.
[00:39:06] What does that mean?
[00:39:08] Yeah, no.
[00:39:08] So, Bhavik was actually a peer of mine.
[00:39:13] We worked in Prime Video together.
[00:39:14] Oh, awesome.
[00:39:15] So, he’s an awesome dude.
[00:39:17] Yeah, there are all of these paradoxes.
[00:39:20] And this paradox of belonging is a really interesting one.
[00:39:26] You know, you work for the organization.
[00:39:28] You’re working cross teams, right?
[00:39:30] So, as a senior engineer, you’re embedded on a team.
[00:39:34] And, you know, you own the team’s architecture, the operations, you know, the software development lifecycle, and the design.
[00:39:41] Right?
[00:39:43] So, when you get to that next level where you’re working across teams, you kind of operate in this weird layer where, you know, you’re not on pager duty for a particular team.
[00:39:54] You have visibility across all of these teams that are there.
[00:39:58] You’re helping to guide and make decisions, but you’re literally not on the ground floor anymore.
[00:40:04] And so, you know, when you work with a particular team, you know, you might call the senior engineers or the mid-level engineers in and be like,
[00:40:11] Hey, let’s whiteboard some stuff.
[00:40:12] Like, let’s try to figure out what’s going on.
[00:40:14] You’re not on the team.
[00:40:16] You’re kind of this, like, advisor that’s sort of coming in.
[00:40:19] Right?
[00:40:20] But then, you know, maybe a director or a VP would call you in and say, like,
[00:40:24] Hey, what do I own?
[00:40:25] Like, what’s going on?
[00:40:26] Explain to me this outage or tell me why we can’t build this thing.
[00:40:30] And then you’re trying to whiteboard the architecture and the system.
[00:40:34] And you’re trying to say, like,
[00:40:35] Hey, you know, this is what’s going on on the ground floor.
[00:40:40] But you weren’t, you know, you weren’t part of that team.
[00:40:42] Right?
[00:40:42] You’re just sort of operating in this sort of strata where, you know, you don’t really belong on a team.
[00:40:48] You know, I’m an immigrant.
[00:40:50] I think you are as well.
[00:40:52] And, you know, my parents came from Asia.
[00:40:55] I’m not Asian.
[00:40:57] Right?
[00:40:57] So when I go back to Asia, I’m definitely from the U.S.
[00:41:00] And then growing up in this country is just like, you know, I’m, you know, not quite an American.
[00:41:06] Right?
[00:41:07] And so you sort of operate in this sort of, you know, area in the gaps where your identity,
[00:41:12] your identity is really defined by not being squarely in one of these predefined categories.
[00:41:18] So it’s very similar to that as a principal engineer.
[00:41:21] You’re not on the ground floor.
[00:41:22] You’re not checking in.
[00:41:23] You will check in code, but you’re not necessarily part of that team, embedded on that team.
[00:41:28] And even if you are for a short time, it’s usually a short time.
[00:41:31] And, like, tomorrow the director will call you up and say, like,
[00:41:33] Hey, Steve, we need you on this other team.
[00:41:35] They’re in trouble.
[00:41:36] Move over.
[00:41:37] Yeah, and you parachute in.
[00:41:39] And then, you know, then they’re like, oh, who’s this guy?
[00:41:41] You know?
[00:41:42] And then your director is like, what’s going on?
[00:41:45] What happened during this outage?
[00:41:47] Why is, you know, why is the press writing about us?
[00:41:50] And then you’re like, well, you know, here’s what’s happening on the ground floor.
[00:41:53] But you’re not really embedded on that team.
[00:41:57] Which leads us to the next paradox that Bhavik said.
[00:41:59] He lists a few of the paradox, which is the freedom of responsibility.
[00:42:02] And he writes that you enjoy significant autonomy in being able to choose what you work on.
[00:42:06] However, there’s an implicit expectation and accountability for resounding impact.
[00:42:11] Yeah.
[00:42:12] So, you know, I reported to a VP right before I left the company.
[00:42:18] So they were your manager, basically.
[00:42:19] Yeah, my manager was a VP.
[00:42:21] Oh, wow.
[00:42:23] That’s…
[00:42:23] I don’t hear many companies having engineers report into VPs.
[00:42:29] Yeah.
[00:42:29] That doesn’t seem very standard.
[00:42:31] You know, and so the org that he owned, you know, I considered myself the tech advisor for that organization.
[00:42:36] It was about 450 people, 450 software developers.
[00:42:41] And what did our one-on-ones consist of, right?
[00:42:45] Like when I would have our one-on-one, it wasn’t like, hey, here’s…
[00:42:49] You know, he didn’t assign me work.
[00:42:51] He wasn’t like, hey, I need you to build this thing.
[00:42:54] I need you to design this thing.
[00:42:56] The context that he set was basically like, here’s a direction, right, that you need to go.
[00:43:01] And the way that you can achieve that type of impact was up to me, right?
[00:43:08] So he might say something like, hey, availability is…
[00:43:11] So important for, you know, live sports.
[00:43:15] We just signed, you know, billion-dollar contracts with these sports leagues.
[00:43:19] And so we need to increase our availability posture.
[00:43:23] And then I would be like, okay.
[00:43:26] And then I would go away and I would come back and I would be like, you know, here’s what I’m working on, right?
[00:43:33] Like that type of dynamic does not exist at the senior engineer below level where you’re basically telling your boss what’s happening.
[00:43:41] I was about to say that when you said my manager one-on-ones, he didn’t tell me what to do.
[00:43:46] I’m like, most engineers would be like, sign me up.
[00:43:48] Like, I don’t want, you know, we all hate micromanagement.
[00:43:50] But now when you’re telling me, like, he would say like, oh, so we just signed a billion-dollar contract.
[00:43:55] Availability is important.
[00:43:56] And then stops talking.
[00:43:58] I’m like, that sounds uncomfortable.
[00:44:01] And basically, like, you’re kind of expected a little bit to, like, understand what he’s expecting, even though he doesn’t know.
[00:44:06] And then, and I’m assuming, you know, there’s two ways of going, right?
[00:44:09] You go back on the next one-on-one and you say something.
[00:44:11] And then you go back and you say something.
[00:44:11] And then you go back and you say something.
[00:44:11] And he’s like, like, Steve, like, you’re a principal engineer.
[00:44:14] This is not what I expect of you.
[00:44:16] And you don’t want that.
[00:44:17] Whereas this, you know, if you bring back the right things, it sounds like you really need to up-level in, like, understanding how, like, these people think.
[00:44:25] Absolutely.
[00:44:25] And so he’s, you know, he’s accountable to his boss as well.
[00:44:29] And, you know, don’t get me wrong.
[00:44:30] I didn’t, you know, I had a, I owned aspects of availability.
[00:44:34] You know, there’s a multi-thousand person organization at Prime Video doing this stuff.
[00:44:38] But we own the live sports aspect of this.
[00:44:41] And, you know, there are playback teams.
[00:44:44] There are, you know, recommendation teams.
[00:44:46] There are, you know, there’s so many different teams that are there that had to really step up and make sure that availability was good.
[00:44:53] But he would say something like, hey, you know, what is our availability posture for certain aspects?
[00:45:00] And I would have to go and figure it out.
[00:45:02] Yeah.
[00:45:02] Like, what are we measuring?
[00:45:04] What are we not measuring?
[00:45:05] There’s a deadline for, you know, the start of a season where we’re expecting, you know, millions and millions of concurrents.
[00:45:11] There’s a deadline for, you know, millions and millions of concurrents to come in.
[00:45:13] What can we do between now and then?
[00:45:16] Right.
[00:45:16] And then if we do write some software, like what is the highest leverage piece of software that we could create that would increase our availability posture?
[00:45:24] And so the way that I sort of describe it to people is you are assigned not a problem, not even a problem space.
[00:45:32] You’re assigned a direction.
[00:45:33] You can solve the problem with code.
[00:45:34] You can solve the problem with system design and architecture.
[00:45:38] But you could also solve the problem, say, by, you know, I don’t know.
[00:45:41] Hey, maybe there’s some off-the-shelf software we should purchase.
[00:45:44] Maybe there’s a dev team that we should start to spin up right now whose job it is to do this particular thing.
[00:45:52] Maybe we’ve identified a piece of software and it’s already been scoped that this team needs to go and build, but it’s not a priority for them.
[00:46:01] Now we need to go and figure out, like, you know, how we can get them to do it.
[00:46:05] Can we shuffle around resources?
[00:46:06] That sort of thing.
[00:46:07] And so the way I describe it is, like, there’s so many more things on the menu.
[00:46:11] That you can use to solve the problem.
[00:46:14] And I don’t think people recognize that.
[00:46:16] They think that it’s just, oh, when you’re a principal, like, you just, like, code a lot and it’s just really complicated.
[00:46:22] Or do more meetings.
[00:46:23] You know, that’s what often happens.
[00:46:24] I mean, at the end of the day, like, don’t get me wrong.
[00:46:26] There’s a ton of meetings that go on.
[00:46:28] Yeah, yeah.
[00:46:28] But this is, I think it’s good to, like, shine light.
[00:46:31] Because I also feel like once it sounds like a big change, but I also kind of feel if you get good at this, you might not really want to go back to, you know, having a man.
[00:46:41] And you’re just like, all right, here’s a project.
[00:46:43] We need to solve, like, you know, scope it up.
[00:46:45] And which you can do, right?
[00:46:46] Yeah.
[00:46:46] That’s cool.
[00:46:47] And now, the next challenge that Bhavik said was, this all sounds great, but there’s apparently a bandwidth challenge.
[00:46:52] So, it’s easy to become this, like, social resource where people just pull you into everything and you’re breathing.
[00:46:59] Yeah.
[00:47:00] No, you know, I think, I wish I had taken a screenshot.
[00:47:03] But, you know, I have my Outlook calendar, right?
[00:47:05] So, it’s my schedule.
[00:47:06] My day looked like most people’s week.
[00:47:10] So, it looked like somebody.
[00:47:11] It just, like, blew up a Tetris factory.
[00:47:14] Like, there was, like, I would have triple or quadruple booked on a Monday all through the day.
[00:47:19] So, you would have the manager calendar as an IC.
[00:47:21] Yeah.
[00:47:22] And it’s absolutely crazy because, you know, for that large org that I was supporting, everybody just added me as optional.
[00:47:29] Or they might try to say, like, no, you’re actually required for all of these meetings.
[00:47:34] But when you have a triple booked calendar and you’re required for this stuff, you just learn that you’re going to have to disappoint.
[00:47:40] You’re going to disappoint a lot of people.
[00:47:42] And so, it’s this sort of, like, you know, this thing where it’s, like, it’s almost easier to say no now that you’re obscenely overbooked versus when you’re a senior engineer.
[00:47:52] You’re, like, I don’t have time to write code.
[00:47:55] But there’s just barely enough time in between the cracks.
[00:47:59] Yeah.
[00:47:59] And so, I think that it’s almost like when your schedule breaks, that’s when you are finally freed because you know that you can sort of say no to stuff.
[00:48:08] But ultimately, if I just went to all of the meetings.
[00:48:10] That everybody said that I would have to go to.
[00:48:12] I would be a professional meeting attender.
[00:48:14] And I would literally have no time to do the work.
[00:48:17] And then, Bhavik follows up on this next challenge, which is being truly present.
[00:48:21] And he writes, I think it’s almost like, you know, he was sitting next to you.
[00:48:24] You find yourself physically present in one meeting while your mind is already racing against the next three.
[00:48:29] You know, it’s a really big challenge.
[00:48:32] You know, I pride myself on being a good communicator and being present.
[00:48:36] And when there are 20 things that are going on in the air.
[00:48:40] Or 100 things that are going on.
[00:48:42] It’s just really, really difficult to stay single-threaded.
[00:48:47] And what I ended up having to do is to sort of say like, okay, I could do all of these things.
[00:48:53] And they would be really impactful.
[00:48:54] But I just had to aggressively prioritize and say, you know, for the availability.
[00:48:59] I’m just looking at availability.
[00:49:00] There’s all these other fires that are going on.
[00:49:03] Which is disappointing.
[00:49:05] Because there’s so many things that, you know, you could be focusing on.
[00:49:08] And it’s super difficult.
[00:49:11] And so, you know, I work with a lot of people to try to get them to the next level.
[00:49:14] And they say, Steve, I’m completely overwhelmed.
[00:49:16] There are like 20 things that are going on.
[00:49:19] And I tell them, like, do you think it gets easier when you get higher level?
[00:49:24] There’s just going to be more and more things on your plate.
[00:49:27] Why wait until you burn out or you break?
[00:49:30] You can just start implementing these things now.
[00:49:31] So every high-level tech I see, I know, and managers included.
[00:49:35] They have a wonderful system in order to.
[00:49:38] Isolate signal and then cut out the noise.
[00:49:41] And if you don’t have that, you literally won’t survive.
[00:49:44] But it just at the principal level and above, it’s just amplified that much more.
[00:49:48] I’m getting a sense that a lot of the work as you do as a principal engineer.
[00:49:52] I mean, there’s huge amounts of software engineering.
[00:49:55] And you need to be, you know, just really good at building resilient systems.
[00:50:01] Learning about new technologies.
[00:50:03] You know, for example, today, I’m assuming whoever’s a principal engineer at Amazon.
[00:50:06] They’re expected to just know everything.
[00:50:08] Everything about LLMs, trade-offs, characteristics, et cetera, because they’re anyway, but you also need to just become do the skills that managers have, which is managing your time, changing contacts, finger finger on how to get that focus time.
[00:50:23] Like, you know, contrary to popular belief, like managers actually need focus time.
[00:50:27] So like, you know, I, I will also always try to carve out some time, but you’re now doing it while your title is not manager, but actually it’s, it’s, it feels like you combine a manager, a lot of managerial responsibilities.
[00:50:36] Yeah.
[00:50:37] Yeah.
[00:50:37] Yeah.
[00:50:37] Yeah.
[00:50:37] Yeah.
[00:50:37] Yeah.
[00:50:37] Yeah.
[00:50:37] Yeah.
[00:50:37] Yeah.
[00:50:38] Yeah.
[00:50:38] Yeah.
[00:50:38] Yeah.
[00:50:39] Yeah.
[00:50:40] Yeah.
[00:50:41] Yeah.
[00:50:42] Yeah.
[00:50:42] Yeah.
[00:50:42] Yeah.
[00:50:43] Yeah.
[00:50:43] Yeah.
[00:50:43] Yeah.
[00:50:43] Yeah.
[00:50:43] Yeah.
[00:50:56] Yeah.
[00:50:57] Yeah.
[00:50:57] Yeah.
[00:50:57] Yeah.
[00:50:57] Yeah.
[00:50:57] Yeah.
[00:50:57] Yeah.
[00:50:58] Yeah.
[00:50:58] Yeah.
[00:50:58] Yeah.
[00:50:59] So, so I reported to a VP, you know, one of my peers was a director and he was basically
[00:51:03] like, Hey Steve, I would like you to show up to my performance review for my entire
[00:51:07] org.
[00:51:08] of 100-something people.
[00:51:09] And I’m like, I can’t do that for you and for everybody else.
[00:51:12] Okay, so now it makes sense why, as a principal engineer,
[00:51:16] your compensation package will be similar to, like,
[00:51:19] is it a senior engineering manager or something like that?
[00:51:21] Around that.
[00:51:22] Around that, but basically, like, the job has a lot of overlaps.
[00:51:27] Okay, the benefit is you’re not the one delivering the performance reviews
[00:51:30] to the direct report, but you’re doing almost everything else
[00:51:33] in terms of the effort I’m talking about.
[00:51:36] Yeah.
[00:51:36] Okay.
[00:51:37] Okay, so having been a principal engineer for four years,
[00:51:40] what are the good things that you really, really liked about Amazon,
[00:51:44] specifically Amazon’s principal engineer role,
[00:51:46] and what are some of the, you know, not so good
[00:51:49] or it could have been better things?
[00:51:51] I mean, the great parts are you get visibility
[00:51:55] that you just couldn’t possibly have at the team level.
[00:51:58] You know, within a large organization like Prime Video
[00:52:00] or wherever you’re at, there are many thousands of people
[00:52:04] that are working within that organization.
[00:52:07] Doing so many things, right?
[00:52:09] And typically, the performance of these people is really high.
[00:52:12] There’s so many different directions that are going on.
[00:52:14] And so to survive, you kind of have to look inward.
[00:52:17] And you say, okay, well, here’s my service boundary.
[00:52:19] Here’s all the software I own.
[00:52:21] I’m going to own everything within the sphere of ownership.
[00:52:24] Because you’ve built this wall up,
[00:52:26] you tend not to be able to see, like, that broader picture.
[00:52:29] And so as a principal engineer, I think it’s really awesome
[00:52:32] to be able to sort of, like, spelunk and be able to go to different teams
[00:52:35] and sort of see what’s going on.
[00:52:36] Yeah.
[00:52:37] To be able to see that broader picture.
[00:52:38] And I just don’t see a way that you would be able to get
[00:52:41] that type of visibility that’s super interesting at a lower level.
[00:52:47] You know, I think the other thing is, like, you know,
[00:52:49] whether it’s warranted or not,
[00:52:51] you do get some amount of status when you go to a meeting.
[00:52:53] People just listen to you.
[00:52:55] They listen to your harebrained ideas.
[00:52:57] And it’s kind of nice because you don’t necessarily have to, like,
[00:53:00] prove yourself over and over again.
[00:53:02] There’s a bit less, like, professional, like, not fights,
[00:53:06] but just established.
[00:53:07] There’s a bit less establishing that you know what you’re talking about.
[00:53:09] Yeah.
[00:53:10] Yeah.
[00:53:11] Now, the bad things are, you know, there’s a lot of folks
[00:53:15] that are really good in tech and being really effective
[00:53:17] as a principal engineer.
[00:53:19] But then they also, you know, myself included,
[00:53:22] they’re like, okay, cool.
[00:53:23] Well, that sort of makes me an expert in pretty much everything.
[00:53:26] And so you would get these principal engineers together.
[00:53:29] We had a weekly meeting.
[00:53:30] And so it would be like, okay, if you wanted to talk about, like,
[00:53:33] establishing a constitution for a small island nation,
[00:53:36] you know, you’re going to have to do a lot of work.
[00:53:36] You’re going to have to do a lot of work.
[00:53:36] You’re going to have to do a lot of work.
[00:53:36] You’re going to have to do a lot of work.
[00:53:37] And all of a sudden, they would just be like, well, like,
[00:53:38] here are the main considerations.
[00:53:39] It’s like nobody has a background in government policy.
[00:53:43] But all of a sudden, like, just because you’re sort of trained to do so,
[00:53:46] you start to, like, pitch in.
[00:53:48] You’re like, well, actually, you know,
[00:53:49] maybe we should have two branches of government
[00:53:51] or three branches of government.
[00:53:52] And it just sounds like we would know what we’re doing, but we don’t.
[00:53:57] And so there’s this trap, and, again, I’ve fallen into it many times,
[00:54:01] where you actually think you’re an expert in one thing,
[00:54:05] but you’re actually not, right?
[00:54:06] Right.
[00:54:06] Right.
[00:54:06] Right.
[00:54:06] Right.
[00:54:06] Right.
[00:54:06] Right.
[00:54:06] Right.
[00:54:06] Right.
[00:54:06] Right.
[00:54:06] Right.
[00:54:06] Right.
[00:54:06] Right.
[00:54:06] Right.
[00:54:06] So, you know, take LLMs.
[00:54:08] There’s a ton of folks that understand AI.
[00:54:11] I left before it was sort of, like, allowed to use internally,
[00:54:14] but I think you can use it now.
[00:54:17] I’m not an expert in LLMs at all,
[00:54:20] but I do think that the expectation would be that you understand,
[00:54:26] you know, how they work.
[00:54:27] But then the expectation’s also like, hey, what should our policy be?
[00:54:31] How should we be thinking about this stuff?
[00:54:34] And I think that’s fine for a mature,
[00:54:36] I think that’s fine for a mature technologies, potentially, like you can ramp yourself up
[00:54:39] for it.
[00:54:40] But as like that particular landscape is changing so quickly, I think there’s this sort of trap
[00:54:45] where you sort of, you speak as an authority, even though you haven’t had the requisite
[00:54:49] time to ramp up at something.
[00:54:51] And you’ve been there for 17 years at Amazon.
[00:54:54] What are your favorite parts of the culture?
[00:54:56] Like I, you know, there’s a lot of things that there’s a values that we all know, like
[00:55:01] the frugality, customer obsession.
[00:55:03] What, what were the things that you’re like?
[00:55:05] That you found to be like the most interesting or the ones that have lasting impact and how
[00:55:10] did they change?
[00:55:11] How did Amazon change over 17 years?
[00:55:13] They must have changed.
[00:55:14] No, I, I think the, the things I missed the most and, and the secret sauce, yeah,
[00:55:19] the, the leadership principles are good, but I think the actual secret sauce there is principled
[00:55:25] thinking.
[00:55:26] Right.
[00:55:27] And so, yeah.
[00:55:28] So, you know, there’s, you know, invent and simplify and bias for action and all of this
[00:55:32] stuff.
[00:55:33] But.
[00:55:34] The thing that is amazing about those leadership principles, aren’t the specific stances that
[00:55:40] they took.
[00:55:41] So they decided that customer obsession is a big deal.
[00:55:43] They decided that bias for action is a big deal, all of these things.
[00:55:47] But really, if you, if you looked at a meta level, you’d be like, oh, these guys have
[00:55:51] principles that they won’t budge on.
[00:55:53] I sort of think about it in terms of math and axioms, like you just take certain things
[00:55:58] to be true.
[00:55:59] You know, two lines that are parallel, if you extend them out to infinity, won’t touch
[00:56:03] them.
[00:56:04] They won’t touch with each other.
[00:56:05] Yeah.
[00:56:06] You assume that’s true.
[00:56:07] Yeah.
[00:56:08] You, you don’t, you don’t prove that.
[00:56:09] It’s an axiom.
[00:56:10] And then based off of that, you’re able to build a system of mathematics.
[00:56:13] Right.
[00:56:14] And so it’s the same thing with the corporate leadership principles at Amazon.
[00:56:18] They basically said, okay, we are going to fix these things to be true.
[00:56:23] There are 16 or 12, or I don’t know, they just sort of bolted some on.
[00:56:25] They were 14 and now they’re 16.
[00:56:28] And but there are like four or five that are just really core to, to Amazon.
[00:56:33] And we just fix those things to be true.
[00:56:35] Which, which ones were the ones that you felt were the most present?
[00:56:36] Customer obsession.
[00:56:37] We are absolutely customer obsessed.
[00:56:38] We’ll just burn money to the light of customer.
[00:56:39] You can, you can be in a meeting with a VP as an intern and you say, Hey, that’s a bad
[00:56:40] customer experience.
[00:56:41] It would be like a needle coming off a record.
[00:56:42] It would just be like, what, what are you talking about?
[00:56:43] Like immediately.
[00:56:44] Right.
[00:56:45] You know, bias for action.
[00:56:46] So like, just get some stuff done.
[00:56:47] Stop asking for permission.
[00:56:48] Just like go and do it.
[00:56:49] Right.
[00:56:50] Okay.
[00:56:51] Okay.
[00:56:52] Okay.
[00:56:53] Okay.
[00:56:54] Okay.
[00:56:55] Okay.
[00:56:56] Okay.
[00:56:57] Okay.
[00:56:58] Okay.
[00:56:59] Okay.
[00:57:00] Okay.
[00:57:01] Okay.
[00:57:02] Okay.
[00:57:03] Okay.
[00:57:04] Okay.
[00:57:05] Okay.
[00:57:06] Okay.
[00:57:07] Okay.
[00:57:08] Okay.
[00:57:09] Okay.
[00:57:10] Okay.
[00:57:11] Okay .
[00:57:24] Right.
[00:57:25] Okay.
[00:57:26] Okay.
[00:57:27] All right.
[00:57:28] Cool.
[00:57:29] Cool.
[00:57:30] Yep.
[00:57:31] I feel like we got this.
[00:57:32] It worked out okay.
[00:57:33] or not being customer obsessed.
[00:57:36] I think it’s, you know, like being about your staff.
[00:57:39] Yeah.
[00:57:39] Which is Google.
[00:57:41] It could be like, hey, we really care about our people above everything else.
[00:57:45] Or it could be, you know, let’s not mince around it.
[00:57:48] We care about top line or bottom line revenue.
[00:57:50] Yeah.
[00:57:51] That’s totally valid, right?
[00:57:53] And then you could just fix that.
[00:57:54] You can’t prove that, you know, being, you know, staff focused is a bad thing.
[00:57:59] You just build that.
[00:58:00] And then, you know, a certain set of things will happen.
[00:58:02] Like great things are going to happen.
[00:58:04] And then like not so great things are going to happen.
[00:58:06] Those not great things that happen, you can try to mitigate them, but you can’t fix them
[00:58:10] because you have started with this principled approach to everything.
[00:58:14] Yeah.
[00:58:14] Yeah.
[00:58:15] It all goes like everything has.
[00:58:18] Yeah.
[00:58:18] I see what you mean.
[00:58:19] But I think what you’re saying is like it might be less about what the specific principles are.
[00:58:24] I mean, Amazon has theirs and we know about them, but it’s just sticking to them and not
[00:58:28] keeping wiggling because if you keep wiggling, it’s like, what was the point, right?
[00:58:32] Then you’re going to have a really kind of mediocre, not truly not standout company, whatever you do.
[00:58:38] What does it actually mean to be principled and to not bend?
[00:58:41] It could be really easy to do so.
[00:58:43] So that’s an amazing secret sauce of Amazon’s.
[00:58:46] People look at the leadership principle.
[00:58:47] I’m like, no, it’s principled thinking.
[00:58:49] Another thing.
[00:58:49] A lot of this, honestly, from what I understand, talking to you earlier and some other people,
[00:58:53] a lot of it probably comes from Jeff Bezos being from a top down, being very principled
[00:58:58] on not giving, not saying we will do this, whatever.
[00:59:02] It takes sounds like it was customer obsession initially and then some other things.
[00:59:06] Yeah.
[00:59:06] Yeah, absolutely.
[00:59:07] And he’s he was he was an absolute genius when it came through.
[00:59:11] So I’m a, you know, I’m a Jeff Bezos fanboy for sure.
[00:59:15] Like it just it just worked.
[00:59:17] Another thing that that’s Amazon secret sauce is just the writing culture.
[00:59:22] And so, you know, I spent on the order of like one to four hours every day reading while
[00:59:28] I was a principal engineer.
[00:59:30] And it was.
[00:59:31] We had a standard format.
[00:59:32] It was a it was a six page memo.
[00:59:35] And, you know, that would be our business strategy.
[00:59:38] That would be a system design.
[00:59:40] That would be, you know, what we call the PR FAQ.
[00:59:44] So a press release and frequently asked questions for like a new line of business or a new
[00:59:48] initiative.
[00:59:49] And everybody was sort of constrained to the six page format.
[00:59:53] And everybody just produces documents in that format for whatever they need to do.
[00:59:57] And so when I would try to get up to speed on a particular.
[01:00:01] Thing, I would just be like, give me your six pagers, give me all your documents.
[01:00:05] And I just got really, really good at just reading these documents to get up to speed,
[01:00:10] which is a self-fulfilling and virtuous cycle, which is just like, OK, well, now I need
[01:00:15] to express myself.
[01:00:16] And so I will write a six pager and that will set the context for whatever we’re working
[01:00:20] on.
[01:00:21] We’d go to a meeting.
[01:00:22] You would read the six pager.
[01:00:24] And it was just super great to to just actually just have people do study hall at the beginning
[01:00:30] part of a meeting.
[01:00:30] Mm hmm.
[01:00:31] Where you just everybody just gets fast forwarded and then you have a really great discussion
[01:00:35] at the end.
[01:00:36] That is what an amazing culture that I think that almost every other company should replicate
[01:00:42] if they could.
[01:00:44] But I think that the difficulty would be like you actually have to be disciplined and
[01:00:48] actually have a reading cult and principled and have a reading culture and then actually
[01:00:53] value writing.
[01:00:55] Yeah.
[01:00:55] I almost wonder if unless it comes from the top, some of these things might just be really,
[01:00:59] really hard to do.
[01:01:00] Yeah.
[01:01:00] So one thing that I figured is we’re in your studio right now and you have a lot of these
[01:01:06] blocks.
[01:01:07] And I asked them what they are.
[01:01:08] Are they for promotions or projects or whatever?
[01:01:10] They’re for patents.
[01:01:12] Yeah.
[01:01:13] And this is for a patent number 10,000, 10,824,964.
[01:01:19] Can you tell me about why you have these, how they come about?
[01:01:23] Yeah.
[01:01:24] What you needed to do for them?
[01:01:25] So the highest order bit is like, you know, for better or for worse, they’re a software
[01:01:30] patent.
[01:01:30] They’re a software patent that exists.
[01:01:32] Amazon, they’ll say that basically the reason they have them is defensively because, you
[01:01:37] know, other people will assert that, hey, you’re in violation of our patents or our
[01:01:42] IP.
[01:01:43] And then, you know, we’ll use them reactively.
[01:01:45] Okay, fine.
[01:01:46] But, you know, you’re also in violation of these other things.
[01:01:49] And so, you know, there’s a, there is a culture of, of trying to make sure that, you know,
[01:01:54] we protect ourselves in that way.
[01:01:56] But, you know, there’s the other part of software patents, which is basically like, hey, can
[01:01:59] you really patent?
[01:02:00] Like math or whatever.
[01:02:02] And so what I learned over time is that, you know, I’m just a really bad IP lawyer, even
[01:02:06] though, you know, as a principal engineer, I might cause play as somebody that really
[01:02:10] understands software patents, right?
[01:02:12] At the end of the day, you know, what we would do is we would take our important six
[01:02:17] pagers and we would hand them over to the legal team.
[01:02:19] And then they would just be like, oh, this stuff is really interesting.
[01:02:22] Like, let’s explore that.
[01:02:23] And so it, it turned into this awesome thing where like, we just had ready inputs to go
[01:02:28] into like the.
[01:02:30] You know, into that particular system.
[01:02:32] A writing culture turns out has a bunch of benefits.
[01:02:35] Exactly.
[01:02:36] And, and I think the, the, there’s this sort of like, it’s the concept is called like the
[01:02:40] curse of knowledge, which is essentially like, if you understand something, you discount
[01:02:44] how long, you know, like how easy that concept is.
[01:02:48] Yep.
[01:02:48] And so it’s just like, you don’t get it, you don’t get it, you don’t get it.
[01:02:51] And then you get it.
[01:02:52] And then you’re like, oh, that’s trivial.
[01:02:54] Right.
[01:02:54] Even though, you know, there could have been, you know, it could actually be novel or it
[01:02:57] could actually be interesting.
[01:02:59] Yeah.
[01:02:59] Yeah.
[01:02:59] Yeah.
[01:02:59] Yeah.
[01:02:59] Yeah.
[01:02:59] Yeah.
[01:02:59] Yeah.
[01:02:59] Yeah.
[01:02:59] Yeah.
[01:02:59] Yeah.
[01:02:59] Yeah.
[01:02:59] Yeah.
[01:02:59] So what ends up happening is that you would just throw these documents over to the lawyers
[01:03:03] and then they would basically be like, oh, this stuff is great.
[01:03:06] And you would just be like, well, that’s just, that’s just regular software development
[01:03:10] or that’s just the context and domain that we were living in.
[01:03:12] You know, it turns out that there’s some, some interesting stuff.
[01:03:15] This particular patent I’m, I’m, I’m proud of.
[01:03:17] So there’s a, a system design interview question that seems to be popular right now, which
[01:03:23] is like design ticket master.
[01:03:25] Right.
[01:03:25] And so I worked on Amazon tickets and, you know, we ended up shuttering that business,
[01:03:29] but, you know, we ended up building like one of the world’s fastest, like ticket selling
[01:03:33] systems, like in the world, right?
[01:03:35] We can do many, many orders per second.
[01:03:38] So the use case is basically at T zero, that’s, you know, for a really big ticket on sale,
[01:03:42] that that’s when the maximum amount of demand and requests are coming in.
[01:03:47] And you want to sell out all of your ticket supply as quickly as possible.
[01:03:52] The problem is I think one where you have seated concerts.
[01:03:58] Mm-hmm.
[01:03:58] And so when you purchase a ticket, you know, most of the time with the system design stuff,
[01:04:04] it’ll be like general admission or it won’t be a high ticket on, you know, like one with
[01:04:08] a bunch of demand.
[01:04:10] You have to find contiguous seats.
[01:04:12] Yeah.
[01:04:12] So the ones that are really quick next to each other.
[01:04:14] Yes, exactly.
[01:04:16] And so, you know, it’s, it’s actually really hard.
[01:04:20] Like suppose it was a SQL database as your backing store.
[01:04:23] Like, how do you come up with a SQL query?
[01:04:25] That’s just like, Hey, give me the best fork.
[01:04:28] You know, within this particular price range that are sitting, sitting next to each other.
[01:04:33] Now, now you’re thinking, so this is a real, real world thing where you need to,
[01:04:37] you want to be as efficient as possible in terms of resource usage.
[01:04:41] May that be, maybe you want to minimize your CPU or memory depending on, on what you have,
[01:04:44] I assume.
[01:04:45] And you need to do this quick, as rapidly as possible to give this to people.
[01:04:50] Okay.
[01:04:51] So, so now we’re talking about a problem that is, seems like pretty novel in some ways,
[01:04:56] right?
[01:04:56] Yeah.
[01:04:57] And so, you know, I was, I,
[01:04:58] I did this patent with a senior principal.
[01:05:00] I was a senior engineer at the time, but the, the idea is like, you know, what is the theoretical
[01:05:06] maximum speed by which we could, you know, show this inventory to people.
[01:05:12] And it turns out that, you know, even if you have a high ticket on sale, you only have
[01:05:17] like thousands of tickets at the end of the day.
[01:05:19] Yeah.
[01:05:20] So instead of making a request to like a backend that would conduct some sort of search across
[01:05:25] the space.
[01:05:26] Yeah.
[01:05:27] What if you actually inverted it and then you basically had each of the individual hosts
[01:05:32] have like some view on the entire arena or a venue that was there and you loaded up all
[01:05:40] of that availability and inventory into like L2 cache on a CPU.
[01:05:44] Yeah.
[01:05:45] Because it’s actually not that many.
[01:05:46] So if you have this compact representation.
[01:05:47] Yeah, yeah.
[01:05:48] We’ll kind of cache it pretty big.
[01:05:49] Yeah.
[01:05:49] Then what you can do is you can, you can do bit manipulation to like really, really quickly
[01:05:55] get contiguous seats.
[01:05:56] This seats that are there.
[01:05:58] And then what you do is you can like send in that particular requests and try to like
[01:06:03] reserve those particular seats.
[01:06:04] Now is it a logging problem?
[01:06:06] Which is much more tractable than like, Hey, there’s a, you know, a 2 million people that
[01:06:13] have just hit your onset on this page.
[01:06:15] And each of them, I’m going to search for each of them.
[01:06:17] Yes.
[01:06:18] So the, the inversion of that ordering process, what, by which you like actually send out
[01:06:22] the inventory to the individual nodes.
[01:06:24] And then like.
[01:06:26] Load it up into CPU cache and then just do bit manipulation.
[01:06:29] And then try to lock that resource from the individual nodes.
[01:06:33] That was, that was the basis of this particular patent.
[01:06:36] Awesome.
[01:06:37] That’s clever.
[01:06:38] And like, that sounds like some, you know, people are always asking like, Oh, you know,
[01:06:42] on my job, I don’t use the algorithm stuff or any of the formal methods.
[01:06:46] It sounds like there are some uses of it, especially when you’re trying to figure out
[01:06:50] what is it like when you just taking away from the patent, like just having a problem
[01:06:54] like, like, like this and saying like.
[01:06:56] Like, what is the theoretical limit that we can do?
[01:06:58] What is the fastest possible, like to answer that you probably want to have access to these
[01:07:02] tools, like, you know, like, so it’s, it’s not always a time and effort to actually get
[01:07:07] into these things.
[01:07:08] And so what are you up to now that you’ve, you’ve, you’ve left Amazon a year ago after
[01:07:14] like 17, 18, very long years.
[01:07:16] You know, I’m just, you know, I’m, I’m just making content.
[01:07:19] I’m just sort of living the dream there, you know, making YouTube videos.
[01:07:22] It started up a newsletter.
[01:07:24] I’ve had discord.com.
[01:07:25] Yeah.
[01:07:25] Yeah.
[01:07:25] Yeah.
[01:07:25] Yeah.
[01:07:25] Yeah.
[01:07:25] Yeah.
[01:07:25] Yeah.
[01:07:25] Yeah.
[01:07:25] Yeah.
[01:07:25] Yeah.
[01:07:25] Yeah.
[01:07:25] Yeah.
[01:07:25] Yeah.
[01:07:25] Yeah.
[01:07:25] Yeah.
[01:07:25] Yeah.
[01:07:26] Yeah.
[01:07:27] Yeah.
[01:07:28] And we’re, we’re going to link all, all of those below.
[01:07:30] I actually like got to first know you before we started talking.
[01:07:33] This was probably a few years ago from your YouTube videos, which are, you know, you know,
[01:07:38] like you, you shared a lot about like Amazon things, software engineering things, and just
[01:07:42] like your general thinking, but yeah, your newsletter is a new one.
[01:07:45] So I’m, I’m, we’ll, we’ll link it in the show notes below.
[01:07:48] It’s, it’s always a good way to keep in touch and also, you know, like on your YouTube
[01:07:51] channel.
[01:07:51] Awesome.
[01:07:52] So as closing, I have some, some rapid questions.
[01:07:55] Okay.
[01:07:55] So I’ll, I’ll just ask and you just shoot what comes to mind.
[01:07:57] What is career advice that greatly helped you in your path?
[01:08:02] Yeah.
[01:08:02] I mean, this is, I, you know, I, I talk a lot about this.
[01:08:05] It’s kind of like, oh, what’s, what’s your favorite food or your favorite movie?
[01:08:08] It’s just like, there’s so much there and it’s hard to pick one.
[01:08:11] What I would say is instead of saying like, Hey, what’s the technology that I should learn
[01:08:16] that’s really gonna, you know, make my career, you know, solid instead sort of flip it around
[01:08:23] and say like, how can I quickly learn?
[01:08:24] How can I quickly learn skills that makes you, that makes you sort of like recession
[01:08:30] proof, right?
[01:08:31] That, that sort of makes you valuable.
[01:08:32] It’s essentially meta learning.
[01:08:33] It’s like, how can I learn something faster and faster?
[01:08:36] If, if that’s your focus, then you’ll always be, you’ll never have a problem finding a
[01:08:42] job and you’ll never have a problem progressing in your career.
[01:08:46] Now, some of the skills may be difficult to find resources on online, but you know, I
[01:08:51] think if you just sort of think about like, what’s a valuable skill that if, if you’re
[01:08:54] I knew right now would, you know, make my, you know, job search easier or would like
[01:09:00] make me, you know, perform better on the job.
[01:09:03] And then just sort of thinking about acquiring that skill as quickly as possible.
[01:09:07] And do it now.
[01:09:08] Like, don’t wait.
[01:09:09] Yeah.
[01:09:09] Well, people tend to postpone themselves.
[01:09:11] They’ll be like, Oh, well I’ll start when, you know, everything is lined up.
[01:09:15] But like to begin, you just need to begin.
[01:09:18] Like when you start something that only then will you know what you need to do instead
[01:09:22] of saying like, Oh.
[01:09:24] I need to get everything that I need to do first before I start.
[01:09:27] You’ll use a lot of programming languages.
[01:09:29] Which one’s your favorite and why and which one you do dislike most?
[01:09:33] Yeah.
[01:09:34] You know, I, you know, I, I have like a, you know, obviously there’s no perfect
[01:09:38] programming language.
[01:09:39] Um, what I would say is like, I really enjoyed Pearl and nobody would ever give that
[01:09:46] answer, but I just like this concept of like, there’s just so many different ways to do
[01:09:51] it.
[01:09:51] It’s a, it’s a right only language.
[01:09:52] Like you can’t read anybody else’s Pearl.
[01:09:54] And I, it’s, it’s actually one of the languages that like uses up the most power.
[01:09:58] It’s like the least efficient.
[01:10:00] It’s interpreted.
[01:10:01] It’s, it’s just like terrible.
[01:10:03] Also, most of booking.com still runs out or some of it.
[01:10:06] Yeah.
[01:10:07] Amazon’s backend was, you know, for a long time, it still might be, um, you know, sort
[01:10:11] of like Pearl Mason is sort of like a web technology bolted onto Pearl, but I just kind
[01:10:15] of like it.
[01:10:16] I just feel like I can express myself and there’s just like, there’s just what, however
[01:10:20] you’d like to express yourself, you can.
[01:10:21] Um, it also looked like an ASCII factory.
[01:10:24] Blue.
[01:10:24] Up sometimes.
[01:10:25] And so it’s just like, it’s, it’s, you know, now that it’s on a podcast, you know, I wouldn’t
[01:10:29] really, you know, advertise that fact.
[01:10:32] The best programming languages right now, I think rust is pretty interesting.
[01:10:35] So I might, you know, pick that up.
[01:10:37] Um, at the end of the day, like I really love the boring languages.
[01:10:42] Yeah.
[01:10:43] Um, so, you know, Java with, you know, for all of its stuff, like it’s verbosity and
[01:10:48] I think it’s just a great link.
[01:10:50] I think a JVM based language, um, that has.
[01:10:54] Essentially like great, like library support and a bunch of stuff written for it, but it’s
[01:10:59] just like super boring.
[01:11:00] Maybe it’s just cause I’m from Amazon and we do this like enterprise stuff.
[01:11:03] Like it’s a fine language.
[01:11:05] And then I see your, you, you have a large bookshelf here.
[01:11:08] You also read a lot as, especially at Amazon, all the most internal documents.
[01:11:12] What is a book that you would recommend?
[01:11:14] So something around software engineering that, that you enjoyed and it cannot be that book.
[01:11:18] It can’t be your book.
[01:11:19] Um, what I would say is, you know, uh, you know, I just given the advice about, you know,
[01:11:23] meta learning and, and career growth.
[01:11:25] I, I think that most software developers should read a book by Cal Newport.
[01:11:29] It’s called so good.
[01:11:30] They can’t ignore you.
[01:11:31] And so the concept there is around career capital.
[01:11:33] So like, what are the skills that are in the most demand?
[01:11:36] And if you can just like learn those skills, then you become in demand.
[01:11:40] And then, you know, from there you can choose what type of lifestyle that you’d like.
[01:11:44] You know, you can also like sort of lean into, you know, some of the science of meta learning.
[01:11:49] So deliberate practice, you know, you can also like sort of lean into, you know, some
[01:11:52] of the science of meta learning.
[01:11:53] You know, practice space repetition and that sort of thing.
[01:11:55] Um, in terms of like tech books, I think the new, uh, AI engineering book, uh, by chip
[01:12:01] point is, is amazing.
[01:12:03] Um, I think, uh, DDIA, so, uh, the, the, the design of data intensive.
[01:12:09] It’s so good.
[01:12:10] A new, new version is coming the end of the year actually.
[01:12:12] Yeah.
[01:12:13] And I’m excited about that.
[01:12:14] I think that’ll be pretty good.
[01:12:15] Um, but you know, at the end of the day, like you don’t want one book on your bookshelf,
[01:12:19] you want 50 books on your bookshelf.
[01:12:21] Um.
[01:12:22] So, you know, I think within a particular sub genre of tech books, you know, I’d have
[01:12:28] recommendations there, but.
[01:12:29] Steve, this was great.
[01:12:30] Awesome.
[01:12:31] Really enjoyed it.
[01:12:32] Yeah.
[01:12:33] Great.
[01:12:34] Thanks so much for having me.
[01:12:35] Thanks a lot for Steve, for sharing all these details.
[01:12:36] Although Amazon’s principal engineering level feels surprisingly difficult to get promoted
[01:12:40] to, I have yet to hear of such a strong principal engineering community than what Amazon builds
[01:12:44] and keeps investing in.
[01:12:45] This community itself could be a reason enough to consider the company after the principal
[01:12:50] plus level.
[01:12:51] Should you have the opportunity to do so?
[01:12:53] For a deep dive into Amazon’s engineering culture, including the details on compensation,
[01:12:57] career ladders, performance reviews and engineering processes, check out the Pragmatic Engineering
[01:13:02] Deep Dive linked in the show notes below.
[01:13:04] If you’ve enjoyed this podcast, please do subscribe on your favorite podcast platform
[01:13:08] and on YouTube.
[01:13:09] This helps more people discover the podcast and a special thank you if you leave a rating.
[01:13:13] Thanks and see you in the next one.