Adding tests to a big untested codebase - Where do I start?


Summary

The episode addresses a common challenge faced by development teams: how to begin adding tests to a large, untested codebase. The host, Jonathan Cottrell, argues against the intuitive approach of refactoring the code first to make it more understandable, as this can be dangerous without a safety net. He emphasizes that in a complex system where you cannot hold all the information in your head, testing becomes critical before any significant changes.

Cottrell distinguishes between integration tests (or end-to-end tests) and unit tests, stating that the first tests to write for an untested codebase should be integration tests. These tests verify the critical pathways—the essential user journeys and functionalities of the application, such as loading a home page or logging in. This provides a baseline verification that the most important parts of the system still work, creating a safety net for subsequent refactoring.

The key takeaway is that functionality is paramount; beautiful, well-designed code that breaks existing functionality makes the project worse off. By starting with integration tests around agreed-upon critical pathways, teams can then refactor with confidence, knowing they have a warning system if something breaks. The process likely involves refactoring code to make it more testable for future unit and integration tests, but the initial focus must be on securing the core user experience.


Recommendations

Tools

  • Tea Break Challenge — A website offering daily soft skills challenges for developers, mentioned as a free resource for personal and career growth.

Topic Timeline

  • 00:00:00Introducing the challenge of testing a large untested codebase — The episode sets up a common scenario: a team recognizes the need to add tests to a large, untested legacy codebase. The host poses the central question: where do you start when faced with this mountain of work? He frames the discussion as a practical guide for improvement, not judgment of past decisions.
  • 00:02:34Why refactoring first is a dangerous approach — The host explains that the intuitive approach—refactoring code to make it more understandable before adding tests—is like walking blindfolded through a minefield. For small projects, this might be feasible, but for sufficiently complex systems where you can’t hold all the code in your head, testing becomes a critical prerequisite for safe refactoring.
  • 00:04:46The importance of starting with integration tests — Cottrell introduces the two key test types: integration (end-to-end) and unit tests. He strongly advocates that the first tests to write for an untested codebase are integration tests. These tests verify the critical pathways—the essential user journeys and functionalities—providing a safety net that the most important parts of the application still work.
  • 00:06:53Functionality over beautiful code — The host emphasizes that the primary value of code is its function. A perfectly designed class that breaks existing functionality makes the project worse, not better. Integration tests ensure that functionality remains intact as you begin the refactoring process, allowing you to improve the code’s design without regressing user experience.
  • 00:08:03Identifying and building tests for critical pathways — The practical step is to have discussions with your team to identify the application’s critical pathways (e.g., loading the home page, logging in). Build integration tests around these agreed-upon critical functions. These tests should remain passing throughout the refactoring process, which will likely involve making the code more testable for future unit and integration tests.

Episode Info

  • Podcast: Developer Tea
  • Author: Jonathan Cutrell
  • Category: Technology Business Careers Society & Culture
  • Published: 2019-01-11T10:00:00Z
  • Duration: 00:10:49

References


Podcast Info


Transcript

[00:00:00] Imagine yourself sitting in a stand-up meeting or a sprint planning meeting or whatever kind

[00:00:11] of meeting that you have with your team and discussing an initiative.

[00:00:19] Your team has taken on a legacy code base or perhaps you’ve run really fast at creating

[00:00:26] a product and unfortunately as a part of that, the team didn’t add tests.

[00:00:35] And for whatever reason, we aren’t here to judge why the team didn’t add tests, we aren’t

[00:00:40] here to judge the legacy code, but we are here to improve.

[00:00:46] And so as you’re sitting in this meeting, you or maybe a team member, maybe the whole

[00:00:52] team brings up the need to increase your test coverage from zero to anything really.

[00:01:01] And now you are faced with the mountain in front of you of introducing tests into an

[00:01:11] untested large code base.

[00:01:15] Where do you start?

[00:01:16] That’s what we’re talking about in today’s episode.

[00:01:18] My name is Jonathan Cottrell and you’re listening to Developer T.

[00:01:21] My goal on this show is to help driven developers connect to their career purpose and do better

[00:01:25] work so they can have a positive influence on the people around them.

[00:01:29] Today’s episode is a practical episode.

[00:01:33] This is not as much about the things that we always talk about on the show, but we will

[00:01:38] end up talking about how this approach, what we’re getting ready to talk about in approaching

[00:01:45] it in untested code base and introducing tests, how this affects or is affected by

[00:01:51] the way that you approach this and the mindset that you have during that approach.

[00:01:57] Before we get started, I want to mention tbreakchallenge.com.

[00:02:01] If you haven’t signed up yet, these are daily soft skills challenges that I’m writing and

[00:02:06] releasing on a daily basis and I encourage you to go and subscribe.

[00:02:10] It’s totally free.

[00:02:11] Of course, you’ll get an email when you subscribe every single day.

[00:02:16] You’ll get an email with those challenges in it.

[00:02:19] You can also find the challenges on tbreakchallenge.com and the newest one goes up on that homepage

[00:02:26] every day.

[00:02:27] Go and check it out, tbreakchallenge.com.

[00:02:29] Okay, so how do you approach this problem?

[00:02:34] The difficulty is that our intuition says, well, the first thing that we need to do is

[00:02:39] make this code easier to understand.

[00:02:43] We need to figure out how to refactor this code into a more maintainable state.

[00:02:50] We can see all kinds of code smells in every file that we open up and so only once we’ve

[00:02:56] gotten the code to where we can understand it better, where we can wrap our heads around

[00:03:01] it, then we can introduce tests, right?

[00:03:04] We need to improve the code first and the problem with this, the problem with this is

[00:03:09] that if you try to go down this road, you are kind of walking in a minefield blindfolded

[00:03:17] and without any armor.

[00:03:20] The reality is when you put yourself in this situation, changing code becomes dangerous.

[00:03:26] You’ll notice that I mentioned that this is a large code base and that’s an operative

[00:03:30] and important part of this discussion.

[00:03:33] Sometimes when we have very small projects that are untested, there may be a legitimate

[00:03:39] reason for how it ended up that way, number one, and number two, because the project is

[00:03:45] so small, it might be a little bit easier to approach it from that refactor first perspective

[00:03:53] because you can hold all the necessary information in mind.

[00:03:57] You can see all of the different pieces of the code from the top down, but when you get

[00:04:02] into a project with sufficient complexity, when you have enough information that you

[00:04:08] can’t hold it all in your mind, you can’t memorize it all, testing becomes critical

[00:04:16] before any kind of refactoring.

[00:04:21] In other words, the only way that you can reasonably begin to refactor is if you have

[00:04:29] some kind of validations, some kind of testing to warn you to identify when you’ve written

[00:04:39] some kind of refactoring code that breaks something that previously was working.

[00:04:44] But here’s the key.

[00:04:46] There’s different types of tests.

[00:04:47] If you’ve written software for very long, then you know this.

[00:04:51] The most important for the sake of this discussion are two different types.

[00:04:55] One is an integration test, or also known as an end-to-end test, and the other is a

[00:05:01] unit test.

[00:05:02] The unit test is going to test more specific pieces of your code to ensure that they are

[00:05:09] working as they should.

[00:05:11] A unit test might, for example, test a specific class and its public methods.

[00:05:17] So that’s kind of a testing 101 idea.

[00:05:21] When you approach a large code base that is untested, the first types of tests that you

[00:05:28] need to run are integration tests.

[00:05:32] This is a strong opinion that I have, and there’s certainly valid differing opinions

[00:05:37] and different approaches that other people may take, but here’s the reality.

[00:05:42] If you have integration tests that verify that your code still does the important and

[00:05:50] critical jobs, the critical pathways that users need to take in your projects, if you

[00:05:59] have integration tests that walk through those critical pathways and verify that whatever

[00:06:05] is on those paths is working, regardless of how ugly those building blocks may be, you

[00:06:13] start from a position of having verification that the most important thing in your code

[00:06:21] remains intact, and that is functionality.

[00:06:25] If you go down the road of refactoring and you write a perfect class that follows all

[00:06:31] of the solid principles and everything is beautiful from the perspective of software

[00:06:38] design, object-oriented design, or even functional programming design, if you walk down that

[00:06:45] road and then something breaks, it doesn’t really matter how good that design is.

[00:06:53] If you break the code and you leave it less functional than it was before, but with a

[00:07:00] beautiful class, then that project is worse off, and this is something that we all kind

[00:07:06] of know.

[00:07:07] This isn’t surprising news. This isn’t new information for you as a developer, but it

[00:07:13] is something that we often don’t practice. It’s kind of a weird juxtaposition. We believe

[00:07:20] that the value of our code is disconnected in some way from the point, the function of

[00:07:29] the code.

[00:07:30] Now, that’s not to say that your work on that class couldn’t be leveraged into high-level

[00:07:37] high value, right? If you cleaned up whatever it is that you broke, then certainly you have

[00:07:43] better off code, and that can be a win. But if you break something and you don’t fix it,

[00:07:51] then it doesn’t really matter how good the code is that is breaking things.

[00:07:55] So as you approach a code base that is large, or at least sufficiently complex, and doesn’t

[00:08:03] have tests, the first tests that you need to write are the integration tests that verify

[00:08:09] all of those critical pathways. You can probably Google critical pathways. Some of the most

[00:08:15] important ones, of course, you should be thinking about the most commonly used things like loading

[00:08:20] a home page, that’s a critical pathway, logging in, logging out may be a critical pathway,

[00:08:26] or actually I’ve seen that a lot of people don’t identify logging out as a critical

[00:08:32] pathway, but they do identify logging in as a critical pathway.

[00:08:38] So there’s some thought experiments you can have, some good discussion that you should

[00:08:41] have with your team about which things are most critical in the application, which things

[00:08:46] should be kind of elevated, and what kind of edge case support and coverage do you want

[00:08:53] to have for those critical pathways, and build your tests around the answers that you have

[00:08:58] in those discussions. Build those initial tests. And here’s the thing, those tests should

[00:09:04] retain their working status as you go through the refactoring process. Now it’s going to

[00:09:11] be important that you refactor code to the degree that you can test it. It’s possible

[00:09:17] that your product only requires those integration tests. This is very unlikely, but it may be

[00:09:24] that you can write those integration tests and then kind of stand back and things are

[00:09:29] fine, but it’s much more likely that you’re going to need to adjust your code in some

[00:09:35] way. If your code was not written with tests in the beginning, then it’s very likely that

[00:09:40] you’ll be doing some refactoring to make that code more easily tested in those unit tests

[00:09:46] and further integration tests into the future. Thank you so much for listening to today’s

[00:09:51] episode of Developer Tea. This was an unusually concrete kind of episode of this show, but

[00:09:58] if you are interested in this kind of content as well as less concrete information like,

[00:10:04] for example, practicing those soft skills that are so critical to your career and your

[00:10:09] personal growth and finding your purpose, then I encourage you to subscribe in whatever

[00:10:15] podcasting app you’re listening to this episode with right now. Also, another quick plug for

[00:10:20] the Tea Break Challenge. This is teabreakchallenge.com. These are soft skills exercises that you can

[00:10:27] do on a daily basis every single day of this year. Go and check it out, teabreakchallenge.com.

[00:10:32] You can get those delivered directly to your inbox. You can also find them on Twitter at

[00:10:37] Developer Tea. Thanks so much for listening, and until next time, enjoy your tea.