Taming the Beast

Taming the Beast

Pragmatic, Proactive Debt Management at Procore R&D

Tim Doherty is Staff Software Engineer at Procore Technologies, where he’s tirelessly championing TDD and sustainable development. He also runs the Santa Barbara JavaScript Meetup and speaks at technical conferences.

Background

I’ve been building and shipping software professionally, in one capacity or another, for a little over two decades. In that time, I have had the good fortune to work with some very capable, passionate teams across a variety of business domains. One challenge all of those teams met, with varying degrees of success, is what’s commonly referred to as technical debt.

The idea of technical debt as a financial metaphor was first described by Ward Cunningham, ironically while working on financial software. The financial debt metaphor is a powerful tool since it’s so easily understood. Most of us carry some kind of financial debt, commonly in the form of a home mortgage. Borrowing is often the only way to make such a large purchase, but it’s a worthwhile tradeoff since the value of owning a home easily offsets the cost of the loan. To be clear, this is true in software development as well, incurring a little technical debt can help quickly prove the viability of a product or feature.

What would happen though, if we decided not to pay our mortgages next month so that we could have more spending money? For two months? Three? The idea is absurd, since the inevitable consequences would include late fees, calls from bill collectors, and eventually the loss of our homes and ruined credit.

Unpaid debt in software engineering is just like that. To quote Ward Cunningham, “Shipping first time code is like going into debt. A little debt speeds development so long as it is paid back promptly with a rewrite… The danger occurs when the debt is not repaid. Every minute spent on not-quite-right code counts as interest on that debt. Entire engineering organizations can be brought to a stand-still under the debt load of an unconsolidated implementation”

Framing the Solution

Taming the beast of technical debt is hard. It’s not just that the actual effort of repaying it can be substantial or even impossible. The bigger challenge is often getting buy-in to do so at all. The business value of paying off technical debt is hard to sell when compared to the value of delivering features.

After joining Procore, and seeing the core values of Openness, Optimism, and Ownership not just preached but practiced in our daily work, I saw a unique opportunity to tackle technical debt head on in an environment where the effort could really succeed.

At our R&D Operational Excellence Event, I championed technical debt management as our most important priority in R&D. At the closing of the event, mine was selected via crowdsourcing as the top initiative, and a team of my colleagues joined me to lead a 90 day effort to operationalize an ongoing solution for debt management.

Experience told me that our biggest challenge would be cultural rather than technical. We needed to drive a shift toward debt management as a first-class citizen, to treat technical debt just like a mortgage. But in order to do this we needed a framework that could be scaled across the organization for describing and measuring our debt.

Describing Debt

At the highest level, it’s easy to split debt into two broad categories: the debt we already have and the debt we have yet to incur. The newly formed debt leadership team, affectionately known as Debtheads, decided to start by tackling our existing debt. We found Martin Fowler’s Debt Quadrants concept invaluable in starting to further qualify existing debt items.

techDebtQuadrant
Picture Credit

What’s more, we decided to break down what’s usually called “technical debt” into categories more strongly identifiable by each discipline in R&D:

Technical Debt
This is Ward Cunningham’s original idea of technical debt: not-quite-right architecture, design, and code that hinders development. Tightly coupled code, large multi-responsibility functions, and code duplication are common examples of technical debt.

UX Debt
UX debt is the gap between what the product delivers and what is expected by the user. For the most part it’s obvious, something doesn’t "look right”, feels broken, or is broken.

QA Debt
QA debt is the cost incurred when the desired level of quality is not maintained in every stage of the process, usually due to gaps in communication and/or test coverage. For example, omitting test cases from story grooming can produce gaps in both manual and automated test cases and missed bugs that would otherwise have been caught early in the process.

Innovation Debt
Innovation debt is the cost that companies incur when they fail to proactively drive change and stay abreast of new technologies. Being a major version or more behind a primary language or framework is the most common example. Failure to stay current affects not just the work itself, but morale and recruiting efforts also.

Process Debt
Process debt is the abandonment in part or entirety of an essential process, without a better process in its place, that can impact the delivery of software. For example, teams that forego retrospectives are missing a critical opportunity to learn and improve their process.

Measuring Debt

We really liked the ideas laid out in A Taxonomy of Tech Debt, which proposed measuring debt based on three criteria:

  • Impact: what’s the impact of this debt on our staff or customers?
  • Contagion: how widespread is this debt, or how much will it spread if not fixed?
  • Fix cost: what’s the cost, both in time and risk, to fix this debt?

Impact and contagion are a great starting point for prioritizing debt. If either or both of these is high a debt item probably warrants a closer look. Conversely if both are low, it’s probably safe to defer further discussion.

After working with these measurement criteria for several weeks, we found value in breaking “fix cost” out into its constituent parts:

  • Time to fix: how much time and effort is required to pay off this debt?
  • Risk of fix: how risky is paying off this debt?

This allowed us to be a little more granular in our measurement, and let either metric skew the resulting fix cost independently:
Fix cost = (time to fix + risk of fix) / 2

Paying it Off

With this measurement framework as a starting point, we could start dividing our existing debt further into three broad buckets:

  1. High impact/contagion, low risk, low effort: these are no-brainers, we should just pay them off.
  2. Low impact/contagion, high risk, high effort: likewise, these are simple; we likely can’t, or shouldn’t, pay them off.
  3. Everything else: this is the hard part.

The Easy Stuff
For the first bucket, we recommended that teams create a debt budget and spend it to pay these items off as part of their standard process. A budget frees technical roles to do the maintenance and technical work they know needs to be done, without having to justify, prioritize, or plan such work.

The Hard Stuff
Figuring out when the expensive and/or risky stuff takes priority over business features is harder. Our measurement framework became a common vocabulary for and shared understanding of debt that helped set the stage for more substantive discussions around prioritization and strategy, but that was only the beginning. Technical staff still needed to convince stakeholders how, when, and even if the work should be done.

Having a plan - or several alternative plans - for how to tackle a debt item tends to achieve the best outcome. It’s important to be thorough, practical, and willing to compromise: a staged, gradual refactoring of a system is a lot easier to sell than a stop-everything rewrite. Whatever the actual plan, forward progress and partial debt payoff is always better than none at all.

Making it Happen
During the 90 day window of the Operational Excellence initiative, with broad support from R&D leadership and high visibility, we were able to make real progress both culturally and tactically:

  • We kept up regular messaging of our efforts from the Debtheads team and R&D leadership
  • We printed promotional one-page posters on each debt category and posted them around campus
  • We held a “Lunch and Learn” to share our findings and keep people engaged
  • We had Debtheads stickers printed (engineers love stickers)
  • We had Debtheads t-shirts printed, and used them to incentivize debt work
  • We worked directly with squads to validate and refine our measurement framework by applying it to tickets in their backlogs
  • We added labels and measurements to our agile ticketing system to capture debt measurements inline and start reporting on the data
  • We started tracking debt items in our internal release notes, and made a point of calling out and celebrating debt work as it shipped
  • We solicited case studies of successful debt work and shared them with the wider R&D organization

At the Operational Excellence closing event we were able to demonstrate a substantial workload of debt tickets completed during the initiative. We shared our blueprint for a scalable approach to operationalizing debt management, and Procore was abuzz… had we finally figured out how to tame this technical debt beast?

The Path Forward

An Ounce of Prevention…
Technical debt isn’t a problem you fix once. Like financial debt the solution involves effective ongoing management, which leads us to the debt we have yet to incur. Prudent, responsible debt management going forward is arguably more important than paying off existing debt. I like the battlefield triage metaphor here: a medic must first stop the bleeding and stabilize a patient before assessing any further treatment. Minimizing the incurrence of new debt gives us some measure of breathing room to assess and plan repayment of our existing debt.

To this end, I started a series of in-house Sustainable Development workshops aimed at improving process health. Since unsustainable development processes almost guarantee technical debt, taking the time to assess and improve predictability and sustainability is an effective preventative measure. I also started teaching in-house courses on Test Driven Development, which has both technical and non-technical advantages for reducing technical debt.

What’s Next?
The conversation around technical debt is alive and well here at Procore. We were very successful in beginning the cultural shift required for us to succeed at this long term. The idea of debt budgets is gaining traction, and across the organization we’re having healthy discussions about the balance between feature development and technical maintenance work. With the cultural shift underway, we’re ready to tackle the next phase in this journey: making technical debt management an integral part of our standard operating procedure.