Intro and problem statement
Early this year, the book Software Architecture Metrics: Case Studies to improve the quality of your architecture was published. This book is very special for Apiumhub, as Christian Ciceri, co-founder and chief architect, is one of the co-authors.
Christian wrote a chapter that is particularly useful in contexts where the architecture and environment still have many opportunities for improvement. He emphasizes the importance of private builds in the development process, with the underlying principle of his reasoning being easily understandable: you should make use of private builds and run their tests on your local machine, in an environment as close as possible to production, to obtain feedback on your build as soon as possible.
This allows every member of a development team to push reliable code to the pipeline, minimizing the number of bugs and their impact in later stages, namely QA and production, as seen in the next picture:
Graphic by Christian Ciceri, from Private Builds and Metrics presentation in GSAS’22
The conceptual idea in Christian’s chapter got me thinking, and I finally concluded that not using these private builds was a particular case of something that happens in many development teams.
Not using private builds can be seen as a type of process debt, and this is what this article is about.
When preparing it, I also tried to figure out who would be the actor that could have more impact on improving it within a company, and this is why this text is mainly addressed to technical managers (engineering managers, directors, and VPs of engineering).
What is process debt? Technical debt and Process debt
Process debt is the implied cost of having to perform additional work or actions caused by not having the right elements in place in our development process, environment, and/or workflows.
It can be seen as an analogy (or extension) of technical debt (a concept coined by Ward Cunningham), which is the implied cost of reworking a part of our software as a result of choosing a limited approach that we can code faster now to solve an immediate need, instead of the approach that would best suit the project in the mid/long run.
Technical debt is a conscious choice by the team. Technical debt is not messy code (Uncle Bob talks about this difference); it is an informed decision taken by the team that consists of implementing a suboptimal technical solution at a given point in time to prioritize some other factors (normally minimizing feature delivery times.)
The biggest problem with technical debt is that it involves a toll that you pay until you fix it.
When creating technical debt, we choose to build a house on a foundation that will need improvements in the future; and in doing so we pay a price, twofold.
Firstly, we pay a price frequently: The more we build upon the suboptimal foundation, the more effort we will have to daily invest to make sure that everything that we are building works fine despite not having the solution that should have been there in the first place.
Secondly, we pay an accumulative price: the more we build upon the foundation and the more time passes without fixing it, the more effort we will have to invest to fix it in the future. Paradoxically, we probably will have to refactor all the extra costs that we have been doing on a daily basis while building on our technical debt.
Process debt is quite similar to technical debt. Teams choose not to properly prepare and maintain certain elements of their environment or process to favor feature delivery, and they pay a price until this debt is fixed, usually in terms of time and/or reliability of the value delivery.
Process Debt and daily team dynamics
In my experience, process debt is vastly extended in a lot of companies, and is often much more harmful than the technical debt itself. The latter is usually known and acknowledged, and it is frequently scheduled to be addressed in the future; however, process debt is something that a development team “learns to live with” without realizing the negative impact on their value delivery pace most of the time.
Some examples of what can be considered process debt could be:
- Not having production-like datasets to run proper tests in QA
- Not using private builds to run most of the test pyramids in local
- Not having logs, monitoring tools, and metrics in place to detect problems in an environment
- Using manual steps that could be automated
- Knowing that some things do not work in one environment and have to be tested in other environments
- Having inconsistencies between environments
- Not having fast/standardized ways of deploying new instances or containers
- Not having batteries of load or stress tests designed
In general, process debt is anything that differs from what a development team would consider an ideal workflow and environment given the project context.
I am aware that it is sometimes almost impossible to have a 100% ideal environment or process, but that does not mean that the development team should settle for anything and not strive to improve it.
I have seen and experienced many of the problems above working with different companies. As I mentioned before, the biggest challenge I have found in addressing these problems is that most developers are aware that the situation is far from ideal. However, they stoically accept it, mostly because the priorities that are set by the product side are always focused on value delivery, and they see the time allocated to fixing processes or technical debt as something that will prevent or delay feature deliveries, instead of an investment that will improve the value delivery pace in the future.
This is where engineering managers should kick in and push forward.
The role of tech management
Tech management positions, Tech Leads, but most importantly Engineering Directors or VPs of Engineering are key figures implementing the solution to the process debt.
In Agile environments, development teams are self-managed, and their main focus is the implementation of value, the focus of the tech managers should be to improve processes. We should never micromanage people, instead, we should always be looking for ways to improve the way we do things so teams can work more efficiently and have a better experience working on the project.
The development team and the tech managers should always be questioning and challenging the processes of the company, but the developers have a lot of load on their shoulders, at the end of the day they are the ones that are delivering value. It is the responsibility of tech management positions to push and facilitate these kinds of improvements or reduce process debt.
Following this idea, tech managers should advocate within the company and educate the rest of the management team on how important the investment of time in process improvement is.
The false dichotomy of value vs process/tools
Let’s jump out of software development for a moment, and imagine you’re in the kitchen of a top-level restaurant, with a chef leader and a team of chefs.
Could you imagine them asking for time or permission to sharpen their knives?
Or asking permission for cleaning up the oil and the rest of the food on the worktop?
The answer is no, obviously.
Despite the fact that everyone knows that when a chef is sharpening a knife he is not cooking any dish, nobody disputes the fact that in order to produce the best end product, time must be invested in keeping the tools and processes of the chef as good as possible.
Another example of this principle, much more directly related to software development, is preached by game developers.
The guy in the picture above is John Romero, founder of Id Software and co-creator of some of the most successful games ever (Wolfenstein 3D, Doom, and Quake) in a recent talk regarding software development principles in his days (by the way, the session can be found here).
In any software development company, the statement above has a direct translation. “Great workflows make great products.” Take care of your processes as much as you take care of your code”. I will admit I have taken some liberty in the analogy, but I think it is quite valid.
Sadly, this culture is not often upheld in most software development companies nowadays. There are many environments where the mantra is always “deliver new features as soon as possible,” minimizing and ruling off any initiative that has to do with keeping a healthy workflow that helps the team perform their tasks.
The problem with this approach is that it degenerates fastly. The more time you spend without paying attention to your environment and tools, the more time it will take to put it in good shape once you start addressing it, and the more resistance we will find in other departments to invest this time in the project. This is the price we pay, as mentioned before.
Culture of CI/CD
Keeping your process in shape should follow the same philosophy that CI follows. Using Martin Fowler’s words, if it hurts, do it often. ( Frequency reduces Difficulty ).
The development team should ask the following questions on a regular basis to map into the pains indicated in the preceding sections:
- Do we have a meaningful testing pyramid in place?
- Do we have proper datasets to run the testing pyramid both locally and in other environments?
- Are we using private builds that allow us to run most of the test pyramids locally?
- Do we have meaningful logs, monitoring tools, and metrics in place?
- Have we automated all the manual steps that can be automated?
- Are our environments consistent? Do all our tests behave the same way regardless of the environment?
- Do we have the load and stress tests up to date?
To push improvements, tech managers should always keep an eye on the questions above and discover the pain areas of the development teams, deliver pace, and deliver reliability.
Nowadays, top organizations deploy software at an almost insane rate. According to the Phoenix Project book, the most valuable organizations deploy considerably more frequently than any other company.
This deployment frequency is possible not just due to the development team’s seniority, but also because the processes within the companies are exceptionally healthy, efficient, and continually maintained.
CI culture but also testing capabilities, rollback policies, production-like datasets, etc., are often the key differences between a typical enterprise or startup and a high-performance company.
When we consider the time lost due to unreliable testing, manual steps, manual deployments, bugs caught in later stages of our pipe, and so on, we may admit that some of the projects we have worked on could have lasted a few weeks less, and the developer experience would have been better.
This is where tech management can make a difference. Tech managers should always strive to drive their teams towards this level of perfection, which is often overlooked.