TSM - Our Fight against Technical Debt

Septimiu Mitu - Development Lead

When implementing a new functionality you have two options: blue or red - quick and dirty or smart and clean. If you chose the first option you create a debt that has to be paid at some point. If you go with the second option it takes longer to implement new features but it makes change easier in the future.

The graphics were adapted from Scrum.org, PSPO course

Awareness

Our project started from scratch, so in the beginning all code was new. As the project progressed and sprints came and went, we were sometimes hard-pressed for time, usually at the end of the sprint, seeing we overestimated our ability to deliver. At this point we started to care a bit less about sound engineering practice and we just got the job done.

The client architect would provide input very late and when he did, we saw that some of our code was not very close to his guidelines. Sometimes we patched things up to make them work and get the thing released.

We love the concept of emergent architecture, whereby the "big picture" of how the system should work emerges with the system, during its creation, and people add and change the system components as it grows out of the requirements. This is a very nice way of fighting uncertainty - we don"t know the final requirements of the system, so we are building and designing it as we find out more.

By adding a component here, a component there, our application was starting to resemble Frankenstein. It did all it needed to do, that is for sure, but it was becoming harder and harder to understand what each piece did and if it still had a purpose.

We took another quality hit when some of our specialized people were on leave - for example Alex, our only strong BPM developer. When he was away, the other guys writing BPM would take some slightly uninformed decisions which would decrease the code quality. People call this the "bus factor" - how many people would need to be run over by a bus in order to cause the team to stop working properly. Our "bus factor" for certain things was one. To mitigate this issue we tried to pair Alex with Stefan and Cosmin who started to learn BPM.

We need to change

As we continued sprinting, the project codebase became larger and larger. Whatever we pushed under the rug was starting to surface. People were unable to write code until they "fixed" something or they "made it work again". What used to be a quick fix now became quicksand.

We started to hear a recurring beat during the daily Scrum: my task took twice as long because I had to dig, fix and refactor the system to make it stable enough to work on it. Needless to say, people started to get frustrated. Developers felt they were being held back and were getting angry because they were fixing stuff more than writing new code, the project leadership was getting alarmed with the lower team velocity and in the same time everyone ended up working overtime to try to have as much as possible done in the sprint.

We were using both a physical board and Jira because our client, including the Product Owner, was not collocated with the team. The internal communication revolved around the physical board, the heart of our team. The communication with the PO and the outside, as well as reporting, was done using Jira. Over time, Jira became more and more difficult to follow because of unclosed tasks, changed user stories, unplanned tasks. From Jira it became impossible to tell the real state of the project.

At a certain point, the problem became so painful that we were no longer able to work.

Make it visible

We needed to identify the expanse of the technical debt and to display it - on the board, in Jira, on Confluence, tools we used to share information. The tough part was discussing with the client, admitting that we have a problem and getting their support.

Some of the actions we started were a full architecture review with the client architect and team code reviews. We also extended the use of Sonar, our Code Quality tool, and Jenkins dashboards to let us know the state of the Continuous Integration system. Our colleague Madalin from Endava Cluj even created a dashboard that gives stakeholders a one-page view of the most significant metrics around quality for all teams working on the same account. It shows the number and types of bugs and the trends in their number, the unit test code coverage and how the automated build system is faring, as you can see in the picture below.

Pay it back

There are three steps in this process, according to the PSPO course from Scrum.org:

  1. Stop creating debt
  2. Make a small payment
  3. Repeat from 2

What we did was, after acknowledging the size of the debt problem and putting up related tasks on the physical board, we started working on paying back some of it every sprint. At some point we realized there was so much technical debt that we needed to take a sprint off building new functionality and just pay it back. We"ve stopped creating technical debt.