TSM - Zero Bug Software Development and Four Amigos

Tiberiu Cifor - Engineering Manager

We all know that the way in which software products are delivered has changed a lot in the last few years. There is an increasing need to deliver things faster and to ensure that what we deliver is high-quality. If some years ago, development teams had enough time to design an architecture, to plan what they developed very well, in the last few years, the same teams do not enjoy the same amount of time. For this reason, the work methodologies laid great focus on the time required to deliver a product.

We live in a globalized world, a state of affairs which can be seen in the way teams are built. Some time ago, development teams shared the same location. This is no longer valid. The majority of the development teams are distributed on different time zones and across various cultural contexts. In spite of all this, the high-standards of software product quality is as high as it will ever be.

Working in this complex environment, the teams must deliver results more efficiently, must give a rapid response to market requirements, and must deliver impeccable products quality wise. Why am I mentioning this? Nowadays, a product which does not answer customer needs, a product which does not work correctly and which contains many errors, will be penalized by the market very quickly. There is a very simple reason for it: there are many alternatives on the market. The competition is high, no matter of the field you activate in, which puts extra pressure on the teams which must deliver high-quality products.

Quality - a small analysis

I hear the following expressions all the time: "high quality", "product quality", "process quality", etc.. Over the years, experience showed me that we all have a different perception of quality. It were simpler if this term would mean something which leaves no room for interpretation. It seems things are more difficult than we would like them to believe. From the point of view of CEOs, product quality is a matter of how well the product is sold. If CEOs see that there is a healthy cash flow in their business, CEOs know they have a good, high-quality product. Let's take a developer's case: if developers notice that there are only a few bugs reported for their code, they believe they must have written a high-quality code, right? Moving on, if testers discover errors in the application they are testing, all the time, testers believe they are doing quality work. What is the case of the marketing director? If marketing directors notice that the market feedback is positive, as a result of the campaigns they built and implemented, they will believe they did a fine job. Examples can continue, but what I want to highlight is the fact that people have their own perception of what quality means. In the end, we can have a team where each member has quality results, but where things to actually look different as a whole.

I believe there is a close relationship between value and quality. For example, in the development process, everything starts with an idea based on a detailed and refined analysis. The idea is then translated in clear specifications. Next, these specifications are implemented and tested. Obviously, all these steps are monitored. This is how the development process looks like from a general outlook. In the example above, for each step of the process, we consider that the idea and its implementation have value. In other words, we have an assumed value.

When a customer receives this value, starting off from an idea, we can say that we bring value through what we deliver. Only then can we say that the idea is valuable. In other words, we started off from an assumed value and we got to the real value. It sounds a bit complicated, but if we analyze things a bit, we notice that it all adds up. What matters is to get to the real value, the only one which indicates whether we generated quality. The image below shows how we progressed from idea to value.

Quality functions

In close connection to the things mentioned above, I read some interesting things about quality not so long ago, and especially in the way we transform assumed value to real value. The way this transition is done was called "quality functions" - namely quantity functions. Since it looks interesting, I will detail what these quality functions are.

The final result of this project is one of high quality when these quality functions were applied in each step of the project, more precisely in each step: idea, research, specifications, implementation and testing, installation, production, monitoring. These quality functions are a mechanism that can transform assumed value to real value. Value is impossible without quality, quality is impossible without value. These elements entertain a Ying-Yang relationship.

In the IT world, we like to talk by referring ourselves to programming languages. These quality functions can be defined by the following function:

Quality()  {
  researchAndRefine()
  specify()
  buildAndTest()
  deploy()
  monitor()
}

This code merely highlights the fact that quality is nothing more than the application of these quality functions in each phase of project development.

Quality and 0-bug-software

0-bug software? How is this possible? We know that software products contain bugs, no matter how well the code is written and no matter how perfectionist we are. At a given point, bugs become manifest either after using the product intensively or when discovering corner-cases. How can these errors be best managed? Have you ever wondered if there is something extra that can be done to make these frustrating situations isolated or inexistent?

"0-bug-software" is an interesting approach to this problem. This way of doing work requires more discipline at all levels, starting with developers, testers, business owners and ending with product managers. Actually, I believe that this is the greatest challenge of this way of work: imposing discipline on all work levels.

How does the process work?

We start by inviting product owners to classify all existing task into the following categories:

The next step is for the development team to prioritize these tasks. To describe priority levels I will make an analogy with a physical store which sells consumers certain products:

Critical problems. These problems determine the store not to function. The store is on fire, there is a major fire, and so, unless we extinguish the fire, we will not have a store. This is pretty clear, right? The priority given to solving such problems is very high: customers no longer get product value. In other words, it is like throwing money off the window. Message for the development team: you stop from whatever you are doing and you solve the problem.

Errors. The following analogy is suggestive: errors are like a water leak in your store. If you let too much water in, you might end up with a flood and you will close the store. Therefore, the errors prevent customers from enjoying the products they should benefit from (I refer to functionalities). What must the development team do? They must fix the errors after they finish the things they are working on at the moment.

Functionalities. Functionalities are like products marketed in stores. Stores cannot exist without products, and products cannot exist without a store where they are sold. There is a mixture which ensures success. Some tasks are functionalities when they don't already exist in the actual product. When should work be done on these functionalities? They are tackled according to backlog priority.

Improvements. Here we discuss the things that bring your store something extra. These are the things which attract customers or which make customers return to your store. Think of the things that differentiate one store from other stores. These are the tasks that improve an already existing functionality. As is the case with any other functionality, they must be worked one according to their backlog priority.

This is the "0-bug-software" approach in brief. As I mentioned in the beginning of this section, discipline must be flawless. No exceptions are allowed, no deviations from this behavior. At first sight, this looks like a very rigid system, but it is not like this. This ensures an ordered way of tackling tasks and anything that must be implemented in the software product.

We are usually accustomed with certain error classifications. Development teams work on certain errors according to their priority. This means that there can be critical errors, major errors and minor errors. In the "0-bug-software" approach, a task is either an error, or not. This picture has no shades of grey. It is either black, or white. In other words, if the answer to the following question "Can you live without a task?" is "yes", then this is not an error, but an improvement of the system. Error classification is binary.

Of course, we cannot put all errors together. They must be filtered and differentiated, because they are not all the same. Therefore, this way of work implies a classification of errors:

Please read carefully through the way errors are classified. Isn't it right that you can put an error into any of these categories? Of course, there might be debates regarding this classification. Yet, it is important that we be honest within the team and that we admit each time an error is part of a group enumerated above. It's important to know that any error can be classified as one of the following:

I liked the following definition: "By enforcing a strict set of classification and handling rules, you get prioritization discipline for free."

Here are some examples regarding the way in which we can classify errors in the categories above.

Example1.

Is the encountered error one we can live with? Does the error still allow the application to run correctly as a whole? For example: Images should be downloaded and cached in another thread. In this case, the signaled error will be reclassified as improvement.

Example 2.

Does the lack of specifications entail a new functionality? For example: In the page that displays users, some fields cannot be edited in line. In this case, the signaled error will be reclassified as a new functionality.

Example 3.

The incomplete description of functionality has a major impact on the business. For example: A page counts the number of times a product was clicked, when we in fact want to know how many times the product was actually ordered. This is the hypothetical example of an online store. In this case, the signaled error is reclassified as a critical problem.

As mentioned earlier, this system requires discipline from all those involved in the process, from the management team and business analysts down to the development and test teams. The greatest effort occurs when the work style mentioned earlier is actually initiated. From the get-go, all tasks must be analyzed and classified as detailed above. No exception! Ok, you might say, but our backlog contains hundreds or thousands of tasks. I would say this is a good time to start the cleanup. I am convinced that many of those tasks are very old and that many of them are no longer reproducible. Of course, the development team must not wait until all tasks are properly classified. After 10-20% of the tasks are revised and reclassified, the development team can start working. It's important that all tasks be classified as detailed above. It requires a huge effort in the beginning, but things will then work quite easily.

The great disadvantage of this system is that all the players need to be involved 100%, even people who are part of management or the ones oriented towards the business part. This is where we stumble upon their limited availability, from the point of view of the spent time. There are solutions for this situation as well. I recommend weekly meetings where these people are involved, thus helping with the reclassification. These meetings must be quick. During these meetings, the tasks will be revised and reclassified as per the system outlined above. By doing this several times in a row, after a couple of sessions, things will fall into place and the task list will shrink. For large products, where development teams are large, I recommend the "three amigos" model (which I will details briefly in what follows). The system brings together people who have different responsibilities, which will help usher a more efficient communication process among them.

Now we should clearly specify the order in which we process these tasks. The first are the critical situations. Then come the working issues, the errors, and then the backlog tasks, namely the enhancements and new functionalities.

Kanban board would look like this

This concept is not new. It was first mentioned in the '60s by Philip Crosby, a legendary quality expert from the USA. As many other concepts, this was first used by the aviation industry, and then, in the '90s, by the auto industry. In the beginning, Crosby called this the Zero-Defect methodology.

Three Amigos

Now, let's have a look at an interesting concept, which is implemented in some companies: "Three Amigos". It is important to note the connection between this concept and "0-Bug-Software". These two concepts go hand-in-hand and it is interesting to see how we can deliver them faster and more efficiently.

It is worth clarifying what this concept is, how it came about and what its main features are. We know that many companied have adopted Scrum or Kanban. Entire departments align themselves to function according to well-established rituals like Scrum.

Where did the "Three Amigos" concept originate from? Given that different actors are involved in the entire Scrum process, each leaves with a baggage of knowledge, namely with a level of understanding things. However, people understand things in their own language. More exactly, product owners speak about UAT and user stories, business analysts speak about specifications or acceptance criteria, developers speak about code, unit tests, while QA people speak about scenarios and test cases. Each of these players has their own language and their own level of understanding things even if they work within the same team.

The environment is prone to generate confusion, complexity and ambiguity. To be more exact, confusion emerges when the team needs to establish what is needed for the functionality to be complete. Complexity is very important, manifesting itself when one of the players involved modifies the process without estimating the impact it will have on the other involved players. Obviously, ambiguity may emerge when some of these players do something already implemented. For example, do the unit tests fully cover the functionality or do we also need manual tests?

Therefore, we have some players involved in the process, who, given their knowledge and totally different language can generate several levels of confusion, complexity or ambiguity.

What does the "Three Amigos" model entail? This is another meeting, which is added to the Scrum meeting, and which should solve all the problems enumerated above. Another meeting? You may find it absurd because the teams which adopt Scrum have a high level of resistance to acknowledging to a new meeting. This is not true. This meeting happens anyway, but the people who are apart of development teams are not involved (I refer to testers and developers). This is a meeting where certain specifications are discussed, and where the functionality to be implemented is detailed and analyzed. This meeting usually happens before tasks reach the backlog. It is assumed that a task reaches the backlog only after it is validated.

How is a task validated? This is easy to do. In this meeting, there will be a representative of the business analyst, development and testing teams. Therefore, this meeting will include 1 BA, 1 developer and 1 QA.

There are many advantages to having all these people in the meeting. Remember that, at this stage, specifications are detailed, questions are answered and things are clarified. It is very important because specifications are clarified, discussed and people see whether they are on the same page in terms of development, testing and business. Common language slowly makes its way, so that everybody understands the bigger picture as well as what tasks are available to them. The development team will also understand why a certain functionality is implemented, how it will be tested, and the BA will get the opportunity to see exactly how the development team interprets everything.

I would recommend that any new functionality or any enhancement of an existent functionality goes through this filter, the 3 Amigos one. This ensures consistency in the way tasks are tackled. This guarantees that a given functionality is ready to go in the development stage, with complete specifications. This also provides an answer to existing questions, which determines that the estimations that are bound to arrive are very accurate.

It is worth mentioning what the benefits of this approach are. Three Amigos:

Given that things evolve constantly and rapidly, I would like to make a recommendation. I would add another player to this group, so that the group becomes Four Amigos. This is for good reason. Taking into account that today's applications contains pretty complex interfaces; I would add a UI/UX or a frontend developer. The great advantage of having this user-experience person among us is that we can acknowledge, at this stage, various interface problems, various limitations and issues which will increase development time. The UI part takes a large part of the implementation time, it becomes very complex, it generates many problems. To mitigate the impact of such problems, go with the Four Amigos approach. You will start seeing the benefits too.

Last but not least, I would go even further and apply what we discussed in the beginning of this paper on "0-bug-software". We can implement this methodology at group level, namely at "Three Amigos" or "Four Amigos" level. Everything we discussed relates to these errors, functionalities, enhancements, but at story/epic level.

The idea is to identify how we can function better and more efficiently in each team. The experience of team members is very important, how open these people are ti change, and how much they can juggle with such methodologies. It is important to try out new things, to understand whether the system works or not and then adapt ourselves. Nobody wants to have fully-fledged processes, but which do not ensure a level of efficiency desired both by the customer and the team.

It is important to make every process we use are scalable. Products evolve, processes evolve, things move at an alert pace and we must attempt at making processes scalable every time we reach a critical point, no matter what that critical point may be. In a future paper, I will share the way we can make processes scalable. Until then, I wish you best of luck in everything you do and remember to be brave enough to try out new things, things that may lead to more efficient processes.