Over the past 6 months, I've been working with a customer who develops hardware appliances. They were facing a challenge with complaints from users about one of their products, which appeared a few months in. Needless to say, the cause of such bugs requires long and thorough investigation.
The hardware appliance runs a real-time OS, which is currently in transition from a proprietary system to FreeRTOS. Most of the code is written in C, and it uses system calls for creating threads and synchronizing them. The codebase is old, since it has been running for about 10 years.
The first step was to perform a technical assessment of the code. This was done with a mix of visual inspection, running analysis tools, and interviewing the team members. As a result of the assessment, it was established that the code was well written, the only issue being that inter-process communication was inconsistently implemented. Every team member recognized the lack of testing, and that it would have been very useful to have something in place. It was, therefore, time for the next step.
Defining the testing strategy typically begins with the pyramid of tests. Due to the unusual structure of this particular program, the strategy was less obvious than usual.
Real-time operating systems share one characteristic: instead of writing programs, developers write the so-called "tasks", which run according to various policies. Tasks connect with each other using IPC (inter-process communication) mechanisms. It was decided to try writing unit tests on one task, and see if any patterns emerged, that could then be used throughout the application.
But even when all tasks are correct, concurrency issues can appear: deadlocks, starvation, etc., which cannot be prevented by testing the tasks. Another type of tests is needed to find these kinds of issues. The solution was writing stress tests. They would send increasingly more requests, or speed up the system timer, up to the point where something broke. We would then have a chance to investigate it. I think about this technique as "speeding time", so that issues that normally appear after 6 months can be identified in a week or less.
We have successfully implemented task tests, as you'll see below. The stress tests are for now a working idea that might change during implementation.
I paired with one of the customer's developers for 4 hours a week, in two sessions of 2 hours each. We started by setting up a test harness using CppUTest, a unit testing framework that can test embedded C/C++ code. Its advantage is that it consumes very little memory. The alternative is GoogleTest, and it's the framework the team decided to adopt in the end. Either works fine.
We used the classic approach for writing characterization tests. First, write a test that initializes the task, and see what breaks. Then, call a function from the same test, and see what fails. etc. Whenever the test was failing, we looked for ways to break the dependency so that the test could pass.
Luckily, dependencies on the OS were already extracted in macros. Breaking that dependency just meant creating a .h file, including it in the test, and replacing the macro with whatever was needed for the test. It's much more time consuming when direct method calls to the OS are spread around the code.
Another type of dependency we encountered was related to starting, synchronizing and stopping threads. One option was to replace the start, stop and synchronization methods with empty functions (thus remaining in the same thread), or with a dummy implementation. The issue was that threading is an important part of the system, so it was important to test it. We ended up separating the threaded code from the synchronization code by extracting functions from each piece of code that runs in another thread. We then replaced all thread-related primitives with the pthread functions for all the tests. This way, the choreography was tested separately from the implementation.
All in all, writing tests for the tasks proved to be relatively easy. We also identified the patterns we could use to roll out the practice to all the other tasks in the system. A few particular issues are still expected, but those will be fixed one by one.
After testing the task, my pairing partner presented the technique and the results to the rest of the team. They noticed a few benefits:
It is possible to write tests for this type of system. The team was initially very reluctant to the idea, but seeing it's possible completely changed the conversation.
We can now run tests on our own computers. The team is distributed, and some locations miss access to the device. With these tests, they will be able to make a change, run the tests on their own computer, and be more confident that nothing broke. (Of course, this is not enough, but we'll talk about it in the next section)
They also decided to use Google Test instead of CppUTest, because it simplifies certain tasks, and because the device has enough resources to run it.
Running tests on your own computer is great for feedback, but not enough. This is true, especially when using different libraries for testing (thread on the local computer, the real-time OS primitives on the device). So, the next step will be to find a way to deploy the software automatically to a device and run the task tests on it. Once we do that, it's quite easy to have a continuous integration system that builds, deploys and runs the tests. This system will allow the developers who lack access to the device to test their code in a matter of hours on the real thing.
To complete the testing strategy, we'll also need to write stress tests and figure out a schedule for running them. And, to improve the design, we'll need to refactor the IPC so that it is consistent throughout the codebase.
Writing automated tests on a real-time, embedded C codebase is not only possible, but very beneficial. The usual techniques apply the pyramid of tests, characterization tests, and techniques to break dependencies. There is additional complexity due to having two testing configurations: on developers' computer and on the device, but that's easy to solve with a well-built system.
The most important factor in adopting the tests is design. A clear integration layer with external dependencies and an easy way to inject test doubles make a big difference in terms of the time spent writing the first tests. C/C++ have an advantage in this context: macros are a way of breaking dependencies and of dependency injection unavailable in other languages. It has been my experience that the inherent complexity of C/C++ leads to clearer separation in design compared to Java, PHP or C#, thus simplifying the task of writing tests on existing code.