Securing Opensource Code via Static Analysis (I)

Raghudeep Kannavara
Security Researcher, Software and Services Group
@Intel USA

OTHERS

Static code analysis (SCA) is the analysis of computer programs that is performed without actually executing the programs, usually by using an automated tool. SCA has become an integral part of the software development life cycle and one of the first steps to detect and eliminate programming errors early in the software development stage. Although SCA tools are routinely used in proprietary software development environment to ensure software quality, application of such tools to the vast expanse of opensource code presents a forbidding albeit interesting challenge, especially when opensource code finds its way into commercial software. Although there have been recent efforts in this direction, in this paper, we address this challenge to some extent by applying static analysis on a popular opensource project, i.e., Linux kernel, discuss the results of our analysis and based on our analysis, we propose an alternate workflow that can be adopted while incorporating opensource software in a commercial software development process. Further, we discuss the benefits and the challenges faced while adopting the proposed alternate workflow. Keywords-Static Code Analysis; Opensource; Software Testing; Software Development Life Cycle

Although SCA is not new technology, of late, it has been receiving a lot of attention and is rapidly being adopted as an integral activity in the Software Development Lifecycle to improve quality, reliability and security of the software. Most proprietary software development establishments have a dedicated team of software validation professionals whose job responsibilities also include running various static analysis tools on the software under development, either in an agile environment or in a traditional waterfall model. SCA tools are capable of analyzing code developed using several programming languages and frameworks that include and not limited to Java, .Net and C/C++. Whether the goal is to develop application software or a mission critical embedded firmware, static analysis tools have proved to be one of the first steps towards identifying and eliminating software bugs and are critical to the overall success of the software project.

Having said that, it is clear that there are typically either commercial interests and/or security issues involved in licensing and exhaustively applying expensive static analysis tools in software development efforts with very little or no scope for failure. What then remains as an expansive, unexplored void are the endless realms of opensource code that is usually assumed to be well reviewed since opensource software has been used and reused in countless number of applications by numerous academic and commercial institutions across the globe. "Expansive" indicating that it is continually adding new source code and "Unexplored" indicating the absence of a dedicated, unbiased entity to statically analyze every line of code that is opensourced. Of late, SCA companies have taken up the initiative in this direction. But even then, the rate at which newer versions of the opensource software gets released or updated, presents a barrier to such efforts. Although such an effort would be attributed to paranoia by many, it is evident that financial meltdown and shrinking IT budgets have led to opensource software entering into commercial applications, for example, Linux, Apache, MySQL or OpenSSL

Figure 1. Usual software development process segment incorporating static analysis

The conventional workflow where SCA tool"s output is included in the formal software review package is as shown in figure 1. In most software development and testing institutions, although the software product as a whole is subjected to dynamic analysis, like fuzzing or black box testing and newly developed software code is subjected to stringent static analysis and review, opensource code incorporated in the software is usually not subjected to the same stringent static analysis and review as newly developed proprietary code, as depicted in figure 1. Usually, a binary of the necessary opensource software is incorporated in the complete software package. This may be based on the assumption that opensource software is more secure and less buggy than closed source software because the source is freely available, lots of people will look for security flaws and other software bugs in it in a way that isn"t going to happen in the commercial world. Despite the fact that this assumption has withstood the test of time and probably cannot be proven wrong, it would still be an interesting exercise to apply static analysis tools on opensource code and analyze the results. Although there have been previous efforts to statically analyze opensource code using commercial or opensource SCA tools, in this paper, we seek to verify if certain code issues (or bugs) can be detected early on by SCA. Further, we propose an alternate workflow that can be adopted while incorporating opensource code in a commercial software development process. We discuss the benefits, challenges and possible trade-offs to adopting the proposed alternate workflow. Such an effort will be of interest to both the software engineering community and the opensource community in general. Therefore, in order to experiment, we pick Klocwork Insight as our tool of choice, since it is one of the industry leaders in SCA. We pick a popular opensource project, i.e., Linux kernel, for our code analysis. We then proceed to run Klocwork Insight against Linux kernel code and we discuss the results of our analysis. In general, Klocwork is a representative tool for SCA and Linux is a representative opensource project for our analysis, which can be extended to other SCA tools and other opensource projects.

Paper organization

The rest of the paper is organized as follows. Section 2 provides the background to choosing Linux kernel code for analysis and SCA using Klocwork Insight. Section 3 discusses the results of SCA. In section 4, we discuss an alternate workflow that can be followed while incorporating opensource software in a commercial software development process and discuss the benefits and the challenges faced while following the proposed alternate workflow. Finally, in Sections 5 we conclude by summarizing important observations.

PRELIMINARIES

A. Linux kernel

The Linux kernel is an operating system kernel used by the Linux family of Unix-like operating systems. It is one of the most prominent examples of free and open source software and hence a natural choice for our analysis. The Linux kernel is released under the GNU General Public License version 2 (GPLv2) and is developed by contributors worldwide. Day-to-day development discussions take place on the Linux kernel mailing list. The Linux kernel has received contributions from thousands of programmers. Many Linux distributions have been released based upon the Linux kernel [8]. We chose the Linux kernel version 2.6.32.9, released in February 2010, for our analysis.

Static Code Analysis

The unique benefit of static analysis is its ability to scan complete codebases to identify logic and security bugs. It is much more comprehensive in reach than black-box systems testing. However, some issues are runtime dependent and can only be found by actually executing code, so static analysis cannot stand alone. A representational effort-benefit curve for using a SCA tool is shown in figure 2. It is observed that as more checks are enforced, the fraction of errors detected increases along with increase in the amount of effort required to enforce these checks. Typical compilers such as gcc incorporate static analysis to a certain extent in the form of warnings or errors reported during the compilation process. These generally require the least amount of effort and likewise report the least number of bugs. While on the other hand, formal verification of software, for example using model checking, is a much more complex approach towards automation of SCA, requiring greater amount of effort, but capable of detecting greater number of bugs of higher complexity. Although formal verification is not extensively adopted in the software industry, due to the law of diminishing returns, there are several operating systems that are formally verified such as NICTA"s Secure Embedded L4 microkernel [6].