Last time : I promised some more thoughts on the paper that we recently published in CACM. One area that some readers noticed is that the article spends a lot of time on issues that are specific to C/C++. Let's dive into a question I sometimes hear: why don't companies just write software in a better language where these problems can't happen?
Part of the answer is many companies have already switched. But many more have stuck with C/C++ (see the Tiobe data on language popularity for one measure). A few of the many reasons from observing our customers are:
- Legacy code. Consider the ton of code written in C/C++ out there. Coverity Scan analyzes over 300 of the most popular open source projects written in C/C++, including the Linux kernel and Firefox browser. There are about 60 million lines of code (MLOC) in these projects combined. In the commercial world, some of our larger customers have over 30 MLOC in a single project. To give you some perspective, 1 MLOC amounts to a stack of printed paper about 6 feet high. There would need to be a really, really good reason to consider rewriting anything of that magnitude.
- Risk aversion. Code "hardens" as it ages and enters repeated production use. Large code bases are tested over a period of years, if not decades. The reality is that all production software, no matter what language it's written in, needs to be tested for functionality, stability, and performance reasons. Using a great language might help with stability to some degree, but rarely functionality or performance. Rewriting a (sub)system reaps certain benefits, but also introduces risks to existing functionality, stability, and performance. Sometimes, the risk-reward trade-off is worthwhile, but often it is not.
- Embedded software. Think mobile phones, cable set-top boxes, internet routers, wireless base stations, firewalls, network-attached storage systems, weapons control systems, and medical devices. They aren't running on commodity PC hardware, processors, and operating systems. The hardware is often chosen for power-performance and cost reasons, and not infrequently there are large chunks of kernel-mode code. Sometimes these systems start from the Linux or FreeBSD kernel, or perhaps a commercial offering like Wind River VxWorks. Specialized hardware devices are common in this world, and low-level direct memory access is frequently necessary. Much of this software is written in C/C++.
- Performance and control. C/C++ compilers generally produce very efficient object code. Still, the choice of algorithm, and careful tuning and performance evaluation can trump language choice. However, in my experience, C/C++ provides a degree of flexibility and control that can sometimes help in optimizing the performance of a system in a way that many other languages don't allow. Of course, the same flexibility and control can lead to bugs that are hard to diagnose.
When I was a researcher back at Stanford, my favorite programming language was O'Caml. I felt wonderfully productive writing code in a language that had type safety, type inference, pattern matching, and purely functional data structures (still a novelty at the time). We initially started off writing most of our core functionality in O'Caml, but eventually, had to abandon that implementation in favor of one written in C++. It was just too hard to hire programmers who knew O'Caml, and we also found that certain features that helped eliminate defects also reduced our control of the program's performance, scalability behavior, and flexibility. Writing in C++ was often more verbose and less elegant, but it was possible to get to the result needed for the business.
At the end of the day, the features of a programming language are only one aspect of what makes it suitable or not for a particular job.