When C is the Best (Tool for the Job)
Table of Contents
- Document Information
- Introduction
-
Reasons for Wanting to Prefer C
-
Case Study: Freecell Solver
- Conclusion
Document Information
Introduction
- "Java and/or .NET are better than C in every way.".
- "Perl/Python/PHP/Ruby/Tcl are better than C."
- "LISP is better than C".
- "If you need speed, use O'Caml or Haskell - don't bother with C."
- "C is for kernels."
We hear these things all the time and still people insist on using C for some tasks. OK, these sentences are not entirely false. Most of the time (I don't know if it's 90%, 80% or slightly above 60%, I didn't count), it is preferable to use something else. But what about the minority of these cases?
The purpose of this article is to point to the cases where C really shines and is the best tool for the job. At first I'll enumerate some of the strengths of C, and then I'll give a case study which illustrates some of these points.
Before that - a note. By "C" I mean either ANSI C (with most common and portable extensions) or C99 or C++ with many C primitives used. The so-called Standard C++ is entirely different.
Reasons for Wanting to Prefer C
Portability
ANSI C is portable for every architecture there's a gcc backend or any other compiler for. With Java you rely on Sun's whims. Open-source high level languages are better (especially if they can be compiled using gcc or supply a gcc back-end), but nonetheless, C is still more portable than anything written using it.
Open-Source Considerations
gcc is a full-featured and robust compiler for ANSI C, and it's open-source. Sun's Java distribution has a very problematic license. It's not distributed as part of most Linux distributions due to this.
Low-levelness
You cannot effectively duplicate a struct containing other structs in Java or in Perl as you can in C. This is due to the fact that in these languages every struct is a reference. You cannot work with interrupts, DMAs, and other hardware elements. It's much harder to write your own custom fine-grained memory-management routines.
Many problem domains call for ANSI C.
Self-contained-ness
You can compile a C library and distribute it as is. With Java or Perl one often needs a substantial run-time.
Making use of UNIXisms
There's no fork() in Java, and not many other UNIX system calls. You need to create bindings for them (which require JNI) or work around them. Perl and other languages, usually don't have this limitation.
To some extent, this is also true when working on Microsoft Windows, where some of the system calls and APIs are not available in the Java distribution by Sun.
Integrability
A C library can be used by Perl code, Python code, Tcl code, Ruby code, to a lesser extent Java code, etc. Code written in a high-level language is mostly usable only in that language.
Speed and Memory Consumption
And naturally there's speed and memory consumption. Often when speed or low memory consumption is the issue, one cannot use a high-level language, because it may have a lot of overhead in one area or both. Many real-time applications (i.e: applications that should respond to events within a given time) must be written in a low-level language for that reason. Other applications like that are applications that simply can never be fast enough: various APIs, virtual machines, simulations, graphics, audio, video and 3-D programs, etc.
Case Study: Freecell Solver
Freecell Solver is an ANSI C library that can be used to solve games of the Solitaire Card Game Freecell, and similar Solitaire variants. Started as a way for me to see if a theory I had about how a computer can solve Freecell would work in practice, it ended up as a highly-customizable, very feature-rich and pretty fast program. It is the only one with a high quality web-site dedicated to it, has been integrated into at least three GUI games so far, and is the only program of its kind with binaries and source packages distributed for common distributions. I am pretty confident in saying it has become the "category killer" of its kind.
Freecell Solver (also FCS from now on) is written in ANSI C - mostly C89 but with some necessary extensions that are common in most modern ANSI C compilers. It's been successfully compiled with gcc, Visual C++, and the Intel C Compiler . The code of Freecell Solver is very "funky": hard to understand, uses preprocessor macros all over the place (inline and gcc's statement expressions are not portable enough unfortunately) with many tricks to the untrained eye.
There are some comments, but many of them may be out of date. I happen to agree with the Extreme Programming camp that it is really better to refactor and make the code easier to understand than to comment redundantly. But since it was essential that it would run as fast as possible, I could not structure it appropriately. However, I can assure you that the code is modular, and of good quality and maintainable. So don't believe those who claim it's bad.
Luckily, I have written an architecture document that covers the main data structures and algorithms used.
Why Freecell Solver should be written in C?
The best way to realize why C is the ideal language for Freecell Solver is to think how to port it to Java, which a high-level language that is similar to C. Freecell Solver represents the states of the board as an array of pointers to columns, where each column is an array of bytes, where each byte represents a single card. The states themselves are allocated from blocks of contiguous memory. The blocks themselves are not realloced, and thus remain in the same memory address. If more states need to be allocated, then more blocks are created. This is all done to reduce the amount of allocated memory, as individually allocating the states may incur a lot of overhead.
A similar technique is used for the columns themselves. This time they are allocated, out of also contiguous blocks of memory, but this time they are not of fixed size. Each column follows the other one. The columns themselves are placed in a hash, to make sure duplicate columns are eliminated.
Now, how do we do it in Java? In Java every reference to a structure or an array is allocated and manipulated individually. We cannot have them chained one by one in memory. This (together with the fact that the garbage collection also incurs some memory overhead), will cause them to occupy more memory. And more memory means more cache misses, potentially more swapping and less speed.
One way I can think of overriding it is allocating a very large array, and then giving an index. But that would be slower than a simple C pointer.
That's not all there is to it. The Freecell Solver hash implementation itself is very optimized. The chains of the hashes themselves are compactly allocated and from then on re-used in the manner described above. This too is very hard to achieve in Java.
In order to efficiently process and manipulate the states and the individual cards, Freecell Solver contains a great deal of preprocessor macros. Java does not have a preprocessor, and therefore one will need to either:
- Replace them with methods. This would mean lower speed.
- Use a preprocessor (such as filepp) to process the code. This would mean that problems would be harder to track, line numbers may be skewed, and the development environment may not necessarily support it.
- Use the expressions encapsulated by the macros verbatim in the code. This would make it less modular.
Another fact that may incur a lot of overhead in Java is the fact that in some functions Freecell Solver uses a great deal of nested loops and conditionals to search for suitable moves. A loop in Java has more speed overhead than its C equivalent, and together it will probably add up.
Potentially, also all the arrays that are extensively used by Freecell Solver may incur speed overheads due to the fact Java also does bounds checking.
Finally, Freecell Solver uses a lot of C-isms. It compares structs using memcmp(), copies adjacent regions of memory using memcpy(), etc. These have a more limited support in Java, and may not necessarily make sense all of the time.
To sum up, while one can write a program that will be compatible in its output to Freecell Solver in Java, it will feel less native there, and will probably be much slower and more memory hungry. All of these things add up to a lot. C is the best language for writing a Freecell solver in.
Conclusion
Despite the fact that computers have become much faster, and that there are now high level languages whose run-time speed is very adequate, C still has its uses. Whether you want your code to be self-contained (and not contain a huge run-time), or you want to have a fine-grained memory allocation, or you want to interoperate with your system's native built-ins, C may be the best choice. It is insightful to know, and often proves very useful. And it still has its place.