The Economics of Language Scalability
The other day I read an article written by Mike Vanier, a faculty member in Caltech’s computer science department. In it, he discusses what he calls “language scalability”, or the ability of a programming language to support large-scale development.
I have to agree with almost all of the points in the article, particularly his discussion of garbage collection. Having written large applications in both C++ and C#, I can attest that memory management is by far the most dangerous part of software development — not to mention the most tedious. It’s not too bad when you have a deterministic understanding of where the memory is being used, but particularly when you get into object-oriented development in C++, and you generate large object maps or trees in memory, the likelihood that you’re going to correctly free memory approaches zero.
Another thing that I find extremely necessary for me to consider a language “scalable” is the availability of easy to use flexible data structures, such as linked-lists and hashtables. When writing data-driven applications, most of the time data will relate in a non-deterministic way — that is, elements will often relate 1-to-N where N is unknown. In these situations, it’s vital that the language you’re using not force you to define the amount of memory you need before you need it.
Take, for example, the task of loading data out of a database into some type of structures. If a language requires that you tell it how much space you’re going to need before you load the information out of the database, one of three things will happen:
- You’ll allocate a bunch of memory up to the maximum that you “think” you’ll need. Of course, if you’re wrong, you’ll end up with buffer-overflow errors, and if your language sucks, you won’t have any sort of bounds checking on your arrays. Not only will your program get crashy, it’ll open up some nice security flaws as well.
- You’ll come up with a brilliant way to figure out how many elements you have, and allocate precisely the right amount. Unfortunately, this will almost definitely require an additional call to the database (for example, a COUNT(*) query), which adds overhead. Also, I hope you’re not going to let your user add or delete any elements, or that fixed array is going to get reallocated pretty quickly.
- You’ll start to write your 143rd implementation of a dynamically-resizing array-based linked list, chew on your keyboard a few times, and then give up and take a job as an alpaca farmer.
At any rate, while I do agree with the grand majority of what Mike is saying, I think his argument holds up better in theory than in practice. In the commercial sector, large, complex software projects are generally written by a large number of people. As a result, I believe that for a language to truly be scalable in the real world, there must be a “critical mass” of developers that understand the software — so many that enough of them can be assimilated into the same development team. When a company makes a choice to implement a large project in a certain language, they’re making a long-term investment in the life of that language.
A lot of the languages that Mike refers to as scalable, such as ocaml, lack the critical mass necessary necessary to make it worthwhile for companies to make an investment in using them. Even languages that are closer to mainstream, like Common LISP and Eiffel, don’t have enough developers in support of them to make them useful in a commercial environment. This isn’t to say, of course, that you have to write every commercial application in Visual Basic just because there are more VB developers around than anyone else. (If this was the case, I would already be an alpaca farmer.) However, it’s much more likely that a company would take the plunge and support a language like Java or C#, which are arguably “less scalable” than a fringe language like ocaml, simply because the talent required to develop in them was more readily available.
Also, for the record, I can prove that LISP doesn’t scale well at all: the parentheses keys would snap off your keyboard after about 50,000 lines of LISP. :)









