Performance and automation

Posted: February 21st, 2009 | Author: Mars | Filed under: Design | Comments Off

This rant, Why Should I Ever Use C Again, is a funny, irreverent, spot-on critique of a certain strain of micro-optimized thinking about software performance. Samples:

Okay, C is fast. But C is not the only fast language. It’s not even the fastest. If speed was everything, we would be (1) on a race track (2) using assembly. The C-is-fast-so-use-C crowd doesn’t know there is a clan that says the same for assembly. But, in 1980, it was clear that _slower_ C made more sense than _faster_ assembly.

Computers are really good at automating repetitive work. Every decade or so it becomes clear that it has come time to delegate to the computers some entire category of task which programmers have spent years sweating over. When this happens, it’s not because the computer can suddenly do a better job than a clever human being could, on a one-by-one comparison – it’s because the computer can do an adequate job a billion times a day without breaking a sweat. Computers keep getting faster, and for any problem there comes a point where the run-time cost of an automatic solution disappears against the development-time cost of human brainpower.

Programming language compilers themselves are an early example of this process in action. People take the notion of a high-level language for granted now, enough so that nobody uses the term “high-level language” anymore, but writing entire applications in assembly language was once routine. When I was beginning my programming career the need to think in terms of machine instructions was still common enough that every serious development tool offered some way to embed “raw” assembly instructions in your C or Pascal or Basic code. Nobody does that anymore, at all; you could still probably beat the compiler if you worked very hard at it, but it simply is not worth your time to bother.

A similar change occurred with the rise of Java, which pushed the idea of automatic memory management into the mainstream. You youngsters may not believe this, but paying careful attention to the lifetime of every last scrap of data was once a normal part of programming practice. Nobody would design such a language now: memory management is complex, error-prone, and amenable to automatic analysis – therefore it makes no sense to waste any more human brainpower on the problem. We delegate that task to the computers, which will do it in a simple-minded but utterly reliable way, and move on to more interesting things.

We’re about to see another of these shifts. As always, it’s going to start out looking like a nerf toy. How can you get any real work done with something like that, people will wonder? Where’s the go-faster button? Where are the sharp edges? The answer, as always, is that when you have a button, you have to decide whether, and when, to press it. We will choose to give up another layer of fine-grained control in exchange for the opportunity to spend our brainpower solving more interesting problems.

This generational shift is being driven by the ubiquity of multi-processor computing. Everyone has a dual-core machine now, and in two years everyone will have a quad-core machine. Meanwhile, the really big problems are being solved on server farms with thousands of networked processors. The tools we use today leave control of concurrency largely in the hands of the programmer: so programmers have to solve the same set of concurrency design problems over and over. This situation is just begging for automation. In the next few years we will see a new level of abstraction come to prominence: we will choose to give up a few old familiar tools, and in exchange we will be able to delegate the concurrency problem to our new tools.

I think I know what the recipe for this new style will be, and I’m convinced enough to spend a lot of my free time building a working example, but the details are less important than the process. A new generation of candy-coated, dumbed-down, kid-stuff programming tools are about to come down the pike. They will look limited and faintly embarrassing, but get ready to jump on board anyway: those limitations are the key to the future.

Typing is more complex than “dynamic” or “static”

Posted: February 9th, 2009 | Author: Mars | Filed under: Design | Comments Off

This PDF slideshow introduces a concept the author calls “gradual typing” for Python. I’ve been pursuing a similar notion of typing in Radian. You can think of the notion of “type” as the information we have about a value. “Static type” is the information the compiler has about a value, and “dynamic type” is the information the running program has about the value.

From this perspective, one can see the ubiquitous ‘assert’ statement as a supplementary typing system, evaluated at run time. I frequently assert that various parameters are non-null, or that the objects in those parameters are related in some standard way; the C++ language may not include this information in its notion of a type system, but those attributes are very much part of my concept of the types I expect the method to receive. When you look at a type system in these broader terms, little islands of dynamic typing start to show up even in the most “static” languages like C++ and Java.

Wouldn’t it be nice if those ad-hoc assertions I place in my code by habit could be as much a part of the language, and have as much of an effect on the compiler’s knowledge about the values I am working with, as the built-in parameter and variable type annotations?

The Radian type system approaches something much like the linked presentation’s “gradual typing” from the opposite direction. It starts with a completely wide-open type system, where any value could have any type, then allows the programmer to limit the range of options by making assertions about values. Each assertion functions as a guard, or checkpoint; in order to continue, the given conditions must hold.

We then rely on dataflow analysis, constant propagation, and constant folding to resolve as many of these assertions as possible at compile time. The compiler can use the resulting knowledge about the content of each value to produce more efficient code specialized for that type. Each assertion condition can be resolved as either known true, in which case the assertion has been satisfied at compile time and can be omitted from the output, known false, in which case the compiler knows that the assertion cannot succeed and can report an error, or unknown, in which case the assertion will be compiled into the output and checked at run time.

Let’s make the compiler do all the tedious bookkeeping work! That’s what it’s there for. The result, I hope, will be a language which allows programs to start small, simple, and generic, then specialize over time for safety and efficiency – thus gaining the advantages of both “dynamic” and “static” typing.