Functions and linguistic style

Posted: November 30th, 2009 | Author: Mars | Filed under: Design | 4 Comments »

The goal of the Radian project is to provide the concurrency benefits of a pure functional language inside the syntax and conceptual style of a high-level imperative language. This suggests two questions: what is it about a pure functional language that makes it useful for developing concurrent software, and what is it about an imperative syntax that makes the marriage of these two divergent linguistic styles worth the effort?

A pure functional language is one where functions cannot have side-effects. That is, a function can accept parameter values, perform some computation, and return a value – but it cannot make any change to the state of the machine outside itself. The value returned may be an arbitrarily complex data structure, but everything the function accomplishes must be somehow expressed in it. There is no such thing as a “side effect”, because all potential effects must be part of the function’s explicit interface.

This restriction, onerous as it may seem, has a great deal of value for concurrent programming. Problems of concurrency are all about coordination of access to shared state. In an imperative language, any function may have side effects which alter any piece of shared state at any time. In a functional language, however, pure functions can only touch what you give them, so it is possible to confidently compose pieces of a program without worrying that they will conflict with each other in some hidden way.

This is not an original observation; the combination of pure functions, immutable objects, and transactions has been tried and found useful in several functional languages. My goal with Radian is to present this set of tools in a clean, clear, accessible package suitable for general programming by non-specialists. In syntax, in nomenclature, and in style, therefore, I am trying to create something that will feel clear, straightforward, and familiar. I am not trying to change the way people write software; I am trying to develop a practical tool which will automate away the rat’s nest of difficult problems currently facing anyone trying to write concurrent software.


Variables and SSA

Posted: November 27th, 2009 | Author: Mars | Filed under: Design | 6 Comments »

There’s an important property in compiler engineering called static single-assignment, or “SSA”. This term describes code which assigns exactly one value to each of its variables. Instead of assigning newly-computed values to existing variables, the code defines a new variable each time it makes an assignment.

SSA is interesting because it dramatically simplifies a wide variety of compiler optimizations. The first thing any modern optimizing compiler will do is transform the intermediate representation of the code it is working on into SSA form. This transformation makes the flow of data through the code much more visible, allowing simpler and more powerful transformations than would be possible if the compiler did not have this knowledge about the code it was working on.

The Erlang programming language goes a step further: the rules of the language enforce single assignment. Once you’ve defined a variable, you cannot assign a new value to it. The Erlang compiler does not need to transform your code into SSA form, because you can’t write it any other way.

Radian used to work the same way, but I found it to be an obnoxious restriction. Sometimes one variable holds different versions of the same notional value as you transform it from one state to another; if I know that this “foo” is the descendant of that “foo”, and ought to occupy the same conceptual space in my program, why do I have to name one “foo_1″ and the other “foo_2″?

I decided to step back and think about variables more deeply. In assembly language, a variable is just a label for a piece of memory; variables in C are defined in much the same way, and languages descended from C pretend that their variables are names for chunks of memory, too. It has been years since this was literally true: modern compilers perform lifetime analysis on their variables, and will split a variable across multiple slots, join two variables into the same slot, or simply eliminate the storage altogether if it is possible to keep the value in a register.

Radian abandons the fiction that variables have anything to do with specific storage mechanisms. A variable is nothing more than a name, associated with a value; it is up to the compiler to work out how, when, and where the value ought to be stored. These variables are really more like symbolic macros than the traditional variables; they are labels for the result of some computation. This gives Radian code the same property Erlang code has, and which transformation to SSA confers on code for C or some other language: the compiler can always tell where data comes from, what happens to it, and where it goes.


There’s an interesting project called Reia which builds a Python/Ruby style language on top of Erlang itself, and has taken a similar approach to variables:

Reia supports the ability to rebind new values to the same variable name, even if that variable is already bound. Contrast this to Erlang, which has single assignment and doesn’t let you rebind new values to the same variable name. This is, to date, the most controversial feature of Reia.


Hello world!

Posted: November 25th, 2009 | Author: admin | Filed under: Meta | Comments Off

Radian is a programming language. It began as a component of my 2004 graphics project, “Starfish”, and has grown in periodic bursts ever since. I’ve been digging in with increasing commitment over the past few months, and I’m starting this blog so I can discuss the design in depth as I push on to the point where Radian is ready to meet the world.

I’m intending to copy over the last year’s worth of Radian-related posts from my personal blog. I’m sure I will continue to make references to the project over there, but from now on all of the detailed technical discussion will happen here.


Modules

Posted: November 12th, 2009 | Author: Mars | Filed under: Design, Progress | 4 Comments »

I have just about finished putting together a basic module system for Radian, which will allow programs to span multiple source files. Designing this system was more difficult than I had anticipated; the module system says a lot about developer workflow, so I had to think pretty far ahead about the things I expect people to do with this language and the ways they will likely want to do it.

The module system came about because I want to start building a standard utility library, and in order to do that the compiler needs to link a program together from multiple source files. I want to have some semi-automatic mechanism for linking against the standard library, but it seemed more sensible to build that on top of a generic linking mechanism than to start with the special case and generalize it.

Lessons learned from experiences with other languages:

  • Modules should not be able to define global identifiers. A client program can always import a qualified symbol into its unqualified namespace, but there’s no way to prevent imported global identifiers from conflicting with each other.
  • Source files should import dependencies explicitly. Interpretation of a source file should not depend on any external context, like a project file, an environment variable, or the contents of some shared directory.
  • Support modules should be initialized and finalized explicitly, in the main program, so that the programmer can control the dependency order.
  • The structure of the program should be visible in the filesystem. Don’t trip people up by introducing a parallel-but-different structural hierarchy.
  • Makefiles are evil. The language must allow the programmer to describe the program in such a way that the compiler can identify all of its parts and build a finished executable in one step.

The system I’ve built works like this. You invoke Radian on a program file; this is equivalent to the “main” function in C. This program file may import module files, which may in turn import other module files, using the import statement:

import foo

This statement declares the name “foo”, representing the contents of the file “foo.radian”, which the compiler expects to find in the same directory as the client file. Imports are simply placeholders, to be resolved at link time, so circular references are not a problem.

The Radian compiler treats the contents of an imported module file as the body of an object declaration. The top-level functions and other declarations in the file become the members of the imported object. A source file cannot simultaneously be a program and a module, since only a program file gets the implicit “io” variable allowing interaction with the rest of the system.

That’s all I’ve built for now. As far as the standard library goes, I think I’ll throw an implicit “radian” import into the top-level namespace of every file. All of the standard utilities will be members of this namespace – much like “std” in C++. This will allow me to extend the standard library in future versions without introducing name conflicts. To make this more convenient, I want to extend the import statement, like Python but in a more sensible order:

import stack from radian

This would define a new item named “stack”, equal to “radian.stack”; you could extend the “from” expression arbitrarily to handle deeper nesting. You wouldn’t have to have imported the “from” identifier on its own; you could import only the item you wanted and leave the rest of the package unimported.

Proceeding onward, I expect that you’ll be able to treat subdirectories as modules – if you had a subdirectory “foo” next to your program file, containing a module file named “bar.radian”, you could import it like this:


# get access to the module file only, resulting in 'bar'
import bar from foo
# import the whole directory, resulting in 'foo.bar'
import foo

I’ll need to design a package system as well, but that is still some distance ahead. It’ll probably look something like python’s packages, which remind me of Mac OS X bundles. I am fairly well convinced that the convenience of a standard central package directory is outweighed by the configuration hassles and dependency tracking issues, so I think I will require packages to be included in the project folder. If people want to keep a central repository of useful libraries, or several such repositories, they can always make a softlink/alias into the project folder.