Update

Posted: October 3rd, 2011 | Author: Mars | Filed under: Uncategorized | Comments Off

Three months since the last update – it was a busy summer!

The garbage collector / multithreaded allocator project is stalled at about 80% complete. It took a global change to calling conventions, which inevitably turned up a lot of odd corners and difficult debugging problems.

On another branch, I’ve built a new non-LLVM-based X64 code generator. This is not to imply anything negative about LLVM, which is a fine piece of software engineering, but it is an extremely large, complex library, dwarfing the rest of the Radian compiler by an order of magnitude, and to justify that footprint it must offer correspondingly extreme benefits. Yet I suspect that much of LLVM’s value is in its array of sophisticated solutions to problems that Radian doesn’t have. As part of the parallelization architecture, I’ve pushed a lot of the analysis work which typically happens in the backend up into the frontend semantics engine, and what reaches the code generator is a long stream of tiny functions which contain no loops, branches, aliasing, or other complications. Loop analysis is finished, register allocation is almost trivial due to the low level of register pressure, and there is basically no type information left. What optimization opportunities I do see tend to be specific to Radian semantics, and I doubt whether LLVM is equipped to do much about them.

I’ll benchmark the two code generators against each other, then, and see whether this Radian-specific EmitX64 module can hold its own against LLVM. If so, I’ll be able to radically simplify the Radian compiler build process and reduce its memory and disk footprint; if not, I’ll learn exactly what it is that makes LLVM worth all that weight.


Latest updates

Posted: May 11th, 2011 | Author: Mars | Filed under: Uncategorized | 2 Comments »

The foreign-function interface can just about do something useful now. You can load a function pointer from some external library, explain how you’d like to marshal values, then invoke it as an IO action. The value it returns will be marshalled back, according to your previous specification, into some kind of Radian object.

I’ve been sort of picking gently at the remaining bugs in the system while I think about a better syntax for doing asynchronous I/O. All Radian I/O operations are asynchronous, by nature, since the only way to execute an IO action is to yield it back to the system. Thus you must construct I/O activity as a callback-driven state machine.

This is actually the way I learned to do I/O back in the ’80s, on the non-preemptive classic Mac OS, and I’ve used similar techniques in microcontroller code. Nostalgic as the style may be, however, Radian’s lack of shared mutable state makes doing actual work this way something of a mind-bender.

There’s no getting around the fact that IO is asynchronous and callback-driven, but the pattern for writing sequential IO code in such an environment is predictable, and it should be possible to build in some support that makes the process less tedious. I’d like to introduce an async statement, or an async operator, which does all the relevant housekeeping work. It would split the current function apart: anything downstream of the returned value would be captured and passed in as an implicit callback function, which would carry on with the results of the process whenever the initial IO action had completed.

It has been difficult to wrap my head around this semantic transformation, but I think I’ve come up with a way to do it. Along the way I accidentally found a way to implement C-style continue, break, and return statements, and worked out most of what I would need to do in order to introduce Python-style generator functions (yield, or in C# yield return), so it’s been some productive thinking-time even if I haven’t written much code.

It always feels like there is a lot of work piled up ahead of me, but I’ve been keeping a to-do list, and after the async system is done, the only significant task left before “first public beta” is the garbage collector. At that point I plan to spend some time polishing things up, writing documentation, and preparing an installer package.


FFI implementation

Posted: April 11th, 2011 | Author: Mars | Filed under: Uncategorized | Comments Off

Now that the built in container types are finished, the next most important item on the to-do list appears to be a foreign-function interface: that is, a mechanism for calling functions from external libraries.

We can break this problem down into several pieces. We need:

  • IO function to load a function pointer, by name, from some library file
  • Mechanism to marshal Radian values into C types, and to create Radian objects from C values
  • Annotation scheme so we can describe the parameter and return value signature for some function pointer
  • IO function to invoke some annotated function pointer with some values to be used as parameters

It’s not clear what use a bare function pointer would be, so perhaps the annotation operation should occur at the same time as function-loading. Even still, the result cannot be an invokable, since we can’t know which external functions might have side-effects or thread interactions; the external function pointer must be a type of IO action.

As usual with a big, complex problem, I’m going to start by implementing the piece that seems simplest and most obvious. That looks like the marshaling scheme: I will develop a system of type-objects, which can produce a byte buffer from a Radian value, or can create a Radian value from a byte buffer. I will need to consider endianness, and provide some support for variable-length buffers. This system will probably also incorporate support for text encodings, since those can be seen as schemes for marshaling an abstract string of characters into some concrete byte representation.


List implementation (finally!) finished

Posted: April 9th, 2011 | Author: Mars | Filed under: Uncategorized | Comments Off

It turned out to be a much bigger project than I had expected, but that really just demonstrates why this list object is such an important part of the Radian language. The Radian list does everything you’d expect a list to do, in a language of this type, but it’s an immutable (or “persistent”) data structure, so it is memory efficient and plays nicely with multiple threads.

Immutability has another advantage: it’s a bad idea to return an array from an object method in REALbasic, or a list from a method in Python, because you must either make a copy of the array every time, or you give any caller the ability to modify your object’s internal data storage. With an immutable list, however, giving away a reference to a list is effectively the same as making a complete copy, without actually taking up any additional memory. You can safely return a list from a method – the caller can make any changes they like to their new copy of the list without altering your object’s internal data.


Access specifiers

Posted: March 31st, 2011 | Author: Mars | Filed under: Uncategorized | 1 Comment »

Radian objects and modules currently expose all of their members, since I’ve yet to invent any way to hide them. This is obviously not great, since it means there is no way to distinguish between the elements of an object which are supposed to be part of its public interface, and elements which are mere implementation details, not to be used by outsiders. Radian clearly needs some kind of member access specifier scheme, but what should it look like?

Java and its cousin C#, as well as my old friend REALbasic, use a leading keyword on each declaration. One declares a private void foo() or a public int bar(). This is certainly clear, but it’s verbose, and it makes scope a more prominent feature of the declaration than it really deserves to be.

In C++ and Ruby, access specifiers are independent statements. One typically declares groups of public or private members together, and a single private or public changes access rights for all declarations which follow, until the next access specifier. This tends to result in classes where all the public declarations are grouped together, which is convenient, but in long class declarations it can become difficult to remember what the current access level is.

Python, typically the first place I look for language ideas, has an unusual approach: the language semantics do not actually include any such thing as a private member, and the distinction between public and private members is maintained by convention. Everyone agrees that members whose names begin with a double underscore are private, and that’s that. Private identifiers look different enough from regular identifiers that it’s hard to do the wrong thing by accident.

Go also uses a naming convention, though it is language-enforced and not just a convention: methods beginning with uppercase letters are public, while methods beginning with lowercase letters are private.

What’s the right choice for Radian? I am drawn to the cleverness of the Go approach, but it’s a bit too clever, and I think the distinction will be too easy to miss. I’ve mulled over the idea of a private: / end private statement block, all of whose contents would become private members, but attempting to implement it quickly became a hairy mess, which I tend to take as a sign that the concept is fundamentally flawed. Using per-declaration access specifier keywords, while time-tested, just doesn’t look good to me: it’s too noisy, it doesn’t suit the language.

This leaves me with something like the Python approach: define a naming convention for private members, make it obvious enough that you can’t violate the convention by accident, then leave it up to Radian programmers to respect the rule and do the right thing. I can always come back later and build in some language-level enforcement if I think of an elegant way to implement it, but for now the language can get by without it.

What, then, should this convention be? It’d be easy enough to copy Python’s convention and use a leading double underscore, but I don’t really like it. There are a handful of unused ASCII punctuation marks still: @, $, and % are all too bulky, and neither ;, `, nor \ feel right, but ~ or ^ could work. It’s pretty much arbitrary, so I will try a few things out and codify whatever looks best.


Map implementation finished

Posted: March 16th, 2011 | Author: Mars | Filed under: Uncategorized | 2 Comments »

Radian’s built in map object is done, and it’s hooked up to the brace syntax for construction. The construction expression should be a tuple; the constructor adds each item in the tuple to the map as a separate element. This is exactly the same as calling the map’s add method, passing that element in as its parameter.

var some_data = { "foo" => 1, "bar" => 2, "baz" => 3}
is the same thing as

var some_data = { "foo" => 1 }
some_data->add( "bar" => 2 )
some_data->add( "baz" => 3 )

Lookup uses the indexed-container protocol – that is, it works the same way for a map as it does for a tuple, using postfix square brackets:

var num = some_data["bar"]

This is the same protocol used for object members, which are looked up using symbols, so you can actually make your own object-like maps by associating symbols with captured functions.

Finally, you can remove an element from the map:

some_data->remove( "foo" )

I decided not to have remove raise an exception if the key is not present. Instead, it simply leaves the map unmodified. Similarly, adding an element whose key is already present does not raise an exception; it replaces the old element’s value with the new one.

Map is a container, so it can tell you if it is empty?, but it’s not a sized container, so it can’t (currently) tell you what its size might be. Nor is it currently possible to iterate over the map elements, though I expect that will come along eventually as it is the sort of thing which can’t be implemented without access to the map’s internal data structure.


Map syntax revised again

Posted: March 14th, 2011 | Author: Mars | Filed under: Uncategorized | 2 Comments »

The original syntax for map containers used a special overloading of the colon token as a key-value separator:

var map = {1: "foo", 2: "bar", 3 : "baz"}

While I like the way this reads, I don’t like the way I’ve given the colon three different syntactic roles, and I feel stingy introducing a useful syntax that only works with the built-in map object. Surely people will want to implement their own key-value map datatypes and initialize them in a familiar-looking way!

The solution I’ve chosen is to use the Ruby/Perl style => token to express the key-value grouping. This is a right-associative binary operator, with precedence above the comma and below the logical operators, and its function is to group its operands into a 2-tuple. With this change, the above code sample reads like this:

var map = {1 => "foo", 2 => "bar", 3 => "baz"}

The map-construction syntax is now much simpler: it accepts any expression, and calls the map constructor with that expression as the single argument. The map constructor expects its parameter to be an N-tuple, where each element represents one key-value pair; if the element is a 2-tuple, element 0 is the key and element 1 the value, otherwise the element itself is used as both key and value. This way the map can also be used as a set with no additional fuss.


Exception handling in Go

Posted: February 11th, 2011 | Author: Mars | Filed under: Uncategorized | Comments Off

I like the way Go handles exceptions. It’s a simpler model than most languages use; to my eyes, it offers only the most useful part and leaves out excess complexity. This article explains it well: it’s just another mechanism of control flow, less of a magic alternate world with lots of automatic behavior.


Library design

Posted: November 23rd, 2010 | Author: Mars | Filed under: Uncategorized | Comments Off

With symbol literals and the “has” operator in place, I can start building a type system for the standard library. The library consists of modules, which live in a central directory; each module represents some related package of utilities.

I’ve decided to build a module around each abstract datatype; for example, you can import sequence from radian to use functions relevant to sequences, or import stack from radian to get a collection of stack functions. Each of these modules has a type? function, which examines some arbitrary object and tells you whether it implements the member functions necessary for that interface. I’ve also included an empty function which returns an appropriate null object for the interface. The empty sequence, for example, can still be iterated over, but it ends immediately and yields no values; the empty stack has no items.

For a type like stack, the empty function works like a constructor: you start with stack.empty, then push items onto it. After you’ve popped all the items back off, what you end up with is – once again – stack.empty. I intend to do the same for all the stock container types: dictionary, set, array, stack, queue, deque, tuple, and anything else I end up including.

There’s nothing obligating you to start with the built-in stack.empty, of course: as long as your own object implements the appropriate methods, stack.type?(your_object) will return true, and any function which works on stacks will accept it.


IO system design

Posted: November 10th, 2010 | Author: Mars | Filed under: Uncategorized | 2 Comments »

Functional languages are good at describing pure data transformations, but how do you write a functional program that causes changes to occur in the rest of the computer, or even in the world outside the computer? The whole point of a purely functional language is that functions can only compute values; they cannot cause changes to occur. Doesn’t this mean that actions taken outside the program have to be inherently “impure”?

The solution is to think of the entire state of the outside world as an object; a value which can be described and manipulated. The program itself becomes a function which accepts a state-of-the-world object as a parameter, transforms it, and returns a new state-of-the-world as its result value. The job of a purely functional program is to describe the transformation that should occur on the state of the world outside itself: how to get from point A to point B. It’s up to the operating system, which has invoked the program, to decide what to do with that information.

Radian already has a system for describing transformations of objects; we can represent IO actions the same way. The program receives an implicit parameter named io, which is its representation of the state-of-the-world; the program can do whatever it wants to this object, and whatever value is left in the io variable when the program ends becomes the program’s description of the IO state it wants to create. (We’ll ignore for the moment the question of this object’s internal structure; suffice it to say this is just a pretty UI wrapped around all the old familiar IO streams and system calls.)

This means that an IO action doesn’t actually occur at the point where the program calls an IO method. Instead, the IO object adds the action to its internal queue. As the program continues, the IO queue grows; when the program is finished, it executes the queue, thus applying its state-changes. This is analogous to buffered I/O, where writes may actually live in a buffer for a while, only going to disk via flush() or when the IO subsystem decides it’s time to commit. This buffering approach is what makes Radian’s IO system thread-safe and exception-safe: you can always roll back to a previous IO state.

The problem with this pretty plan – a very big problem! – is that programs don’t just read from the outside world, perform computations, then write their results back to the outside world. It’s usually more of a conversation, where the data that comes in affects the data that goes out and vice versa. If your IO object only represents the state of the world as it was before the program began, how can you implement an interactive process? How, for that matter, can you read in data when the act of reading data itself has side effects? (Think of a socket buffer.)

One option might be to stop the program and flush the IO object before reading any data. The problem is that this doesn’t play well with parallelism, and gives us no way to roll back if an exception occurs – once you’ve flushed a set of IO actions, there’s no way to recall them.

Instead, I’m going to implement an asynchronous IO system. We can think of a read operation as an IO task like any other: the difference is that it doesn’t just change the world-state, but also creates some new information. We can’t just fire and forget; something needs to receive that information and do something with it. That something will be an IO continuation: a function which receives, as its parameters, an IO state and some value. This function is effectively a little program of its own: its job is to interpret the value, do something interesting with it, and return a new IO-state in its turn.

So, you want to read some data: you call io->read, specifying the input stream and the completion. This queues up an IO action which is to read data from that stream; when the read action finishes, it will call the completion routine, passing in the newly-modified world (moving the read pointer may be a tiny change but it is a “side effect” all the same!), plus the data that was read from the stream. Now it’s the completion function’s job: it can parse the data, add it to a buffer, do whatever it wants – write it to a socket, read more data, whatever. In the end it all comes down to another queue of IO actions, which it can return back to its caller. The process continues until some completion, somewhere down the line, decides that all the work is done and returns the IO object without changing it. If there are no actions left to perform, that’s the end – the program exits.

One of the larger problems with this architecture is that I believe people will find it inconvenient. I have observed that most people approaching IO like to think in terms of steps, and like to lay their operations out in a series of commands; recasting the work as a completion-driven state machine does not come naturally. I think I am going to have to implement another code-transformation engine, like the loop hoisting system, that can unroll a function into an IO state machine. There would be some new statement which marked a break, a wait point: it would first commit the IO actions up to that point, then await the result and resume executing.

It’s possible that I can build a single architecture which does something like Python’s sequence generators and this IO system all at once. For what is this IO queue but another sequence, after all? The problem of yielding a sequence of data values asynchronously is not so different from the problem of yielding a sequence of IO actions asynchronously; the key difference is that the IO completion function needs to get a new world-state object along with the data it has read.