Release 0.1.1

Posted: February 1st, 2012 | Author: Mars | Filed under: Uncategorized | No Comments »

I made a couple of stupid mistakes in yesterday’s build, so I’ve tagged and uploaded version 0.1.1.

  • Removed debug spew when inserting or doing lookups on a map object.
  • String literal comparison no longer returns spurious “false” results when comparing two equal strings. This also caused map lookups to fail.

Rational numbers

Posted: January 28th, 2012 | Author: Mars | Filed under: Uncategorized | No Comments »

Scientific computing seems like a reasonable class of uses for a tool like Radian, but an integer-only number system will only take us so far. I’ve just finished adding on a rational number system which handles arithmetic on fractional numbers. Dividing one integer by another no longer discards the remainder; if the denominator is not a factor of the numerator, the result is a fractional number instead of an integer.

“Rational” sits below “integer” on the numeric tower; that is, all integers are rationals, but not all rationals are integers. The type system works as follows:

Number has number?, rational?, integer?

Rational extends number with add, subtract, multiply, divide, modulus, exponentiate, numerator, denominator, and compare_to

Integer extends rational with shift_left, shift_right, bit_and, bit_or, bit_xor, and to_char

The next step for numerics will be the addition of an arbitrary-precision “bigint” implementation. Integers are currently limited to the range of a machine word, which is either 32 or 64 bits, and they offer no provision for overflow; the new system will detect overflow and shift up to a larger storage format. If this works out well it should be inexpensive for word-sized integers but allow seamless scaling up to thousands of bits of precision.


Multiline expressions

Posted: January 25th, 2012 | Author: Mars | Filed under: Uncategorized | 2 Comments »

I had a few extra minutes last night and decided that it would make more sense to tackle something small and complete it than to simply peck away at the big async project. I cherry-picked something easy from the low end of my priority list and implemented support for multiple-line expressions.

Radian is a line-oriented language, not a curly-brace language, so end-of-line signals the end of a statement. This is awkward when you are composing a long expression; the line has to keep sprawling off toward the right.

Of course there are many places in an expression where end-of-line would always be an error. Once you’ve opened a parenthesis, for example, you must close the subexpression before the outer expression can end.

The scheme I chose was to skip any end-of-line which occurs after a binary operator token. That is, in any expression of the form exp 'foo' exp, you can insert an end-of-line after the ‘foo’ and continue the expression on the following line.

Another strategy might have been to focus on grouping tokens: parentheses, brackets, and braces. Any end-of-line found inside such a group must also be an error, no matter where in the expression it appears, so it could reasonably be stripped.

I chose the binop-focused syntax instead, though, because it makes life easier for editing tools. With the grouping approach, you’d need to identify the beginning of the statement – which may be an arbitrary number of lines above – before you can be certain whether the current line is the beginning of a statement or the continuation of some expression.

With the binop approach, you need only look at the last token of the previous line: if it is one of a known list of operators, then you know the target line is a continuation rather than a new statement. This will make life a lot easier on developers of syntax highlighters and code-formatting tools.


Sequence generators

Posted: January 23rd, 2012 | Author: Mars | Filed under: Uncategorized | No Comments »

I’ve done some work on the asynchronous sequence and task scheme and perhaps a quarter of the job is finished. I started with generators, as they are a bit simpler than tasks, and have implemented the generator block structure and the yield statement. The compiler can break a block down into what is effectively a series of callbacks, and represent that as a sequence-type object. It’s neat to watch it work.

The next step is to implement a delegating yield from statement. This is important because the compiler will recursively decompose control-flow blocks like if statements and loops into sub-generators; it will the internally “yield from” these sub-generators, thus allowing you to place yield statements inside these nested blocks – an important part of the system!

This whole asynchronous-callback scheme is a pretty big job but I think it is worth the effort. Sequence generator functions will be a big addition to the language; sequences are the whole point of Radian, really, and you can think of the generator block as being the converse of a for-loop. The new IO system is obviously a bit more syntactically involved than the monadic IO system, but I think it will be more approachable. I expect that it will be substantially easier to figure out how to compose complex operations with it. And while the task block is intended primarily for IO, I’m sure people will find many other interesting things to do with it.


First binary release

Posted: December 30th, 2011 | Author: Mars | Filed under: Uncategorized | Comments Off

I’ve just posted the first binary release package for Radian, which means that you can now try the language out without having to build the compiler yourself. It’s a Mac OS X build and the version number is 0.0.0. It includes the Radian compiler, the standard library, and a sample program.


Garbage collector working

Posted: November 16th, 2011 | Author: Mars | Filed under: Uncategorized | Comments Off

I pushed up the first working implementation of the new garbage collector last night. It’s a tracing copy collector, designed to efficiently compact small heaps. Radian has a branching generational memory model: a subtask can operate in its own allocation zone, with read-only access to objects from other zones. When the task completes, we can copy its live set into the parent zone, then release the subtask zone as a whole block.

By running tasks in separate allocation zones, we can avoid the cost of mutex locking, as threads no longer need to contend for access to the same allocator. Further, since allocation zones operate on a page-level granularity, tasks running on separate processors need not touch the same cache lines. Only at the point when subtask results are reintegrated into the parent do we need a lock – but we already have to lock that operation anyway.

This allocator/collector project has been a lot of work, and I really hope it turns out to have been worth the engineering time. Radian’s semantics are not very much like those of a conventional object-oriented language, and I believe that a memory management system designed around its needs will perform much better than a generic allocator & collector ever could have. The whole point of this project is to maximize processor throughput, after all – the fancy loop splitting stuff is not going to do us any good if the cores spend all their time contending over allocator locks.


Posted: November 12th, 2011 | Author: Mars | Filed under: Uncategorized | Comments Off

Relaxed radix balanced trees: efficient immutable vectors

Immutable vectors are a convenient data structure for func-
tional programming and part of the standard library of modern languages
like Clojure and Scala. The common implementation is based on wide
trees with a xed number of children per node, which allows fast in-
dexed lookup and update operations. In this paper we extend the vector
data type with a new underlying data structure, Relaxed Radix Balanced
Trees (RRB-Trees), and show how this structure allows immutable vector
concatenation, insert-at and splits in O(logN) time while maintaining the
index, update and iteration speeds of the original vector data structure.

They compare this new structure to finger trees, which I used for Radian’s array container:

2-3 fi nger trees achieve a lgN time for indexing and update while at the sametime maintaining an amortized constant time for adding items to the vector. Concatenation can be accomplished in lgN time too. Although attractive, using this data structure for vectors does compromise the 1/5 lgN time for index, making it theoretically 5 times slower.


Update

Posted: October 3rd, 2011 | Author: Mars | Filed under: Uncategorized | Comments Off

Three months since the last update – it was a busy summer!

The garbage collector / multithreaded allocator project is stalled at about 80% complete. It took a global change to calling conventions, which inevitably turned up a lot of odd corners and difficult debugging problems.

On another branch, I’ve built a new non-LLVM-based X64 code generator. This is not to imply anything negative about LLVM, which is a fine piece of software engineering, but it is an extremely large, complex library, dwarfing the rest of the Radian compiler by an order of magnitude, and to justify that footprint it must offer correspondingly extreme benefits. Yet I suspect that much of LLVM’s value is in its array of sophisticated solutions to problems that Radian doesn’t have. As part of the parallelization architecture, I’ve pushed a lot of the analysis work which typically happens in the backend up into the frontend semantics engine, and what reaches the code generator is a long stream of tiny functions which contain no loops, branches, aliasing, or other complications. Loop analysis is finished, register allocation is almost trivial due to the low level of register pressure, and there is basically no type information left. What optimization opportunities I do see tend to be specific to Radian semantics, and I doubt whether LLVM is equipped to do much about them.

I’ll benchmark the two code generators against each other, then, and see whether this Radian-specific EmitX64 module can hold its own against LLVM. If so, I’ll be able to radically simplify the Radian compiler build process and reduce its memory and disk footprint; if not, I’ll learn exactly what it is that makes LLVM worth all that weight.


Latest updates

Posted: May 11th, 2011 | Author: Mars | Filed under: Uncategorized | 2 Comments »

The foreign-function interface can just about do something useful now. You can load a function pointer from some external library, explain how you’d like to marshal values, then invoke it as an IO action. The value it returns will be marshalled back, according to your previous specification, into some kind of Radian object.

I’ve been sort of picking gently at the remaining bugs in the system while I think about a better syntax for doing asynchronous I/O. All Radian I/O operations are asynchronous, by nature, since the only way to execute an IO action is to yield it back to the system. Thus you must construct I/O activity as a callback-driven state machine.

This is actually the way I learned to do I/O back in the ’80s, on the non-preemptive classic Mac OS, and I’ve used similar techniques in microcontroller code. Nostalgic as the style may be, however, Radian’s lack of shared mutable state makes doing actual work this way something of a mind-bender.

There’s no getting around the fact that IO is asynchronous and callback-driven, but the pattern for writing sequential IO code in such an environment is predictable, and it should be possible to build in some support that makes the process less tedious. I’d like to introduce an async statement, or an async operator, which does all the relevant housekeeping work. It would split the current function apart: anything downstream of the returned value would be captured and passed in as an implicit callback function, which would carry on with the results of the process whenever the initial IO action had completed.

It has been difficult to wrap my head around this semantic transformation, but I think I’ve come up with a way to do it. Along the way I accidentally found a way to implement C-style continue, break, and return statements, and worked out most of what I would need to do in order to introduce Python-style generator functions (yield, or in C# yield return), so it’s been some productive thinking-time even if I haven’t written much code.

It always feels like there is a lot of work piled up ahead of me, but I’ve been keeping a to-do list, and after the async system is done, the only significant task left before “first public beta” is the garbage collector. At that point I plan to spend some time polishing things up, writing documentation, and preparing an installer package.


FFI implementation

Posted: April 11th, 2011 | Author: Mars | Filed under: Uncategorized | Comments Off

Now that the built in container types are finished, the next most important item on the to-do list appears to be a foreign-function interface: that is, a mechanism for calling functions from external libraries.

We can break this problem down into several pieces. We need:

  • IO function to load a function pointer, by name, from some library file
  • Mechanism to marshal Radian values into C types, and to create Radian objects from C values
  • Annotation scheme so we can describe the parameter and return value signature for some function pointer
  • IO function to invoke some annotated function pointer with some values to be used as parameters

It’s not clear what use a bare function pointer would be, so perhaps the annotation operation should occur at the same time as function-loading. Even still, the result cannot be an invokable, since we can’t know which external functions might have side-effects or thread interactions; the external function pointer must be a type of IO action.

As usual with a big, complex problem, I’m going to start by implementing the piece that seems simplest and most obvious. That looks like the marshaling scheme: I will develop a system of type-objects, which can produce a byte buffer from a Radian value, or can create a Radian value from a byte buffer. I will need to consider endianness, and provide some support for variable-length buffers. This system will probably also incorporate support for text encodings, since those can be seen as schemes for marshaling an abstract string of characters into some concrete byte representation.