Release 0.1.0

Posted: January 31st, 2012 | Author: Mars | Filed under: Progress | Comments Off

I’ve just tagged v0.1.0 in the git repository and uploaded a new Mac OS X binary build. Changes since last month’s release 0.0.0:

  • A string is a sequence of characters, not a sequence of bytes. When iterating over the characters in a string literal, the contents are no longer returned as individual UTF-8 bytes, but as Unicode code points.
  • Symbol lookup time is now O(log n) in the worst case, not O(n), since the index is now an AA-tree rather than an unbalanced binary tree.
  • A race condition in the symbol allocator has been corrected; it is no longer possible to get duplicate symbols when the first lookup happens simultaneously on different processors.
  • io.print function emits UTF-8 byte streams instead of mangling non-ASCII characters.
  • The identifiers task, generator, and yield have been reserved for use as syntax keywords.
  • String literals implement a compare_to method, which allows them to be compared and used as keys in a map container. This method uses a simple ordinal comparison, not one of the specific Unicode-defined lexical comparisons.
  • Import statement allows an optional aliasing clause, where the local symbol has a different name than the symbol defined in the target module. The complete syntax is:
    'import' identifier ['=' identifier] ['from' expression]
    As with all declarations, the first identifier token is the declared name; the optional second identifier specifies the target to import.
  • Expressions can now be split and continued on the next line after any binop token, including the comma and period tokens. Comments between a binop token and the end of line will be ignored.
  • Tuples implement the sequence interface.
  • Map constructor no longer tries to double up as a set constructor. Entries must have both a key and a value.
  • Unary operators (negate, not, poison, and yield) no longer parse an entire expression instead of a single term. This caused baffling precedence errors.
  • Numerics package supports rational as well as integer arithmetic. Division of two integers will now yield a rational number instead of truncating the remainder and returning an integer. Rational numbers are represented as a pair of integers, numerator and denominator. Rational numbers are normalized such that the numerator and denominator have no common factors and the denominator is always positive and greater than 1.
    Number type has two new predicates: rational?, which is currently true for all numbers, and integer?, which is true only for integral values. The shifting and bitwise arithmetic operations are only defined for integers, not for all rationals.
  • Functions in the number module allow rounding of rational numbers in three modes: ceiling rounds toward positive infinity, floor rounds toward negative infinity, and truncate rounds toward zero.
  • Sign function in the number module returns -1 for negative numbers, 1 for positive numbers, and 0 for all numbers which are equal to zero.
  • New generator block type lets you define a function which returns a
    sequence. The generator supplies each value in the sequence by executing a yield statement. This is not very useful, as you cannot put yield statements inside nested loops or conditionals, but it is a sign of things to come.

Rational numbers

Posted: January 28th, 2012 | Author: Mars | Filed under: Uncategorized | Comments Off

Scientific computing seems like a reasonable class of uses for a tool like Radian, but an integer-only number system will only take us so far. I’ve just finished adding on a rational number system which handles arithmetic on fractional numbers. Dividing one integer by another no longer discards the remainder; if the denominator is not a factor of the numerator, the result is a fractional number instead of an integer.

“Rational” sits below “integer” on the numeric tower; that is, all integers are rationals, but not all rationals are integers. The type system works as follows:

Number has number?, rational?, integer?

Rational extends number with add, subtract, multiply, divide, modulus, exponentiate, numerator, denominator, and compare_to

Integer extends rational with shift_left, shift_right, bit_and, bit_or, bit_xor, and to_char

The next step for numerics will be the addition of an arbitrary-precision “bigint” implementation. Integers are currently limited to the range of a machine word, which is either 32 or 64 bits, and they offer no provision for overflow; the new system will detect overflow and shift up to a larger storage format. If this works out well it should be inexpensive for word-sized integers but allow seamless scaling up to thousands of bits of precision.

Multiline expressions

Posted: January 25th, 2012 | Author: Mars | Filed under: Uncategorized | 2 Comments »

I had a few extra minutes last night and decided that it would make more sense to tackle something small and complete it than to simply peck away at the big async project. I cherry-picked something easy from the low end of my priority list and implemented support for multiple-line expressions.

Radian is a line-oriented language, not a curly-brace language, so end-of-line signals the end of a statement. This is awkward when you are composing a long expression; the line has to keep sprawling off toward the right.

Of course there are many places in an expression where end-of-line would always be an error. Once you’ve opened a parenthesis, for example, you must close the subexpression before the outer expression can end.

The scheme I chose was to skip any end-of-line which occurs after a binary operator token. That is, in any expression of the form exp 'foo' exp, you can insert an end-of-line after the ‘foo’ and continue the expression on the following line.

Another strategy might have been to focus on grouping tokens: parentheses, brackets, and braces. Any end-of-line found inside such a group must also be an error, no matter where in the expression it appears, so it could reasonably be stripped.

I chose the binop-focused syntax instead, though, because it makes life easier for editing tools. With the grouping approach, you’d need to identify the beginning of the statement – which may be an arbitrary number of lines above – before you can be certain whether the current line is the beginning of a statement or the continuation of some expression.

With the binop approach, you need only look at the last token of the previous line: if it is one of a known list of operators, then you know the target line is a continuation rather than a new statement. This will make life a lot easier on developers of syntax highlighters and code-formatting tools.

Sequence generators

Posted: January 23rd, 2012 | Author: Mars | Filed under: Uncategorized | Comments Off

I’ve done some work on the asynchronous sequence and task scheme and perhaps a quarter of the job is finished. I started with generators, as they are a bit simpler than tasks, and have implemented the generator block structure and the yield statement. The compiler can break a block down into what is effectively a series of callbacks, and represent that as a sequence-type object. It’s neat to watch it work.

The next step is to implement a delegating yield from statement. This is important because the compiler will recursively decompose control-flow blocks like if statements and loops into sub-generators; it will the internally “yield from” these sub-generators, thus allowing you to place yield statements inside these nested blocks – an important part of the system!

This whole asynchronous-callback scheme is a pretty big job but I think it is worth the effort. Sequence generator functions will be a big addition to the language; sequences are the whole point of Radian, really, and you can think of the generator block as being the converse of a for-loop. The new IO system is obviously a bit more syntactically involved than the monadic IO system, but I think it will be more approachable. I expect that it will be substantially easier to figure out how to compose complex operations with it. And while the task block is intended primarily for IO, I’m sure people will find many other interesting things to do with it.

Collection objects in Groovy

Posted: January 17th, 2012 | Author: Mars | Filed under: Reference | Comments Off

Examples of some pleasant little syntax tricks for working with range and sequence objects in the Groovy language, which runs on top of Java and its class library.

IO, coroutines, generators

Posted: January 11th, 2012 | Author: Mars | Filed under: Design | 1 Comment »

Radian’s functional semantics make synchronous I/O operations difficult, perhaps impossible; I certainly haven’t figured out how one would implement them. Instead, all I/O operations are performed asynchronously. A program’s job is not to perform I/O actions, but to describe them. You could think of a program as a function which returns a list of changes the external world ought to apply to itself. Each change has a continuation: another program to run once the change has been applied.

This lets you construct chains of I/O actions: read the contents of a directory, open all the files in it, transform the contents of those files, write them back. Each step is a little program which runs as the callback for the previous action, and then returns the next action.

The scheme works well in a mathematical sense but it’s a right nuisance in actual practice. Javascript programmers will be familiar with the issue: you end up with callback nested in callback nested in callback, and you can’t use the language’s built in flow control structures. You have to decompose your loops and conditionals into groups of functions by hand, and the result is hard to read.

The whole point of the Radian project is that the compiler should perform this kind of mechanical, predictable translation for you.You should be able to write straightforward straight-line code that expresses the algorithm you want to execute, and let the compiler figure out how to decompose that into a set of interrelated callback functions.

Let’s start by describing the IO process as an interface. It looks a lot like a sequence: a sequence contains a chain of iterators, where each iterator has a current value and a method letting you move on to the next iterator. An IO task contains a chain of sub-tasks, where each one returns an IO action to perform, and offers a method letting you move on to the next sub-task. The chief difference is that the “next” method needs some value as input, that being whatever came back from the IO action. It might be the contents of a file, the state of a socket, or an error condition flag, but whatever it is, it determines what the next IO subtask is going to do.

I’ll call this thing a “task”, and the subtasks I’ll call “steps”. Task will have a “run” method, which returns the first step; this is analogous to the sequence’s “iterate” method. Each step will have a “current” function, returning some IO action, and a “next” method which accepts the result of that action. Finally, a “valid?” method will return true when the IO task is complete; the terminal step has no “current” or “next”.

It is obviously be possible to use the existing object system to construct such a task, but that doesn’t solve the problem of having to decompose all your flow control by hand. The language needs a new bit of syntax: a task block, which defines a new task, and a yield operation, which marks the divisions between steps. The yield operation is like a function which accepts a parameter – that’s the IO action returned by “current” – and yields a value – which is of course the parameter the caller will pass in to “next”. It’s a doubly apt name: seen as a generator, the yield operator yields the generator’s output; seen as a thread, or coroutine, the yield operator yields control back to some other process.

Once this structure exists, I plan to redefine the IO system in terms of the “task” and “step” interfaces. The program itself will become one big task block, broken down into steps between each “yield” operation. The task syntax block will let you define these asynchronous state-machine tasks in the same way you can currently define ordinary functions and methods, and you’ll be able to create complex sequential IO operations by composing task instances in much the same way that ordinary functions can call each other.

This will completely change Radian IO syntax, of course. Right now one uses the ‘io’ object to create sequences of actions: it notionally represents the state of the world, and you wrap it in a series of actions to transform it. Once you return the resulting, modified state-of-the-world, that commits your changes to reality, and a new state-of-the-world (still called ‘io’) gets passed in to the next continuation function.

In the new system, there’ll be no need to represent the state of the world explicitly. IO actions will be simple objects; you won’t need to have a world-token in order to create them. It will be much more practical to have modules that manage specific types of IO, and return specific types of IO actions. Instead of making calls on the “io” object, you’ll yield things from function calls into various modules. For example, the hello world program will change from this:
io->print("Hello, world!")
to this:
yield console.print("Hello, world!")

This does mean that “yield” and the task system will have to be one of the first things people learn about Radian, but I suppose that’s worth the vast increase in ease of IO programming.

This scheme also opens the door for a lot of useful concurrency features. Tasks are basically coroutines; they’re like lightweight, cooperative threads which don’t need permanently-allocated stacks. You could have thousands of them running at once, Erlang-style.

State of the project

Posted: January 6th, 2012 | Author: Mars | Filed under: Progress | Comments Off

In 2009, I launched this language project and built its compiler. In 2010, I fleshed out the structure and built the automatic parallelization system. In 2011, I finished the core semantic model, implemented the standard container structures, added an IO interface and a memory management system. Now it’s 2012: what comes next?

I’ve accomplished the primary goal for this project, which was to demonstrate that this approach to parallel programming actually works. You can have a language that looks and feels like Python, Ruby, or JavaScript whose semantic properties allow automated parallelism in a manner similar to Haskell or Erlang. You don’t need to invert your thinking and write your code in functional style; given a few carefully chosen semantic restrictions, a compiler can make that transformation for you.

The next step is to build this tool up into a software ecosystem, so that it can actually do some real people some real good. Who are its users going to be? What tools are they currently using? What does Radian need to offer in order to make life easier than whatever they are doing right now? The job facing me in 2012 is to answer these questions and build the components those answers suggest.

I need to find people doing experimental work with huge amounts of data: people who are writing smallish programs that crunch through gigabytes of stuff. Radian should be good for CPU-bound processes involving varied data: it will do well with embarrassingly parallel problems, but the whole point is that it should also do a good job with problems where the parallelism takes a bit more work to find.

Maybe people are analyzing web server logs, or extracting fingerprints from MP3s, or sorting EXIF tags from hundreds of thousands of pictures. Are there scientific applications for this sort of work? There must be – people use Python for scientific work, after all; scipy is a big deal.