Posted: January 31st, 2012 | Author: Mars | Filed under: Progress | Comments Off
I’ve just tagged
v0.1.0 in the git repository and uploaded a new Mac OS X binary build. Changes since last month’s release 0.0.0:
- A string is a sequence of characters, not a sequence of bytes. When iterating over the characters in a string literal, the contents are no longer returned as individual UTF-8 bytes, but as Unicode code points.
- Symbol lookup time is now O(log n) in the worst case, not O(n), since the index is now an AA-tree rather than an unbalanced binary tree.
- A race condition in the symbol allocator has been corrected; it is no longer possible to get duplicate symbols when the first lookup happens simultaneously on different processors.
io.print function emits UTF-8 byte streams instead of mangling non-ASCII characters.
- The identifiers
yield have been reserved for use as syntax keywords.
- String literals implement a
compare_to method, which allows them to be compared and used as keys in a map container. This method uses a simple ordinal comparison, not one of the specific Unicode-defined lexical comparisons.
Import statement allows an optional aliasing clause, where the local symbol has a different name than the symbol defined in the target module. The complete syntax is:
'import' identifier ['=' identifier] ['from' expression]
As with all declarations, the first identifier token is the declared name; the optional second identifier specifies the target to import.
- Expressions can now be split and continued on the next line after any binop token, including the comma and period tokens. Comments between a binop token and the end of line will be ignored.
- Tuples implement the
- Map constructor no longer tries to double up as a set constructor. Entries must have both a key and a value.
- Unary operators (negate, not, poison, and yield) no longer parse an entire expression instead of a single term. This caused baffling precedence errors.
- Numerics package supports rational as well as integer arithmetic. Division of two integers will now yield a rational number instead of truncating the remainder and returning an integer. Rational numbers are represented as a pair of integers, numerator and denominator. Rational numbers are normalized such that the numerator and denominator have no common factors and the denominator is always positive and greater than 1.
Number type has two new predicates:
rational?, which is currently true for all numbers, and
integer?, which is true only for integral values. The shifting and bitwise arithmetic operations are only defined for integers, not for all rationals.
- Functions in the
number module allow rounding of rational numbers in three modes:
ceiling rounds toward positive infinity,
floor rounds toward negative infinity, and
truncate rounds toward zero.
Sign function in the
number module returns -1 for negative numbers, 1 for positive numbers, and 0 for all numbers which are equal to zero.
generator block type lets you define a function which returns a
sequence. The generator supplies each value in the sequence by executing a
yield statement. This is not very useful, as you cannot put yield statements inside nested loops or conditionals, but it is a sign of things to come.
Posted: January 28th, 2012 | Author: Mars | Filed under: Uncategorized | Comments Off
Scientific computing seems like a reasonable class of uses for a tool like Radian, but an integer-only number system will only take us so far. I’ve just finished adding on a rational number system which handles arithmetic on fractional numbers. Dividing one integer by another no longer discards the remainder; if the denominator is not a factor of the numerator, the result is a fractional number instead of an integer.
“Rational” sits below “integer” on the numeric tower; that is, all integers are rationals, but not all rationals are integers. The type system works as follows:
number?, rational?, integer?
Rational extends number with
add, subtract, multiply, divide, modulus, exponentiate, numerator, denominator, and
Integer extends rational with
shift_left, shift_right, bit_and, bit_or, bit_xor, and
The next step for numerics will be the addition of an arbitrary-precision “bigint” implementation. Integers are currently limited to the range of a machine word, which is either 32 or 64 bits, and they offer no provision for overflow; the new system will detect overflow and shift up to a larger storage format. If this works out well it should be inexpensive for word-sized integers but allow seamless scaling up to thousands of bits of precision.
Posted: January 25th, 2012 | Author: Mars | Filed under: Uncategorized | 2 Comments »
I had a few extra minutes last night and decided that it would make more sense to tackle something small and complete it than to simply peck away at the big async project. I cherry-picked something easy from the low end of my priority list and implemented support for multiple-line expressions.
Radian is a line-oriented language, not a curly-brace language, so end-of-line signals the end of a statement. This is awkward when you are composing a long expression; the line has to keep sprawling off toward the right.
Of course there are many places in an expression where end-of-line would always be an error. Once you’ve opened a parenthesis, for example, you must close the subexpression before the outer expression can end.
The scheme I chose was to skip any end-of-line which occurs after a binary operator token. That is, in any expression of the form
exp 'foo' exp, you can insert an end-of-line after the ‘foo’ and continue the expression on the following line.
Another strategy might have been to focus on grouping tokens: parentheses, brackets, and braces. Any end-of-line found inside such a group must also be an error, no matter where in the expression it appears, so it could reasonably be stripped.
I chose the binop-focused syntax instead, though, because it makes life easier for editing tools. With the grouping approach, you’d need to identify the beginning of the statement – which may be an arbitrary number of lines above – before you can be certain whether the current line is the beginning of a statement or the continuation of some expression.
With the binop approach, you need only look at the last token of the previous line: if it is one of a known list of operators, then you know the target line is a continuation rather than a new statement. This will make life a lot easier on developers of syntax highlighters and code-formatting tools.
Posted: January 23rd, 2012 | Author: Mars | Filed under: Uncategorized | Comments Off
I’ve done some work on the asynchronous sequence and task scheme and perhaps a quarter of the job is finished. I started with generators, as they are a bit simpler than tasks, and have implemented the
generator block structure and the
yield statement. The compiler can break a block down into what is effectively a series of callbacks, and represent that as a sequence-type object. It’s neat to watch it work.
The next step is to implement a delegating
yield from statement. This is important because the compiler will recursively decompose control-flow blocks like
if statements and loops into sub-generators; it will the internally “yield from” these sub-generators, thus allowing you to place
yield statements inside these nested blocks – an important part of the system!
This whole asynchronous-callback scheme is a pretty big job but I think it is worth the effort. Sequence generator functions will be a big addition to the language; sequences are the whole point of Radian, really, and you can think of the generator block as being the converse of a for-loop. The new IO system is obviously a bit more syntactically involved than the monadic IO system, but I think it will be more approachable. I expect that it will be substantially easier to figure out how to compose complex operations with it. And while the
task block is intended primarily for IO, I’m sure people will find many other interesting things to do with it.
Posted: January 17th, 2012 | Author: Mars | Filed under: Reference | Comments Off
Examples of some pleasant little syntax tricks for working with range and sequence objects in the Groovy language, which runs on top of Java and its class library.
Posted: January 11th, 2012 | Author: Mars | Filed under: Design | 1 Comment »
Radian’s functional semantics make synchronous I/O operations difficult, perhaps impossible; I certainly haven’t figured out how one would implement them. Instead, all I/O operations are performed asynchronously. A program’s job is not to perform I/O actions, but to describe them. You could think of a program as a function which returns a list of changes the external world ought to apply to itself. Each change has a continuation: another program to run once the change has been applied.
This lets you construct chains of I/O actions: read the contents of a directory, open all the files in it, transform the contents of those files, write them back. Each step is a little program which runs as the callback for the previous action, and then returns the next action.
The whole point of the Radian project is that the compiler should perform this kind of mechanical, predictable translation for you.You should be able to write straightforward straight-line code that expresses the algorithm you want to execute, and let the compiler figure out how to decompose that into a set of interrelated callback functions.
Let’s start by describing the IO process as an interface. It looks a lot like a sequence: a sequence contains a chain of iterators, where each iterator has a current value and a method letting you move on to the next iterator. An IO task contains a chain of sub-tasks, where each one returns an IO action to perform, and offers a method letting you move on to the next sub-task. The chief difference is that the “next” method needs some value as input, that being whatever came back from the IO action. It might be the contents of a file, the state of a socket, or an error condition flag, but whatever it is, it determines what the next IO subtask is going to do.
I’ll call this thing a “task”, and the subtasks I’ll call “steps”. Task will have a “run” method, which returns the first step; this is analogous to the sequence’s “iterate” method. Each step will have a “current” function, returning some IO action, and a “next” method which accepts the result of that action. Finally, a “valid?” method will return true when the IO task is complete; the terminal step has no “current” or “next”.
It is obviously be possible to use the existing object system to construct such a task, but that doesn’t solve the problem of having to decompose all your flow control by hand. The language needs a new bit of syntax: a task block, which defines a new task, and a yield operation, which marks the divisions between steps. The yield operation is like a function which accepts a parameter – that’s the IO action returned by “current” – and yields a value – which is of course the parameter the caller will pass in to “next”. It’s a doubly apt name: seen as a generator, the yield operator yields the generator’s output; seen as a thread, or coroutine, the yield operator yields control back to some other process.
Once this structure exists, I plan to redefine the IO system in terms of the “task” and “step” interfaces. The program itself will become one big task block, broken down into steps between each “yield” operation. The task syntax block will let you define these asynchronous state-machine tasks in the same way you can currently define ordinary functions and methods, and you’ll be able to create complex sequential IO operations by composing task instances in much the same way that ordinary functions can call each other.
This will completely change Radian IO syntax, of course. Right now one uses the ‘io’ object to create sequences of actions: it notionally represents the state of the world, and you wrap it in a series of actions to transform it. Once you return the resulting, modified state-of-the-world, that commits your changes to reality, and a new state-of-the-world (still called ‘io’) gets passed in to the next continuation function.
In the new system, there’ll be no need to represent the state of the world explicitly. IO actions will be simple objects; you won’t need to have a world-token in order to create them. It will be much more practical to have modules that manage specific types of IO, and return specific types of IO actions. Instead of making calls on the “io” object, you’ll yield things from function calls into various modules. For example, the hello world program will change from this:
yield console.print("Hello, world!")
This does mean that “yield” and the task system will have to be one of the first things people learn about Radian, but I suppose that’s worth the vast increase in ease of IO programming.
This scheme also opens the door for a lot of useful concurrency features. Tasks are basically coroutines; they’re like lightweight, cooperative threads which don’t need permanently-allocated stacks. You could have thousands of them running at once, Erlang-style.
Posted: January 6th, 2012 | Author: Mars | Filed under: Progress | Comments Off
In 2009, I launched this language project and built its compiler. In 2010, I fleshed out the structure and built the automatic parallelization system. In 2011, I finished the core semantic model, implemented the standard container structures, added an IO interface and a memory management system. Now it’s 2012: what comes next?
The next step is to build this tool up into a software ecosystem, so that it can actually do some real people some real good. Who are its users going to be? What tools are they currently using? What does Radian need to offer in order to make life easier than whatever they are doing right now? The job facing me in 2012 is to answer these questions and build the components those answers suggest.
I need to find people doing experimental work with huge amounts of data: people who are writing smallish programs that crunch through gigabytes of stuff. Radian should be good for CPU-bound processes involving varied data: it will do well with embarrassingly parallel problems, but the whole point is that it should also do a good job with problems where the parallelism takes a bit more work to find.
Maybe people are analyzing web server logs, or extracting fingerprints from MP3s, or sorting EXIF tags from hundreds of thousands of pictures. Are there scientific applications for this sort of work? There must be – people use Python for scientific work, after all; scipy is a big deal.