Documentation for map and string objects

Posted: December 17th, 2012 | Author: Mars | Filed under: Progress | Comments Off

I have added two new entries under Documentation: a writeup about the map object and another about the string object. These pages describe the syntax, list the methods and functions offered by the objects, and discuss the computational complexity of the available operations.


Change from ‘const’ to ‘def’

Posted: October 25th, 2012 | Author: Mars | Filed under: Design, Progress, Syntax | 2 Comments »

Radian offers two simple symbol types: var lets you define a symbol to which you can later assign a new value, while const is a definition which cannot later be changed. I had expected to make heavy use of const in Radian code since it echoes a pattern I use frequently in C or C++, but in practice I’ve found myself shying away from it. The reason is entirely superficial: it doesn’t feel right, because the values I would be assigning just aren’t constants. Instead, most of the consts I would define are intermediate values – things that will change on every invocation of the function or every pass through the loop, but which can remain unchanged once I’ve defined them. As such it just feels weird to call them constants, and so I tend to define them as var even if I have no intention of ever redefining them.

I still think that const has a good place; in fact I think that using it heavily is good style. I’ve decided therefore to rename it. Stealing a keyword from Python, “constants” are now “definitions”, using the keyword def. I’d avoided def since Python uses it for function definitions, specifically, while Radian functions use function, but sometimes one’s nice clean abstract ideas don’t pan out in practice.

It’s about time to freeze the syntax for a while. Aside from the half-finished regex literals, which are actually present in 0.6, I don’t see any further syntax changes on the horizon. All the upcoming work is in libraries and the toolchain.


File reading API

Posted: September 27th, 2012 | Author: Mars | Filed under: Design, Progress | Comments Off

The regex system is turning out to be a larger project than I had anticipated. It’s still important, but as the length of time it appears likely to consume continues to grow, its immediate priority is dropping. I’m still working on it, but I’m not going to let it delay the long list of smaller pieces of functionality impeding other use-cases.

I am continuing to move away from the original monadic IO system. The latest change is the file-input mechanism: the function that used to be io.read_file is now file.read_bytes. I want it to be clear that the result of this function is a byte buffer, not a string. The buffer object implements the sequence interface, so if I just called it file.read an unobservant ASCII-using programmer might be able to get disturbingly far along without noticing that what they’d read was not actually text, and had not been decoded from its byte form, but merely a string of bytes. By naming the function read_bytes I hope to plant a seed of puzzlement which will lead the programmer to its eventual sibling, read_string, which will require you to specify the encoding of the text file you are reading.

Another change is the elimination of the filespec object. I’d intended to use an abstract mechanism for describing a file, but it’s ultimately nothing but a thin wrapper around a path string. Since every platform I care about uses path strings to identify files, I’ve decided to drop the wrapper. Perhaps there will eventually be a module in the library which implements platform-localized transformations on path strings.


Release 0.5.0

Posted: September 9th, 2012 | Author: Mars | Filed under: Progress | Comments Off

A new version of Radian is available for download. Changes since 0.4.0:

  • --dump switch now supports llvm option, producing LLVM IR as output.
  • IO system rewritten to use asynchronous tasks. IO methods no longer mutate the implicit IO object, but simply return asynchronous task objects which you can then sync to execute. It is no longer necessary to pass in a separate callback expression; the program will continue when task execution completes.
  • Former IO object methods load_external, describe_function, and call have been moved to the FFI (”foreign function interface”) module.
  • Number type predicates have been renamed from number? to is_number, integer? to is_integer, and rational? to is_rational.
  • All type? functions in the standard library have been renamed to type.
  • Question marks are no longer allowed as identifier and symbol suffixes. The
    category “suffix character” no longer exists. An identifier may begin with any
    character in the Unicode category XID_Start, or an underscore, and may continue
    with any number of characters in the Unicode category XID_Continue.
  • Methods of built-in objects check the number of incoming arguments and report an exception when there are too many or too few. Previous behavior was undefined.
  • List member indexed lookup no longer dies with strange “member not found” exception after the list grows larger than 8 items.
  • A list, once reversed, can now concatenate another list without throwing an “unimplemented” exception.
  • Number module now offers a range_with_step function, accepting parameters min, max, and step. Like the normal range function, this counts from min to max. If step is positive, it continues while current <= max; if step is negative, the sequence continues while current >= max.
  • sync operator no longer needs to be the root of its expression: you can now use the result of the sync in a compound expression involving other values, other function calls, and even other syncs. Expressions are processed in deepest-to-shallowest, left-to-right order, and syncs are currently the only expression operator which can cause an observable side-effect.
  • No longer fails to include line number and position when reporting errors
    with parameter definitions.
  • Functions inside a module no longer refer to the module as self; instead they refer to it using the module’s name, derived from its file name, just as other files which import that module would do.
  • set object in the library no longer returns an exception when you try to add an element: that is, the set object will now actualy work as a set container.
  • No longer accepts linebreak characters inside a string literal: that is now an error, as it should have been all along.

Module identifiers

Posted: August 10th, 2012 | Author: Mars | Filed under: Design, Progress | Comments Off

Modules are a lot like objects, and the implementation of module files in Radian’s compiler shares a great deal of code with the implementation of object blocks. One common element they’ve had is the use of self to refer to the current instance, the object on which the function or method was called.

This works fine until you define an object inside a module, something I’ve had occasion to do once or twice, and which I imagine other Radian programmers may also find to be a useful practice: the object’s definition of “self” shadows the module’s “self”, making it awkward to reach the other members of the module. There are workarounds, of course, but they suck.

I’ve just committed some code which changes modules so that the implicit parameter referring to the current module is now simply the name of the module file, minus its “.radian” suffix: that is, it’s the same name you would use to import the module from another file. This has the pleasant implication that references to module members look the same inside the module as they would from outside – though of course code inside the module can refer to private members, while code outside the module cannot.

It does feel just a little strange to have the identifiers available inside a source file depend on a piece of metadata like the file’s name, but the import system is already committed to the idea that filenames matter. It’s conceptually weird, but in practice it’s just requiring you to do something you were probably going to do anyway.


Next demo target

Posted: August 10th, 2012 | Author: Mars | Filed under: Progress | Comments Off

Now that I’ve tackled 99 Bottles, the next target on my radar is Tim Bray’s Wide Finder. This benchmark and blog series was actually one of the major inspirations for reimplementing what was formerly the “starfish rendering language” as a general-purpose parallel computing language. For this, I’ll need a regular-expressions engine, and for that, I’ll need Unicode character class support. I’m looking into the ICU library, which offers a suite of features all of which belong in Radian’s standard library. I’m just not sure yet whether I can repackage its Java-oriented API, full of mutable iterators and heavyweight objects, into a form which can coexist with Radian’s flyweight, throw-away immutable object style. It’d be a shame to reimplement such a comprehensive library!


IO system rewritten

Posted: August 1st, 2012 | Author: Mars | Filed under: Progress | Comments Off

At long last, I’ve finished rewriting the IO system. It is no longer necessary to capture and pass in a continuation procedure every time you invoke an IO action; instead, you sync IO actions back to the system, which suspends your program until the task is complete. It looks a lot like synchronous IO, and you can compose actions in the same way you could with synchronous IO, but the compiler transparently turns it all into asynchronous, thread-friendly, callback-driven code.

This was the last big technological challenge on the to-do list for my initial concept of Radian as a programming tool. There is still plenty of work to be done – filling out the support library, writing documentation, bulking up the validation suite, cleaning up a few dozen other loose ends – but this was the last big area of unknown potential problems. As a proof of concept, Radian is now complete.

This change means Radian has a new “hello world” – it goes like this:

sync io.print( "Hello, world" )


Release 0.4.0

Posted: July 25th, 2012 | Author: Mars | Filed under: Progress | Comments Off

I’ve built and uploaded x86_64-macosx and i386-linux versions of Radian version 0.4.0, including these changes since the last release:

  • Yield statement works inside while-loops.
  • No longer fails an assertion when a program defines two different functions with the same name in the same scope; instead reports an appropriate error.
  • Validation suite works again; check.sh or make check runs all tests.
  • New relation module includes greater, less, and equal relations and functions which determine whether a given relation is_greater, is_less, is_equal, is_greater_or_equal, is_less_or_equal, or is_not_equal. Relations are the values returned by compare functions such as sequence.compare.
  • string.join function concatenates a sequence of strings into a single string; string.join_with function inserts a delimiter between every pair of strings in a sequence, returning a single string.
  • Array methods which take index parameters no longer stop working after reversing the array.
  • New sync operator turns the current function into an asynchronous task generator; it yields its (optional) argument as the current response, then returns the next value sent in by the caller. A statement form of sync lets you ignore the next value and just yield back a response. Response values should be other asynchronous tasks.
  • Object constructor parameters no longer become members of the result object; they are just ordinary parameters now.
  • Compiler reports an error when it finds a direct reference to an object member – one that does not go through self. Such references almost certainly wouldn’t do what you expect them to do, so they are now forbidden.
  • string.from_sequence function accepts a sequence of characters and turns it into a string object.
  • Passing the wrong number of arguments to a function no longer produces an undefined result; it will now raise an exception.
  • Command-line --dumptokens switch has been replaced with --dump option: supported output types are tokens, flowgraph, and lic.

Asynchronous tasks

Posted: July 16th, 2012 | Author: Mars | Filed under: Design, Progress | Comments Off

I’ve gone back and forth and back again on the nomenclature: the current implementation adds a sync operator. A function which contains a sync becomes a task generator in exactly the same way that a function which contains a yield becomes a sequence generator.

A task generator is a function which returns a task; a task represents a series of related actions. Each action holds a response from the previous action; if the task is_running, you may send a new value. This updates the action pointer, creating a new response.

This scheme allows a program to describe a complex chain of asynchronous actions and continuations using normal imperative syntax. You don’t need to break your code up into a lot of nested callbacks, or laboriously transform a simple loop into some object with state; instead you can use the sync operator and let the compiler do that work for you.

This is very similar to the async function system in C# or Visual Basic, with Radian’s sync operator taking the place of C#’s await. There’s no need to explicitly declare that the function is async, though; the compiler will figure that out. It is also very similar to Python’s enhanced generators, though Python fuses yield and sync into a single operator, reusing iterators as asynchronous tasks. I considered this approach for Radian, but extending iterators in that way turned out to significantly impede the compiler’s ability to extract map/reduce operations out of loops. The constraints are an important part of the design, so I kept the two mechanisms separate.

The point of all this engineering, of course, is that I can now redesign the I/O API around the asynchronous task system. At present, writing a Radian program which performs any kind of I/O interaction or touches global system state in any way is a masochistic exercise in long chains of callbacks. You can’t really use the language the way it’s meant to be used, since you have to turn your code inside out just to talk to the filesystem. With the new I/O model, your entire program will effectively be one big asynchronous task, and only the presence of the sync keyword will distinguish a normal function call from one which performs some IO action.

Inside a sequence generator, one can either yield a single value into the sequence output, or yield from another sequence to splice all of its values in as though the current generator had yielded them itself. Inside an asynchronous task, however, the sync operator expects that everything you return will be another asynchronous tasks. It’s as though sync is always doing yield from: you are always syncing from another asynchronous task. If you want to create a new atomic action which just returns some value, there will be a utility function in the task module which creates such a task which you can then sync from.


Asynchronous loops are finished

Posted: July 2nd, 2012 | Author: Mars | Filed under: Progress | Comments Off

I’ve finished implementing asynchronous loops. You can now yield values or entire sequences from within a while or for loop. You can nest if, while, and for blocks arbitrarily deep, and the compiler will generate all the necessary sequencing apparatus.

Generator functions are a great way to save memory and improve processor time, compared to the alternative strategy of doing all the work up front and returning an array. If you generate a whole array, you have to keep all the memory for the whole array around until you’re done with the array, even if you’re only stepping through the array looking at a handful of its elements at a time.

Any time you might want to append a bunch of values to an output array, consider using a generator function instead. By yielding values instead of appending them, whatever process downstream is consuming the sequence can effectively single-step through your function, running only as much as necessary to produce the next value. This saves memory, since the sequence consumer can release everything it is done with and doesn’t need to use memory for data it has yet to reach. It also improves performance – since the sequence consumer is probably a for-loop, Radian’s parallel scheduler can stack your sequence generator on top of the for-loop and dispatch larger chunks to the worker threads. Larger chunks of work means fewer context switches means less overhead means higher throughput, and that’s the whole point of having multiple cores.