New intrinsic parameter: argv

Posted: November 15th, 2010 | Author: Mars | Filed under: Design | Comments Off

I previously discussed the idea of an IO continuation: a two-parameter function which accepts a world-state and some value, then combines them, creating a sequence of IO actions to return. Each of these actions, when executed, may yield another continuation, which can in its turn generate more actions. It’s a self-extending sequence of little programlets.

The first programlet the IO engine executes is of course the main body of the program itself. This is no different than any other IO continuation; it just happens to be the place we start. It may start up as many IO state-machines as it likes, which will keep working and passing data along until the program has accomplished its job.

The parameter value could be a buffer read from a file, a packet received from a socket, a directory listing, a success/failure code, or whatever it was that the IO action was intended to produce or retrieve. For the main program body, the parameter value is the list of command-line arguments. I’ve followed the Ruby approach here and defined an intrinsic global parameter named argv: this and io are the two parameters the main program receives from the IO engine.

This simple program will print all of its arguments to the console:

for s in argv:
    io->print( s )
end s

Modules, which are singleton objects and not programs or IO continuations, do not have any such parameters; the only place argv is directly available is the main program file.


Making the parallel scheduler useful

Posted: October 21st, 2010 | Author: Mars | Filed under: Design, Progress | Comments Off

To elaborate on yesterday’s somewhat bleary-eyed burst of enthusiasm, the Radian parallelization scheme relies on collaboration between a compile-time optimization and a run-time thread scheduler. The goal is to maximize throughput by spreading work across all available cores; the compiler’s job is to refactor loops into map and reduce stages, and the run-time scheduler’s job is to distribute the mapping work across cores.

Since it’s possible to build map operations by hand, I decided to start with the scheduler. The demo programs I’ve used to test this out all generate some sequence using list comprehension expressions, then iterate over the sequence with a for-loop and do something simple with each value, such as printing out the results. The printing still has to happen on a single core, but the scheduler can get other cores in on the work of evaluating the sequence expression.

The scheduler is designed to regulate itself, depending on the ratio of time taken by the ‘map’ stage versus the ‘reduce’ stage and the number of available cores. What’s more, multiple levels of loop can take advantage of the scheduler simultaneously: if you are using a four-core machine, and the outer loop’s “reduce” stage takes long enough that there’s no need to run more than one of its “map” stages at a time, the inner loop can take over the other two cores to parallelize its behavior. Allocation of cores will shift around depending on how long each stage takes- it doesn’t assume that list elements are uniform.

Now that the scheduler works, the next step is to make the compiler take advantage of it. After generating a loop body, the compiler will first perform invariant code motion (extracting everything that stays the same on every iteration), then perform another hoisting pass to extract induction variables. These induction variables will be grouped into a tuple, and that will be the mapping function on the input sequence; the remaining expressions, all of which depend on values calculated in previous loop iterations, will become the reduction function.


Change to member access syntax

Posted: October 15th, 2010 | Author: Mars | Filed under: Design, Progress | Comments Off

While working on the sequence module I discovered a fatal flaw in my implementation of object member access. I had tried to implement member access in the C++/C#/Java style, where members of “this” are available as unqualified identifiers within the object scope, but the implementation was turning into a kludge-ridden mess. This is generally a good indication that I’ve taken a wrong turn in the design, so I’ve decided to ax the feature for now. One must instead access members explicitly through the self variable, as in Python or JavaScript.

On the bright side, I’ve overhauled the assignment and method-invocation code, and it is now possible to assign values to object properties and invoke methods of object properties, stacked any number of levels deep:

foo->bar->baz = 42
foo->bar->baz( "hello", capture(x: uppercase(x)))


What’s next?

Posted: October 10th, 2010 | Author: Mars | Filed under: Design | 6 Comments »

Finishing the initial set of object features feels like a significant milestone. I took a long enough development break that the roadmap has grown hazy, and I’m having to think through the next few steps again in order to figure out what to start building next. Some of the choices I am mulling over:

  • Standard containers, array and map. These would be implemented in terms of a finger tree and a red-black tree, respectively.
  • String encoding system. Functions to parse a bytestream as UTF8, UTF16, or some other text encoding, and complementary functions to render a string as a bytestream in some encoding.
  • Error management: syntax to create an error value – this is Radian’s equivalent of an exception – and to detect an error and unpack the message it contains.
  • IO file management – ability to read and write files, and to manipulate the file system.
  • Continued development of object features: inheritance, interfaces, possible control over public/private elements.
  • Parallelization kernel: that is, the whole point of this exercise.
  • Sequence-generating loop to go along with the sequence-consuming loop.

All of this stuff needs to happen before Radian can even start becoming a useful tool; the parallelization kernel is the major point of the project, and it is useless without a program structure based on sequences; but sequences represent data, which means we need some way of pulling data in and transforming it so that we have something for the sequence engine to work on.

Perhaps I should create a “before the initial release” task list so I can track my progress against it. Other items, off the top of my head:

  • Garbage collector
  • Rational and real numeric types
  • FFI system for invoking C-api libraries
  • Numerics library with trig functions and roots/exponentials
  • JSON parser, possibly access to an XML parser too
  • ability to execute other programs through the shell

Update

Posted: October 5th, 2010 | Author: Mars | Filed under: Design, Progress | Comments Off

Latest implementation progress: I’ve tracked down and fixed the bug that resulted in malformed object member indexes. Objects are indexed collections of related functions, and the index is implemented as a binary search tree. I had been generating the BST by accumulating the member list in a std::set, then copying the members into an array, in order, and partitioning; but I neglected to create an appropriate comparison function for the set, so “in order” was meaningless. That obstacle down, I fixed a couple of problems in the backend which came up when you tried to loop on a constant instead of a conditional expression.

Speaking of endless loops, Radian currently has nothing like the break statement familiar from C, Python, JavaScript, and the like. Goto-like jumps are awkward in a language with functional semantics; it can be done, but I’d rather find a similar-looking alternative construction that fits more readily onto the language’s natural semantics. I’m thinking of adding an implicit break variable to the body of every loop. If this value is true, the loop will not iterate further. This would be less useful in a while loop, but a for loop could certainly use it.


Data structures implementation

Posted: January 6th, 2010 | Author: Mars | Filed under: Design, Progress | Comments Off

Radian offers syntax for five types of container data structures: tuple, array, set, map, and object. The tuple is a primitive element, implemented in the compiler backend, but the others must be built up from simpler components. I’ve been thinking a lot about the fundamentals of data organization in Radian lately, therefore, and while I’ve spent a lot of time working on the project, I’ve written little code, all of which I have thrown away.

It’s less the implementation that has been puzzling me – though immutability certainly complicates matters – than the fact that I have to design an API for the whole thing, and API design is hard. Objects, like maps, are associative containers; but how much of the map interface should an object share? How much support should I build in for runtime inspection or mutation of object structure? Should arrays serve double-duty as queues and stacks?

I’m making progress, and I think the next round of code I write may actually stick. I’ve settled on a list of standard containers, interfaces, and method names. Key->value containers like maps and objects will be implemented with red-black trees; ordered containers, like arrays, will be finger trees. Both structures offer O(log n) average performance for lookup, insert, and delete, and both are compatible with immutability.

It’s proving difficult to implement any useful API using only the current set of primitives. I’m thinking of upgrading the red-black tree to become one of the primitives, alongside tuples and closures. Tonight’s project is a persistent implementation in Python, as a prototype for the version I intend to embed in the compiler’s semantics engine.


Numerics in Radian

Posted: December 29th, 2009 | Author: Aaron Ballman | Filed under: Design | Comments Off

Initially, Radian handled numbers as basic integers only.  The way floating-point support was implemented was by using the dot operator as a function which would take two integers (the left-hand side and the right-hand side) and return a floating-point value.  However, this also meant that it was basically impossible for us to write any backend support for special processing of numerics.  Also, it meant that it was possible to write some funky things, like:

var wahoo = 12
var blah = 1.wahoo
if blah = 1.12 then
end if

In theory, that would work fine because the dot operator function would concatenate 1 and wahoo (12) together.

Now Radian has lexical support for more numeric formats than just integers.  The support is highly based off of Python, so those of you coming from that language should feel right at home.  We now support integers, floating-point values, hexadecimal, octal and binary literals.  All of the various literal formats are still considered to be a “number” internally, and there basically is no support for coercion aside from what the backend does via C.  Eventually, we’d like to see a numeric tower like Scheme, where each numeric format builds on top of a lower abstraction.  However, that project will wait for another day.

Currently, the syntax for numeric literals is:

Integers:    [1-9][0-9]* | 0
Floats:        [0-9]+.[0-9]+
Hexadecimal:    0[Xx][0-9A-Fa-f]+
Octal:        0[Oo][0-7]+
Binary:        0[Bb][0-1]+

You’ll notice that there is not support for scientific notation with floats, which will likely change in the future.  Also, I’m not supporting sign information as part of the numeric literal.  When you apply a sign, we treat it as an operator currently.  I’m not too keen on that right now because I think it will make constant folding a bit more difficult.  Also, I’m still on the fence as to whether I want to support a unary + as well as a unary – in the same way that Python (and JavaScript, etc) does.  I can’t think of reasonable use cases that aren’t contrived or too clever.


Subscripts and invocations

Posted: December 20th, 2009 | Author: Mars | Filed under: Design | 4 Comments »

In C, the name of a function returns a pointer to the function. A function call is a combination of the function-reference expression with a parameter subscript. Thus, parentheses are required whether they contain any arguments or not; it is the parentheses that distinguish a call from simple reference. In the same language, however, the name of a variable returns that variable’s value. In order to get a reference to the variable, you must prefix the name with the ampersand. This seems a little inconsistent, but in practice it works well, and it makes the use of function pointers feel natural and convenient.

I started out with the same system for the Radian grammar, but decided it made less sense here. I believe that heavy reliance on punctuation tends to make the learning process more difficult. It’s much easier to look up an unfamiliar term or to consult the documentation for some unfamiliar module than it is to guess at the meaning of some novel piece of punctuation. I banished the empty parentheses, therefore, and decided that naming a function invokes it. One must use the capture operator to get a reference to some function.

The problem is that I want to be able to subscript container types (like tuples, arrays, and maps) in order to get element values back, like this:

var foo = ["zero", "one", "two", "three"]
io->print(foo(1))

This doesn’t work, because foo is a var, not a function. The subscript expression is no longer an independent operator, but an adjunct to the act of invoking the function, and has no definition for symbols which are not functions.

One solution would be to define a meaning for the parentheses, when applied to a variable or constant name. This would work, but it gets ugly fast. You can’t tell, when you look at a name followed by a subscript, whether that is a function call with a parameter, or a reference to an element of some container. REALbasic had this problem, since Basic traditionally uses parentheses for both types of subscript, and I was never happy with the grammar compromise we were stuck with.

Instead, I’m going to introduce a second type of subscript, using square brackets. I’m already using square brackets in a non-subscript context as an array literal, as in Python or Javascript, so I think it makes sense to borrow the subscript syntax as well. This will be a postfix operator, not bound to an identifier, so it can be applied to any expression.
io->print(foo[1])

The semantics are not completely clear. Radian has an intrinsic type, the tuple, which I have intended to be a primitive container and not an object. Once you’ve created a tuple, the only thing you can do is ask for one of its elements, by index. The implementation is that the tuple is a function which accepts one parameter, the index. This suggests that the subscript operation should simply call the function reference the expression yields, passing in the value as the sole parameter: exactly the same thing the invoke operator already does. Is there any need for an invoke operator, then? It seems an unfortunate conflation: the square brackets feel right for “get an element from this container”, but arbitrary for “invoke this function reference”.

Further, it’s less clear that this implementation would work for more complex containers, which are likely to be objects. An object is a function which accepts a single parameter, which is a selector identifying a member; the object returns a reference to the function representing that member. A container object, then, would need to accept either a selector representing a member, or an index value representing one of the contained values – how is it to know the difference?

Perhaps it doesn’t matter. Instead of thinking of containers as one subtype of objects, perhaps objects are a subtype of containers! Perhaps the object member access syntax is just a quick shorthand for a common use of a common type of container. The interesting consequence of this approach is that you could create objects out of other containers: if you had some existing map/dictionary type, you could stuff it full of symbol keys mapped to function reference values, and that would be just as legitimate an object as any created through the built-in syntax.


String concatenation, operator overloading

Posted: December 14th, 2009 | Author: Mars | Filed under: Design, Progress | Comments Off

I’ve just added a concatenation operator, using the ampersand character. It is a simple bit of syntactic sugar:

foo = bar & baz
foo = bar.concatenate(baz)

The string type implements a Concatenate method, which returns a new string, as you’d expect. This operator is intended for sequences, generally; specific objects may implement specific optimized concatenations, but you should be able to concatenate any two sequences.

I’ve been intending to implement some kind of overloading for binary operators in general, but wanted to think about multimethods for a while first. I’ve decided not to go that way; multiple dispatch would simplify certain semantic problems at the expense of a much more complicated design, and my principle here is very much “build what you need and defer the rest”. So I expect to reimplement the other binops in the same style: the parser will take care of precedence, but the implementation is just an ordinary method call.

There is a tension in the design here. Objects should be simple, based on a single concept, and each method on the object should be fundamental, an indispensable tool for manipulating that concept. But method calls should be similarly simple: why should you have to know, when you want to concatenate one object with another, where the concatenation code is actually located? The logical operation is the concatenation; the implementation may be specific to the object, or it may apply generally to a wide range of objects, but you shouldn’t have to make that decision when you invoke it.

One solution might be some kind of extension-method system, as found in REALbasic and C#: utility libraries can declare methods which “extend” some existing type. A class’ built in methods take precedence, but if a class lacks a “foo” method, the compiler falls back to any applicable extension method named “foo”. This is particularly useful when combined with interfaces: you can declare methods which work for any instance of that interface, regardless of its implementation type.

Another, less elegant solution would be to define some method corresponding to each operator, located in the standard library, which calls the object’s method if present and falls back to some generic behavior otherwise. This would work for the binary operators, where the actual calling mechanism is hidden, but it wouldn’t help for named methods (what if you wanted to sort some list, for example, which didn’t define its own sort method?).

Well – as always I’m going to do the simplest thing first, and expand on it later if it becomes necessary. For now a simple method call will do the job, so that’s all I’m going to implement. I’ll revisit the issue later if it becomes necessary.


Samples: Hello World and Fibonacci

Posted: December 5th, 2009 | Author: Mars | Filed under: Design | 2 Comments »

Alright then, what does this language actually look like? Let’s dig into some examples. First, of course, the classic Hello World:

io->print("Hello, world!")

The io object is an implicit program argument which lets you talk to the outside world. The print method sends a string to the standard output.

Here’s the Fibonacci function:

function fibonacci(limit):
  var a = 0
  var b = 1
  for i in range(0, limit):
    result = a
    a = b
    b = result + b
  end i
end fibonacci

It’s a keyword-oriented, block-structured language with significant newlines. I considered Python-style indents, but stuck with explicit ends because I like the way they re-synchronize expectations.

Symbols must be declared, and declarations cannot be forward-referenced. Inner blocks can use symbols defined in outer blocks; but code inside a function can only read and not write to a variable defined outside the function.

I’ll probably build a return statement, but for now every function comes with an implicit result variable; the function returns whatever its final value is. Same with multiple assignment – plan to do it, but not a top priority.