Numerics in Radian

Posted: December 29th, 2009 | Author: Aaron Ballman | Filed under: Design | Comments Off

Initially, Radian handled numbers as basic integers only.  The way floating-point support was implemented was by using the dot operator as a function which would take two integers (the left-hand side and the right-hand side) and return a floating-point value.  However, this also meant that it was basically impossible for us to write any backend support for special processing of numerics.  Also, it meant that it was possible to write some funky things, like:

var wahoo = 12
var blah = 1.wahoo
if blah = 1.12 then
end if

In theory, that would work fine because the dot operator function would concatenate 1 and wahoo (12) together.

Now Radian has lexical support for more numeric formats than just integers.  The support is highly based off of Python, so those of you coming from that language should feel right at home.  We now support integers, floating-point values, hexadecimal, octal and binary literals.  All of the various literal formats are still considered to be a “number” internally, and there basically is no support for coercion aside from what the backend does via C.  Eventually, we’d like to see a numeric tower like Scheme, where each numeric format builds on top of a lower abstraction.  However, that project will wait for another day.

Currently, the syntax for numeric literals is:

Integers:    [1-9][0-9]* | 0
Floats:        [0-9]+.[0-9]+
Hexadecimal:    0[Xx][0-9A-Fa-f]+
Octal:        0[Oo][0-7]+
Binary:        0[Bb][0-1]+

You’ll notice that there is not support for scientific notation with floats, which will likely change in the future.  Also, I’m not supporting sign information as part of the numeric literal.  When you apply a sign, we treat it as an operator currently.  I’m not too keen on that right now because I think it will make constant folding a bit more difficult.  Also, I’m still on the fence as to whether I want to support a unary + as well as a unary – in the same way that Python (and JavaScript, etc) does.  I can’t think of reasonable use cases that aren’t contrived or too clever.


Hello!

Posted: December 29th, 2009 | Author: Aaron Ballman | Filed under: Uncategorized | Comments Off

Introductory posts always scare me — I feel like I’m applying for a job.  :-P

My name is Aaron Ballman, and I’m one of the contributors to the Radian project (and blog).  I started working on compilers with Mars while we were both at REAL Software, and eventually worked my way up to being the lead compiler architect on REALbasic once he moved on to greener pastures at Microsoft.  I’m currently employed by 4D, Inc doing compiler-ish type work on several languages, including JavaScript and SQL.

I’ve been involved with Radian in one way or another for a few years now.  However, it’s only recently that my involvement has started to ramp up into actual productivity instead of just acting as a sounding board.  I’m really looking forward to working with Mars on the language design and implementation work for Radian.

Don’t be too shocked if you see blog posts from me in the future.  At least you now know who I am!  ;-)


Experiences with copyright assignment

Posted: December 22nd, 2009 | Author: Mars | Filed under: Reference | Comments Off

Michael Meeks of the GNOME project has a long and informative article about his experiences with open source licenses that require copyright assignment.


Subscripts and invocations

Posted: December 20th, 2009 | Author: Mars | Filed under: Design | 4 Comments »

In C, the name of a function returns a pointer to the function. A function call is a combination of the function-reference expression with a parameter subscript. Thus, parentheses are required whether they contain any arguments or not; it is the parentheses that distinguish a call from simple reference. In the same language, however, the name of a variable returns that variable’s value. In order to get a reference to the variable, you must prefix the name with the ampersand. This seems a little inconsistent, but in practice it works well, and it makes the use of function pointers feel natural and convenient.

I started out with the same system for the Radian grammar, but decided it made less sense here. I believe that heavy reliance on punctuation tends to make the learning process more difficult. It’s much easier to look up an unfamiliar term or to consult the documentation for some unfamiliar module than it is to guess at the meaning of some novel piece of punctuation. I banished the empty parentheses, therefore, and decided that naming a function invokes it. One must use the capture operator to get a reference to some function.

The problem is that I want to be able to subscript container types (like tuples, arrays, and maps) in order to get element values back, like this:

var foo = ["zero", "one", "two", "three"]
io->print(foo(1))

This doesn’t work, because foo is a var, not a function. The subscript expression is no longer an independent operator, but an adjunct to the act of invoking the function, and has no definition for symbols which are not functions.

One solution would be to define a meaning for the parentheses, when applied to a variable or constant name. This would work, but it gets ugly fast. You can’t tell, when you look at a name followed by a subscript, whether that is a function call with a parameter, or a reference to an element of some container. REALbasic had this problem, since Basic traditionally uses parentheses for both types of subscript, and I was never happy with the grammar compromise we were stuck with.

Instead, I’m going to introduce a second type of subscript, using square brackets. I’m already using square brackets in a non-subscript context as an array literal, as in Python or Javascript, so I think it makes sense to borrow the subscript syntax as well. This will be a postfix operator, not bound to an identifier, so it can be applied to any expression.
io->print(foo[1])

The semantics are not completely clear. Radian has an intrinsic type, the tuple, which I have intended to be a primitive container and not an object. Once you’ve created a tuple, the only thing you can do is ask for one of its elements, by index. The implementation is that the tuple is a function which accepts one parameter, the index. This suggests that the subscript operation should simply call the function reference the expression yields, passing in the value as the sole parameter: exactly the same thing the invoke operator already does. Is there any need for an invoke operator, then? It seems an unfortunate conflation: the square brackets feel right for “get an element from this container”, but arbitrary for “invoke this function reference”.

Further, it’s less clear that this implementation would work for more complex containers, which are likely to be objects. An object is a function which accepts a single parameter, which is a selector identifying a member; the object returns a reference to the function representing that member. A container object, then, would need to accept either a selector representing a member, or an index value representing one of the contained values – how is it to know the difference?

Perhaps it doesn’t matter. Instead of thinking of containers as one subtype of objects, perhaps objects are a subtype of containers! Perhaps the object member access syntax is just a quick shorthand for a common use of a common type of container. The interesting consequence of this approach is that you could create objects out of other containers: if you had some existing map/dictionary type, you could stuff it full of symbol keys mapped to function reference values, and that would be just as legitimate an object as any created through the built-in syntax.


Developing the import system

Posted: December 19th, 2009 | Author: Mars | Filed under: Uncategorized | 5 Comments »

I’ve implemented a little more of the module import system. The import statement has gained a from clause, which lets you specify the subdirectory where the desired module can be found. This search is always relative to your program’s main file, no matter which component you put it in. Imagine that your project had the following file structure, with foo.radian as the main file:

foo:
    foo.radian
    bar.radian
    sub:
        baz.radian
        quux.radian

From any file in this project, you could import modules like this:

import bar
import baz from sub
import quux from sub

That is, imports from baz.radian start from the foo directory too, not the sub directory. I’ve done it this way so that you can break a project up into subdirectories which can then refer to each other’s contents.

There is one special subdirectory name: if you import a module from radian, the compiler will look for it in the common library directory. This directory will contain containers, formatters, IO services, and other utilities that are likely to be useful in nearly every program. I do not intend that this directory will be extended by the user – it should be read-only, if at all possible, associated firmly with a single version of the Radian language distribution.


Language grammar

Posted: December 17th, 2009 | Author: Mars | Filed under: Progress | Comments Off

Pat asked for a copy of the grammar, so I’ve written one up and placed it on the documentation page, along with a list of reserved words.


Source code available

Posted: December 17th, 2009 | Author: Mars | Filed under: Meta, Progress | Comments Off

I’ve created a public repository for the Radian source code, using Git. If you are interested in having a look, you can clone it like this:

git clone http://www.radian-lang.org/git/radian.git

This is the development trunk. As it’s an http archive it is read-only. I’m still getting used to git workflow, but I think the idea is that you can either email me a diff or send me the address of your repository, and I can merge your changes into the public repository.


Struggling with Git

Posted: December 16th, 2009 | Author: Mars | Filed under: Uncategorized | Comments Off

I think I have set up a local repository, and I thought I had set up a repository on the server, but I can’t push from local to remote. I am still working on this problem; source code will be available as soon as I figure it out.


String concatenation, operator overloading

Posted: December 14th, 2009 | Author: Mars | Filed under: Design, Progress | Comments Off

I’ve just added a concatenation operator, using the ampersand character. It is a simple bit of syntactic sugar:

foo = bar & baz
foo = bar.concatenate(baz)

The string type implements a Concatenate method, which returns a new string, as you’d expect. This operator is intended for sequences, generally; specific objects may implement specific optimized concatenations, but you should be able to concatenate any two sequences.

I’ve been intending to implement some kind of overloading for binary operators in general, but wanted to think about multimethods for a while first. I’ve decided not to go that way; multiple dispatch would simplify certain semantic problems at the expense of a much more complicated design, and my principle here is very much “build what you need and defer the rest”. So I expect to reimplement the other binops in the same style: the parser will take care of precedence, but the implementation is just an ordinary method call.

There is a tension in the design here. Objects should be simple, based on a single concept, and each method on the object should be fundamental, an indispensable tool for manipulating that concept. But method calls should be similarly simple: why should you have to know, when you want to concatenate one object with another, where the concatenation code is actually located? The logical operation is the concatenation; the implementation may be specific to the object, or it may apply generally to a wide range of objects, but you shouldn’t have to make that decision when you invoke it.

One solution might be some kind of extension-method system, as found in REALbasic and C#: utility libraries can declare methods which “extend” some existing type. A class’ built in methods take precedence, but if a class lacks a “foo” method, the compiler falls back to any applicable extension method named “foo”. This is particularly useful when combined with interfaces: you can declare methods which work for any instance of that interface, regardless of its implementation type.

Another, less elegant solution would be to define some method corresponding to each operator, located in the standard library, which calls the object’s method if present and falls back to some generic behavior otherwise. This would work for the binary operators, where the actual calling mechanism is hidden, but it wouldn’t help for named methods (what if you wanted to sort some list, for example, which didn’t define its own sort method?).

Well – as always I’m going to do the simplest thing first, and expand on it later if it becomes necessary. For now a simple method call will do the job, so that’s all I’m going to implement. I’ll revisit the issue later if it becomes necessary.


Update

Posted: December 13th, 2009 | Author: Mars | Filed under: Progress | 1 Comment »

I’m back from a week in Montreal. I’ve been doing some work for XSilva Systems, and it was time for some planning meetings. The week kept me busy, but I did get some Radian work done on the plane and during a couple of late evenings.

One minor syntactic change: the end statement’s identifier is now optional. You can just say end to close a block, as in Ruby.

I spent a few hours yesterday rewriting the loop implementations; I’d let them fall out of date. I’ve also been setting up a validation suite, which will help with quality control and will be a simple way to demonstrate how the language works.

It’s clear that I am going to miss my end-of-year goal, which was to have strings, basic math, objects, file I/O, and shell-exec capabilities working. I still think that’s a reasonably minimal definition of a working language, but at the current somewhat contemplative pace I’m not going to have it done January 1st.

I can still get the source code online, however. I’m increasingly comfortable with the idea of releasing the compiler code under GPL, and the runtime/standard-library code under the MIT/simplified BSD license. It almost certainly isn’t going to matter much, but it’s so hard to change licenses later that I want to get this right up front.

The source control system will almost certainly be Git. I’ve been using SVN but I don’t think it makes sense for a public project. I’d rather host the repository myself, but I may end up using github just because it’s easy.

Aaron B. has ported the compiler shell to Windows. It’s great to see support for a new platform showing up. The code isn’t polished yet – depends on hard-coded paths to MinGW – but it’s a great start and I’m sure it’ll improve rapidly.

Joe R. has been working on LLVM integration; he’s started an alternate backend that emits LLVM code instead of C source code. It’ll be nice not to depend on gcc.