Readability and a small change

Posted: October 14th, 2010 | Author: Mars | Filed under: Progress, Reference | 5 Comments »

I’ll have to dig up the reference tomorrow, but I recently read a paper on programming language usability as it relates to natural language conventions; its general drift was that structuring a grammar in ways similar to natural language tends to help more than it hurts. With this in mind I’ve decided to try an experiment, borrowing a convention from Ruby to see how I like it: I’ve extended the identifier syntax to allow the question mark as a suffix, which I’ll use as a convention indicating functions which return a boolean. For example, the sequence interface’s valid function is now spelled valid?. I considered this idea early on, but discarded it as part of a general effort to minimize punctuation; now I think maybe it’s not punctuation itself that I want to reduce, but the use of unfamiliar symbols.

[Edit: The paper was Bruckman and Edwards (1999), "Should We Leverage Natural-Language Knowledge? An Analysis of User Errors in a Natural-Language-Style Programming Language". Thanks to Kelly Caine for a list of language-related HCI papers.]


Sequences and sequence operations

Posted: October 14th, 2010 | Author: Mars | Filed under: Progress, Reference | 1 Comment »

A new library module named sequence contains two functions for working on entire sequences of values. Both functions accept a sequence and a captured expression (aka lambda, closure, or delegate) as parameters, and return a new sequence as a result. map applies its expression to each element of a sequence, returning a new sequence composed of the expression’s result values; filter also applies its expression to each element of a sequence, but it expects the expression to produce a boolean value, which determines whether the input value should be included in the output sequence.

The sequence interface is a central part of the Radian design; the whole point of the parallelization scheme is to allow the compiler to refactor parts of ordinary for-loops into map and filter operations. The language doesn’t actually have any such thing as an interface contract yet, but if it did, the sequence interface might look like this:

interface sequence:
    function iterate    # returns iterator
end sequence

interface iterator:
    function valid    # returns boolean
    function current
    method next
end iterator

That is, a sequence is a thing which can be iterated over, and an iterator is a pointer to some value within the sequence, plus a means of advancing to the next value. The functions in the sequence library will happily process any object that implements this simple interface.

Why so much abstraction? These interfaces represent an array of values, but not only are we abstracting away the array by providing an iterator, we’re also abstracting away the iterator by defining a sequence, whose only characteristic is that it can supply an iterator! The key is that this allows lazy evaluation, which saves memory and reduces the overhead of parallelization. “Lazy evaluation” simply means that we don’t actually have to calculate values until some future process asks for them.

This is key to the way map and filter work. When you map a sequence, the library doesn’t actually calculate anything: it just generates a wrapper object, itself a sequence, which knows how to apply the mapping to each element of the input sequence. The output looks exactly like an array of precomputed values – but you don’t pay the computational cost of those values until you need to use them.

If we did all this transformation using arrays of values, we’d have to use up a lot more memory: we would need enough storage for all of the input values and all of the output values, all at once. By stacking up sequence-transforming wrapper objects, we only need to keep enough storage around for one element at a time.

If we did these transformations as separate loops through a sequence, it would be harder to parallelize efficiently. In order to get the most out of a multicore processor, you need to break the work up into the largest chunks possible, so that the necessary overhead involved in coordinating threads forms as small a percentage of the overall time as possible. By waiting to actually perform the computations as long as possible, we get a chance to stack up a thick pile of sequence-transformers, parallelizing fewer loops and doing more work in each one.

The whole point of the Radian project is to construct a language whose compiler can manage most of these details for you, so that you can write code representing the work you want to do and let the machine work out what order to do things in. You will always be able to dig in and build your own sequence-transforming operations yourself, of course, but if things work out the way I plan, you should be able to write ordinary for-loops full of ordinary step-by-step procedural operations and still get most of the parallelization benefit. It does mean that the language is going to be full of sequences, however, and you will have to work with them instead of against them; if you’re constantly crunching sequences down into arrays, the compiler will not be able to give you very much help.


String module works

Posted: October 5th, 2010 | Author: Mars | Filed under: Reference | 5 Comments »

Now available in the Radian standard library:


module string:
    function decimal(number)
    function hex(number)
    function binary(number)
    function octal(number)
end string

To use the string library, import it into your program, then call each function as a member of the string object:

import string from radian
io->print( "Hello, world! The answer is " & string.decimal(42))

The fun part is that these functions are implemented in Radian code.


Posted: February 27th, 2010 | Author: Mars | Filed under: Reference | Comments Off

Documentation about the DWARF debugging data format.


Experiences with copyright assignment

Posted: December 22nd, 2009 | Author: Mars | Filed under: Reference | Comments Off

Michael Meeks of the GNOME project has a long and informative article about his experiences with open source licenses that require copyright assignment.


Open-source licenses for programming languages

Posted: December 6th, 2009 | Author: Mars | Filed under: Reference | 5 Comments »

Python: custom, BSD-like, plus a clause requiring documentation of changes
Ruby: GPL plus some alternate, less restrictive terms
Erlang: MPL plus modifications; I’m not familiar enough with the original to spot the changes
Perl: Artistic license
Go: BSD
Haskell (GHC): BSD
Clojure: EPL – eclipse license – very wordy, but doesn’t seem to require much beyond BSD
Scala: BSD

I’ve always been a fan of the GPL, but I can’t find any compiler codebase that uses it, save GCC. Perhaps this is because compilers are easier to build than language ecosystems.


Build systems, and version control with Git

Posted: October 21st, 2009 | Author: Mars | Filed under: Reference | Comments Off

You can publish a Git repository on any old web server, without needing a special git daemon, using a command called ‘git-update-server-info’. This would be useful if you wanted to publish some code but lacked the ability to offer any old user an ssh login, perhaps because you are using a hosting service instead of running your own server. Here are a couple of how-to guides:
git-server for the poor: git-update-server-info, rsync, and remote repository
How to publish a Git repository
SourceForge FAQ for Git development

It is also possible to create a patch file with git which you can send via email. I am thinking about setting up an auto-build-verify system that checks a dedicated email inbox, downloads patch files, builds them on a temporary branch, runs a validation suite, and either pushes the changes up to another repository or sends back an email describing the errors that occurred.

We were working toward a system like that just before I left Real Software, using Buildbot. It was a big improvement over the ad-hoc practices we’d always used before, but it was still a reactive notifier rather than an active filter. You checked in code first, then the buildbot would run its tests to see whether you broke anything. It was much better than hearing about it from one’s irritated colleagues a day or two later, but still too fragile.

It was much harder to break the build at Microsoft, where no change could be committed until it included a new test suite and had been shown to pass every existing test. This was unfortunately a completely manual process and thus extremely time-consuming, but it did create an unusual degree of confidence in the checkins, when they finally did happen.

Grunt work is a waste of human time: that’s what robots are for. I want to send my code off to the build system whenever I think it’s ready, let it do the repetitive validation, and then either pass the code on to the development trunk or let me know what went wrong. If I screw up, I’ll have a chance to fix it before I waste anyone else’s time dealing with the problem, and I’ll know that code I pull from the trunk will always work as far as the test suite is concerned.


Home pages for some interesting languages

Posted: October 21st, 2009 | Author: Mars | Filed under: Reference | Comments Off

Python: python.org
Perl: perl.org (does not appear to be an official home page, but this is close)
Ruby: ruby-lang.org
Scala: scala-lang.org
Clojure: clojure.org
Factor: factorcode.org
Haskell: haskell.org
Erlang: erlang.org
OCaml: caml.inria.fr


Verb- versus noun-based models in cognitive psychology

Posted: September 18th, 2009 | Author: Mars | Filed under: Reference | Comments Off

From the blog Psychology of Programming, these excerpts from a 1995 paper on object-oriented programming published in Human-Computer Interaction have some fascinating comments from cognitive psychology research as applied to programming language design:

In careful experiments, Gentner (1981; Gentner & France, 1988) showed that, when people are asked to repair a simple sentence with an anomalous subject-verb combination, they almost always change the verb and leave the noun as it is, independent of their relative positions. This suggests that people take the noun (i.e. the object) as the basic reference point. Models based on objects may be superior to models based on other primitives, such as behaviours.


LLVM tutorial

Posted: July 16th, 2009 | Author: Mars | Filed under: Reference | Comments Off

This tutorial describes the implementation of a simple compiler using LLVM, an increasingly robust code generation library which is rapidly becoming the obvious solution for any compiler targeting x86.