Arithmetic types as lexically scoped declarations

Posted: January 15th, 2015 | Author: Mars | Filed under: Design | 2 Comments »

There are many algorithms for evaluating arithmetic operations, each useful in different situations. Sometimes you want simple, machine-word integers, and sometimes you want floating-point doubles; but sometimes it would be nice to have infinite-precision fixed-point arithmetic instead.

The usual solution for this problem is a system of related numeric types, with various rules about implicit or explicit conversions between those types, such that the output type for any given operation may be inferred from its inputs. Radian does exactly this, though its type implementations are nameless internal details and not explicitly declared or referenced.

This all works well enough, but every design has its tradeoffs, and I wonder if another strategy might be more convenient. Since Radian types are implicit, there’s no straightforward way for a programmer to specify the type of computation they want, and thus the math package has to maintain as much precision as it can – whether that information is actually useful or not. For the great majority of arithmetic operations, simple machine-word integers are plenty, but the radian library has to overflow into bignums just in case the programmer later decides to care.

Another weakness of the traditional arithmetic model is that there is no interface for specifying behavior when there are multiple valid possibilities. What should the math package do when a program divides by zero? Should it raise an exception, return some special NaN code, or merely approximate infinity and get on with life? Each could be the right answer for some situation, but language designers are generally obligated to pick one and hope it will work for everyone.

What if we separated the ideas of numeric type and arithmetic type? What if, instead of delegating arithmetic operations to number objects, we delegated them to some “calculator” object? One might have a “machine-word integer” calculator, an “IEEE double” calculator, or a “4x IEEE single vector” calculator, each one implementing the various arithmetic operators using some consistent mechanism. Perhaps “calculator” is an interface, with methods named “add”, “subtract”, “multiply”, and so on. The standard library might provide some common calculators, as listed above, but programmers with specialized needs could implement their own calculators in any fashion they saw fit, with any specialized rounding, approximation, or exception-handling behavior they might happen to need. Instead of specifying the types of variables, then, one would specify the types of computations.

Let’s explore how this might work in Radian. At present, arithmetic operators are syntactic sugar for a method call on the left operand, where each operator has a specific name, so these statements are equivalent:

def foo = bar + baz
def foo = bar.add(baz)

Let’s do something else instead: we’ll still call an “add” method, but we’ll imagine that there is some object named “arithmetic” which implements it. These two methods would then become equivalent:

def foo = bar + baz
def foo = arithmetic.add(bar, baz)

Perhaps the language would provide an implicit global definition for “arithmetic” which links against the existing standard library code. Outer-scope symbols can always be overridden by local symbols, so any function or object or control structure would be free to apply its own calculator object by merely defining its own “arithmetic” symbol:

function float_add(x, y):
  import fancy_arithmetic
  def arithmetic = fancy_arithmetic.configure(my_handler, 42, true)
  result = x + y
end function

The addition function still compiles to the same arithmetic.add(x,y) as ever, but the function has provided its own definition of arithmetic, so it can control the algorithm used. In this example I am imagining that it might include some parameters detailing the desired exception behavior and precision.

Since this is a lexical structure, not a dynamic part of the call stack, it is still possible to determine a value’s type at compile time – in fact it becomes much easier to determine what kind of result a given arithmetic operation will have, since the compiler can always tell which calculator object is currently in play. If the language library defined some standard calculator type which could be implemented efficiently using hardware primitives, the compiler might be able to detect that and generate accordingly more efficient code, instead of leaving all the decisions to runtime as it currently must.

It seems unlikely that this is a new idea, but I haven’t been able to find any references to previous experiments along these lines. If you’re familiar with such a project I would love to hear about it.

Posted: December 6th, 2014 | Author: Mars | Filed under: Reference | Comments Off

Empirical Analysis of Programming Language Adoption

We report several prominent findings. First, language adoption follows a power law; a small number of languages
account for most language use, but the programming market supports many languages with niche user bases. Second, intrinsic features have only secondary importance in adoption. Open source libraries, existing code, and experience strongly influence developers when selecting a language for a project. Language features such as performance, reliability, and simple semantics do not. Third, developers will steadily learn and forget languages, and the overall number of languages developers are familiar with is independent of age. Developers select more varied languages if their education exposed them to different language families. Finally, when considering intrinsic aspects of languages, developers prioritize expressivity over correctness. They perceive static types as more valuable for properties such as the former rather than for correctness checking.

Posted: November 24th, 2014 | Author: Mars | Filed under: Reference | 1 Comment »

Dec64: Decimal Floating Point Number Type

DEC64 is a number type. It can precisely represent decimal fractions with 16 decimal places, which makes it very well suited to all applications that are concerned with money. It can represent values as gargantuan as 3.6028797018963967E+143 or as measly as 1.0E-127, which makes it well suited to most scientific applications. It can provide very fast performance on integer values, eliminating the need for a separate int type and avoiding the terrible errors than can result from int truncation.

DEC64 is intended to be the only number type in the next generation of application programming languages.

Eye tracking study on identifier name styles

Posted: October 28th, 2013 | Author: Mars | Filed under: Reference | Comments Off

An Eye Tracking Study on camelCase and under_score Identifier Styles [PDF]:

An empirical study to determine if identifier-naming conventions (i.e., camelCase and under_score) affect code comprehension is presented. An eye tracker is used to capture quantitative data from human subjects during an experiment. The intent of this study is to replicate a previous study published at ICPC 2009 (Binkley et al.) that used a timed response test method to acquire data. The use of eye-tracking equipment gives additional insight and overcomes some limitations of traditional data gathering techniques. Similarities and differences between the two studies are discussed. One main difference is that subjects were trained mainly in the underscore style and were all programmers. While results indicate no difference in accuracy between the two styles, subjects recognize identifiers in the underscore style more quickly.

Rust language has a great entry page

Posted: October 1st, 2013 | Author: Mars | Filed under: Meta | 1 Comment »

The home page for the Rust language is a great piece of design: it explains what’s going on and points you to all the most interesting resources, without wasting any time. The “short summary of features” and “taste of what it looks like” answer the first questions anyone might have when investigating the language. I should build something similar for Radian.

Target constraints and module names

Posted: September 10th, 2013 | Author: Mars | Filed under: Design | Comments Off

Following from yesterday’s design discussion, here is the plan I am considering for platform-specific code.

At present, a statement of the form import foo generates an extern reference to an object named foo, then pushes “foo.radian” onto the compile queue. When the compiler reaches “foo.radian”, it generates a static singleton object named foo whose members are the declarations found the file.

The new system will add awareness of the target OS and architecture. Instead of searching for “foo.radian”, the compiler will search through a sequence of possible file names. Where $OS is the OS component of the LLVM target and $ARCH is the architecture component, the compiler will search for these files, in order, compiling the first one it finds and ignoring the rest:

  1. foo-$OS-$ARCH.radian
  2. foo-$OS.radian
  3. foo-$ARCH.radian
  4. foo.radian

We already have a distinction between the module name, foo, and the file name, “foo.radian”; this change will simply add an additional, optional suffix to the file name, leaving the module name unchanged. Since the compiler will always search for each possible implementation file in order, it will be impossible to import more than one different implementation of the same module for a single target build.

I chose the hyphen instead of Go’s underscore because it is already forbidden from module names. This means it is impossible to accidentally import against a specific platform implementation; even if you only supply an implementation of the file for a single platform, you must always import the module name alone.

Currently supported values of $OS would be “linux” and “macosx”, while $ARCH can be “i386″ or “x86_64″.

Conventions for platform-specific source files

Posted: September 9th, 2013 | Author: Mars | Filed under: Design | Comments Off

The Go language uses OS- and architecture-specific file name suffixes to identify platform-specific implementations of a module. There is a more complex system of build constraints of which the name suffix is only one example, but it’s a pretty powerful convention.

I wonder if a similar scheme would work for Radian. Import statements are already abstracted from filenames, after all – one imports a library named “foo.radian” with the statement import foo and not a more C-style import "foo.radian". Perhaps the build tool could first check for a file named “foo_macos.radian”, if one happens to be targeting Mac OS, and only fall back to “foo.radian” if no platform-specific file exists.

In C++ code, one typical pattern is to define an abstract base class whose methods implement common behavior, calling pure virtual functions for platform-specific operations. Each platform then defines a subclass which implements those methods. Another similar strategy uses the bridge pattern: instead of subclassing, the general class delegates its operations to a platform-specific implementation object.

How might we use this strategy in Radian, if its import statement could transparently pick an appropriate implementation file for the target platform?

In order to use the abstract-base-class approach, we’d have to introduce the notion of module inheritance. Your import foo would actually target foo_linux.radian, or whatever was appropriate, and each implementation would then somehow declare itself to be an extension of foo_common.radian. Perhaps this would all happen magically if the build tool observed the presence of both a foo_platform.radian file and a foo.radian file, but it seems like a lot of magic – and if we’re going to postulate the introduction of a module-inheritance scheme, it seems likely that people might want to use it for purposes other than simply isolating platform-dependent code; it should be called out explicitly.

This makes me think the bridge pattern is a better bet. If you were to design a network library, for example, you might define a network.radian which provides the public API. This file might then be packaged with network_internal_macos.radian, network_internal_windows.radian, and network_internal_linux.radian. The main network.radian module would then import network_internal, automatically picking up the appropriate version for the current target. External users would import network.radian and use some reasonable library API, while the library itself would delegate its grungy details to the methods in the platform-specific files.

Given that Radian requires source file names to consist of a legal identifier followed by the extension “.radian”, it might make more sense to use a hyphen or a period than an underscore. Imagine network_internal-linux.radian, for example, which you’d import network_internal and then refer to via x = and the like.

Case-folding function names

Posted: April 30th, 2013 | Author: Mars | Filed under: Design | Comments Off

Every programming language which manipulates strings offers a pair of functions which convert text to upper or lower case. These functions are often used to perform case-insensitive comparisons – you just convert both strings to either upper or lower case first. This generally works, but it fails for some scripts, and so the Unicode standard defines a case-folding transform which produces a normalized string suitable for caseless matching.

Radian’s string library implements to_upper and to_lower, and it seems reasonable that it should offer a case-folding function too. But what to call it? Nobody else seems to be offering such a function: I can’t find one in .NET, in Python, in PHP, in Ruby – Go has a mysterious function called SimpleFold, but whatever it’s doing, it isn’t what I’m trying to do.

Release 0.7.0

Posted: April 24th, 2013 | Author: Mars | Filed under: Uncategorized | Comments Off
  • Fixed a bug in the garbage collector copy-forwarding mechanism which caused strange runtime library assertion failures after sync statements.
  • No longer misassigns a terminator byte when performing concatenations of certain very short strings.
  • const statement renamed to def. The semantics are the same; only the keyword has changed. The statement never created a constant in the mathematical sense; it simply defines a name for a value which cannot be redefined.
  • Importing a file whose name begins with an underscore no longer fails, trying to load a mangled version of the file name instead. While the compiler does not enforce this convention, prepending a file name with an underscore is a way to show that the file contains private implementation details and should not be imported from outside the directory which contains it.
  • No longer misprocesses hex character escapes inside string literals.
  • More specific error message when you try to modify the self object inside a function and not a method.
  • More helpful error message when an import statement fails: now points at the location of the import statement which failed. This might happen if you typo the import name and accidentally specify a file which does not exist.
  • Fixed a race condition in the parallel work dispatcher which sometimes caused a null dereference crash.

Documentation for map and string objects

Posted: December 17th, 2012 | Author: Mars | Filed under: Progress | Comments Off

I have added two new entries under Documentation: a writeup about the map object and another about the string object. These pages describe the syntax, list the methods and functions offered by the objects, and discuss the computational complexity of the available operations.