Regex literals in various languages

Posted: August 22nd, 2012 | Author: Mars | Filed under: Design, Syntax | 3 Comments »

Languages in which regexes are first-class syntax elements:

Awk: /I (love|hate) regexe(s|n)/
Perl: /I (love|hate) regexe(s|n)/ or |I (love\|hate) regexe(s\|n)|
Ruby: /I (love|hate) regexe(s|n)/ or %r!I (love|hate) regexe(s|n)!, where the bang mark can be any delimiter
Javascript: /I (love|hate) regexe(s|n)/
Clojure: #"I (love|hate) regexe(s|n)"

Languages which offer “raw” strings with no internal escapes:

Scala: """I (love|hate) regexe(s|n)"""
Python: """I (love|hate) regexe(s|n)"""

Languages which offer minimally escaped strings:

PHP: 'I (love|hate) regex(s|n)' – backslash escapes backslash and single-quote, but no other characters
Python: r"I (love|hate) regex(s|n)" – can use either single or double quote

The oldest example of a first-class regex literal I can find appears to be in Awk. Ruby and Javascript copied it from there by way of Perl.


3 Comments on “Regex literals in various languages”

  1. 1 Pat Lasswell said at 21:24 on August 22nd, 2012:

    I think awk got that from sed, which took it from ed.

  2. 2 Mars said at 21:26 on August 22nd, 2012:

    An ancient and noble lineage indeed. I think I’ll follow suit.

  3. 3 Mars said at 21:13 on September 4th, 2012:

    Turns out that Perl, Ruby, and Javascript all employ brutal parser kludges to distinguish the backslash character as the beginning of a regex literal from the backslash on its own as a division operator. I can’t see how I’d do any better in Radian, but I’m deeply reluctant to introduce any such cruft into what has until now been a simple, clean, one-way token-to-grammar structure.

    Radian already supports the รท character as a division operator; the backslash was originally supposed to be merely an “ASCII respelling” of the nominal division character. I could just drop that aliasing and use backslash for regular expressions alone – but I’m not entirely sure how anyone would write Radian code on a non-Mac OS system then!