Runes

1 Introduction

The runes interface in Closure arose from the need for Unicode characters and strings, while not having a Lisp implementation available that offers those characters. The runes API offers an interface very similar to the character and string interface in standard Common Lisp.

There are two implementations:

rune-is-character

This is for Unicode-aware Lisp implementations; rune is a synonym for character and rod is a synonym for string.

rune-is-integer

This is for environments that are not Unicode-aware. runes really are (unsigned-byte 16). And rods are specialized vectors of those runes.

Note that in any of these models, a rod is vector of rune objects. So that you can use all the standard Common Lisp sequence functions on rod objects; further it is guaranteed that eql works on rune objects.

Additionally there are two reader macros. #/… to read runes and #"" to read rods.

Although most Common Lisp implementations these days have Unicode support, using runes is still a good idea in applications or libraries that aim to be highly portable amoung different implementations. For one thing you get Unicode support for the occasional non-Unicode aware implementation; For the other thing you can be sure, that certain things remain constant. Like the input syntax — is the ASCII Formfeed character called #\Page or #\Formfeed? Or: What code point is used to represent the end of a line? Additionally behavior of string-upcase and friends can vary a lot.

2 Runes

Runes are like characters. However different from Common Lisp, we specify that a rune is a single Unicode code point. And: For every Unicode code point there is a rune.

Implementation Note — In the current implementation a rune might be represented as an (unsigned-byte 16); this is a historical accident, as at the time of writing, the original author was not aware that there will be Unicode code points beyond 216.

Depending on the model choosen, a rune might be either be represented as an unsigned-byte, a character or some otherwise opaque structure.

[Type] rune
[Function] code-rune code

Returns the rune which identifies the Unicode code point code.

[Function] rune-code rune

Returns the code point that rune identifies.

[Function] char-rune char

Returns the rune that corresponds to the Common Lisp character char.

[Function] rune-char rune &optional (default *invalid-rune*)

Returns the a Common Lisp character that corresponds to the rune rune. If the particular rune is not representable as a character in the implementation at hand, default is returned. If, in this case, default is nil, an error is signaled.

[Special Variable] *invalid-rune*

Rune to use as a replacement in rune-char and rod-string for runes not representable as characters. If nil, an error is signalled instead.

Predicates

[Function] runep object

Returns true if object is a rune. Note that unless the rune-is-structure model is selected, we can't tell runes apart from either characters or integers, depending on the model choosen.

[Function] white-space-rune-p rune

Returns true, if the rune rune is a white space. White space defined as either ASCII Space, Linefeed, Carrige Return, or Tabulator. (Code points decimal 32, 10, 13, and 9).

[Function] digit-rune-p rune &optional (radix 10)

If rune is a digit according to the base radix, the weight of the digit is returned; otherwise nil is returned. radix should be an integer in the range [2; 36].

Only arabic digits and latin letters are ever considered to be digits.

[Function] rune= x y
[Function] rune⇐ rune &rest more
[Function] rune>= rune &rest more
[Function] rune-equal rune1 rune2

Returns true, if rune1 and rune2 differ only by case.

3 Rods

Rods are vectors of runes. We specifically opted for not further warp rods into some structure say, for the benefit that the whole bunch of Common Lisp sequence functions work on rods.

[Type] rod

This type refers to a vector of runes. Depending on the implementation model choosen, it is the appropriate subtype of vector.

[Type] simple-rod

This type refers to a simple vector of runes. Depending on the implementation model choosen, it is the appropriate subtype of simple-vector.

[Function] make-rod size

Returns a freshly allocated rod of length size. The initial content of the rod returned is unspecified.

[Function] sloopy-rod-p object

Returns true, if object looks like a rod.

[Function] rune rod index

Returns the indexth rune from rod. It is an error if index is not a positive integer less than the length of rod.

[Function] (setf rune) new-value rod index

Modifies the index'th rune of rod to become new-value. It is an error of index is not a positive integer less than the length of rod. It also is an error if new-value is not a rune.

[Function] %rune rod index

Like the rune accessor, but rod is assumed to be simple-rod and index is assumed to be a legitimate index. This is a low safety variant for speed — use with care.

[Function] (setf %rune) new rod index
[Function] rod object
[Function] rod-subseq rod start &optional end

Returns a freshly allocated rod, that contains in sequence, all the runes in rod, indexed by the range from start to end (exclusively). end defaults to the length of rod.

[Function] rod= rod1 rod2

Returns true, if rod1 and rod2 represent the same sequence of runes.

[Function] rod< rod1 rod2
[Function] rod-equal rod1 rod2

Returns true, if the sequences of runes are rune-equal. That is rod1 and rod2 are compared to each other while ignoring the case of the runes.

4 Case Conversion

Note, that unlike string-upcase and string-downcase we do not make the promise that the length of a case converted string is the same as the original string.

[Function] rune-downcase rune

Converts the rune rune to downcase. If the rune has no case or no down case equivalent is available the original rune is returned.

[Function] rune-upcase rune

Returns the upcase equivalent of rune. If there the rune has no case or has no upcase equivalent the original rune is returned.

[Function] rod-downcase rod

Converts each rune from rod to downcase and returns the resulting rod. Note that the result can have a length different from the input argument.

[Function] rod-upcase rod

Converts each rune from rod to upcase and returns the resulting rod. Note that the result can have a length different from the input argument.

5 Character and String Conversion

There are some convenience functions provided to convert from vanilla Common Lisp characters and strings to rune and rod objects.

[Function] rod-string rod &optional (default-char *invalid-rune*)

Turns the rod rod into a Common Lisp string. default-char is as for rune-char.

[Function] string-rod string

Converts the Common Lisp string string into a rod.

6 Syntax

[Syntax] #/…

This syntax is used to read a rune. It is similar to the Common Lisp #\… syntax.

#/U+nnnn

#xnnnn

rune with the code nnnn hexadecimal

The following semi-standard rune names are defined:

#/Null

#x0000

#/Space

#x0020

#/Newline

#x000A

#/Return

#x000D

#/Tab

#x0009

#/Page

#x000C

The following ASCII runes are defined:

#/nul

#x0000

null character

#/soh

#x0001

start of header

#/stx

#x0002

start of text

#/etx

#x0003

end of text

#/eot

#x0004

end of transmission

#/enq

#x0005

enquiry

#/ack

#x0006

acknowledgment

#/bel

#x0007

bell

#/bs

#x0008

backspace

#/ht

#x0009

horizontal tab

#/lf

#x000A

line feed

#/vt

#x000B

vertical tab

#/ff

#x000C

form feed

#/cr

#x000D

carriage return

#/so

#x000E

shift out

#/si

#x000F

shift in

#/dle

#x0010

data link escape

#/dc1

#x0011

device control 1

#/dc2

#x0012

device control 2

#/dc3

#x0013

device control 3

#/dc4

#x0014

device control 4

#/nak

#x0015

negative acknowledgement

#/syn

#x0016

synchronous idle

#/etb

#x0017

end of transmission block

#/can

#x0018

cancel

#/em

#x0019

end of medium

#/sub

#x001A

substitute

#/esc

#x001B

escape

#/fs

#x001C

file separator

#/gs

#x001D

group separator

#/rs

#x001E

record separator

#/us

#x001F

unit separator

#/del

#x007F

delete

Additional control characters:

#/nbsp

#x00A0

non breakable space

#/shy

#x00A0

soft hyphen

[Syntax] #"…"

The printed representation of a rod is #" the runes and then another ". A double quote character is escaped by "\".

7 Usage Hints