SEXPRS(6) SEXPRS(6)
NAME
sexprs - symbolic expressions
DESCRIPTION
S-expressions (`symbolic expressions') provide a way for
programs to store and exchange tree-structured text and
binary data. The Limbo module sexprs(2) provides the vari-
ant defined by Rivest in Internet Draft `draft-rivest-sexp-
00.txt' (4 May 1997), as used for instance by the Simple
Public Key Infrastructure (SPKI). It provides a `canonical'
form of S-expression, and an `advanced' form for display.
They can convey binary data directly and efficiently, unlike
some other schemes such as XML. The two forms are closely
related and all can be read or written by sexprs(2), includ-
ing a variant sometimes used for transport on links that are
not 8-bit safe.
An S-expression is either a sequence of bytes (a byte
string), or a parenthesised list of smaller S-expressions.
All forms start with the fundamental rules below, in
extended BNF:
sexpr ::= string | list
list ::= '(' sexpr* ')'
They give the recursive structure. The various representa-
tions ultimately differ only in how the byte string is rep-
resented and whether white space such as blanks or newlines
can appear.
Furthermore, the definition of string is also common to all
forms:
string ::= display? simple-string
display ::= '[' simple-string ']'
The optional bracketed display string provides information
on how to present the associated byte string to a user.
(``It has no other function. Many of the MIME types work
here.'') Although supported by sexprs(2), it is largely
unused by Inferno applications and is usually left out. The
canonical and advanced forms differ in their definitions of
simple-string. They always denote sequences of 8-bit bytes,
but with different syntax (encodings). Two strings are
equal iff their simple-strings encode the same byte strings
(for both data and display).
Canonical form must be used when exchanging S-expressions
between computers, and when digitally signing an expression.
It is defined by the complete set of rules below:
Page 1 Plan 9 (printed 10/28/25)
SEXPRS(6) SEXPRS(6)
sexpr ::= string | list
list ::= '(' sexpr* ')'
string ::= display? simple-string
display ::= '[' simple-string ']'
simple-string ::= raw
raw ::= nbytes ':' byte*
nbytes ::= [1-9][0-9]+ | 0
Its simple-string is a raw byte string. The primitive byte
represents an 8-bit byte. The length of every byte string
is given explicitly by a preceding decimal value nbytes
(with no leading zeroes). There is no white space. It is
`canonical' because it is uniquely defined for each S-
expression. It is efficient to parse even on small comput-
ers.
Advanced form is more elaborate, and has two main differ-
ences: not all byte strings need an explicit length, and
binary data can be represented in printable form, either
using hexadecimal or base 64 encodings, or using quoted
strings (with escape sequences similar to those of Limbo or
C). Unquoted text is called a token, and is restricted by
the standard to a specific alphabet: it must contain only
letters, digits, or characters from the set `-./_:*+=', and
must not start with a digit. The latter restriction is
imposed to allow byte counts to be distinguished from tokens
without lookahead, but has the consequence that decimal num-
bers must be quoted, as must non-ASCII characters in utf(6)
encoding. Upper- and lower-case letters are distinct. The
advanced transport syntax is defined by the complete set of
rules below:
sexpr ::= string | list
list ::= '(' ( sexpr | whitespace )* ')'
string ::= display? simple-string
display ::= '[' simple-string ']'
simple-string ::= raw | token | base-64 | hexadecimal | quoted-string
raw ::= nbytes ':' byte*
nbytes ::= [1-9][0-9]+ | 0
token ::= token-start token-char*
base-64 ::= decimal? '|' ( base-64-char | whitespace )* '|'
hexadecimal ::= '#' ( hex-digit | whitespace )* '#'
quoted-string ::= nbytes? quoted-string-body
quoted-string-body ::='"' byte* '"'
token-start ::= [-./_:*+=a-zA-Z]
token-char ::= token-start | [0-9]
hex-digit ::= [0-9a-fA-F]
base-64-char ::= [a-zA-Z0-9+/=]
Whitespace is any sequence of blank, tab, newline or
carriage-return characters; note that it can appear only at
the places shown. The bytes in a quoted-string-body are
Page 2 Plan 9 (printed 10/28/25)
SEXPRS(6) SEXPRS(6)
interpreted according to the quoting rules for Limbo (or C).
That is, the bytes are enclosed in quotes, and may contain
the escape sequences for the following characters: backspace
(\b), form-feed (\f), newline (\n), carriage-return (\r),
tab (\t), and vertical tab (\v), octal escape \ooo (all
three digits must be given), hexadecimal escape \xhh (both
digits must be given), \\ for backslash, \' for single
quote, and and \" to include a quote in a string. Note that
a quoted string can have an optional nbytes, but it gives
the length of the byte string resulting after interpreting
character escapes.
Both canonical and advanced forms can contain binary data
verbatim. Sometimes that is troublesome for storage or
transport. At the lexical level any sexpr can therefore be
replaced by the following:
'{' ( base-64-char | whitespace )* '}'
where the text between the braces is the base-64 encoding of
the sexpr expressed in canonical or advanced form. The S-
expression parser will replace the sequence by its decoded,
and resume parsing at the start of that byte string. Note
the difference in syntax and interpretation from rule base-
64 above, which encodes a simple-string, not an sexpr.
EXAMPLES
The following S-expression is in canonical form:
(12:hello world!(5:inner0:))
It is a list of two elements: the string hello world!, and
another list also with two elements, the string inner and an
empty string. All the bytes in the example are printable
characters, but they could have been arbitrary binary val-
ues.
The following is an S-expression in advanced form:
(hello-world
(* "3" "5.6")
(best-of-3 (5:inner0:)))
Note that advanced form contains canonical form as a subset;
here it is used for the innermost list.
SEE ALSO
sexprs(2), json(6), ubfa(6)
R. Rivest, ``S-expressions'', Network Working Group Internet
Draft (4 May 1997), reproduced in /lib/sexp.
Page 3 Plan 9 (printed 10/28/25)