XML(2) XML(2)
NAME
xmlattr, xmlcalloc, xmlelem, xmlfind, xmlfree, xmllook,
xmlmalloc, xmlnew, xmlparse, xmlprint, xmlstrdup, xmlvalue -
XML parser
SYNOPSIS
#include <u.h>
#include <libc.h>
#include <xml.h>
enum {
Fcrushwhite = 1,
Fstripnamespace = 2,
};
struct Xml{
Elem *root; /* root of tree */
char *doctype; /* DOCTYPE structured comment, or nil */
...
};
struct Elem {
Elem *next; /* next element at this hierarchy level */
Elem *child; /* first child of this node */
Elem *parent; /* parent of this node */
Attr *attrs; /* linked list of atributes */
char *name; /* element name */
char *pcdata; /* pcdata following this element */
int line; /* Line number (for errors) */
};
struct Attr {
Attr *next; /* next atribute */
Elem *parent; /* parent element */
char *name; /* atributes name */
char *value; /* atributes value */
};
Attr* xmlattr(Xml *xp, Attr **root, Elem *parent,
char *name, char *value)
Elem* xmlelem(Xml *xp, Elem **root, Elem *parent, char *name)
Elem* xmlfind(Xml *xp, Elem *ep, char *path)
Elem* xmllook(Elem *ep, char *path, char *attr, char *value)
Xml* xmlnew(int blksize)
Xml* xmlparse(int fd, int blksize, int flags)
char* xmlvalue(Elem *ep, char *name)
void* xmlmalloc(Xml *xp, usize size)
void* xmlcalloc(Xml *xp, usize nelem, usize elemsz)
void* xmlstrdup(Xml *xp, char *s)
void xmlfree(Xml *xp)
Page 1 Plan 9 (printed 10/29/25)
XML(2) XML(2)
void xmlprint(Xml *xp, int fd)
DESCRIPTION
Libxml is a library for manipulating an XML document, in-
memory (known as the DOM model). Each element may have a
number of children, each of which has a number of
attributes, each attribute has a single value. All elements
contain a pointer to their parent element, the root element
having a nil parent pointer. Pcdata (free form text) found
between elements is attached to element which follows it.
The line numbers where each element was found is stored to
allow unambigious error messages during later analysis.
Strings are stored in two data structures: a binary tree for
common names such as element and attribute names. Uncommon
names such as values and pcdata are stored in a simple,
unmanaged heap. These steps vastly reduce the memory foot-
print of the parsed file and the time needed to free the XML
data.
Xmlparse reads the given file and builds an in-memory tree.
Blocksize controls the granularity of allocation of the
string heap described above, 8192 is typically used. The
flags field allows some control over the parser, it is a
bitwise or of the following values:
Fcrushwhite
All strings whitespace in PCdata is replaced by a
single space and leading and trailing whitespace
is removed.
Fstripsnamespace
Remove leading namespace strings form all element
and attribute names; this effectively ignores
namespaces which can lead to parsing ambiguities,
though in practice it has not been a problem—yet.
Xml trees may also be built up by calling xmlnew to create
the XML tree, followed by xmlelem and xmlattr to create
individual elements and attributes respectively. Xmlelem
takes the address of the root of an element list to which
the new element should be appended, the address of the par-
ent node the new element should reference, and the name of
the node to create; It returns the address of the created
element.
Xmlattr attaches an attribute to an existing element. It
takes a list pointer and parent pointer like xmlelem, but
requires both an atribute name and value, and returns the
address of the new attribute.
Xmllook descends through the tree rooted at ep using the
Page 2 Plan 9 (printed 10/29/25)
XML(2) XML(2)
path specified in path. It then returns if elem is nil, or
continues to search for a matching element. if attr and
value are not nil, the search will continue for for an ele-
ment which contains this attribute and value pair.
Xmlvalue searches the given element's attribute list and
returns the value of the attribute found or nil if that
attribute is not found.
Xmlprint writes the XML hierarchy rooted at ep as text to
the given file descriptor.
Xmlmalloc, xmlcalloc, and xmlstrdup allocate memory within
the Xml tree. Xmlfree frees all memory used by the given
Xml tree.
SOURCE
/sys/src/libxml
SEE ALSO
xb(1).
BUGS
Namespaces should be handled properly.
A SAX model parser will probably be needed sometime (e.g.
for Ebooks).
UTF-16 headers should be respected but UTF-16 files seems
rare.
Page 3 Plan 9 (printed 10/29/25)