XML(2)                                                     XML(2)

     NAME
          xml - XML navigation

     SYNOPSIS
          include "xml.m";

          xml := load Xml Xml->PATH;
          Parser, Item, Location, Attributes, Mark: import xml;

          init:   fn(): string;
          open: fn(f: string, warning: chan of (Locator, string),
                          preelem: string): (ref Parser, string);
          fopen: fn(iob: ref Bufio->Iobuf, f: string, warning: chan of (Locator, string),
                          preelem: string): (ref Parser, string);

          Parser: adt {
              fileoffset: int;

              next:       fn(p: self ref Parser): ref Item;
              down:       fn(p: self ref Parser);
              up:         fn(p: self ref Parser);
              mark:       fn(p: self ref Parser): ref Mark;
              atmark:     fn(p: self ref Parser, m: ref Mark): int;
              goto:       fn(p: self ref Parser, m: ref Mark);
              str2mark:   fn(p: self ref Parser, s: string): ref Mark;
          };

          Item: adt {
              fileoffset: int;
              pick {
              Tag =>
                  name:   string;
                  attrs:  Attributes;
              Text =>
                  ch:     string;
                  ws1:   int;
                    ws2:    int;
              Process =>
                  target: string;
                  data:   string;
              Doctype =>
                  name:   string;
                  public: int;
                  params: list of string;
              Stylesheet =>
                  attrs:  Attributes;
              Error =>
                  loc:    Locator;
                  msg:    string;
              }

     Page 1                       Plan 9            (printed 12/22/24)

     XML(2)                                                     XML(2)

          };

          Locator: adt {
              line:       int;
              systemid:   string;
              publicid:   string;
          };

          Attribute: adt {
              name:       string;
              value:      string;
          };

          Attributes: adt {
              all:        fn(a: self Attributes): list of Attribute;
              get:        fn(a: self Attributes, name: string): string;
          };

          Mark: adt {
              offset:     int;
              str:        fn(m: self ref Mark): string;
          };

     DESCRIPTION
          Xml provides an interface for navigating XML files (`docu-
          ments'). Once loaded, the module must first be initialised
          by calling init.  A new parser instance is created by call-
          ing open(f, warning, preelem), which opens the file f for
          parsing as an XML document, or
          fopen(iob, name, warning, preelem), which does the same for
          an already open Iobuf (the string name will be used in diag-
          nostics).  Both functions return a tuple (p, err).  If there
          is an error opening the document, p is nil, and err contains
          a description of the error; otherwise p can be used to exam-
          ine the contents of the document.  If warning is not nil,
          non-fatal errors encountered when parsing will be sent on
          this channel - a separate process will be needed to received
          them. Each error is represented by a tuple (loc, msg), con-
          taining the location loc, and the description, msg, of the
          error encountered. One XML tag, preelem, may be marked for
          special treatment by the XML parser: within this tag all
          white space will be passed through as-is.

          Once an XML document has been opened, the following Parser
          methods may be used to examine the items contained within:

          p.next()  An XML document is represented by a tree-
                    structure.  Next returns the next item in the doc-
                    ument at the current level of the tree within the
                    current parent element. If there are no more such
                    items, it returns nil.

     Page 2                       Plan 9            (printed 12/22/24)

     XML(2)                                                     XML(2)

          p.down()  Down descends into the element that has just been
                    returned by next, which should be a Tag item. Sub-
                    sequent items returned by next will be those
                    within that tag.

          p.up()    Up moves up one level in the XML tree.

          p.mark()  Mark returns a mark that can be used to return
                    later to the current position in the document. The
                    underlying file must be seekable for this to work.

          p.goto(m) Goes back to a previously marked position, m, in
                    the document.

          p.atmark(m)
                    Atmark returns non-zero if the current position in
                    the document is the same as that marked by m. The
                    current tree level is ignored in the comparison.

          p.str2mark(s)
                    Str2mark turns a string as created by Mark.str
                    back into a mark as returned by Parser.mark.

        Items
          Various species of items live in XML documents; they are
          encapsulated in the Item adt. This contains one member in
          common to all its subtypes: fileoffset, the position in the
          XML document of the start of the item.  The various kinds of
          item are as follows:

          Tag  A generic XML tag.  Name names the tag, and attrs holds
               its attributes, if any.

          Text Text represents inline text in the XML document.  With
               the exception of text inside the tag named by preelem
               in open, any runs of white space are compressed to a
               single space, and white space at the start or end of
               the text is elided.  Ch contains the resulting text;
               ws1 and ws2 are non-zero if there was originally white
               space at the start or end of the text respectively.

          Process
               Process represents an XML document processing direc-
               tive.  Target is the processing instruction's target,
               and data holds the rest of the text inside the direc-
               tive.  XML stylesheet directives are recognised
               directly and have their own item type.

          Doctype
               Doctype should only occur at the start of an xml docu-
               ment, and represents the type of the XML document.

     Page 3                       Plan 9            (printed 12/22/24)

     XML(2)                                                     XML(2)

          Stylesheet
               Stylesheet represents an XML stylesheet processing
               request. The data of the processing request is parsed
               as per the RFC into attribute-value pairs.

          Error
               If an unrecoverable error occurs processing the docu-
               ment, an Error item is returned holding the location
               (loc), and description (msg) of the error.  This will
               be the last item returned by the parser.

          The attribute-value pairs in Tag and Stylesheet items are
          held in an Atttributes adt, say a. A.all() yields a list
          holding all the attributes; a.get(name) yields the value of
          the attribute name.

          The location returned when an error is reported is held
          inside a Locator adt, which holds the line number on which
          the error occurred, the ``system id'' of the document (in
          this implementation, its file name), and the "public id" of
          the document (not currently used).

          A Mark m may be converted to a string with m.str(); this
          enables marks to be written out to external storage, to
          index a large XML document, for example.  Note that if the
          XML document changes, any stored marks will no longer be
          valid.

     SOURCE
          /appl/lib/xml.b

     SEE ALSO
          ``Extensible Markup Language (XML) 1.0 (Second Edition)'',
          http://www.w3.org/TR/REC-xml

     BUGS
          XML's definition makes it tricky to handle leading and
          trailing white space efficiently; ws1 and ws2 in Item.Text
          is the current compromise.

     Page 4                       Plan 9            (printed 12/22/24)