DOC2TXT(1) DOC2TXT(1)
NAME
doc2txt, olefs, mswordstrings - extract printable strings
from Microsoft Word documents
SYNOPSIS
doc2txt [ file.doc ]
aux/olefs [ -m mtpt ] file.doc
aux/mswordstrings /mnt/doc/WordDocument
DESCRIPTION
Doc2txt is a shell script that uses olefs and mswordstrings
to extract the printable text from the body of a Microsoft
Word document.
Microsoft Office documents are stored in OLE (Object Linking
and Embedding) format, which is a scaled down version of
Microsoft's FAT file system. Olefs presents the contents of
an Office document as a file system on mtpt, which defaults
to /mnt/doc. Mswordstrings parses the WordDocument file
inside an Office document, extracting the text stream.
SOURCE
/sys/src/cmd/aux/mswordstrings.c
/sys/src/cmd/aux/olefs.c
/rc/bin/doc2txt
SEE ALSO
strings(1)
``Microsoft Word 97 Binary File Format'', available on line
at Microsoft's developer home page.
``LAOLA Binary Structures'', snake.cs.tu-
berlin.de:8081/~schwartz/pmh.
Page 1 Plan 9 (printed 3/7/26)