DOC2TXT(1) DOC2TXT(1) NAME doc2txt, olefs, mswordstrings - extract printable strings from Microsoft Word documents SYNOPSIS doc2txt [ file.doc ] aux/olefs [ -m mtpt ] file.doc aux/mswordstrings /mnt/doc/WordDocument DESCRIPTION Doc2txt is a shell script that uses olefs and mswordstrings to extract the printable text from the body of a Microsoft Word document. Microsoft Office documents are stored in OLE (Object Linking and Embedding) format, which is a scaled down version of Microsoft's FAT file system. Olefs presents the contents of an Office document as a file system on mtpt, which defaults to /mnt/doc. Mswordstrings parses the WordDocument file inside an Office document, extracting the text stream. SOURCE /sys/src/cmd/aux/mswordstrings.c /sys/src/cmd/aux/olefs.c /rc/bin/doc2txt SEE ALSO strings(1) ``Microsoft Word 97 Binary File Format'', available on line at Microsoft's developer home page. ``LAOLA Binary Structures'', snake.cs.tu- berlin.de:8081/~schwartz/pmh. Page 1 Plan 9 (printed 5/21/24)