Discussion:
[9fans] MS office XML to txt/troff
(too old to reply)
Steve Simon
2013-02-26 11:26:31 UTC
Permalink
New toys in my contrib to convert modern
microsoft office XML files to text or troff/tbl source.

these live in a directory opc as the standard is known as Open
Packaging Conventions and there may be more tools to come.

docx2troff works pretty well, the formatting is imperfect but
looks OK, embedded drawings are ignored (sorry, too hard).

xlsx2txt works find for text output but custom number formats are
not handled which is disappointing - this means they work fine
for most documents but "clever" spreadsheets can cause problems.
This may get fixed one day - feel free if you want to try.

code in /n/sources/contrib/steve/opc.tgz and depends on
/n/sources/contrib/steve/libxml.tgz

fixes and extensions greatfully received. please don't reformat
the code without contacting me first.

-Steve
erik quanstrom
2013-03-02 20:53:58 UTC
Permalink
Post by Steve Simon
New toys in my contrib to convert modern
microsoft office XML files to text or troff/tbl source.
these live in a directory opc as the standard is known as Open
Packaging Conventions and there may be more tools to come.
docx2troff works pretty well, the formatting is imperfect but
looks OK, embedded drawings are ignored (sorry, too hard).
xlsx2txt works find for text output but custom number formats are
not handled which is disappointing - this means they work fine
for most documents but "clever" spreadsheets can cause problems.
This may get fixed one day - feel free if you want to try.
code in /n/sources/contrib/steve/opc.tgz and depends on
/n/sources/contrib/steve/libxml.tgz
fixes and extensions greatfully received. please don't reformat
the code without contacting me first.
this has been included in 9atom.

it works pretty well for my purposes, and has reduced the need
to switch to google docs for much of anything.

it would be nice to have a troff2docx as well. i've recommended
parsing excel format strings as a gsoc project.

- erik

Loading...