NAME¶
docx2txt - convert Microsoft OOXML files to plain text.
SYNOPSIS¶
docx2txt [ infile.docx|-|-h ] [ outfile.txt|- ]
docx2txt < infile.docx
docx2txt < infile.docx > outfile.txt
DESCRIPTION¶
This manual page documents briefly the
docx2txt commands.
docx2txt docx2txt is a tool that attempts to generate equivalent (ASCII)
text files from Microsoft .docx documents, preserving some formatting and
document information (which MS text conversion drops) along with appropriate
character conversions for a good (ASCII) text experience. It is a platform
independent solution consisting of (core) Perl and (wrapper) Unix/Windows
shell scripts and a configuration file to control the output text appearance
to fair extent. It can very conveniently be used to build a Web based docx
document conversion service. With unzippers like CakeCmd that can deal with
corrupt Zip archives, this tool can extract text from corrupt docx documents
in many cases, where MS word processor fails to even open them.
OPTIONS¶
- -h
- As the first argument to get this usage information.
- -
- As the infile name to read the docx file from STDIN.
- -
- As the outfile name to dump the text on STDOUT.
Output is saved in infile.txt if second argument is omitted.
AUTHOR¶
docx2txt was written by Sandeep Kumar <shimple0@yahoo.com>.
This manual page was written by Khalid El Fathi <khalid@elfathi.fr>, for
the Debian project (and may be used by others).