GroupDocs.Parser provides the functionality to extract data from Microsoft Office Word documents. Both classic (doc, dot) and Open Xml (docx, dotx) formats are supported. Also LibreOffice Writer (OpenOffice.org Writer) formats and RTF are supported.
The following table provides the list of supported formats:
Format Description DOC Microsoft Office Word Document DOT Microsoft Office Word Document Template DOCX Microsoft Office Open Xml Document DOCM Microsoft Office Open Xml Macro-Enabled Document DOTX Microsoft Office Open Xml Document Template DOTM Microsoft Office Open Xml Document Macro-Enabled Template TXT Plain text ODT Open Document Text OTT Open Document Text Template RTF Rich Text Format More resources GitHub examples You may easily run the code above and see the feature in action in our GitHub examples:...Editor Product Solution GroupDocs...Extract data from various formats / Extract data from Microsoft...
To extract hyperlinks from Microsoft Office Word document getStructure method is used. This method returns Xml representation of the document. Hyperlinks are represented by “hyperlink” tag; “link” attribute contains hyperlink’s URL. For more details, see Extract text structure. Hyperlink can contain a text:
google.com Warning getStructure method returns null value if text structure extraction isn’t supported for the document. For example, text structure extraction isn’t supported for TXT files. Therefore, for TXT file getStructure method returns null....Editor Product Solution GroupDocs...Extract data from various formats / Extract data from Microsoft...
To extract tables from Microsoft Office Word document getStructure method is used. This method returns Xml representation of the document. Tables are represented by “table” tag. For more details, see Extract text structure.
Warning getStructure method returns null value if text structure extraction isn’t supported for the document. For example, text structure extraction isn’t supported for TXT files. Therefore, for TXT file getStructure method returns null. If Microsoft Office Word document has no text, getStructure method returns an empty org....Editor Product Solution GroupDocs...Extract data from various formats / Extract data from Microsoft...
Learn how to deal.If you have a corporate sensitive data removal policy as a list of redaction rules, you don't need to specify them in your code. You can specify an Xml document with a list of pre-configured redactions....Editor Product Solution GroupDocs...your code. You can specify an XML document with a list of pre-configured...