Groupdocs.Parser provides the functionality to extract data from Microsoft Office Word documents. Both classic (doc, dot) and Open XML (docx, dotx) formats are supported. Also LibreOffice Writer (OpenOffice.org Writer) formats and RTF are supported.
The following table provides the list of supported formats:
Format Description DOC Microsoft Office Word Document DOT Microsoft Office Word Document Template DOCX Microsoft Office Open XML Document DOCM Microsoft Office Open XML Macro-Enabled Document DOTX Microsoft Office Open XML Document Template DOTM Microsoft Office Open XML Document Macro-Enabled Template TXT Plain text ODT Open Document Text OTT Open Document Text Template RTF Rich Text Format More resources GitHub examples You may easily run the code above and see the feature in action in our GitHub examples:...Navigation Products GroupDocs.Total Product Family GroupDocs.Viewer Product...Solution GroupDocs.Annotation Product Solution GroupDocs.Conversion...
OCR support means the ability to connect an external module (library) for the recognition of printed text (optical character recognition, OCR) on images, either separate or embedded in documents.
To connect OCR, you need to implement the IOcrConnector interface in the client code.
The following example demonstrates how to implement the OCR connector using com.aspose.ocr library for text recognition in images.
String indexFolder = "c:\\MyIndex"; String documentFolder = "c:\\MyDocuments"; // Creating an index Index index = new Index(indexFolder); // Setting the OCR indexing options IndexingOptions options = new IndexingOptions(); options....Navigation Products GroupDocs.Total Product Family GroupDocs.Viewer Product...Solution GroupDocs.Annotation Product Solution GroupDocs.Conversion...
This article demonstrate that how to save computing resources, you can notify the index about the renaming of the document, and then the document will not be reindexed during the update operation...Navigation Products GroupDocs.Total Product Family GroupDocs.Viewer Product...Solution GroupDocs.Annotation Product Solution GroupDocs.Conversion...
This article gives the knowledge of the API methods which can be used to perform operations about spelling corrector....Navigation Products GroupDocs.Total Product Family GroupDocs.Viewer Product...Solution GroupDocs.Annotation Product Solution GroupDocs.Conversion...
This article shows that how to allow you to search for nouns in the singular or plural, adjectives in the degree of comparison, forms of regular and irregular verbs, etc....Navigation Products GroupDocs.Total Product Family GroupDocs.Viewer Product...Solution GroupDocs.Annotation Product Solution GroupDocs.Conversion...
This article explains that how to optimize index to reduce the number of segments in an index using Java....Navigation Products GroupDocs.Total Product Family GroupDocs.Viewer Product...Solution GroupDocs.Annotation Product Solution GroupDocs.Conversion...
This article explains that how to extract attachments from PDF documents...Navigation Products GroupDocs.Total Product Family GroupDocs.Viewer Product...Solution GroupDocs.Annotation Product Solution GroupDocs.Conversion...
An interface is used to receive the information about errors, warnings and events which occur while data extraction....Navigation Products GroupDocs.Total Product Family GroupDocs.Viewer Product...Solution GroupDocs.Annotation Product Solution GroupDocs.Conversion...
This article gives the knowledge of the API methods which can be used to perform operations about Alphabets using Java....Navigation Products GroupDocs.Total Product Family GroupDocs.Viewer Product...Solution GroupDocs.Annotation Product Solution GroupDocs.Conversion...