Character replacement during indexing can be used, for example, to convert all text to lowercase characters or to remove diacritics from text....Comparison Product Solution GroupDocs...over your PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX and more with our...
Using the GroupDocs.Metadata search engine you can extract desired metadata properties from files of different types. You don’t need to worry about the exact file format and metadata standards it can deal with. The same code will work for all supported formats in the same way. Most commonly used metadata properties are marked with tags that allow searching them across all supported files in various metadata packages. All tags defined in GroupDocs....Comparison Product Solution GroupDocs...metadata of PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, emails, images and...
This article gives the knowledge of the API methods which can be used to perform operations about stop word dictionary....Comparison Product Solution GroupDocs...over your PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX and more with our...
OCR support means the ability to connect an external module (library) for the recognition of printed text (optical character recognition, OCR) on images, either separate or embedded in documents.
To connect OCR, you need to implement the IOcrConnector interface in the client code.
The following example demonstrates how to implement the OCR connector using com.aspose.ocr library for text recognition in images.
String indexFolder = "c:\\MyIndex"; String documentFolder = "c:\\MyDocuments"; // Creating an index Index index = new Index(indexFolder); // Setting the OCR indexing options IndexingOptions options = new IndexingOptions(); options....Comparison Product Solution GroupDocs...over your PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX and more with our...
This article explains how to detect the document file type and calculate the number of pages when converting a file with GroupDocs.Conversion for Java....Comparison Product Solution GroupDocs...Presentation Documents (PPT, PPTX) For presentation files, you...
This article shows that how spell checking works during the search....Comparison Product Solution GroupDocs...over your PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX and more with our...
This article explains the usage of SetProperties method is used to update or add metadata. You can easily add metadata to photos, pdfs or you can update or add data to mp3 files....Comparison Product Solution GroupDocs...metadata of PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, emails, images and...
It supports DOCX, DOCM, DOC, DOT, DOTM, XLS, XLSX, PDF, PPT, JPG, PNG, HTML, EML and many more...Comparison Product Solution GroupDocs...Try the to convert DOCX, XLSX, PPTX, and more. This topic lists...
To extract emails from Outlook Storage getContainer method is used. This method returns the collection of ContainerItem objects.
Outlook Storage item can contain the following metadata:
Name Description date The time and date at which the Outlook Storage item was last modified. email-sender The value of “sender” field. email-to The value of “to” field. subject The value of “subject” field. Outlook Storage container consists of email documents (msg files).
Here are the steps to extract an email text from outlook storage:...Comparison Product Solution GroupDocs...data from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails and more...
To extract a text from Microsoft OneNote Sections getText and getText(int) methods are used. These methods allow to extract a text from the entire document or a text from the selected page. Raw mode is not supported for Microsoft OneNote.
Here are the steps to extract a text from Microsoft OneNote Section:
Instantiate Parser object for the initial section; Call getText method and obtain TextReader object; Read a text from reader....Comparison Product Solution GroupDocs...data from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails and more...