Character replacement during indexing can be used, for example, to convert all text to lowercase characters or to remove diacritics from text....Comparison Product Solution GroupDocs...to search over your PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX and more...
This article explains that how to detect file type of container item....Comparison Product Solution GroupDocs...extract data from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails...
OCR support means the ability to connect an external module (library) for the recognition of printed text (optical character recognition, OCR) on images, either separate or embedded in documents.
To connect OCR, you need to implement the IOcrConnector interface in the client code.
The following example demonstrates how to implement the OCR connector using com.aspose.ocr library for text recognition in images.
String indexFolder = "c:\\MyIndex"; String documentFolder = "c:\\MyDocuments"; // Creating an index Index index = new Index(indexFolder); // Setting the OCR indexing options IndexingOptions options = new IndexingOptions(); options....Comparison Product Solution GroupDocs...to search over your PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX and more...
It supports Docx, DOCM, DOC, DOT, DOTM, XLS, XLSX, PDF, PPT, JPG, PNG, HTML, EML and many more...Comparison Product Solution GroupDocs...files online? Try the to convert DOCX, XLSX, PPTX, and more. This...
GroupDocs.Metadata for Java provides functionality that allows working with different kinds of WordProcessing documents such as DOC, Docx, ODT, etc. For the full list of supported document formats please refer to Supported document formats.
Detecting the exact type of a document The following sample of code will help you to detect the exact type of a loaded document and extract some additional file format information.
Load a WordProcessing document Extract the root metadata package Use the getWordProcessingType method to obtain file format information advanced_usage....Comparison Product Solution GroupDocs...WordProcessing documents such as DOC, DOCX, ODT, etc. For the full list...
This article explains that how to integrate any paid or free OCR solution....Comparison Product Solution GroupDocs...redactor = new Redactor ( "\\Sample.docx" , new LoadOptions (), new RedactorSettings...
Use advanced rasterization options In order to use the advanced rasterization options you have to pass one of the options to Save method. In this case the document will be rasterized to PDF, but the scan-like effects will be applied to its pages.
The following example demonstrates how to apply the AdvancedRasterizationOptions with default settings.
Python
import groupdocs.redaction as gr import groupdocs.redaction.options as gro import groupdocs.redaction.redactions as grr def run(): # Specify the redaction options repl_opt = grr....Comparison Product Solution GroupDocs...with gr . Redactor ( "source.docx" ) as redactor : # Apply the...
This tutorial provides step-by-step instructions to convert ODG to PDF using Java and a working sample code for ODG to PDF file converter in Java capability....Comparison Product Family GroupDocs...formats to PDF such as DOC, DOCX, XLSX, HTML, RTF, PPTX, and...
This page describes how to get an information about document content using GroupDocs.Annotation for .NET API....Comparison Product Solution GroupDocs...annotator = new Annotator ( "input.docx" )) { IDocumentInfo documentInfo...
This tutorial describes the step-by-step procedure to extract text from Markdown file in C# language and how to use the workflow to get text from Markdown using C#....Comparison Product Family GroupDocs...formats including PDF, DOC, DOCX, XLS, XLSX, PPTX, and many more...