OCR support means the ability to connect an external module (library) for the recognition of printed text (optical character recognition, OCR) on images, either separate or embedded in documents.
To connect OCR, you need to implement the IOcrConnector interface in the client code.
The following example demonstrates how to implement the OCR connector using com.aspose.ocr library for text recognition in images.
String indexFolder = "c:\\MyIndex"; String documentFolder = "c:\\MyDocuments"; // Creating an index Index index = new Index(indexFolder); // Setting the OCR indexing options IndexingOptions options = new IndexingOptions(); options....Signature Product Solution GroupDocs...Us Contact Customers Legal Security Events Acquisition Ask AI...
This article gives the knowledge that how the document filters used during the search using Java search API....Signature Product Solution GroupDocs...Us Contact Customers Legal Security Events Acquisition GroupDocs...
This section describes how to import OLE objects into Word DOC/DOCX documents using Java...Signature Product Solution GroupDocs...Us Contact Customers Legal Security Events Acquisition Ask AI...
This article demonstrates how to save edited text documents, spreadsheets and presentations with GroupDocs.Editor for Java API....Signature Product Solution GroupDocs...Us Contact Customers Legal Security Events Acquisition Ask AI...
This article explains that how to use Aspose.OCR for Cloud SDK in Java....Signature Product Solution GroupDocs...Us Contact Customers Legal Security Events Acquisition Ask AI...
There might be cases when the document is presented only as a stream (without a copy on the local disk). To avoid the overhead of saving documents to the disk, GroupDocs.Parser enables to extract data from streams directly.
The following example shows how to load the document from the stream:
// Create the stream try (InputStream stream = new FileInputStream(Constants.SamplePdf)) { // Create an instance of Parser class with the stream try (Parser parser = new Parser(stream)) { // Extract a text into the reader try (TextReader reader = parser....Signature Product Solution GroupDocs...Us Contact Customers Legal Security Events Acquisition Ask AI...
GroupDocs.Parser provides the functionality to extract data from HTML documents and other markup formats.
The following table provides the list of supported formats:
Format Description HTML Hypertext Markup Language File XHTML Extensible Hypertext Markup Language File MHTML MIME HTML File MD Markdown XML XML File More resources GitHub examples You may easily run the code above and see the feature in action in our GitHub examples:
GroupDocs.Parser for .NET examples GroupDocs....Signature Product Solution GroupDocs...Us Contact Customers Legal Security Events Acquisition Ask AI...
This article provides the knowledge that how to export metadata properties to an Excel workbook in Java...Signature Product Solution GroupDocs...Us Contact Customers Legal Security Events Acquisition Ask AI...
Learn this article and check how to load and convert CSV documents with advanced options using GroupDocs.Conversion for Java API....Signature Product Solution GroupDocs...Us Contact Customers Legal Security Events Acquisition Ask AI...