The following tables indicate the file formats from which GroupDocs.Parser for Java can extract data. You can use the input below to filter supported formats by extension.
Tip Can’t find your file format?
We’re here to help! Please post a request on our Free Support Forum, and our team will assist you. Word Processing Document Type Parse Document by Template Extract Text (Accurate) Extract Text (Raw) Extract Structured Text and Formatted Text Extract Text Areas Extract Metadata Extract Images Extract Containers and Attachments Parse Form Data Extract Table of Contents Scan Barcode DOC Microsoft Word Document DOT Microsoft Word Document Template DOCX Office Open XML Document DOCM Office Open XML Macro-Enabled Document DOTX Office Open XML Document Template DOTM Office Open XML Document Macro-Enabled Template TXT Plain Text ODT Open Document Text OTT Open Document Text Template RTF Rich Text Format PDF Document Type Parse Document by Template Extract Text (Accurate) Extract Text (Raw) Extract Structured Text and Formatted Text Extract Text Areas Extract Metadata Extract Images Extract Containers and Attachments Parse Form Data Extract Table of Contents Scan Barcode PDF Portable Document Format File Markup Document Type Parse Document by Template Extract Text (Accurate) Extract Text (Raw) Extract Structured Text and Formatted Text Extract Text Areas Extract Metadata Extract Images Extract Containers and Attachments Parse Form Data Extract Table of Contents Scan Barcode XHTML Extensible HyperText Markup Language File MHTML MIME HTML File MD Markdown (Formatted Text is Not supported) XML XML File Ebook Document Type Parse Document by Template Extract Text (Accurate) Extract Text (Raw) Extract Structured Text and Formatted Text Extract Text Areas Extract Metadata Extract Images Extract Containers and Attachments Parse Form Data Extract Table of Contents Scan Barcode CHM Compiled HTML Help File EPUB Digital E-Book File Format FB2 FictionBook 2....Parser for Java can extract data. You can use the input...Template ExtractText (Accurate) ExtractText (Raw) Extract Structured...
Find Answers by API GroupDocs.Total Product Family GroupDocs.Conversion Product Family GroupDocs.Annotation Product F......Answers How to ExtractText from PDF in Java How to Extract Metadata...from PDF using Java How to Extract Metadata from Word Document...
Find answers about extracting Text, images, and metadata of different files using code on any platform....using C# ExtractText from DOCM using Java ExtractText from MHTML...using Java ExtractText from TXT using Java ExtractText from EPUB...
Find Answers by API GroupDocs.Total Product Family GroupDocs.Conversion Product Family GroupDocs.Annotation Product F......Answers ExtractText from MHTML using Java ExtractText from TXT...using Java ExtractText from EPUB using Java ExtractText from PPTX...
This article explains how to integrate OCR solution to GroupDocs.Parser...Advanced Usage / Using OCR to extract a text from images and PDFs /...Description RecognizeText Extracts a text from the provided image...
This API allows you to perform Text search and index any type of file format using C# .NET language on any platform....Answers ExtractText from RTF using C# ExtractText from ODT...using C# ExtractText from XLS using C# ExtractText from PPT...
It gives us immense pleasure to announce the release of version 18.4 of GroupDocs.Text for .NET. The latest version allows extracting the table of contents from the EPUB documents. Furthermore, we have added the feature of detecting media type of .one file. Following sections provide details about the newly added features.
Extracting TOC from EPUB Documents Using version 18.4, you can now extract TOC from the EPUB documents. To access the TOC, TableOfContents property of **EpubPackage **class is used....Text for .NET. The latest version allows extracting the table...the newly added features. Extracting TOC from EPUB Documents #...
Inspect, modify, and remove shapes in Excel documents using Python via .NET....Leave feedback On this page Extracting information about all shapes...worksheets and prints shape metadata, text, positioning, size, rotation...
It is our pleasure to announce the release of version 18.12 of GroupDocs.Parser for .NET. The latest version allows you to extract the tables from PDF documents. Furthermore, we have added the support of extracting Text and metadata from Text and presentation templates. For more details, please have a look at the release notes of version 18.12.
Features Introduced Extracting Tables from PDF DocumentsThis feature is very useful when you want to extract only the tables form a PDF document....latest version allows you to extract the tables from PDF documents...support of extractingtext and metadata from text and presentation...
Export PDF, Word, Excel, HTML, and more to Markdown with an on‑premises .NET API. First public release is published....work well with HTML or plain text, these native document formats...that generates text and answers based on large text corpora. RAG...