GroupDocs.Parser provides the functionality to extract data from HTML documents and other markup formats.
The following table provides the list of supported formats:
Format Description HTML Hypertext Markup Language File XHTML Extensible Hypertext Markup Language File MHTML MIME HTML File MD Markdown XML XML File More resources GitHub examples You may easily run the code above and see the feature in action in our GitHub examples:
GroupDocs.Parser for .NET examples GroupDocs....Comparison Product Solution GroupDocs...Upgrade an Order Support Docs API Reference Live Demos Free Support...
In this article, you will learn how to convert documents to HTML format with GroupDocs.Conversion for Node.js via Java....Comparison Product Solution GroupDocs...Upgrade an Order Support Docs API Reference Live Demos Free Support...
The following tables indicate the File formats from which GroupDocs.Parser for Java can extract data. You can use the input below to filter supported formats by extension.
Tip Can’t find your File format?
We’re here to help! Please post a request on our Free Support Forum, and our team will assist you. Word Processing Document Type Parse Document by Template Extract Text (Accurate) Extract Text (Raw) Extract Structured Text and Formatted Text Extract Text Areas Extract Metadata Extract Images Extract Containers and Attachments Parse Form Data Extract Table of Contents Scan Barcode DOC Microsoft Word Document DOT Microsoft Word Document Template DOCX Office Open XML Document DOCM Office Open XML Macro-Enabled Document DOTX Office Open XML Document Template DOTM Office Open XML Document Macro-Enabled Template TXT Plain text ODT Open Document Text OTT Open Document Text Template RTF Rich Text Format PDF Document Type Parse Document by Template Extract Text (Accurate) Extract Text (Raw) Extract Structured Text and Formatted Text Extract Text Areas Extract Metadata Extract Images Extract Containers and Attachments Parse Form Data Extract Table of Contents Scan Barcode PDF Portable Document Format File Markup Document Type Parse Document by Template Extract Text (Accurate) Extract Text (Raw) Extract Structured Text and Formatted Text Extract Text Areas Extract Metadata Extract Images Extract Containers and Attachments Parse Form Data Extract Table of Contents Scan Barcode XHTML Extensible Hypertext Markup Language File MHTML MIME HTML File MD Markdown (Formatted Text is Not supported) XML XML File Ebook Document Type Parse Document by Template Extract Text (Accurate) Extract Text (Raw) Extract Structured Text and Formatted Text Extract Text Areas Extract Metadata Extract Images Extract Containers and Attachments Parse Form Data Extract Table of Contents Scan Barcode CHM Compiled HTML Help File EPUB Digital E-Book File Format FB2 FictionBook 2....Comparison Product Solution GroupDocs...Upgrade an Order Support Docs API Reference Live Demos Free Support...
Render documents to HTML, PNG, JPEG, PDF. Extract text, list attachments, and transform pages with GroupDocs.Viewer for Python....Comparison Product Solution GroupDocs...Upgrade an Order Support Docs API Reference Live Demos Free Support...
Within this article, you will find complete instructions on how to convert DOCX to SVG using C# along with sample .NET application to transform DOCX to SVG in C#....Find Answers by API GroupDocs.Total Product Family GroupDocs...Product Family GroupDocs.Comparison Product Family GroupDocs...
Learn how to merge PDF Files, combine PDF Files into one File programmatically in C# language using GroupDocs.Merger for .NET library....Comparison Product Solution GroupDocs...Upgrade an Order Support Docs API Reference Live Demos Free Support...
Let's efficiently learn how to convert PDF to TXT using C# without installing extra software. The library used to export PDF to TXT using C# is platform-independent....Find Answers by API GroupDocs.Total Product Family GroupDocs...Product Family GroupDocs.Comparison Product Family GroupDocs...
Open password-protected Files and streams using load options in GroupDocs.Parser for Python via .NET....Comparison Product Solution GroupDocs...Upgrade an Order Support Docs API Reference Live Demos Free Support...
This topic describes how to set image resolution in PDF File using the GroupDocs.Viewer .NET Api (C#)....Comparison Product Solution GroupDocs...Upgrade an Order Support Docs API Reference Live Demos Free Support...
In this article, you'll get guidance on how to convert TXT to DOCX using C#, including code example to export TXT to DOCX in C# on any operating system....Find Answers by API GroupDocs.Total Product Family GroupDocs...Product Family GroupDocs.Comparison Product Family GroupDocs...