GroupDocs.Parser provides the functionality to extract data from Microsoft Office Word documents. Both classic (doc, dot) and Open XML (docx, dotx) formats are supported. Also LibreOffice Writer (OpenOffice.org Writer) formats and RTF are supported.
The following table provides the list of supported formats:
Format Description DOC Microsoft Office Word Document DOT Microsoft Office Word Document Template DOCX Microsoft Office Open XML Document DOCM Microsoft Office Open XML Macro-Enabled Document DOTX Microsoft Office Open XML Document Template DOTM Microsoft Office Open XML Document Macro-Enabled Template TXT Plain text ODT Open Document Text OTT Open Document Text Template RTF Rich Text Format More resources GitHub examples You may easily run the code above and see the feature in action in our GitHub examples:...Search Product Solution GroupDocs...Microsoft Office Word documents Search text in Microsoft Office Word...
Let's convert a PDF document to HTML using a few lines of code. Automate PDF conversion within .NET application to convert whole document or selected pages....easily edited, searched, and indexed by search engines, and allows...
We are pleased to announce that the first version of GroupDocs.Parser for Java has been released. GroupDocs.Parser for Java allows the Java developers to extract raw and formatted text from the popular document formats. The API also supports working with containers such as ZIP and email containers. You can also access the metadata attached to the documents using a few lines of code. Please continue to read more about the features and the file formats supported by the API....ability to connect the logger Search text in documents Text analysis...
Good news for Orchard CMS users! We’ve introduced a GroupDocs Annotation app plugin for Orchard. This plugin lets you embed GroupDocs’ online document annotation app as well as selected documents to Orchard pages. Once the document is embedded, you can annotate the document using GroupDocs’ efficient annotation app. Annotate your documents using easy-to-use tools or perform document collaboration online by sharing documents with your colleagues. [caption id=“attachment_960” align=“alignnone” width=“600” caption=“Announcing GroupDocs Annotation Plugin for Orchard”]
[/caption] GroupDocs Annotation is a powerful online document annotation app that allows you to view and annotate documents online....from the Orchard Gallery : search for GroupDocs Annotation in...
Detecting the version of a PDF document The following sample of code will help you to detect the PDF version a loaded document and extract some additional file format information.
Load a PDF document Extract the root metadata package Use the FileType property to obtain file format information AdvancedUsage.ManagingMetadataForSpecificFormats.Document.Pdf.PdfReadFileFormatProperties
using (Metadata metadata = new Metadata(Constants.InputPdf)) { var root = metadata.GetRootPackage(); Console.WriteLine(root.FileType.FileFormat); Console.WriteLine(root.FileType.Version); Console.WriteLine(root.FileType.MimeType); Console.WriteLine(root.FileType.Extension); } Reading built-in metadata properties To access built-in metadata of a PDF document, please use the DocumentProperties property defined in the DocumentRootPackage class....Search Product Solution GroupDocs...uses the GroupDocs.Metadata search engine to retrieve all properties...
Detecting the version of a PDF document The following sample of code will help you to detect the PDF version a loaded document and extract some additional file format information.
Load a PDF document Extract the root metadata package Use the getPdfType method to obtain file format information advanced_usage.managing_metadata_for_specific_formats.document.pdf.PdfReadFileFormatProperties
try (Metadata metadata = new Metadata(Constants.InputPdf)) { PdfRootPackage root = metadata.getRootPackageGeneric(); System.out.println(root.getPdfType().getFileFormat()); System.out.println(root.getPdfType().getVersion()); System.out.println(root.getPdfType().getMimeType()); System.out.println(root.getPdfType().getExtension()); } Reading built-in metadata properties To access built-in metadata of a PDF document, please use the getDocumentProperties method defined in the DocumentRootPackage class....Search Product Solution GroupDocs...uses the GroupDocs.Metadata search engine to retrieve all properties...
Hi there,
We are not 100% sure whether it’s maybe caused by the license mechanism, but we noticed that it usually takes 20-30 seconds for the first GroupDocs.Search call to complete.
Is there a way to speed this up?
W…...Search? GroupDocs.Search Product Family dotnet...seconds for the first GroupDocs.Search call to complete. Is there...
In this tutorial, you will learn how to redact text in PDF using Java. Also, you will be provided with a straightforward code to replace text in PDF using Java....Search Product Family GroupDocs.Parser...becoming proficient in how to search and redact text in PDF using...