GroupDocs.Parser provides the functionality To extract data from Microsoft Office Word documents. Both classic (doc, dot) and Open XML (docx, dotx) formats are supported. Also LibreOffice Writer (OpenOffice.org Writer) formats and RTF are supported.
The following table provides the list of supported formats:
Format Description DOC Microsoft Office Word Document DOT Microsoft Office Word Document Template DOCX Microsoft Office Open XML Document DOCM Microsoft Office Open XML Macro-Enabled Document DOTX Microsoft Office Open XML Document Template DOTM Microsoft Office Open XML Document Macro-Enabled Template TXT Plain text ODT Open Document Text OTT Open Document Text Template RTF Rich Text Format More resources GitHub examples You may easily run the code above and see the feature in action in our GitHub examples:...data from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails and more...
Not all metadata properties extracted from a file are marked with tags. Some file formats and metadata standards allow adding fully cusTom properties that can’t be properly tagged by the library since their purpose is not clearly defined in the appropriate format/standard specification. In such cases, you can use the name of the property To locate and remove it. The following example demonstrates some advanced usage scenarios of the GroupDocs.Metadata search engine allowing To remove metadata properties....metadata of PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, emails, images...
Java API To remove all or selective metadata properties of DOCX, XLSX, PPTX, Pdf documents, JPEG, PNG, WebP images, Email, eBooks, Visio Drawings, Zip, etc....spreadsheets, presentations, PDF files, images, emails, eBooks, drawings...
The easiest way To remove metadata properties from a file is To use corresponding tags that allow you To locate the desired properties across all metadata packages....metadata of PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, emails, images...
This article explains that how To extract images from Microsoft Office Word (.doc, .docx) documents...data from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails and more...
Detecting the GIF version The following sample of code will help you To detect the version of a loaded GIF image and extract some additional file format information.
Load a GIF image Extract the root metadata package Use the FileType property To obtain file format information AdvancedUsage.ManagingMetadataForSpecificFormats.Image.Gif.GifReadFileFormatProperties
using (Metadata metadata = new Metadata(Constants.InputGif)) { var root = metadata.GetRootPackage(); Console.WriteLine(root.FileType.FileFormat); Console.WriteLine(root.FileType.Version); Console.WriteLine(root.FileType.ByteOrder); Console.WriteLine(root.FileType.MimeType); Console.WriteLine(root.FileType.Extension); Console.WriteLine(root.FileType.Width); Console.WriteLine(root.FileType.Height); } Working with XMP Metadata GroupDocs....metadata of PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, emails, images...
This article shows how To access IPTC metadata in a file of any supported format....pdf. Reading basic IPTC IIM properties...metadata of PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, emails, images...
This article shows how ToTo extract data from Microsoft Office Excel spreadsheets....data from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails and more...