GroupDocs.Parser provides the functionality To extract data from HTML documents and other markup formats.
The following table provides the list of supported formats:
Format Description HTML Hypertext Markup Language File XHTML Extensible Hypertext Markup Language File MHTML MIME HTML File MD Markdown XML XML File More resources GitHub examples You may easily run the code above and see the feature in action in our GitHub examples:
GroupDocs.Parser for .NET examples GroupDocs....Conversion Product Solution GroupDocs...data from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails and more...
Working with XMP metadata GroupDocs.Metadata for Java allows managing XMP metadata in TIFF images. For more details please refer To the following guide: Working with XMP Metadata.
Working with EXIF metadata The GroupDocs.Metadata API supports handling EXIF metadata in TIFF images. Please find appropriate code samples in the Working with EXIF Metadata section.
Working with IPTC metadata GroupDocs.Metadata for Java is also able To work with IPTC metadata in TIFF images....Conversion Product Solution GroupDocs...metadata of PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, emails, images...
Working with XMP metadata GroupDocs.Metadata for .NET allows managing XMP metadata in TIFF images. For more details please refer To the following guide: Working with XMP metadata.
Working with EXIF metadata The GroupDocs.Metadata API supports handling EXIF metadata in TIFF images. Please find appropriate code samples in the Working with EXIF metadata section.
Working with IPTC metadata GroupDocs.Metadata for .NET is also able To work with IPTC metadata in TIFF images....Conversion Product Solution GroupDocs...metadata of PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, emails, images...
This article demonstrates that how To save the redacted document, replacing an original file...Conversion Product Solution GroupDocs...formats like PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails and more...
This code snippet demonstrates how To extract information about known properties that can be encountered in a particular package.
Load a file To examine Get a collection of PropertyDescripTor instances for any desired metadata package Iterate through the extracted descripTors advanced_usage.GettingKnownPropertyDescripTors
JavaScript const metadata = new groupdocs.metadata.Metadata("input.doc"); var root = metadata.getRootPackageGeneric(); var descripTors = root.getDocumentProperties().getKnowPropertyDescripTors(); for(var i=0;iTors.getCount(); i++) { var descripTor = descripTors.get_Item(i); console.log(descripTor.getName()); console.log(descripTor.getType().getRawValue()); console.log(descripTor.getAccessLevel().getRawValue()); var tags = descripTor.getTags(); for(var j=0;jConversion Product Solution GroupDocs...metadata of PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, emails, images...
This article shows you how To view and edit metadata of Pdf, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails, images and more with our free online....Conversion Product Solution GroupDocs...metadata of PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, emails, images...
Reading DICOM metadata properties The GroupDocs.Metadata API supports extracting format-specific information from DICOM images.
The following are the steps To read the native DICOM metadata.
Load a DICOM image Get the root metadata package Extract the native metadata package using DicomRootPackage.DicomPackage Read the DICOM metadata properties AdvancedUsage.ManagingMetadataForSpecificFormats.Image.Dicom.DicomReadNativeMetadataProperties
using (Metadata metadata = new Metadata(Constants.InputDicom)) { var root = metadata.GetRootPackage(); if (root.DicomPackage != null) { Console.WriteLine(root.DicomPackage.BitsAllocated); Console.WriteLine(root.DicomPackage.LengthValue); Console.WriteLine(root.DicomPackage.DicomFound); Console.WriteLine(root.DicomPackage.HeaderOffset); Console.WriteLine(root.DicomPackage.NumberOfFrames); // ....Conversion Product Solution GroupDocs...metadata of PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, emails, images...
In some cases it’s required To specify the document format manually To guarantee correct output produced by GroupDocs.Parser. The following are the cases when the document format must be specified manually:
Markdown documents MHTML documents OTP documents (OpenDocument Presentation Template) Databases Emails from remote servers Here are the steps To specify the document format for Markup document.
Instantiate the LoadOptions object and pass the document format in LoadOptions(FileFormat) construcTor; Create Parser object and call any method....Conversion Product Solution GroupDocs...Presentation Template) Databases Emails from remote servers Here are...
Learn how To extract text from Pdf, Word, Excel, PowerPoint, and 50+ document formats using GroupDocs.Parser for .NET. Simple C# code examples for extract text from Pdf C# scenarios....Conversion Product Solution GroupDocs...allows to extract text from PDF, Emails, Ebooks, Microsoft Office...
Sometimes you may need To just remove all or clean metadata properties without applying any filters. The best way To do this is To use the Sanitize method....Conversion Product Solution GroupDocs...document. Ex: @"C:\Docs\source.pdf" try ( Metadata metadata = new...