Detecting the version of a PDF document The following sample of code will help you to detect the PDF version a loaded document and extract some additional file format information.
Load a PDF document Extract the root metadata package Use the getPdfType method to obtain file format information advanced_usage.managing_metadata_for_specific_formats.document.pdf.PdfReadFileFormatProperties
try (Metadata metadata = new Metadata(Constants.InputPdf)) { PdfRootPackage root = metadata.getRootPackageGeneric(); System.out.println(root.getPdfType().getFileFormat()); System.out.println(root.getPdfType().getVersion()); System.out.println(root.getPdfType().getMimeType()); System.out.println(root.getPdfType().getExtension()); } Reading built-in metadata properties To access built-in metadata of a PDF document, please use the getDocumentProperties method defined in the DocumentRootPackage class....Watermark Product Solution GroupDocs.Editor Product Solution GroupDocs...
With GroupDocs.Viewer for Node.js you can render files to HTML, PNG, JPEG and PDF formats, list and save attachments, embedded files and compressed files, and extract document text....Watermark Product Solution GroupDocs.Editor Product Solution GroupDocs...
With GroupDocs.Viewer for Java you can render files to HTML, PNG, JPEG and PDF formats, list and save attachments, embedded files and compressed files, and extract document text....Watermark Product Solution GroupDocs.Editor Product Solution GroupDocs...
Convert Outlook (PST/OST) files to HTML, PDF, PNG, or JPEG using the GroupDocs.Viewer Python API....Watermark Product Solution GroupDocs.Editor Product Solution GroupDocs...
Note This feature is only compatible with GroupDocs.Assembly for Java 19.10 or later releases. To access XML data while building a report, you can use facilities of DataSet to read XML into it and then pass it to the assembler as a data source. However, if your scenario does not permit to specify XML schema while loading XML into DataSet, all attributes and text values of XML elements are loaded as strings then....Watermark Product Solution GroupDocs.Editor Product Solution GroupDocs...
To extract data from PDF documents parseForm and parseByTemplate(Template) methods are used. Both methods return DocumentData object. For details, see Working With Extracted Data.
Here are the steps to extract data from PDF Form:
Instantiate Parser object for the initial document Call parseForm method and obtain the DocumentData object; Check if data isn’t null (parse form is supported for the document); Iterate over field data to obtain form data. The following example shows the use case when a user fills in PDF form and send it by email (for example)....Watermark Product Solution GroupDocs.Editor Product Solution GroupDocs...