To extract data from PDF documents parseForm and parseByTemplate(Template) methods are used. Both methods return DocumentData object. For details, see Working With Extracted Data.
Here are the steps to extract data from PDF Form:
Instantiate Parser object for the initial document Call parseForm method and obtain the DocumentData object; Check if data isn’t null (parse form is supported for the document); Iterate over field data to obtain form data. The following example shows the use case when a user fills in PDF form and send it by email (for example)....Comparison Product Solution GroupDocs...data from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails and more...
Learn how to search for keywords and use regular expressions to find text in documents using GroupDocs.Parser for .NET. Search text with case sensitivity and whole word options in C#....Comparison Product Solution GroupDocs...data from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails and more...
This article shows how to access XMP metadata in a file of any supported format....Comparison Product Solution GroupDocs...metadata of PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, emails, images and...
GroupDocs.Metadata for .NET provides functionality that allows working with different kinds of spreadsheet formats such as XLS, XLSX, ODS, etc. For the full list of supported document formats please refer to Supported Document Formats.
Detecting the exact type of a document The following sample of code will help you to detect the exact type of a loaded spreadsheet and extract some additional file format information.
Load a Spreadsheet document Extract the root metadata package Use the FileType property to obtain file format information AdvancedUsage....Comparison Product Solution GroupDocs...metadata of PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, emails, images and...
First of all you need to create an index. An index can be created in memory or on disk. An index created in memory cannot be saved after exiting your program. In contrast, an index created on disk may be loaded in the future to continue working....Comparison Product Solution GroupDocs...over your PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX and more with our...
This page contains a description of all the properties of the IndexingOptions class...Comparison Product Solution GroupDocs...over your PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX and more with our...
This page contains information about building text search queries of various types. More examples on building search queries are provided on the page...Comparison Product Solution GroupDocs...over your PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX and more with our...
First of all you need to create an index. An index can be created in memory or on disk. An index created in memory cannot be saved after exiting your program. In contrast, an index created on disk may be loaded in the future to continue working....Comparison Product Solution GroupDocs...over your PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX and more with our...
Detecting the version of a PDF document The following sample of code will help you to detect the PDF version a loaded document and extract some additional file format information.
Load a PDF document Extract the root metadata package Use the FileType property to obtain file format information AdvancedUsage.ManagingMetadataForSpecificFormats.Document.Pdf.PdfReadFileFormatProperties
using (Metadata metadata = new Metadata(Constants.InputPdf)) { var root = metadata.GetRootPackage(); Console.WriteLine(root.FileType.FileFormat); Console.WriteLine(root.FileType.Version); Console.WriteLine(root.FileType.MimeType); Console.WriteLine(root.FileType.Extension); } Reading built-in metadata properties To access built-in metadata of a PDF document, please use the DocumentProperties property defined in the DocumentRootPackage class....Comparison Product Solution GroupDocs...metadata of PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, emails, images and...