This article gives the knowledge of the case sensitive search which allows you To find words considering uppercase and lowercase letters as distinct using Java....search over your PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX and more...
An interface is used To receive the information about errors, warnings and events which occur while data extraction....extract data from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails...
This article gives the knowledge of the case sensitive search which allows you To find words considering uppercase and lowercase letters as distinct using Java....search over your PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX and more...
This code snippet demonstrates how To extract information about known properties that can be encountered in a particular package.
Load a file To examine Get a collection of PropertyDescripTor instances for any desired metadata package Iterate through the extracted descripTors advanced_usage.GettingKnownPropertyDescripTors
try (Metadata metadata = new Metadata(Constants.InputDoc)) { WordProcessingRootPackage root = metadata.getRootPackageGeneric(); for (PropertyDescripTor descripTor : root.getDocumentProperties().getKnowPropertyDescripTors()) { System.out.println(descripTor.getName()); System.out.println(descripTor.getType()); System.out.println(descripTor.getAccessLevel()); for (PropertyTag tag : descripTor.getTags()) { System.out.println(tag); } System.out.println(); } } Note Not all possible properties are presented in the getKnowPropertyDescripTors collection....edit metadata of PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, emails...
API allows creating of full-text and / or metadata index on documents. To index only metadata without main content of documents, you only need To set IndexType.MetadataIndex when creating an index....search over your PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX and more...
This article explains the method which can be used when for some reason files have non-standard extensions or if its format is supported, but not pre-configured....document formats like PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails...
This article demonstrates that how To save the redacted document, replacing an original file... rasterize_to_pdf = False result_path = redactor...document formats like PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails...
API allows creating of full-text and / or metadata index on documents. To index only metadata without main content of documents, you only need To set IndexType.MetadataIndex when creating an index....search over your PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX and more...