STop words are frequently used words that do not carry a semantic meaning and can be removed from an index To reduce its size.
You can enable or disable the use of sTop words by calling the setUseSTopWords method of the IndexSettings class. The default value is true, meaning that sTop words are filtered during indexing and not added To the index.
A list of sTop words To use during indexing can be specified in the sTop word dictionary....search over your PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX and more...
Learn how To extract metadata from Microsoft Excel spreadsheets (.xls, .xlsx) in C# using GroupDocs.Parser for .NET. Step-by-step guide with code example....extract data from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails...
Get ZIP format metadata The API allows detecting ZIP archives and reading format metadata. The following steps are needed To be followed:
Load a ZIP archive Get the root metadata package Extract the native metadata package using ZipRootPackage.ZipPackage Read the ZIP archive properties Loop through ZipPackage.Files To extract information about archived files The following code snippet shows how To get metadata from a ZIP archive.
AdvancedUsage.ManagingMetadataForSpecificFormats.Archive.ZipReadNativeMetadataProperties
Encoding encoding = Encoding....edit metadata of PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, emails...
To extract a text from Microsoft OneNote Sections getText and getText(int) methods are used. These methods allow To extract a text from the entire document or a text from the selected page. Raw mode is not supported for Microsoft OneNote.
Here are the steps To extract a text from Microsoft OneNote Section:
Instantiate Parser object for the initial section; Call getText method and obtain TextReader object; Read a text from reader....extract data from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails...
To extract emails from Outlook STorage getContainer method is used. This method returns the collection of ContainerItem objects.
Outlook STorage item can contain the following metadata:
Name Description date The time and date at which the Outlook STorage item was last modified. email-sender The value of “sender” field. email-To The value of “To” field. subject The value of “subject” field. Outlook STorage container consists of email documents (msg files).
Here are the steps To extract an email text from outlook sTorage:...extract data from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails...
Get ZIP format metadata The API allows detecting ZIP archives and reading format metadata. The following steps are needed To be followed:
Load a ZIP archive Get the root metadata package Extract the native metadata package using the ZipRootPackage.getZipPackage method Read the ZIP archive properties Loop through ZipPackage.getFiles To extract information about the archived files The following code snippet shows how To get metadata from a ZIP archive.
advanced_usage.managing_metadata_for_specific_formats.archive.ZipReadNativeMetadataProperties...edit metadata of PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, emails...
To search a keyword in HTML documents Search(String) method is used. This method returns the collection of SearchResult objects....extract data from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails...
Learn how To search a keyword in HTML documents search(String) method is used. This method returns the collection of SearchResult objects....extract data from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, Emails...