By using the Advanced Parser, you are accepting the Cognigy.AI Additional Data Privacy Terms.
Key Features
- Improved quality of responses. Breaking down the text into Knowledge Chunks allows AI Agents to understand the context and provide more accurate responses.
- Recognition of a wide range of text elements. The Advanced Parser recognizes plain text and more complex text elements, such as tables, titles, and footers as well as images, by using Optical character recognition (OCR).
- Effective use of Markdown. The Advanced Parser converts the text into Markdown format and in Knowledge Chunks. AI Agents can process Markdown and recognize text elements, for example, tables and images, and distinguish them from plain text. This approach gives AI Agents more context and helps them better understand how information is organized in the text.
- Improved ability to reference the source. Adding page numbers of the source file to Knowledge Sources as Chunk metadata helps track an information source in large documents.
Prerequisites
- Cognigy.AI 4.71 or later.
Supported Formats
Cognigy.AI versions | Formats |
---|---|
4.80 or later | PDF, DOCX, PPTX, TXT1, JPEG, JPG, PNG, BMP, HEIF, TIFF files |
4.79 | PDF, DOCX, PPTX, and TXT1 files |
4.71-4.78 | PDF and DOCX files |
4.71 or earlier | PDF files |
Availability
The Advanced Parser is available in all Cognigy.AI installations. For the on-premises installation, you need to activate the Advanced Parser.Activate the Advanced Parser for On-Premises
Activating the Advanced Parser varies depending on which version you use:Cognigy.AI 4.80 or Later
To activate the Advanced Parser, add the following parameters to theknowledgeSearch
section in the cognigy-ai-values.yaml
file:
API_KEY
with your Azure AI Document Intelligence API key and ENDPOINT_URL
with your Azure AI Document Intelligence endpoint URL.
Cognigy.AI 4.79 or Earlier
To activate the Advanced Parser for all or specific organizations in your Cognigy.AI installation, set the following environment variables in the cognigy-ai-values.yaml
file:
FEATURE_ENABLE_AZURE_DOCUMENT_INTELLIGENCE_ORG_WHITELIST
:- For all organizations, set to
*
. - For specific organizations, enter the organization IDs separated by a comma, for example,
63bda3d4tlp3c95977bb8604,63babf6e92asd791923e17b7,670k1273fx492448dc288b36
.
- For all organizations, set to
AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT
1: TXT files are internally handled by the Basic Parser.