Text Extraction with the Basic Parser¶
Text Extraction with the Basic Parser is a Cognigy solution that extracts content from files more effectively. It splits the extracted content into chunks based on a fixed token length, which returns the best results based on our research.
We recommended using this Parser in combination with Top K set to 5
in the Search Extract Output Node.
Supported Formats¶
Cognigy.AI versions | Formats |
---|---|
4.79 or later | .pdf , .docx , .pptx , and .txt |
4.78 and earlier | .pdf , .docx , and .txt |
Availability¶
- Available in all environments.
How to Use¶
By default, the Advanced Parser is used for text extraction. To use the Basic Parser, follow these steps:
Switch to the Basic Parser¶
To switch to the Basic Parser, follow these steps:
- In your Project, navigate to Manage > Settings.
- On the Settings page, go to Knowledge AI Settings > Document Processing.
- From the Content Parser list, select Basic.
- Click Save.
Upload a File for Text Extraction¶
To use the Basic Parser, follow these steps:
- In your Project, navigate to Build > Knowledge.
- Open the existing Knowledge Store or create a new one.
- On the Knowledge Store page, click + New Knowledge Sources in the upper-left corner.
- In the New Knowledge Sources window, select File (basic). The lable
basic
means that you will be using the Basic Parser for text extraction. - Drag and drop a
.pdf
,.docx
,.txt
, or.pptx
file, or click Browse Files to select a file from your computer. - Click Create.