githubEdit

Fulltext & Vector Search Functions

circle-info

Available from version 11.48.0

These functions require the OPENSEARCH_ENABLED license/preference to be activated for your organization. Without it, all functions throw a RuntimeError("Fulltext search license is missing").

Functions for searching document archives, finding similar documents, and querying ERP master data. These search across all documents of the organization — unlike get_document_content() which only reads the current document's text.

circle-check

Source: module/script/helper/document_script_functions.py


Searches the full OCR text of all documents in the organization. Finds text in pages.pageText, tfidfCustomPageText and ai_text fields via the fulltextsearch microservice.

fulltext_search(query, **kwargs)

Parameters:

Name
Type
Default
Description

query

str

required

Search term (searched in OCR text of all documents)

search_type

str

"match_phrase"

"match_phrase" (exact phrase), "fuzzy" (typo-tolerant, up to 2 char difference), "prefix" (starts with)

doc_type

str

None

Filter by document type (comma-separated, e.g. "INVOICE,CREDIT_NOTE")

status

str

None

Filter by document status (comma-separated, e.g. "ready_for_validation,exported")

vendor_name

str

None

Filter by vendor name

date_range

str

None

"last_30_days", "last_90_days", "last_180_days", "last_365_days"

size

int

10

Max results (capped at 50)

Returns: list[dict] — Each dict contains:

Field
Description

doc_id

Document UUID

name

Filename (e.g. "INV-2026-001.pdf")

doc_type

Document type ("INVOICE", "ORDER_CONFIRMATION", etc.)

vendor_name

Vendor name

status

Document status

total_amount

Total amount

ocr_content

Matched text excerpt from the document

highlights

Dict with highlighted matches per field

Example — Search for exact phrase:

Example — Fuzzy search (OCR typo tolerant):

Example — Prefix search:

circle-exclamation
circle-info

Error handling: If the fulltextsearch service is unreachable, the function returns [] and logs a warning. It does not throw an exception.


Finds semantically similar documents using vector embeddings (k-NN search with 384-dimensional vectors). Useful for finding documents with similar content regardless of exact wording.

Parameters:

Name
Type
Default
Description

doc_id

str

required

Source document UUID (the document to find similar matches for)

k

int

5

Number of similar documents to return (capped at 50)

Returns: list[dict] — Each dict contains:

Field
Description

doc_id

Similar document UUID

name

Filename

doc_type

Document type

similarity_score

Raw similarity score (0-1)

similarity_percent

Similarity as percentage (0-100)

Example — Find similar documents:

circle-info

How it works: Each document is converted to a 384-dimensional vector when indexed. The vector search finds the nearest neighbors in this vector space, which correspond to semantically similar documents.


fulltext_search_erp()

Searches ERP master data (vendors, purchase orders, customers, materials) indexed in OpenSearch.

Parameters:

Name
Type
Default
Description

query

str

required

Search term

entity_types

str

None

Filter by entity type (comma-separated: "vendor", "purchase_order", "customer", "material")

vendor_number

str

None

Filter by vendor number

vendor_name

str

None

Filter by vendor name

company_code

str

None

Filter by company code

size

int

10

Max results (capped at 50)

Returns: list[dict] — Entity-type-specific fields (vendor records have vendor_number, vendor_name, etc.)

Example — Validate vendor in ERP:

Example — Search purchase orders:


fulltext_suggestions()

Returns autocomplete suggestions for search terms. Groups results by category (vendors, filenames, invoice numbers).

Parameters:

Name
Type
Default
Description

query

str

required

Prefix / search term

limit

int

10

Max suggestions per category (capped at 20)

Returns: dict with grouped suggestions:

Example — Get vendor suggestions:

circle-exclamation

Quick Reference

Function
Purpose
Returns

fulltext_search(query, ...)

Search OCR text across all documents

list[dict]

vector_search(doc_id, ...)

Find semantically similar documents

list[dict]

fulltext_search_erp(query, ...)

Search ERP master data

list[dict]

fulltext_suggestions(query, ...)

Autocomplete suggestions

dict


Common Patterns

License Check

All four functions automatically check the OPENSEARCH_ENABLED preference. If not enabled:

To handle this gracefully in scripts:

Combining with Field Functions

Last updated

Was this helpful?