Update AI Function Skills based on product updates by qian-yu-db · Pull Request #551 · databricks-solutions/ai-dev-kit

qian-yu-db · 2026-05-27T17:46:36Z

…search

Update ai_parse_document, ai_extract, and ai_classify, and add ai_prep_search (DBR 18.2+). This refresh aligns the skill with the published references and rewrites the custom RAG pipeline around ai_prep_search instead of manual variant_get + explode + md5 chunking.

ai_parse_document: DBR 17.1+ -> 17.3+; add TIFF/DOC/PPT formats, pageRange option, per-element id/confidence, expanded element type list, 500-page and 100MB limits.
ai_extract: add object/array schema types, v2.1 options (enableCitations, enableConfidenceScores, version), 150-char field-name and 128k context limits, plus a v2.1 citations/confidence example.
ai_classify: add version option and 128k context-limit note.
ai_prep_search: new reference section in 1-task-functions.md (syntax, return shape, SQL + PySpark end-to-end), wired into SKILL.md function table, selection rules, prereqs, and Common Issues.
4-document-processing-pipeline.md: rewrite "Custom RAG Pipeline" as parse -> prep -> index -> query; streaming Stage 2 now calls ai_prep_search; Vector Search step explicitly names chunk_to_embed (embedding source) and chunk_id (PK); retain legacy manual-chunking fallback for DBR < 18.2.

…search Live AWS docs for ai_parse_document, ai_extract, and ai_classify had drifted from the skill, and ai_prep_search (DBR 18.2+) was missing entirely. This refresh aligns the skill with the published references and rewrites the custom RAG pipeline around ai_prep_search instead of manual variant_get + explode + md5 chunking. - ai_parse_document: DBR 17.1+ -> 17.3+; add TIFF/DOC/PPT formats, pageRange option, per-element id/confidence, expanded element type list, 500-page and 100MB limits. - ai_extract: add object/array schema types, v2.1 options (enableCitations, enableConfidenceScores, version), 150-char field-name and 128k context limits, plus a v2.1 citations/confidence example. - ai_classify: add version option and 128k context-limit note. - ai_prep_search: new reference section in 1-task-functions.md (syntax, return shape, SQL + PySpark end-to-end), wired into SKILL.md function table, selection rules, prereqs, and Common Issues. - 4-document-processing-pipeline.md: rewrite "Custom RAG Pipeline" as parse -> prep -> index -> query; streaming Stage 2 now calls ai_prep_search; Vector Search step explicitly names chunk_to_embed (embedding source) and chunk_id (PK); retain legacy manual-chunking fallback for DBR < 18.2. Co-authored-by: Isaac

qian-yu-db assigned dustinvannoy-db and calreynolds May 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update AI Function Skills based on product updates#551

Update AI Function Skills based on product updates#551
qian-yu-db wants to merge 1 commit into
mainfrom
update-ai-functions-skill

qian-yu-db commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

qian-yu-db commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants