Skip to content

Update AI Function Skills based on product updates#551

Open
qian-yu-db wants to merge 1 commit into
mainfrom
update-ai-functions-skill
Open

Update AI Function Skills based on product updates#551
qian-yu-db wants to merge 1 commit into
mainfrom
update-ai-functions-skill

Conversation

@qian-yu-db
Copy link
Copy Markdown
Collaborator

…search

Update ai_parse_document, ai_extract, and ai_classify, and add ai_prep_search (DBR 18.2+). This refresh aligns the skill with the published references and rewrites the custom RAG pipeline around ai_prep_search instead of manual variant_get + explode + md5 chunking.

  • ai_parse_document: DBR 17.1+ -> 17.3+; add TIFF/DOC/PPT formats, pageRange option, per-element id/confidence, expanded element type list, 500-page and 100MB limits.
  • ai_extract: add object/array schema types, v2.1 options (enableCitations, enableConfidenceScores, version), 150-char field-name and 128k context limits, plus a v2.1 citations/confidence example.
  • ai_classify: add version option and 128k context-limit note.
  • ai_prep_search: new reference section in 1-task-functions.md (syntax, return shape, SQL + PySpark end-to-end), wired into SKILL.md function table, selection rules, prereqs, and Common Issues.
  • 4-document-processing-pipeline.md: rewrite "Custom RAG Pipeline" as parse -> prep -> index -> query; streaming Stage 2 now calls ai_prep_search; Vector Search step explicitly names chunk_to_embed (embedding source) and chunk_id (PK); retain legacy manual-chunking fallback for DBR < 18.2.

…search

Live AWS docs for ai_parse_document, ai_extract, and ai_classify had drifted
from the skill, and ai_prep_search (DBR 18.2+) was missing entirely. This
refresh aligns the skill with the published references and rewrites the
custom RAG pipeline around ai_prep_search instead of manual variant_get +
explode + md5 chunking.

- ai_parse_document: DBR 17.1+ -> 17.3+; add TIFF/DOC/PPT formats, pageRange
  option, per-element id/confidence, expanded element type list, 500-page
  and 100MB limits.
- ai_extract: add object/array schema types, v2.1 options (enableCitations,
  enableConfidenceScores, version), 150-char field-name and 128k context
  limits, plus a v2.1 citations/confidence example.
- ai_classify: add version option and 128k context-limit note.
- ai_prep_search: new reference section in 1-task-functions.md (syntax,
  return shape, SQL + PySpark end-to-end), wired into SKILL.md function
  table, selection rules, prereqs, and Common Issues.
- 4-document-processing-pipeline.md: rewrite "Custom RAG Pipeline" as
  parse -> prep -> index -> query; streaming Stage 2 now calls
  ai_prep_search; Vector Search step explicitly names chunk_to_embed
  (embedding source) and chunk_id (PK); retain legacy manual-chunking
  fallback for DBR < 18.2.

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants