diff --git a/databricks-skills/databricks-ai-functions/1-task-functions.md b/databricks-skills/databricks-ai-functions/1-task-functions.md
index a94159ea..53afece7 100644
--- a/databricks-skills/databricks-ai-functions/1-task-functions.md
+++ b/databricks-skills/databricks-ai-functions/1-task-functions.md
@@ -1,6 +1,6 @@
 # Task-Specific AI Functions — Full Reference
 
-These functions require no model endpoint selection. They call pre-configured Foundation Model APIs optimized for each task. All require DBR 15.1+ (15.4 ML LTS for batch); `ai_parse_document` requires DBR 17.1+.
+These functions require no model endpoint selection. They call pre-configured Foundation Model APIs optimized for each task. All require DBR 15.1+ (15.4 ML LTS for batch); `ai_parse_document` requires DBR 17.3+; `ai_prep_search` requires DBR 18.2+ (serverless env v3+).
 
 ---
 
@@ -34,10 +34,13 @@ df.withColumn("sentiment", expr("ai_analyze_sentiment(review_text)")).display()
   - With descriptions: `'{"billing_error": "Payment, invoice, or refund issues", "product_defect": "Any malfunction or bug"}'` (descriptions up to 1000 chars each)
   - 2–500 labels, each 1–100 characters
 - `options`: optional MAP\<STRING, STRING\>:
+  - `version`: `"2.0"` (recommended) or `"1.0"` for backward compatibility
   - `instructions`: task context to improve accuracy (max 20,000 chars)
   - `multilabel`: `"true"` to return multiple matching labels (default `"false"`)
 
-Returns VARIANT. Returns `NULL` if content is `NULL`.
+Returns VARIANT `{"response": ["label", ...], "error_message": null}`. Returns `NULL` if content is `NULL`.
+
+**Constraints:** total input + labels context capped at **128,000 tokens**; not available on Databricks SQL Classic.
 
 ```sql
 -- simple labels
@@ -91,12 +94,15 @@ df.withColumn(
       "line_items": {"type": "array", "items": {"type": "object", "properties": {...}}}
     }
     ```
-  - Supported types: `string`, `integer`, `number`, `boolean`, `enum`
-  - Max 128 fields, 7 nesting levels, 500 enum values
+  - Supported types: `string`, `integer`, `number`, `boolean`, `enum`, `object` (with `properties`), `array` (with `items`)
+  - Max 128 fields, field names up to 150 chars, 7 nesting levels, 500 enum values, 128,000 token total context
 - `options`: optional MAP\<STRING, STRING\>:
+  - `version`: `"2.1"` (recommended) / `"2.0"` / `"1.0"`
   - `instructions`: task context to improve extraction quality (max 20,000 chars)
+  - `enableCitations`: `"true"` to attach `citation_ids` to each extracted field
+  - `enableConfidenceScores`: `"true"` to attach a per-field `confidence_score` (0–1)
 
-Returns VARIANT `{"response": {...}, "error_message": null}`. Returns `NULL` if content is `NULL`.
+Returns VARIANT `{"response": {...}, "error_message": null}`. With `enableCitations` or `enableConfidenceScores` enabled, each scalar field becomes an object `{"value": ..., "citation_ids": [...], "confidence_score": 0.x}` and a `metadata` block is added at the top level. Returns `NULL` if content is `NULL`.
 
 ```sql
 -- simple schema
@@ -129,6 +135,32 @@ df = df.withColumn(
 df.display()
 ```
 
+### Version 2.1: citations and confidence scores
+
+Pass `version => 2.1` with `enableCitations` and/or `enableConfidenceScores` to attach provenance and reliability metadata to each extracted field. Useful for review queues and downstream filtering by confidence.
+
+```sql
+SELECT ai_extract(
+    document_text,
+    '["invoice_id", "vendor_name", "total_amount"]',
+    MAP(
+        'version', '2.1',
+        'enableCitations', 'true',
+        'enableConfidenceScores', 'true'
+    )
+) AS extracted
+FROM parsed_documents;
+
+-- Each scalar field is now an object: {value, citation_ids, confidence_score}
+-- Access:
+SELECT
+    extracted:response:invoice_id:value::STRING       AS invoice_id,
+    extracted:response:invoice_id:confidence_score::DOUBLE AS invoice_id_conf,
+    extracted:response:total_amount:value::DOUBLE     AS total_amount,
+    extracted:metadata                                AS metadata
+FROM extracted_invoices;
+```
+
 ---
 
 ## `ai_fix_grammar`
@@ -300,38 +332,44 @@ df.withColumn(
 
 **Docs:** https://docs.databricks.com/aws/en/sql/language-manual/functions/ai_parse_document
 
-**Requires:** DBR 17.1+
+**Requires:** DBR 17.3+ (serverless env v3+ for VARIANT). Region-restricted — check feature availability.
 
 **Syntax:** `ai_parse_document(content [, options])`
 - `content`: BINARY — document content loaded from `read_files()` or `spark.read.format("binaryFile")`
 - `options`: MAP\<STRING, STRING\> (optional) — parsing configuration
 
-**Supported formats:** PDF, JPG/JPEG, PNG, DOCX, PPTX
+**Supported formats:** PDF, JPG/JPEG, PNG, TIFF/TIF, DOC/DOCX, PPT/PPTX
 
-Returns a VARIANT with pages, elements (text paragraphs, tables, figures, headers, footers), bounding boxes, and error metadata.
+Returns a VARIANT with pages, elements (text, tables, figures, titles, captions, section headers, page headers/footers, page numbers, footnotes), bounding boxes, confidence scores, and error metadata.
 
 **Options:**
 
 | Key | Values | Description |
 |-----|--------|-------------|
 | `version` | `'2.0'` | Output schema version |
-| `imageOutputPath` | Volume path | Save rendered page images |
-| `descriptionElementTypes` | `''`, `'figure'`, `'*'` | AI-generated descriptions (default: `'*'` for all) |
+| `imageOutputPath` | Volume path | Save rendered page images to a UC Volume |
+| `descriptionElementTypes` | `''`, `'figure'`, `'*'` | AI-generated descriptions (default: `'*'` for all). Set to `''` to disable and reduce cost. |
+| `pageRange` | e.g. `'1,3,5-10'` | Restrict parsing to a subset of pages (1-indexed) |
 
-**Output schema:**
+**Output schema (v2.0):**
 
 ```
 document
-├── pages[]          -- page id, image_uri
+├── pages[]          -- id, image_uri
 └── elements[]       -- extracted content
-    ├── type         -- "text", "table", "figure", etc.
+    ├── id           -- per-element id
+    ├── type         -- text | table | figure | title | caption | section_header
+    │                --   | page_header | page_footer | page_number | footnote
     ├── content      -- extracted text
-    ├── bbox         -- bounding box coordinates
-    └── description  -- AI-generated description
-metadata             -- file info, schema version
-error_status[]       -- errors per page (if any)
+    ├── confidence   -- DOUBLE 0–1
+    ├── bbox         -- [{coord:[...], page_id}]
+    └── description  -- AI-generated description (figures/tables when enabled)
+metadata             -- id, version, file_metadata
+error_status[]       -- {error_message, page_id} per page (if any)
 ```
 
+**Limits:** max 500 pages per document, max 100 MB file size.
+
 ```sql
 -- Parse and extract text blocks
 SELECT
@@ -353,6 +391,13 @@ SELECT ai_parse_document(
     )
 ) AS parsed
 FROM read_files('/Volumes/catalog/schema/volume/invoices/', format => 'binaryFile');
+
+-- Parse only specific pages (cheaper for large documents)
+SELECT ai_parse_document(
+    content,
+    map('version', '2.0', 'pageRange', '1,3,5-10')
+) AS parsed
+FROM read_files('/Volumes/catalog/schema/volume/contracts/', format => 'binaryFile');
 ```
 
 ```python
@@ -380,6 +425,106 @@ df.display()
 ```
 
 **Limitations:**
+- Max 500 pages per document, max 100 MB file size
 - Processing is slow for dense or low-resolution documents
-- Suboptimal for non-Latin alphabets and digitally signed PDFs
+- Suboptimal for non-Latin alphabets (e.g., Japanese, Korean in images) and digitally signed PDFs
 - Custom models not supported — always uses the built-in parsing model
+
+---
+
+## `ai_prep_search`
+
+**Docs:** https://docs.databricks.com/aws/en/sql/language-manual/functions/ai_prep_search
+
+**Requires:** DBR **18.2+** (serverless env v3+ for VARIANT support).
+
+Takes the VARIANT output of `ai_parse_document` and returns RAG-ready chunks. The function performs:
+1. **Semantic chunking** — splits document content into retrieval-sized chunks at natural boundaries (paragraphs, sections, tables).
+2. **Context enrichment** — adds document title, section headers, page numbers, and captions to each chunk's embedding text so Vector Search can match on context, not just chunk content.
+
+Use this instead of hand-rolled `variant_get` + `explode` + `md5` chunking when feeding `ai_parse_document` output into Databricks Vector Search.
+
+**Syntax:** `ai_prep_search(parsed [, options])`
+- `parsed`: VARIANT — output from `ai_parse_document`
+- `options`: optional MAP\<STRING, STRING\>:
+  - `version`: output schema version (major.minor; minor upgrades are backward-compatible)
+
+**Returns:** VARIANT with chunks ready for Vector Search:
+
+```
+chunks[]
+├── chunk_id            -- unique id (document_id + position) — use as PK
+├── chunk_position      -- ordinal within the document
+├── chunk_to_retrieve   -- raw chunk text (return this to the LLM)
+└── chunk_to_embed      -- context-enriched text (use this as the embedding source)
+pages[]                 -- page index + image_uri (when imageOutputPath was set on ai_parse_document)
+source_uri              -- input document path
+error_status            -- per-page error info, if any
+```
+
+**End-to-end SQL — parse, prep, persist for Vector Search:**
+
+```sql
+CREATE OR REPLACE TABLE catalog.schema.parsed_chunks AS
+WITH parsed AS (
+  SELECT
+    path AS source_path,
+    ai_parse_document(content) AS parsed
+  FROM read_files('/Volumes/catalog/schema/docs/', format => 'binaryFile')
+),
+prepped AS (
+  SELECT
+    source_path,
+    ai_prep_search(parsed) AS prep
+  FROM parsed
+),
+chunks AS (
+  SELECT
+    source_path,
+    explode(variant_get(prep, '$.chunks', 'ARRAY<VARIANT>')) AS chunk
+  FROM prepped
+)
+SELECT
+  variant_get(chunk, '$.chunk_id',          'STRING') AS chunk_id,
+  variant_get(chunk, '$.chunk_position',    'INT')    AS chunk_position,
+  variant_get(chunk, '$.chunk_to_retrieve', 'STRING') AS chunk_to_retrieve,
+  variant_get(chunk, '$.chunk_to_embed',    'STRING') AS chunk_to_embed,
+  source_path,
+  current_timestamp() AS prepped_at
+FROM chunks;
+
+-- Enable CDF so Vector Search Delta Sync picks up incremental changes
+ALTER TABLE catalog.schema.parsed_chunks
+SET TBLPROPERTIES (delta.enableChangeDataFeed = true);
+```
+
+**PySpark equivalent:**
+
+```python
+from pyspark.sql.functions import expr, current_timestamp
+
+chunks_df = (
+    spark.read.format("binaryFile")
+    .load("/Volumes/catalog/schema/docs/")
+    .withColumn("parsed", expr("ai_parse_document(content)"))
+    .withColumn("prep",   expr("ai_prep_search(parsed)"))
+    .withColumn("chunk",  expr("explode(variant_get(prep, '$.chunks', 'ARRAY<VARIANT>'))"))
+    .selectExpr(
+        "variant_get(chunk, '$.chunk_id',          'STRING') AS chunk_id",
+        "variant_get(chunk, '$.chunk_position',    'INT')    AS chunk_position",
+        "variant_get(chunk, '$.chunk_to_retrieve', 'STRING') AS chunk_to_retrieve",
+        "variant_get(chunk, '$.chunk_to_embed',    'STRING') AS chunk_to_embed",
+        "path AS source_path",
+    )
+    .withColumn("prepped_at", current_timestamp())
+)
+
+chunks_df.write.format("delta").mode("overwrite").saveAsTable("catalog.schema.parsed_chunks")
+```
+
+**Vector Search integration:** Point a Delta Sync index at this table with `chunk_to_embed` as the embedding source column and `chunk_id` as the primary key. The `chunk_to_retrieve` column is what you return to the LLM at query time.
+
+**Tips:**
+- Pass `imageOutputPath` on the upstream `ai_parse_document` call if you want page image URIs available in the prep output for multimodal retrieval.
+- Schema is versioned major.minor; minor upgrades are backward-compatible — pin `version` only if you need to lock schema across deployments.
+- On DBR < 18.2, fall back to manual chunking via `variant_get` + `explode` on `ai_parse_document` output.
diff --git a/databricks-skills/databricks-ai-functions/4-document-processing-pipeline.md b/databricks-skills/databricks-ai-functions/4-document-processing-pipeline.md
index 37498f49..4ff06da7 100644
--- a/databricks-skills/databricks-ai-functions/4-document-processing-pipeline.md
+++ b/databricks-skills/databricks-ai-functions/4-document-processing-pipeline.md
@@ -13,6 +13,7 @@ When processing documents with AI Functions, apply this order of preference for
 | Stage | Preferred function | Use `ai_query` when... |
 |---|---|---|
 | Parse binary docs (PDF, DOCX, images) | `ai_parse_document` | Need image-level reasoning |
+| Prepare parsed docs for Vector Search | `ai_prep_search` (DBR 18.2+) | Need a custom chunking strategy or DBR < 18.2 |
 | Extract fields from text (flat or nested) | `ai_extract` | Schema exceeds 128 fields or 7 nesting levels |
 | Classify document type or status | `ai_classify` | More than 20 categories |
 | Score item similarity / matching | `ai_similarity` | Need cross-document reasoning |
@@ -263,37 +264,39 @@ def processing_errors():
 
 ---
 
-## Custom RAG Pipeline — Parse → Chunk → Index → Query
+## Custom RAG Pipeline — Parse → Prep → Index → Query
 
-When the goal is retrieval-augmented generation rather than field extraction, use this pipeline to parse documents, chunk them into a Delta table, and index with Vector Search.
+When the goal is retrieval-augmented generation rather than field extraction, use this pipeline: `ai_parse_document` to read binary files, `ai_prep_search` to chunk and enrich, then a Vector Search Delta Sync index over the result.
 
-### Step 1 — Parse and Chunk into a Delta Table
+**Requires DBR 18.2+** (for `ai_prep_search`). On older runtimes, see the legacy manual-chunking fallback at the end of this section.
 
-`ai_parse_document` returns a VARIANT. Use `variant_get` with an explicit `ARRAY<VARIANT>` cast before calling `explode`, since `explode()` does not accept raw VARIANT values.
+### Step 1 — Parse and Prep into a Delta Table
+
+`ai_prep_search` takes the VARIANT output of `ai_parse_document` and returns RAG-ready chunks (`chunk_id`, `chunk_position`, `chunk_to_retrieve`, `chunk_to_embed`). The `chunk_to_embed` column is enriched with document title, section headers, page numbers, and captions — Vector Search will match on that context, not just chunk text.
 
 ```sql
 CREATE OR REPLACE TABLE catalog.schema.parsed_chunks AS
 WITH parsed AS (
   SELECT
-    path,
-    ai_parse_document(content) AS doc
+    path AS source_path,
+    ai_parse_document(content) AS parsed
   FROM read_files('/Volumes/catalog/schema/volume/docs/', format => 'binaryFile')
 ),
-elements AS (
+prepped AS (
   SELECT
-    path,
-    explode(variant_get(doc, '$.document.elements', 'ARRAY<VARIANT>')) AS element
+    source_path,
+    ai_prep_search(parsed) AS prep
   FROM parsed
 )
 SELECT
-  md5(concat(path, variant_get(element, '$.content', 'STRING'))) AS chunk_id,
-  path AS source_path,
-  variant_get(element, '$.content', 'STRING') AS content,
-  variant_get(element, '$.type', 'STRING') AS element_type,
-  current_timestamp() AS parsed_at
-FROM elements
-WHERE variant_get(element, '$.content', 'STRING') IS NOT NULL
-  AND length(trim(variant_get(element, '$.content', 'STRING'))) > 10;
+  variant_get(chunk, '$.chunk_id',          'STRING') AS chunk_id,
+  variant_get(chunk, '$.chunk_position',    'INT')    AS chunk_position,
+  variant_get(chunk, '$.chunk_to_retrieve', 'STRING') AS chunk_to_retrieve,
+  variant_get(chunk, '$.chunk_to_embed',    'STRING') AS chunk_to_embed,
+  source_path,
+  current_timestamp() AS prepped_at
+FROM prepped
+LATERAL VIEW explode(variant_get(prep, '$.chunks', 'ARRAY<VARIANT>')) c AS chunk;
 ```
 
 ### Step 1a (Production) — Incremental Parsing with Structured Streaming
@@ -310,7 +313,7 @@ from pyspark.sql.functions import col, current_timestamp, expr
 
 files_df = (
     spark.readStream.format("binaryFile")
-    .option("pathGlobFilter", "*.{pdf,jpg,jpeg,png}")
+    .option("pathGlobFilter", "*.{pdf,jpg,jpeg,png,tif,tiff,docx,pptx}")
     .option("recursiveFileLookup", "true")
     .load("/Volumes/catalog/schema/volume/docs/")
 )
@@ -338,49 +341,46 @@ parsed_df = (
 )
 ```
 
-**Stage 2 — Extract text from parsed VARIANT (streaming):**
+**Stage 2 — Prep chunks for Vector Search (streaming):**
 
-Uses `transform()` to extract element content from the VARIANT array, and `try_cast` for safe access. Error rows are preserved but flagged.
+`ai_prep_search` handles semantic chunking + context enrichment in one call. Skip rows that hit parse errors.
 
 ```python
-from pyspark.sql.functions import col, concat_ws, expr, lit, when
+from pyspark.sql.functions import col, expr, lit, when
 
 parsed_stream = spark.readStream.format("delta").table("catalog.schema.parsed_documents_raw")
 
-text_df = (
+prepped_df = (
     parsed_stream
-    .withColumn("text",
-        when(
-            expr("try_cast(parsed:error_status AS STRING)").isNotNull(), lit(None)
-        ).otherwise(
-            concat_ws("\n\n", expr("""
-                transform(
-                    try_cast(parsed:document:elements AS ARRAY),
-                    element -> try_cast(element:content AS STRING)
-                )
-            """))
-        )
+    .filter(expr("try_cast(parsed:error_status AS STRING) IS NULL"))
+    .withColumn("prep", expr("ai_prep_search(parsed)"))
+    .withColumn("chunk", expr("explode(variant_get(prep, '$.chunks', 'ARRAY<VARIANT>'))"))
+    .selectExpr(
+        "variant_get(chunk, '$.chunk_id',          'STRING') AS chunk_id",
+        "variant_get(chunk, '$.chunk_position',    'INT')    AS chunk_position",
+        "variant_get(chunk, '$.chunk_to_retrieve', 'STRING') AS chunk_to_retrieve",
+        "variant_get(chunk, '$.chunk_to_embed',    'STRING') AS chunk_to_embed",
+        "path AS source_path",
+        "parsed_at",
     )
-    .withColumn("error_status", expr("try_cast(parsed:error_status AS STRING)"))
-    .select("path", "text", "error_status", "parsed_at")
 )
 
 (
-    text_df.writeStream.format("delta")
+    prepped_df.writeStream.format("delta")
     .outputMode("append")
-    .option("checkpointLocation", "/Volumes/catalog/schema/checkpoints/02_text")
+    .option("checkpointLocation", "/Volumes/catalog/schema/checkpoints/02_prep")
     .option("mergeSchema", "true")
     .trigger(availableNow=True)
-    .toTable("catalog.schema.parsed_documents_text")
+    .toTable("catalog.schema.parsed_chunks")
 )
 ```
 
 Key techniques:
 - **`repartition` by file hash** — parallelizes `ai_parse_document` across workers
 - **`trigger(availableNow=True)`** — processes all pending files then stops (batch-like)
-- **Checkpoints** — exactly-once guarantee; no re-parsing on re-runs
-- **`transform()` + `try_cast`** — safer than `explode` + `variant_get` for text extraction
-- **Separate stages with independent checkpoints** — parse and text extraction can fail/retry independently
+- **Checkpoints** — exactly-once guarantee; no re-parsing or re-prepping on re-runs
+- **`ai_prep_search`** — handles semantic chunking + context enrichment; no manual `transform()` + length filters needed
+- **Separate stages with independent checkpoints** — parse and prep can fail/retry independently
 
 ### Step 1b — Enable Change Data Feed
 
@@ -393,16 +393,47 @@ SET TBLPROPERTIES (delta.enableChangeDataFeed = true);
 
 ### Step 2 — Create a Vector Search Index and Query It
 
-Use the **[databricks-vector-search](../databricks-vector-search/SKILL.md)** skill to create a Delta Sync index on the chunked table and query it. Ensure CDF is enabled first (Step 1b above).
+Use the **[databricks-vector-search](../databricks-vector-search/SKILL.md)** skill to create a Delta Sync index on `catalog.schema.parsed_chunks`:
+- **Primary key:** `chunk_id`
+- **Embedding source column:** `chunk_to_embed` (context-enriched text — do not embed `chunk_to_retrieve`)
+- **Return column at query time:** `chunk_to_retrieve` (raw chunk text for the LLM)
+
+Ensure CDF is enabled first (Step 1b above).
 
 ### RAG-Specific Issues
 
 | Issue | Solution |
 |-------|----------|
-| `explode()` fails with VARIANT | `explode()` requires ARRAY, not VARIANT. Use `variant_get(doc, '$.document.elements', 'ARRAY<VARIANT>')` to cast before exploding |
-| Short/noisy chunks | Filter with `length(trim(...)) > 10` — parsing produces tiny fragments (page numbers, headers) that pollute the index |
-| Re-parsing unchanged documents | Use Structured Streaming with checkpoints — see Step 1a above |
-| Region not supported | US/EU regions only, or enable cross-geography routing |
+| `ai_prep_search` not found | Requires DBR **18.2+** (serverless env v3+). Use the legacy manual-chunking fallback below on older runtimes. |
+| Embedding the wrong column | Embed `chunk_to_embed` (enriched with doc title/headers/page), **not** `chunk_to_retrieve`. Return `chunk_to_retrieve` to the LLM. |
+| `explode()` fails with VARIANT | `explode()` requires ARRAY, not VARIANT. Cast first: `explode(variant_get(prep, '$.chunks', 'ARRAY<VARIANT>'))` |
+| Region not supported | `ai_parse_document` / `ai_prep_search` are region-restricted. Check feature availability or enable cross-geography routing. |
+
+### Legacy fallback — DBR < 18.2 (no `ai_prep_search`)
+
+If `ai_prep_search` is unavailable, fall back to manual chunking on `ai_parse_document` element output. Filter out short/noisy fragments (page numbers, headers) that pollute the index:
+
+```sql
+CREATE OR REPLACE TABLE catalog.schema.parsed_chunks AS
+WITH parsed AS (
+  SELECT path, ai_parse_document(content) AS doc
+  FROM read_files('/Volumes/catalog/schema/volume/docs/', format => 'binaryFile')
+),
+elements AS (
+  SELECT path, explode(variant_get(doc, '$.document.elements', 'ARRAY<VARIANT>')) AS element
+  FROM parsed
+)
+SELECT
+  md5(concat(path, variant_get(element, '$.content', 'STRING'))) AS chunk_id,
+  path AS source_path,
+  variant_get(element, '$.content', 'STRING') AS chunk_to_retrieve,
+  variant_get(element, '$.content', 'STRING') AS chunk_to_embed,  -- no enrichment
+  variant_get(element, '$.type', 'STRING') AS element_type,
+  current_timestamp() AS parsed_at
+FROM elements
+WHERE variant_get(element, '$.content', 'STRING') IS NOT NULL
+  AND length(trim(variant_get(element, '$.content', 'STRING'))) > 10;
+```
 
 ---
 
@@ -497,7 +528,7 @@ with mlflow.start_run():
 
 ## Tips
 
-1. **Parse first, enrich second** — always run `ai_parse_document` as the first stage. Feed its text output to task-specific functions; never pass raw binary to `ai_query`.
+1. **Parse → prep → enrich** — run `ai_parse_document` first. For RAG, pipe its VARIANT into `ai_prep_search` (DBR 18.2+) for chunking + context enrichment. For extraction, feed its text output to task-specific functions. Never pass raw binary to `ai_query`.
 2. **Flat or nested fields → `ai_extract`; deeply nested JSON exceeding 7 levels → `ai_query`** — pass `MAP('version', '2.0')` and access results through `:response`.
 3. **`failOnError => false` is mandatory in batch** — write errors to a sidecar `_errors` table rather than crashing the pipeline.
 4. **Truncate before sending to `ai_query`** — use `LEFT(text, 6000)` or chunk long documents to stay within context window limits.
diff --git a/databricks-skills/databricks-ai-functions/SKILL.md b/databricks-skills/databricks-ai-functions/SKILL.md
index 19897d8a..2cbf2b70 100644
--- a/databricks-skills/databricks-ai-functions/SKILL.md
+++ b/databricks-skills/databricks-ai-functions/SKILL.md
@@ -1,6 +1,6 @@
 ---
 name: databricks-ai-functions
-description: "Use Databricks built-in AI Functions (ai_classify, ai_extract, ai_summarize, ai_mask, ai_translate, ai_fix_grammar, ai_gen, ai_analyze_sentiment, ai_similarity, ai_parse_document, ai_query, ai_forecast) to add AI capabilities directly to SQL and PySpark pipelines without managing model endpoints. Also covers document parsing and building custom RAG pipelines (parse → chunk → index → query)."
+description: "Use Databricks built-in AI Functions (ai_classify, ai_extract, ai_summarize, ai_mask, ai_translate, ai_fix_grammar, ai_gen, ai_analyze_sentiment, ai_similarity, ai_parse_document, ai_prep_search, ai_query, ai_forecast) to add AI capabilities directly to SQL and PySpark pipelines without managing model endpoints. Also covers document parsing and building custom RAG pipelines (parse → prep_search → index → query)."
 ---
 
 # Databricks AI Functions
@@ -16,7 +16,7 @@ There are three categories:
 
 | Category | Functions | Use when |
 |---|---|---|
-| **Task-specific** | `ai_analyze_sentiment`, `ai_classify`, `ai_extract`, `ai_fix_grammar`, `ai_gen`, `ai_mask`, `ai_similarity`, `ai_summarize`, `ai_translate`, `ai_parse_document` | The task is well-defined — prefer these always |
+| **Task-specific** | `ai_analyze_sentiment`, `ai_classify`, `ai_extract`, `ai_fix_grammar`, `ai_gen`, `ai_mask`, `ai_similarity`, `ai_summarize`, `ai_translate`, `ai_parse_document`, `ai_prep_search` | The task is well-defined — prefer these always |
 | **General-purpose** | `ai_query` | Complex nested JSON, custom endpoints, multimodal — **last resort only** |
 | **Table-valued** | `ai_forecast` | Time series forecasting |
 
@@ -34,13 +34,15 @@ There are three categories:
 | Free-form generation | `ai_gen` | Need structured JSON output |
 | Semantic similarity | `ai_similarity` | Never |
 | PDF / document parsing | `ai_parse_document` | Need image-level reasoning |
+| RAG chunk preparation (from `ai_parse_document`) | `ai_prep_search` (semantic chunking + context enrichment) | Need custom chunking strategy or DBR < 18.2 |
 | Complex JSON / reasoning | — | **This is the intended use case for `ai_query`** |
 
 ## Prerequisites
 
 - Databricks SQL warehouse (**not Classic**) or cluster with DBR **15.1+**
 - DBR **15.4 ML LTS** recommended for batch workloads
-- DBR **17.1+** required for `ai_parse_document`
+- DBR **17.3+** required for `ai_parse_document`
+- DBR **18.2+** required for `ai_prep_search` (serverless requires environment version **3+** for VARIANT support)
 - `ai_forecast` requires a **Pro or Serverless** SQL warehouse
 - Workspace in a supported AWS/Azure region for batch AI inference
 - Models run under Apache 2.0 or LLAMA 3.3 Community License — customers are responsible for compliance
@@ -176,7 +178,7 @@ FROM ai_forecast(
 
 ## Reference Files
 
-- [1-task-functions.md](1-task-functions.md) — Full syntax, parameters, SQL + PySpark examples for all 9 task-specific functions (`ai_analyze_sentiment`, `ai_classify`, `ai_extract`, `ai_fix_grammar`, `ai_gen`, `ai_mask`, `ai_similarity`, `ai_summarize`, `ai_translate`) and `ai_parse_document`
+- [1-task-functions.md](1-task-functions.md) — Full syntax, parameters, SQL + PySpark examples for the task-specific functions (`ai_analyze_sentiment`, `ai_classify`, `ai_extract`, `ai_fix_grammar`, `ai_gen`, `ai_mask`, `ai_similarity`, `ai_summarize`, `ai_translate`), plus `ai_parse_document` and `ai_prep_search`
 - [2-ai-query.md](2-ai-query.md) — `ai_query` complete reference: all parameters, structured output with `responseFormat`, multimodal `files =>`, UDF patterns, and error handling
 - [3-ai-forecast.md](3-ai-forecast.md) — `ai_forecast` parameters, single-metric, multi-group, multi-metric, and confidence interval patterns
 - [4-document-processing-pipeline.md](4-document-processing-pipeline.md) — End-to-end batch document processing pipeline using AI Functions in a Lakeflow Declarative Pipeline; includes `config.yml` centralization, function selection logic, custom RAG pipeline (parse → chunk → Vector Search), and DSPy/LangChain guidance for near-real-time variants
@@ -185,7 +187,8 @@ FROM ai_forecast(
 
 | Issue | Solution |
 |---|---|
-| `ai_parse_document` not found | Requires DBR **17.1+**. Check cluster runtime. |
+| `ai_parse_document` not found | Requires DBR **17.3+**. Check cluster runtime. |
+| `ai_prep_search` not found | Requires DBR **18.2+** (serverless env v3+). On older runtimes, fall back to manual chunking via `variant_get` + `explode` on `ai_parse_document` output. |
 | `ai_forecast` fails | Requires **Pro or Serverless** SQL warehouse — not available on Classic or Starter. |
 | All functions return NULL | Input column is NULL. Filter with `WHERE col IS NOT NULL` before calling. |
 | `ai_translate` fails for a language | Supported: English, German, French, Italian, Portuguese, Hindi, Spanish, Thai. Use `ai_query` with a multilingual model for others. |