diff --git a/docs/custom_ops.md b/docs/custom_ops.md
index cacbaba47..7bf8b026e 100644
--- a/docs/custom_ops.md
+++ b/docs/custom_ops.md
@@ -531,674 +531,621 @@ expect(node, inputs=[inputs],
 </details>
 
 
-## String operators
-
-### StringEqual
+### CLIPTokenizer
 
 <details>
-<summary>StringEqual details</summary>
+<summary>CLIPTokenizer details</summary>
 
-Compares two strings and returns true if they are equal and false if not.
+Byte-pair-encoding (BPE) tokenizer matching the CLIP text encoder from HuggingFace/OpenAI. Converts input strings into token id sequences.
 
-#### Inputs
+#### Attributes
 
-***x: tensor(string)***
+***vocab: string***
 
-The first string input
+JSON vocabulary mapping tokens to ids (contents of `vocab.json`).
 
-***x: tensor(string)***
+***merges: string***
 
-The second string input
+Merge rules (contents of `merges.txt`).
+
+***padding_length: int64_t*** (default is -1)
+
+If positive, the output is right-padded (or truncated) to this length. When -1, the output is padded to the maximum sequence length in the batch; the operator still returns a dense tensor with a dynamic second dimension.
+
+#### Inputs
+
+***input: tensor(string)***
+
+1D string tensor containing the input texts.
 
 #### Outputs
 
-***z: tensor(boolean)***
+***input_ids: tensor(int64)***
+
+Tensor of token ids.
 
-String with replacements.
+***attention_mask: tensor(int64)*** (optional)
+
+Mask with the same shape as `input_ids` (1 for real tokens, 0 for padding).
+
+***offset_mapping: tensor(int64)*** (optional)
+
+If requested, per-token `(begin, end)` byte offsets into the corresponding input string.
 
 </details>
 
 
-### StringHash
+### RobertaTokenizer
 
 <details>
-<summary>StringHash details</summary>
+<summary>RobertaTokenizer details</summary>
 
+BPE tokenizer compatible with HuggingFace's RoBERTa tokenizer. Uses the same attributes and I/O contract as `CLIPTokenizer`.
 
-Hashes the input string based on the number of buckets
+#### Attributes
 
-#### Inputs
+***vocab: string***
 
-***input: tensor(string)***
+JSON vocabulary (contents of `vocab.json`).
 
-The string to hash
+***merges: string***
 
-***num_buckets: tensor(int64)***
+BPE merge rules (contents of `merges.txt`).
 
-The number of buckets (must be equal to 1?)
+***padding_length: int64_t*** (default is -1)
 
-#### Outputs
+See `CLIPTokenizer`.
 
-***name: tensor(int64)***
+#### Inputs
 
-The hash value of the string
+***input: tensor(string)***
 
-</details>
+1D string tensor of input texts.
 
+#### Outputs
 
-### StringHashFast
+***input_ids: tensor(int64)***
 
-<details>
-<summary>StringHashFast details</summary>
+Token ids.
 
+***attention_mask: tensor(int64)*** (optional)
 
-A faster implementation of StringHash.
+Attention mask, same shape as `input_ids`.
 
-</details>
+***offset_mapping: tensor(int64)*** (optional)
 
+Per-token byte offsets into each input string.
 
-### StringJoin  
+</details>
 
-<details>
-<summary>StringJoin details</summary>
 
+### SpmTokenizer
 
-Join an array of strings
+<details>
+<summary>SpmTokenizer details</summary>
 
-#### Inputs
+SentencePiece-compatible tokenizer built on top of the shared BPE kernel. Produces tokens equivalent to HuggingFace's "fast" SentencePiece tokenizers (e.g. Llama, T5, XLM-RoBERTa).
 
-***input_X: tensor(string)***
+#### Attributes
 
-The input array of strings
+***vocab: string***
 
-***input_sep: tensor(string)***
+JSON vocabulary produced from a SentencePiece model.
 
-The string separator for the resulting joing
+***merges: string***
 
-***input_axis: tensor(int64)***
+SentencePiece merge rules.
 
-The axis along which to joing
+***padding_length: int64_t*** (default is -1)
 
-#### Outputs
+See `CLIPTokenizer`.
 
-***out: tensor(string)***
+#### Inputs
 
-The resulting joined string
+***input: tensor(string)***
 
-#### Examples
+1D string tensor of inputs.
 
+#### Outputs
 
-```bash
+***input_ids: tensor(int64)***
 
-input_X = [["a", "b", "c"], ["aa", "bb", ""]]
-input_sep=";"
-input_axis = 1
+Tensor of token ids.
 
-out = ["a;b;c", "aa;bb;"]
+***attention_mask: tensor(int64)*** (optional)
 
-input_axis = 0
+Attention mask with the same shape as `input_ids`.
 
-out = ['a;aa', 'b;bb', 'c;']
+***offset_mapping: tensor(int64)*** (optional)
 
+Per-token byte offsets.
 
 </details>
 
 
-### StringRegexReplace
+### HfBertTokenizer
 
 <details>
-<summary>StringRegexReplace details</summary>
+<summary>HfBertTokenizer details</summary>
 
+HuggingFace-compatible BERT WordPiece tokenizer. Behaves like `BertTokenizer`'s `__call__` method but with a smaller attribute surface. Produces ids, attention masks and token type ids in a single op.
 
-String replacement based on [Re2-format](https://github.com/google/re2/wiki/Syntax) regular expressions.
-
-#### Inputs
+#### Attributes
 
-***text: tensor(string)***
+***vocab_file: string***
 
-String tensor to extract slices from.
+Contents of `vocab.txt`.
 
-***pattern: tensor(string)***
+***do_lower_case: int64_t*** (default is 1)
 
-Pattern of the regular expression.
+Lowercase inputs before tokenization.
 
-***rewrite: tensor(string)***
+***strip_accents: int64_t*** (default is 0)
 
-Replacement.
+Strip accents as part of normalization.
 
-#### Attributes
+#### Inputs
 
-***global_replace: int64*** (default is 1)
+***input: tensor(string)***
 
-Replace all strings matching the pattern or the first one.
+1D string tensor containing the texts to tokenize.
 
 #### Outputs
 
-***output: tensor(string)***
+***input_ids: tensor(int64)***
 
-String with replacements.
+Token ids.
 
-#### Examples
+***attention_mask: tensor(int64)***
 
-```python
+Attention mask, same shape as `input_ids`.
 
-node = onnx.helper.make_node(
-    'StringRegexReplace',
-    inputs=['text', 'pattern', 'rewrite'],
-    outputs=['y'],
-)
+***token_type_ids: tensor(int64)*** (optional)
 
-text = np.array([['def myfunc():'], ['def dummy():']])
-pattern = np.array([r'def\s+([a-zA-Z_][a-zA-Z_0-9]*)\s*\(\s*\):'])
-rewrite = np.array([r'static PyObject* py_\1(void) {'])
-y = [['static PyObject* py_myfunc(void) {'],
-     ['static PyObject* py_dummy(void) {']]
+Segment ids. All zero for single-sentence input.
 
-expect(node, inputs=[text, pattern, rewrite], outputs=[y],
-       name='test_string_regex_replace')
-```
+***offset_mapping: tensor(int64)*** (optional)
 
-</details>
+Per-token `(begin, end)` byte offsets into the corresponding input string.
 
-### StringECMARegexReplace
+</details>
 
-<details>
-<summary>StringECMARegexReplace details</summary>
 
-String replacement based on [ECMA-format](https://en.cppreference.com/w/cpp/regex/ecmascript) regular expressions.
+### HfJsonTokenizer
 
-#### Inputs
+<details>
+<summary>HfJsonTokenizer details</summary>
 
-***text: tensor(string)***
+Loads a HuggingFace `tokenizer.json` directly and dispatches to the appropriate kernel (BPE or Unigram). Matches HuggingFace fast tokenizers at inference time.
 
-String tensor to extract slices from.
+#### Attributes
 
-***pattern: tensor(string)***
+***tokenizer_config: string***
 
-Pattern of the regular expression.
+Contents of `tokenizer.json` (and optionally `tokenizer_config.json`).
 
-***rewrite: tensor(string)***
+***tokenizer_vocab: string*** (optional)
 
-Replacement.
+Additional vocabulary data when the tokenizer uses an external vocab file.
 
-#### Attributes
+#### Inputs
 
-***global_replace: int64*** (default is 1)
+***input: tensor(string)***
 
-Replace all strings matching the pattern or the first one.
+1D string tensor of inputs.
 
+#### Outputs
 
-***ignore_case: int64*** (default is 0)
+***input_ids: tensor(int64)***
 
-Replace 
+Token ids.
 
-#### Outputs
+***attention_mask: tensor(int64)*** (optional)
 
-***output: tensor(string)***
+Attention mask matching `input_ids`.
 
-String with replacements.
+***offset_mapping: tensor(int64)*** (optional)
 
-#### Examples
+Per-token byte offsets.
 
+</details>
 
-```python
 
-node = onnx.helper.make_node(
-    'StringRegexReplace',
-    inputs=['text', 'pattern', 'rewrite'],
-    outputs=['y'],
-)
+### SentencepieceDecoder
 
-text = np.array([['def myfunc():'], ['def dummy():']])
-pattern = np.array([r'def\s+([a-zA-Z_][a-zA-Z_0-9]*)\s*\(\s*\):'])
-rewrite = np.array([r'static PyObject* py_$1(void) {'])
-y = [['static PyObject* py_myfunc(void) {'],
-     ['static PyObject* py_dummy(void) {']]
+<details>
+<summary>SentencepieceDecoder details</summary>
 
-expect(node, inputs=[text, pattern, rewrite], outputs=[y],
-       name='test_string_regex_replace')
-```
+Decodes a sequence of SentencePiece ids back into a string.
 
-</details>
+#### Attributes
 
+***model: string***
 
+Serialized SentencePiece model (`*.model`).
 
-### StringSplit 
+#### Inputs
 
-TODO
+***ids: tensor(int64)***
 
-### StringUpper  
+1D or 2D tensor of ids. When 2D the leading dimension must be 1.
 
-TODO
+***fairseq: tensor(bool)*** (optional)
 
-### StringLower
+Scalar flag. When true the `fairseq` vocab-id offset convention is applied.
 
-TODO
+#### Outputs
 
-### StringLength
+***output: tensor(string)***
 
-<details>
-<summary>StringECMARegexReplace details</summary>
+1D tensor with one string element containing the decoded text.
 
-Get the length of each string element in input tensor. Similar to the function `len("abcde"")` in python.
+</details>
 
-#### Inputs 
 
-***data: tensor(string)***
+### BpeDecoder
 
-String tensor to get length of its each string element.
+<details>
+<summary>BpeDecoder details</summary>
 
-#### Outputs
+Decodes BPE token ids (GPT-2 / CLIP / RoBERTa style) back into text.
 
-***output: tensor(int64)***
+#### Attributes
 
-Data length tensor.
+***id_vocab: string***
 
-#### Examples
+Newline-separated token strings indexed by id.
 
+***byte_decoder: string***
 
-```python
+Reverse byte-to-unicode mapping used by GPT-2 BPE encoders.
 
-node = onnx.helper.make_node(
-    'StringLength',
-    inputs=['x'],
-    outputs=['y']
-)
+***added_tokens: string*** (optional)
 
-x = ["abcdef", "hijkl"]
-y = np.array([len(x[0]), len(x[1])], dtype=np.int64)
+Extra tokens appended to the base vocabulary.
 
+***all_special_ids: string*** (optional)
 
-expect(node, inputs=[x], outputs=[y],
-       name='test_string_length')
-```
-</details>
- 
-### StringConcat 
+Comma-separated list of special token ids.
 
-<details>
-<summary>StringConcat details</summary>
+***skip_special_tokens: int64_t*** (default is 0)
 
-Concat the corresponding string in the two string tensor. Two input tensors should have the same dimension.
+When 1, ids in `all_special_ids` are skipped during decoding.
 
-```python
-  output = []
-  shape = input1.shape
-  input1 = input1.flatten()
-  input2 = input2.flatten()
-  for i in range(len(input1)):
-      output.append(input1[i] + input2[i])
-  output = np.array(output).reshape(shape)
-```
+***en_normalization: int64_t*** (default is 0)
 
-#### Inputs
+Apply a minimal English-oriented post-processing step (e.g. undo leading-space markers).
 
-***input_1: tensor(string)***
+***whitespace_token: string*** (optional)
+***bos_token: string*** (optional)
+***eos_token: string*** (optional)
+***unk_token: string*** (optional)
 
-The first string tensor.
+Optional overrides for well-known special tokens.
 
-***input_2: tensor(string)***
+#### Inputs
 
-The second string tensor.
+***ids: tensor(int64)***
 
+1D or 2D tensor of token ids.
 
 #### Outputs
 
 ***output: tensor(string)***
 
-The result.
-
-#### Examples
-
+Decoded string tensor.
 
-```python
+</details>
 
-node = onnx.helper.make_node(
-    'StringConcat',
-    inputs=['x', 'y'],
-    outputs=['result'],
-)
 
-x = np.array(["abcd", "efgh"])
-y = np.array(["wxyz", "stuv"])
-result = np.array([x[0] + y[0], x[1] + y[1]])
+### TrieTokenizer
 
-expect(node, inputs=[x, y], outputs=[result],
-       name='test_string_concat')
-```
+<details>
+<summary>TrieTokenizer details</summary>
 
-</details>
+Trie-based longest-match tokenizer used by RWKV-style models.
 
-### StringRegexSplitWithOffsets
+#### Attributes
 
-<details>
-<summary>StringRegexSplitWithOffsets details</summary>
+***vocab: string***
 
-Splits string based on regular expressions.
+Newline-separated vocab where each line has the form `index token length`. `token` is a Python-repr-encoded byte string.
 
 #### Inputs
 
-***text: tensor(string)***
+***input: tensor(string)***
 
-String tensor to extract slices from.
+1D string tensor of inputs.
 
-***delim_regex_pattern: tensor(string)***
+#### Outputs
 
-Splitting attern of the regular expression.
+***output: tensor(int64)***
 
-***keep_delim_regex_pattern: tensor(string)***
+2D right-padded tensor of token ids; padding uses id `0`.
 
-By default, delimiters are not included in the split string results. Delimiters may be included by specifying a regex pattern keep_delim_regex_pattern.
+</details>
 
-#### Outputs
 
-***words: tensor(string)*** Tensor of words.
+### TrieDetokenizer
 
-***offsets: tensor(int64)*** 2D tensor with 3 columns:
-sentence index, position of the first character, position of the last one (excluded)
+<details>
+<summary>TrieDetokenizer details</summary>
 
-***row_indices: tensor(int64)*** Indices of every first token of input sentences.
-`row_indices[i+1] - row_indices[i]` is the number of tokens in input `i`.
-These are updates row indices given as inputs or new ones if the second input is empty.
+Inverse of `TrieTokenizer`. Converts 1D or 2D id tensors back to strings using the same trie vocabulary.
 
+#### Attributes
 
-#### Examples
+***vocab: string***
 
+Same vocabulary format as `TrieTokenizer`.
 
-```python
+#### Inputs
 
-node = onnx.helper.make_node(
-    'StringRegexSplit',
-    inputs=['text', 'pattern', 'rewrite'],
-    outputs=['y', 'begin_end', 'indices'],
-)
+***ids: tensor(int64)***
 
-text = np.array(["hello there"])
-pattern = np.array([r'\s'])
-rewrite = np.array([r'\s'])
-y = np.array(["hello", " ", "there"])
-z1 = np.array([[0, 0, 5],
-               [0, 5, 6],
-               [0, 6, 11]], dtype=np.int64)
-z2 = np.array([0, 2], dtype=np.int64)
-
-expect(node, inputs=[text, pattern, rewrite], outputs=[y, z1, z2],
-       name='test_string_regex_replace')
-```
+1D or 2D tensor of token ids.
 
-</details>
+#### Outputs
 
+***output: tensor(string)***
 
-### StringECMARegexSplitWithOffsets
+Decoded text, one string per row.
 
-TODO
+</details>
 
-### VectorToString
+
+### BlingFireSentenceBreaker
 
 <details>
-<summary>VectorToString details</summary>
+<summary>BlingFireSentenceBreaker details</summary>
 
-VectorToString is the contrary operation to the `StringToVector` , they share same format of mapping table:
+Segments an input string into sentences using a compiled [BlingFire](https://github.com/microsoft/BlingFire) model.
 
-    <string>\t<scalar_1>\s<scalar_2>\s<scalar_3>...<scalar_n>
+#### Attributes
 
-Unmapped vector will output the value of the attribute `unk`.
+***model: string***
 
-Example:
+Raw bytes of the compiled BlingFire sentence-breaking model (`*.bin`).
 
-*Attributes:*
+***max_sentence: int64_t*** (default is -1)
 
-- `map`: 
-  ```
-  a   0 0 1 2
-  b   0 1 2 3
-  d   0 1 3 4
-  ```
+If positive, limits the number of returned sentences.
 
-- `unk`: "unknown_word"
+#### Inputs
 
-*Inputs:*
-- data: [[0,0,1,2],[0,1,3,4],[0,0,0,0]]
+***input: tensor(string)***
 
-*Ouputs:*
-- output: ["a", "d", "unknown_word" ]
+Scalar input string.
 
-#### Attributes
+#### Outputs
 
-***mapping_file_name***
+***output: tensor(string)***
 
-the formative mapping table
+1D tensor of sentences.
 
-***unmapping_value***
+</details>
 
-the result returned when a vector aren't found in the map
+
+## String operators
+
+### StringEqual
+
+<details>
+<summary>StringEqual details</summary>
+
+Compares two strings elementwise and returns true when they are equal.
 
 #### Inputs
 
-***data: tensor(T)***
+***x: tensor(string)***
 
-Input tensor
+The first string input.
+
+***y: tensor(string)***
+
+The second string input. Must have the same shape as `x` (or be broadcastable).
 
 #### Outputs
 
-***output: tensor(string)***
+***z: tensor(bool)***
 
-The mapping result of the input
+Boolean tensor with the same shape as the broadcasted inputs; `true` where the inputs are equal.
 
-#### Type Constraints
-***T:tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(bfloat16), tensor(float16), tensor(float), tensor(double), tensor(bool)***
+</details>
 
-Constrain input and output types to numerical tensors.
 
+### StringToHashBucket
 
-#### Examples
+<details>
+<summary>StringToHashBucket details</summary>
 
+Hashes each input string into one of `num_buckets` buckets using the internal FarmHash-like 64-bit hash implementation.
 
-```python
-mapping_table = \
-  """
-  a   0 0 1 2
-  b   0 1 2 3
-  d   0 1 3 4
-  """
+#### Inputs
 
-node = onnx.helper.make_node(
-    'VectorToString',
-    inputs=['x'],
-    outputs=['y'],
-    map=mapping_table,
-    unk="unknown_word"
-)
+***input: tensor(string)***
 
+The input string tensor to hash.
 
-x = np.array([[0,0,1,2],[0,1,3,4],[0,0,0,0]], type=np.int64)
-y = ["a", "d", "unknown_word"]
+***num_buckets: tensor(int64)***
 
+Scalar number of hash buckets. Must be greater than 0.
+
+#### Outputs
+
+***output: tensor(int64)***
+
+Tensor of the same shape as `input` containing the hash-bucket index for each input string. Each value lies in the range `[0, num_buckets)`.
 
-expect(node, inputs=[x], outputs=[y],
-       name='test_vector_to_string')
-```
 </details>
 
 
-### StringToVector
+### StringToHashBucketFast
 
 <details>
-<summary>StringToVector details</summary>
+<summary>StringToHashBucketFast details</summary>
 
-StringToVector will map each string element in the input to the corresponding vector according to the mapping file. The mapping file is a utf-8 encoding text file in tsv format:
+A faster variant of `StringToHashBucket` that uses `std::hash<std::string>` internally. Hash values are not stable across platforms or compilers, so the op is intended for stateless in-process hashing rather than persisted lookup tables.
 
-    <string>\t<scalar_1>\s<scalar_2>\s<scalar_3>...<scalar_n>
+#### Inputs
 
-Unmapped string will output the value of the attribute `unmapping_value`.
+***input: tensor(string)***
 
-Example:
+The strings to hash.
 
-*Attributes:*
+***num_buckets: tensor(int64)***
 
-- `mapping_file_name`: vocabulary.txt
-  ```
-  a   0 0 1 2
-  b   0 1 2 3
-  d   0 1 3 4
-  ```
-  
-- `unmapping_value`: [0 0 0 0]
+Scalar number of hash buckets. Must be greater than 0.
 
-*Inputs:*
-- data: ["a", "d", "e"]
+#### Outputs
 
-*Ouputs:*
-- output: [[0,0,1,2],[0,1,3,4],[0,0,0,0]]
+***output: tensor(int64)***
 
-#### Attributes
+The hashed values, with the same shape as `input`.
 
-***mapping_file_name:string***
+</details>
 
-The name of your string to vector mapping file.
 
-***unmapping_value:list(int)***
+### StringJoin  
 
-Mapping result for unmapped string
+<details>
+<summary>StringJoin details</summary>
+
+
+Join an array of strings
 
 #### Inputs
 
-***data: tensor(string)***
+***input_X: tensor(string)***
 
-Input tensor
+The input array of strings
 
-#### Outputs
+***input_sep: tensor(string)***
 
-***output: tensor(T)***
+The string separator for the resulting joining.
 
-The mapping result of the input
+***input_axis: tensor(int64)***
 
-#### Type Constraints
-***T:tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(bfloat16), tensor(float16), tensor(float), tensor(double), tensor(bool)***
+The axis along which to join.
 
-Constrain input and output types to numerical tensors.
+#### Outputs
 
-#### Examples
+***out: tensor(string)***
 
+The resulting joined string
 
-```python
-# what's in vocabulary.txt
+#### Examples
 
-mapping_table = \
-"""
-a   0 0 1 2
-b   0 1 2 3
-d   0 1 3 4
-"""
 
-node = onnx.helper.make_node(
-    'StringToVector',
-    inputs=['x'],
-    outputs=['y'],
-    mapping_table=mapping_table,
-    unmapping_value=[0,0,0,0]
-)
+```bash
 
+input_X = [["a", "b", "c"], ["aa", "bb", ""]]
+input_sep=";"
+input_axis = 1
 
-x = ["a", "d", "e"]
-y = np.array([[0,0,1,2],[0,1,3,4],[0,0,0,0]], type=np.int64)
+out = ["a;b;c", "aa;bb;"]
 
+input_axis = 0
 
-expect(node, inputs=[x], outputs=[y],
-       name='test_string_to_vector')
-```
+out = ['a;aa', 'b;bb', 'c;']
 
-</details>
 
+</details>
 
 
-### StringSlice 
+### StringRegexReplace
 
 <details>
-<summary>StringSlice details</summary>
+<summary>StringRegexReplace details</summary>
 
-Do the slice operation to each string element in input tensor. Similar to string slice in python
 
-```python
-a = "abcdef"
-b = a[1:2]
-c = a[3:1:-1]
-```
+String replacement based on [Re2-format](https://github.com/google/re2/wiki/Syntax) regular expressions.
 
 #### Inputs
 
-***data: tensor(string)***
+***text: tensor(string)***
 
 String tensor to extract slices from.
 
-***starts: tensor(int64/int32)***
+***pattern: tensor(string)***
 
-The tensor of starting indices of corresponding string in data, which has same dimension of data.
+Pattern of the regular expression.
 
-***ends: tensor(int64/int32)***
+***rewrite: tensor(string)***
 
-The tensor of ending indices of corresponding string in data, which has same dimension of data.
+Replacement.
 
-***steps(optional): tensor(int64/int32)***
+#### Attributes
 
-The tensor of slice step of corresponding string in data, which has same dimension of data.If steps is empty tensor, we will use default value 1 for each string
+***global_replace: int64*** (default is 1)
+
+Replace all strings matching the pattern or the first one.
 
 #### Outputs
 
 ***output: tensor(string)***
 
-Sliced data tensor.
+String with replacements.
 
 #### Examples
 
-
 ```python
 
 node = onnx.helper.make_node(
-    'StringSlice',
-    inputs=['x', 'starts', 'ends', 'steps'],
+    'StringRegexReplace',
+    inputs=['text', 'pattern', 'rewrite'],
     outputs=['y'],
 )
 
-x = np.array(["abcdef", "hijkl"])
-y = np.array([x[0][1:3:1], x[1][3:1:-1]])
-starts = np.array([1, 3], dtype=np.int64)
-ends = np.array([3, 1], dtype=np.int64)
-axes = np.array([0, 1], dtype=np.int64)
-steps = np.array([1, 1], dtype=np.int64)
+text = np.array([['def myfunc():'], ['def dummy():']])
+pattern = np.array([r'def\s+([a-zA-Z_][a-zA-Z_0-9]*)\s*\(\s*\):'])
+rewrite = np.array([r'static PyObject* py_\1(void) {'])
+y = [['static PyObject* py_myfunc(void) {'],
+     ['static PyObject* py_dummy(void) {']]
 
-expect(node, inputs=[x, starts, ends, axes, steps], outputs=[y],
-       name='test_string_slice')
+expect(node, inputs=[text, pattern, rewrite], outputs=[y],
+       name='test_string_regex_replace')
 ```
 
 </details>
 
-
-### MaskedFill
+### StringECMARegexReplace
 
 <details>
-<summary>MaskedFill details</summary>
+<summary>StringECMARegexReplace details</summary>
 
+String replacement based on [ECMA-format](https://en.cppreference.com/w/cpp/regex/ecmascript) regular expressions.
 
-Fills elements of self tensor with value where mask is True. The operator is similar with [`Tensor.masked_fill_`](https://pytorch.org/docs/stable/generated/torch.Tensor.masked_fill_.html#torch.Tensor.masked_fill_) in pytorch.
+#### Inputs
 
+***text: tensor(string)***
 
-#### Inputs
+String tensor to extract slices from.
 
-***value: tensor(string)***
+***pattern: tensor(string)***
 
-The value to fill in with, currently we only support string type and vector&scalar dimension.
+Pattern of the regular expression.
 
-***mask: tensor(bool)***
+***rewrite: tensor(string)***
+
+Replacement.
+
+#### Attributes
+
+***global_replace: int64*** (default is 1)
+
+Replace all strings matching the pattern or the first one.
 
-The boolean mask, the dimension of mask tensor should be same with value.
+
+***ignore_case: int64*** (default is 0)
+
+Whether to perform case-insensitive ECMAScript regular expression matching.
 
 #### Outputs
 
 ***output: tensor(string)***
 
-The filled output of input tensor.
-
+String with replacements.
 
 #### Examples
 
@@ -1206,59 +1153,1427 @@ The filled output of input tensor.
 ```python
 
 node = onnx.helper.make_node(
-    'MaskedFill',
-    inputs=['value', 'mask'],
-    outputs=['output']
+    'StringECMARegexReplace',
+    inputs=['text', 'pattern', 'rewrite'],
+    outputs=['y'],
 )
 
+text = np.array([['def myfunc():'], ['def dummy():']])
+pattern = np.array([r'def\s+([a-zA-Z_][a-zA-Z_0-9]*)\s*\(\s*\):'])
+rewrite = np.array([r'static PyObject* py_$1(void) {'])
+y = [['static PyObject* py_myfunc(void) {'],
+     ['static PyObject* py_dummy(void) {']]
 
-value = np.array(["a", "b", "c", "d"])
-mask = np.array([True, False, True, False], dtype=bool)
-output = np.array(["a", "c"])
-
-
-expect(node, inputs=[value, mask], outputs=[output],
-       name='test_masked_fill')
+expect(node, inputs=[text, pattern, rewrite], outputs=[y],
+       name='test_string_regex_replace')
 ```
+
 </details>
 
 
-### StringRaggedTensorToDense
 
-TODO
+### StringSplit
 
-### StringMapping
+<details>
+<summary>StringSplit details</summary>
 
-TODO
+Splits each string in the input by a separator, producing a ragged (sparse) representation of the resulting tokens.
 
-## Math operators
+#### Inputs
 
+***input: tensor(string)***
 
-### Inverse
+1D string tensor to split.
 
-TODO 
+***sep: tensor(string)***
 
-### NegPos
+Scalar string separator used to split each element of `input`. If empty, the string is split on whitespace.
 
-TODO
+***skip_empty: tensor(bool)***
 
-### SegmentExtraction
+Scalar boolean. When true, empty substrings are removed from the output.
 
-TODO
+#### Outputs
+
+***indices: tensor(int64)***
+
+2D tensor of shape `[N, 2]` containing `(row, col)` coordinates of each output token in the ragged representation.
+
+***values: tensor(string)***
+
+1D tensor of `N` tokens produced by splitting, in row-major order.
+
+***shape: tensor(int64)***
+
+2-element tensor describing the dense shape `[num_rows, max_row_width]` of the ragged tensor.
+
+</details>
+
+### StringUpper
+
+<details>
+<summary>StringUpper details</summary>
+
+Converts every ASCII character in each string of the input tensor to uppercase using `std::toupper`. This operator is ASCII-only; non-ASCII bytes are passed through unchanged. For full Unicode case folding, pre-process inputs accordingly or use `StringLower` as a reference for Unicode handling.
+
+#### Inputs
+
+***input: tensor(string)***
+
+String tensor of arbitrary shape.
+
+#### Outputs
+
+***output: tensor(string)***
+
+String tensor of the same shape as `input` with uppercased strings.
+
+</details>
+
+### StringLower
+
+<details>
+<summary>StringLower details</summary>
+
+Converts each string in the input tensor to lowercase. Unlike `StringUpper`, this operator decodes input bytes as UTF-8 and performs Unicode-aware case folding on each code point before re-encoding the result.
+
+#### Inputs
+
+***input: tensor(string)***
+
+String tensor of arbitrary shape.
+
+#### Outputs
+
+***output: tensor(string)***
+
+String tensor of the same shape as `input` with lowercased strings.
+
+</details>
+
+### StringStrip
+
+<details>
+<summary>StringStrip details</summary>
+
+Removes leading and trailing whitespace characters from every string in the input tensor. Similar to `str.strip()` in Python.
+
+#### Inputs
+
+***input: tensor(string)***
+
+String tensor of arbitrary shape.
+
+#### Outputs
+
+***output: tensor(string)***
+
+String tensor of the same shape as `input` with whitespace stripped.
+
+</details>
+
+### StringLength
+
+<details>
+<summary>StringLength details</summary>
+
+Get the length of each string element in the input tensor. Similar to the function `len("abcde")` in Python.
+
+#### Inputs 
+
+***input: tensor(string)***
+
+String tensor to get the length of each string element from.
+
+#### Outputs
+
+***output: tensor(int64)***
+
+Data length tensor.
+
+#### Examples
+
+
+```python
+
+node = onnx.helper.make_node(
+    'StringLength',
+    inputs=['x'],
+    outputs=['y']
+)
+
+x = ["abcdef", "hijkl"]
+y = np.array([len(x[0]), len(x[1])], dtype=np.int64)
+
+
+expect(node, inputs=[x], outputs=[y],
+       name='test_string_length')
+```
+</details>
+ 
+### StringConcat 
+
+<details>
+<summary>StringConcat details</summary>
+
+Concat the corresponding string in the two string tensor. Two input tensors should have the same dimension.
+
+```python
+  output = []
+  shape = input1.shape
+  input1 = input1.flatten()
+  input2 = input2.flatten()
+  for i in range(len(input1)):
+      output.append(input1[i] + input2[i])
+  output = np.array(output).reshape(shape)
+```
+
+#### Inputs
+
+***input_1: tensor(string)***
+
+The first string tensor.
+
+***input_2: tensor(string)***
+
+The second string tensor.
+
+
+#### Outputs
+
+***output: tensor(string)***
+
+The result.
+
+#### Examples
+
+
+```python
+
+node = onnx.helper.make_node(
+    'StringConcat',
+    inputs=['x', 'y'],
+    outputs=['result'],
+)
+
+x = np.array(["abcd", "efgh"])
+y = np.array(["wxyz", "stuv"])
+result = np.array([x[0] + y[0], x[1] + y[1]])
+
+expect(node, inputs=[x, y], outputs=[result],
+       name='test_string_concat')
+```
+
+</details>
+
+### StringRegexSplitWithOffsets
+
+<details>
+<summary>StringRegexSplitWithOffsets details</summary>
+
+Splits strings based on regular expressions (RE2 dialect) and reports the byte offsets of each produced token.
+
+#### Inputs
+
+***text: tensor(string)***
+
+String tensor to split.
+
+***delim_regex_pattern: tensor(string)***
+
+Splitting pattern of the regular expression.
+
+***keep_delim_regex_pattern: tensor(string)***
+
+By default, delimiters are not included in the split string results. Delimiters may be included by specifying a regex pattern via `keep_delim_regex_pattern`.
+
+#### Outputs
+
+***tokens: tensor(string)***
+
+1D tensor of tokens produced by splitting, in row-major order.
+
+***begin_offsets: tensor(int64)***
+
+1D tensor with the begin byte offset of each token in the corresponding input string.
+
+***end_offsets: tensor(int64)***
+
+1D tensor with the end byte offset (exclusive) of each token in the corresponding input string.
+
+***row_offsets: tensor(int64)***
+
+1D tensor of row offsets such that tokens of the i-th input string occupy `[row_offsets[i], row_offsets[i+1])` in `tokens`.
+
+#### Examples
+
+
+```python
+
+node = onnx.helper.make_node(
+    'StringRegexSplitWithOffsets',
+    inputs=['text', 'pattern', 'keep_pattern'],
+    outputs=['tokens', 'begin_offsets', 'end_offsets', 'row_offsets'],
+)
+
+text = np.array(["hello there"])
+pattern = np.array([r'\s'])
+keep_pattern = np.array([""])
+tokens = np.array(["hello", "there"])
+begin_offsets = np.array([0, 6], dtype=np.int64)
+end_offsets = np.array([5, 11], dtype=np.int64)
+row_offsets = np.array([0, 2], dtype=np.int64)
+
+expect(node, inputs=[text, pattern, keep_pattern],
+       outputs=[tokens, begin_offsets, end_offsets, row_offsets],
+       name='test_string_regex_split_with_offsets')
+```
+
+</details>
+
+
+### StringECMARegexSplitWithOffsets
+
+<details>
+<summary>StringECMARegexSplitWithOffsets details</summary>
+
+Splits strings using a regular expression in the ECMAScript dialect and reports the byte offsets of every produced token. Provides the same functionality as `StringRegexSplitWithOffsets` but uses `std::regex` instead of `re2`, allowing ECMAScript regex features.
+
+#### Inputs
+
+***input: tensor(string)***
+
+String tensor to split.
+
+***pattern: tensor(string)***
+
+Scalar string containing the ECMAScript regex splitting pattern.
+
+***keep_pattern: tensor(string)***
+
+Scalar string. Delimiter matches that also match this pattern are preserved as tokens in the output. Pass an empty string to drop all delimiters.
+
+#### Attributes
+
+***ignore_case: int64_t*** (default is 0)
+
+When set to 1 the regex is matched case-insensitively.
+
+#### Outputs
+
+***tokens: tensor(string)***
+
+1D tensor containing the split tokens.
+
+***begin_offsets: tensor(int64)***
+
+1D tensor with the begin byte offset of each token in the corresponding input string.
+
+***end_offsets: tensor(int64)***
+
+1D tensor with the end byte offset (exclusive) of each token in the corresponding input string.
+
+***row_offsets: tensor(int64)***
+
+1D tensor of row offsets such that tokens of the i-th input string occupy `[row_offsets[i], row_offsets[i+1])` in `tokens`.
+
+</details>
+
+### VectorToString
+
+<details>
+<summary>VectorToString details</summary>
+
+VectorToString is the contrary operation to the `StringToVector` , they share same format of mapping table:
+
+    <string>\t<scalar_1>\s<scalar_2>\s<scalar_3>...<scalar_n>
+
+Unmapped vector will output the value of the attribute `unk`.
+
+Example:
+
+*Attributes:*
+
+- `map`: 
+  ```
+  a   0 0 1 2
+  b   0 1 2 3
+  d   0 1 3 4
+  ```
+
+- `unk`: "unknown_word"
+
+*Inputs:*
+- data: [[0,0,1,2],[0,1,3,4],[0,0,0,0]]
+
+*Ouputs:*
+- output: ["a", "d", "unknown_word" ]
+
+#### Attributes
+
+***mapping_file_name***
+
+the formative mapping table
+
+***unmapping_value***
+
+the result returned when a vector aren't found in the map
+
+#### Inputs
+
+***data: tensor(T)***
+
+Input tensor
+
+#### Outputs
+
+***output: tensor(string)***
+
+The mapping result of the input
+
+#### Type Constraints
+***T:tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(bfloat16), tensor(float16), tensor(float), tensor(double), tensor(bool)***
+
+Constrain input and output types to numerical tensors.
+
+
+#### Examples
+
+
+```python
+mapping_table = \
+  """
+  a   0 0 1 2
+  b   0 1 2 3
+  d   0 1 3 4
+  """
+
+node = onnx.helper.make_node(
+    'VectorToString',
+    inputs=['x'],
+    outputs=['y'],
+    map=mapping_table,
+    unk="unknown_word"
+)
+
+
+x = np.array([[0,0,1,2],[0,1,3,4],[0,0,0,0]], type=np.int64)
+y = ["a", "d", "unknown_word"]
+
+
+expect(node, inputs=[x], outputs=[y],
+       name='test_vector_to_string')
+```
+</details>
+
+
+### StringToVector
+
+<details>
+<summary>StringToVector details</summary>
+
+StringToVector will map each string element in the input to the corresponding vector according to the mapping file. The mapping file is a utf-8 encoding text file in tsv format:
+
+    <string>\t<scalar_1>\s<scalar_2>\s<scalar_3>...<scalar_n>
+
+Unmapped string will output the value of the attribute `unmapping_value`.
+
+Example:
+
+*Attributes:*
+
+- `mapping_file_name`: vocabulary.txt
+  ```
+  a   0 0 1 2
+  b   0 1 2 3
+  d   0 1 3 4
+  ```
+  
+- `unmapping_value`: [0 0 0 0]
+
+*Inputs:*
+- data: ["a", "d", "e"]
+
+*Ouputs:*
+- output: [[0,0,1,2],[0,1,3,4],[0,0,0,0]]
+
+#### Attributes
+
+***mapping_file_name:string***
+
+The name of your string to vector mapping file.
+
+***unmapping_value:list(int)***
+
+Mapping result for unmapped string
+
+#### Inputs
+
+***data: tensor(string)***
+
+Input tensor
+
+#### Outputs
+
+***output: tensor(T)***
+
+The mapping result of the input
+
+#### Type Constraints
+***T:tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(bfloat16), tensor(float16), tensor(float), tensor(double), tensor(bool)***
+
+Constrain input and output types to numerical tensors.
+
+#### Examples
+
+
+```python
+# what's in vocabulary.txt
+
+mapping_table = \
+"""
+a   0 0 1 2
+b   0 1 2 3
+d   0 1 3 4
+"""
+
+node = onnx.helper.make_node(
+    'StringToVector',
+    inputs=['x'],
+    outputs=['y'],
+    mapping_table=mapping_table,
+    unmapping_value=[0,0,0,0]
+)
+
+
+x = ["a", "d", "e"]
+y = np.array([[0,0,1,2],[0,1,3,4],[0,0,0,0]], type=np.int64)
+
+
+expect(node, inputs=[x], outputs=[y],
+       name='test_string_to_vector')
+```
+
+</details>
+
+
+
+### StringSlice 
+
+<details>
+<summary>StringSlice details</summary>
+
+Do the slice operation to each string element in input tensor. Similar to string slice in python
+
+```python
+a = "abcdef"
+b = a[1:2]
+c = a[3:1:-1]
+```
+
+#### Inputs
+
+***data: tensor(string)***
+
+String tensor to extract slices from.
+
+***starts: tensor(int64/int32)***
+
+The tensor of starting indices of corresponding string in data, which has same dimension of data.
+
+***ends: tensor(int64/int32)***
+
+The tensor of ending indices of corresponding string in data, which has same dimension of data.
+
+***steps(optional): tensor(int64/int32)***
+
+The tensor of slice step of corresponding string in data, which has same dimension of data.If steps is empty tensor, we will use default value 1 for each string
+
+#### Outputs
+
+***output: tensor(string)***
+
+Sliced data tensor.
+
+#### Examples
+
+
+```python
+
+node = onnx.helper.make_node(
+    'StringSlice',
+    inputs=['x', 'starts', 'ends', 'steps'],
+    outputs=['y'],
+)
+
+x = np.array(["abcdef", "hijkl"])
+y = np.array([x[0][1:3:1], x[1][3:1:-1]])
+starts = np.array([1, 3], dtype=np.int64)
+ends = np.array([3, 1], dtype=np.int64)
+axes = np.array([0, 1], dtype=np.int64)
+steps = np.array([1, 1], dtype=np.int64)
+
+expect(node, inputs=[x, starts, ends, axes, steps], outputs=[y],
+       name='test_string_slice')
+```
+
+</details>
+
+
+### MaskedFill
+
+<details>
+<summary>MaskedFill details</summary>
+
+
+Fills elements of self tensor with value where mask is True. The operator is similar with [`Tensor.masked_fill_`](https://pytorch.org/docs/stable/generated/torch.Tensor.masked_fill_.html#torch.Tensor.masked_fill_) in pytorch.
+
+
+#### Inputs
+
+***value: tensor(string)***
+
+The value to fill in with, currently we only support string type and vector&scalar dimension.
+
+***mask: tensor(bool)***
+
+The boolean mask, the dimension of mask tensor should be same with value.
+
+#### Outputs
+
+***output: tensor(string)***
+
+The filled output of input tensor.
+
+
+#### Examples
+
+
+```python
+
+node = onnx.helper.make_node(
+    'MaskedFill',
+    inputs=['value', 'mask'],
+    outputs=['output']
+)
+
+
+value = np.array(["a", "b", "c", "d"])
+mask = np.array([True, False, True, False], dtype=bool)
+output = np.array(["a", "c"])
+
+
+expect(node, inputs=[value, mask], outputs=[output],
+       name='test_masked_fill')
+```
+</details>
+
+
+### StringRaggedTensorToDense
+
+<details>
+<summary>StringRaggedTensorToDense details</summary>
+
+Converts a ragged string tensor to a dense 2D string tensor, padding shorter rows with a fill value.
+
+#### Inputs
+
+***row_splits: tensor(int64)***
+
+1D tensor with the starting position of each row in `values`. Row `i` contains `values[row_splits[i]:row_splits[i+1]]`.
+
+***values: tensor(string)***
+
+1D flat string tensor holding the concatenated row values.
+
+***default_value_shape: tensor(int64)***
+
+1D tensor describing the target dense shape. Only used to determine the number of columns.
+
+***default_value: tensor(string)***
+
+Scalar string used to pad rows that are shorter than the longest row.
+
+#### Outputs
+
+***output: tensor(string)***
+
+2D dense string tensor with padding applied.
+
+</details>
+
+### StringMapping
+
+<details>
+<summary>StringMapping details</summary>
+
+Maps each element of the input string tensor to another string using a user-supplied dictionary. Strings not found in the dictionary are passed through unchanged.
+
+#### Attributes
+
+***map: string***
+
+A string containing one mapping per line. Each line has the form `key\tvalue`, where key and value are separated by a tab character.
+
+#### Inputs
+
+***input: tensor(string)***
+
+Input string tensor of arbitrary shape.
+
+#### Outputs
+
+***output: tensor(string)***
+
+Output string tensor of the same shape as `input` after mapping.
+
+</details>
+
+## Math operators
+
+
+### Inverse
+
+<details>
+<summary>Inverse details</summary>
+
+Computes the matrix inverse of a 2D floating-point tensor.
+
+#### Inputs
+
+***input: tensor(float)***
+
+A 2D square matrix of shape `[N, N]`.
+
+#### Outputs
+
+***output: tensor(float)***
+
+The inverse of the input matrix, of shape `[N, N]`.
+
+</details>
+
+### NegPos
+
+<details>
+<summary>NegPos details</summary>
+
+Splits an input tensor into its negative and positive parts. Equivalent to `min(x, 0)` and `max(x, 0)` returned separately.
+
+#### Inputs
+
+***input: tensor(float)***
+
+Input tensor of arbitrary shape.
+
+#### Outputs
+
+***neg: tensor(float)***
+
+Tensor with the same shape as `input`; contains `x` where `x < 0`, else `0`.
+
+***pos: tensor(float)***
+
+Tensor with the same shape as `input`; contains `x` where `x >= 0`, else `0`.
+
+</details>
+
+### SegmentExtraction
+
+<details>
+<summary>SegmentExtraction details</summary>
+
+Extracts contiguous non-zero segments from a 1D integer input. For every maximal run of non-zero values, the start and end positions are returned.
+
+#### Inputs
+
+***input: tensor(int64)***
+
+1D input tensor.
+
+#### Outputs
+
+***position: tensor(int64)***
+
+2D tensor of shape `[num_segments, 2]` where each row is `(begin, end)` (end exclusive).
+
+***value: tensor(int64)***
+
+1D tensor of length `num_segments` with the value inside each segment.
+
+</details>
 
 ### SegmentSum
 
-TODO
+<details>
+<summary>SegmentSum details</summary>
+
+Computes sums along segments of the first axis of a tensor, similar to TensorFlow's `tf.math.segment_sum`.
+
+#### Inputs
+
+***data: tensor(float)***
+
+The values to reduce. The first dimension is the segment axis.
+
+***segment_ids: tensor(int64)***
+
+1D tensor with the same length as `data.shape[0]`. Must be non-decreasing.
+
+#### Outputs
+
+***output: tensor(float)***
+
+Tensor where `output[i]` is the sum of all rows of `data` whose corresponding `segment_ids` equal `i`.
+
+</details>
+
+### StftNorm
+
+<details>
+<summary>StftNorm details</summary>
+
+Computes a short-time Fourier transform (STFT) of a 1D signal and returns the magnitude spectrogram. The implementation uses a Hann-style sliding window.
+
+#### Attributes
+
+***onesided: int64_t*** (default is 1)
+
+If 1, only the non-redundant positive-frequency half of the spectrum is returned (length `n_fft / 2 + 1`). If 0, the full spectrum is returned.
+
+#### Inputs
+
+***pcm: tensor(float)***
+
+1D audio signal.
+
+***n_fft: tensor(int64)***
+
+Scalar FFT size.
+
+***hop_length: tensor(int64)***
+
+Scalar hop length between consecutive frames.
+
+***window: tensor(float)***
+
+1D window function of length `frame_length`.
+
+***frame_length: tensor(int64)***
+
+Scalar frame length (must equal `n_fft`).
+
+#### Outputs
+
+***output: tensor(float)***
+
+3D tensor of shape `[1, num_frames, num_freq_bins]` containing the magnitude spectrogram.
+
+</details>
+
+### SplitSignalSegments
+
+<details>
+<summary>SplitSignalSegments details</summary>
+
+Partitions an audio signal into segments of voiced/high-energy regions based on a simple short-time energy threshold.
+
+#### Inputs
+
+***input: tensor(float)***
+
+1D audio signal.
+
+***sr: tensor(int64)***
+
+Scalar sample rate in Hz.
+
+***frame_ms: tensor(int64)***
+
+Scalar analysis frame length in milliseconds.
+
+***hop_ms: tensor(int64)***
+
+Scalar hop length between analysis frames in milliseconds.
+
+***energy_threshold_db: tensor(float)***
+
+Scalar energy threshold in dBFS. Frames with average energy below this are treated as silence.
+
+#### Outputs
+
+***segments: tensor(int64)***
+
+2D tensor of shape `[num_segments, 2]` where each row contains the `(begin_sample, end_sample)` indices of a detected segment.
+
+</details>
+
+### MergeSignalSegments
+
+<details>
+<summary>MergeSignalSegments details</summary>
+
+Merges adjacent audio segments whose gap is shorter than a configurable threshold. Typically used as a post-processing step after `SplitSignalSegments`.
+
+#### Inputs
+
+***segments: tensor(int64)***
+
+2D tensor of shape `[N, 2]` with `(begin, end)` indices, as produced by `SplitSignalSegments`.
+
+***merge_gap_ms: tensor(int64)***
+
+Scalar gap threshold in milliseconds. Segments separated by less than this value are merged.
+
+#### Outputs
+
+***output: tensor(int64)***
+
+2D tensor of shape `[M, 2]` (M <= N) of the merged segment boundaries.
+
+</details>
+
+## Tensor operators
+
+### RaggedTensorToSparse
+
+<details>
+<summary>RaggedTensorToSparse details</summary>
+
+Converts a ragged tensor's row lengths to a COO-style sparse indexing representation.
+
+#### Inputs
+
+***n_element: tensor(int64)***
+
+1D tensor holding the number of elements in each row.
+
+#### Outputs
+
+***output_0: tensor(int64)***
+
+2D tensor of `(row, col)` indices for every element.
+
+***output_1: tensor(int64)***
+
+1D tensor of length 2 containing the dense shape `[num_rows, max_row_width]`.
+
+</details>
+
+### RaggedTensorToDense
+
+<details>
+<summary>RaggedTensorToDense details</summary>
+
+Converts a ragged int64 tensor to a dense 2D tensor, padding shorter rows with a configurable value.
+
+#### Attributes
+
+***missing_value: int64_t*** (default is -1)
+
+Value used to pad short rows.
+
+#### Inputs
+
+***input0: tensor(int64)***
+
+1D row-splits tensor indicating the start index of each row within `input3`.
+
+***input1: tensor(int64)***
+
+1D tensor of flat indices (unused by some consumers; reserved).
+
+***input2: tensor(int64)***
+
+1D tensor of length 2 describing the target dense shape `[num_rows, max_row_width]`.
+
+***input3: tensor(int64)***
+
+1D flat values tensor.
+
+#### Outputs
+
+***output: tensor(int64)***
+
+2D dense tensor with missing elements filled by `missing_value`.
+
+</details>
+
+## Audio operators
+
+### AudioDecoder
+
+<details>
+<summary>AudioDecoder details</summary>
+
+Decodes a byte stream containing an encoded audio file (WAV, MP3, or FLAC) into a float PCM tensor. Optionally resamples the audio to a target sample rate.
+
+#### Attributes
+
+***downsampling_rate: int64_t*** (default is -1)
+
+Target sample rate to resample the decoded audio to. When -1, the native sample rate of the decoded stream is used.
+
+***stereo_to_mono: int64_t*** (default is 0)
+
+If set to 1, multi-channel audio is mixed down to a single mono channel.
+
+#### Inputs
+
+***input: tensor(uint8)***
+
+1D tensor of raw bytes representing the encoded audio file.
+
+***format: tensor(string)*** (optional)
+
+Scalar describing the container format. Accepted values: `"wav"`, `"mp3"`, `"flac"`. When absent the format is detected from the file header.
+
+#### Outputs
+
+***output: tensor(float)***
+
+2D tensor of shape `[1, num_samples]` with the decoded (and optionally resampled) PCM samples in the range `[-1, 1]`.
+
+</details>
+
+## Vision operators
+
+### DecodeImage
+
+<details>
+<summary>DecodeImage details</summary>
+
+Decodes an encoded image (PNG, JPEG, BMP, TIFF, …) into an `HxWx3` uint8 tensor.
+
+#### Attributes
+
+***color_space: string*** (default is "bgr")
+
+Color ordering of the output. Valid values are `"rgb"` and `"bgr"` (case-insensitive).
+
+#### Inputs
+
+***input: tensor(uint8)***
+
+1D tensor containing the raw encoded image bytes.
+
+#### Outputs
+
+***output: tensor(uint8)***
+
+3D tensor of shape `[H, W, 3]`.
+
+</details>
+
+### EncodeImage
+
+<details>
+<summary>EncodeImage details</summary>
+
+Encodes a 3-channel `HxWx3` uint8 image tensor to image bytes.
+
+#### Attributes
+
+***format: string*** (default is "png")
+
+Output image format. Valid values are `"png"` and `"jpg"`.
+
+***color_space: string*** (default is "bgr")
+
+Color space / channel order of the input image. Supported values are `"bgr"` and `"rgb"`.
+
+#### Inputs
+
+***input: tensor(uint8)***
+
+3D tensor of shape `[H, W, 3]`. The expected channel order depends on `color_space`: BGR for `"bgr"` and RGB for `"rgb"`.
+
+#### Outputs
+
+***output: tensor(uint8)***
 
-## Tensor operators
+1D tensor of encoded image bytes.
 
-### RaggedTensorToSparse
+</details>
 
-TODO
+### DrawBoundingBoxes
 
-### RaggedTensorToDense
+<details>
+<summary>DrawBoundingBoxes details</summary>
+
+Draws bounding boxes on a BGR image tensor.
+
+#### Attributes
+
+***thickness: int64_t*** (default is 4)
+
+Line thickness of the drawn rectangles, in pixels.
+
+***num_classes: int64_t*** (default is 10)
+
+Number of class colors to cycle through.
+
+***mode: string*** (default is "XYXY")
+
+Interpretation of the box coordinates. One of `"XYXY"`, `"XYWH"`, or `"CENTER_XYWH"`.
+
+***colour_by_classes: int64_t*** (default is 1)
+
+When 1, boxes of the same class share a colour. When 0, each box gets a unique colour from the palette.
+
+#### Inputs
+
+***image: tensor(uint8)***
+
+3D tensor of shape `[H, W, 3]` in BGR order.
+
+***boxes: tensor(float)***
+
+2D tensor of shape `[N, 6]`. Each row is `(class_id, score, x0, y0, x1, y1)` (or equivalent depending on `mode`).
+
+#### Outputs
+
+***output: tensor(uint8)***
+
+Image tensor with boxes drawn, same shape as `image`.
+
+</details>
+
+### GaussianBlur
+
+<details>
+<summary>GaussianBlur details</summary>
+
+Applies a 2D Gaussian blur to an image tensor using OpenCV's `cv::GaussianBlur`. The current kernel wraps the input buffer as a single `CV_32FC3` matrix, so inputs must have `N == 1` and `C == 3` channels.
+
+#### Inputs
+
+***input: tensor(float)***
+
+4D image tensor of shape `[1, H, W, 3]`.
+
+***ksize: tensor(int64)***
+
+1D tensor of length 2 specifying the kernel size `[kx, ky]` (odd positive integers).
+
+***sigma: tensor(double)***
+
+1D tensor of length 2 specifying the Gaussian standard deviation along X and Y.
+
+#### Outputs
+
+***output: tensor(float)***
+
+Blurred tensor with the same shape as `input`.
+
+</details>
+
+### ImageDecoder
+
+<details>
+<summary>ImageDecoder details</summary>
+
+Decodes raw encoded image bytes using OpenCV's `cv::imdecode`. Similar to `DecodeImage` but always returns BGR and does not expose a color-space attribute.
+
+#### Inputs
+
+***input: tensor(uint8)***
+
+1D tensor of encoded image bytes.
+
+#### Outputs
+
+***output: tensor(uint8)***
+
+3D tensor of shape `[H, W, C]` containing the decoded BGR image.
+
+</details>
+
+### ImageReader
+
+<details>
+<summary>ImageReader details</summary>
+
+Reads an image from a file path using OpenCV's `cv::imread` and returns the decoded tensor.
+
+#### Inputs
+
+***input: tensor(string)***
+
+1D string tensor of shape `[1]` containing the path of the image file to read.
+
+#### Outputs
+
+***output: tensor(uint8)***
+
+4D tensor of shape `[1, H, W, C]` containing the decoded BGR image.
+
+</details>
+
+## CUDA operators
+
+The following operators execute on CUDA devices only. They are only registered when the library is built with `USE_CUDA`. Unless otherwise noted each op supports `float`, `float16` (`MFloat16`), and in some cases `bfloat16` (`BFloat16`).
+
+### FastGelu
+
+<details>
+<summary>FastGelu details</summary>
+
+Fused CUDA kernel computing `gelu(x + bias)` using the fast tanh-based approximation.
+
+#### Inputs
+
+***input: tensor(T)***
+
+Input tensor of any shape. `T` is one of `float`, `float16`, `bfloat16`.
+
+***bias: tensor(T)*** (optional)
+
+Bias added elementwise before applying Gelu. Broadcast to the shape of `input`.
+
+#### Outputs
+
+***output: tensor(T)***
+
+Same shape as `input`.
+
+</details>
+
+### MulSigmoid
+
+<details>
+<summary>MulSigmoid details</summary>
+
+Computes `x * sigmoid(x)` (the SiLU / Swish activation) in a single fused CUDA kernel.
+
+#### Inputs
+
+***input: tensor(T)***
+
+Input tensor. `T` is one of `float`, `float16`, `bfloat16`.
+
+#### Outputs
+
+***output: tensor(T)***
+
+Same shape as `input`.
+
+</details>
+
+### MulMulSigmoid
+
+<details>
+<summary>MulMulSigmoid details</summary>
+
+Computes `x * y * sigmoid(y)` in a single fused CUDA kernel. Tensors must have the same shape.
+
+#### Inputs
+
+***x: tensor(T)***, ***y: tensor(T)***
+
+`T` is one of `float`, `float16`, `bfloat16`.
+
+#### Outputs
+
+***output: tensor(T)***
+
+Tensor with the same shape as the inputs.
+
+</details>
+
+### NegXPlus1
+
+<details>
+<summary>NegXPlus1 details</summary>
 
-TODO
+Computes `1 - x` elementwise on CUDA.
+
+#### Inputs
+
+***input: tensor(T)***
+
+`T` is one of `float`, `float16`, `bfloat16`.
+
+#### Outputs
+
+***output: tensor(T)***
+
+Same shape as `input`.
+
+</details>
+
+### ReplaceZero
+
+<details>
+<summary>ReplaceZero details</summary>
+
+Replaces every zero element of the input with a scalar value.
+
+#### Attributes
+
+***by: float*** (default is 0.0)
+
+Replacement value for zero entries.
+
+#### Inputs
+
+***input: tensor(T)***
+
+`T` is one of `float`, `float16`, `bfloat16`.
+
+#### Outputs
+
+***output: tensor(T)***
+
+Same shape as `input`.
+
+</details>
+
+### AddSharedInput
+
+<details>
+<summary>AddSharedInput details</summary>
+
+Computes `A + B` and `A + C` in one kernel launch, sharing the read of `A`.
+
+#### Inputs
+
+***A: tensor(T)***, ***B: tensor(T)***, ***C: tensor(T)***
+
+`T` is one of `float`, `float16`, `bfloat16`. `B` and `C` must have the same shape as `A`.
+
+#### Outputs
+
+***AB: tensor(T)***, ***AC: tensor(T)***
+
+Elementwise sums `A + B` and `A + C`.
+
+</details>
+
+### MulSharedInput
+
+<details>
+<summary>MulSharedInput details</summary>
+
+Computes `A * B` and `A * C` in one kernel launch, sharing the read of `A`.
+
+#### Inputs
+
+***A: tensor(T)***, ***B: tensor(T)***, ***C: tensor(T)***
+
+`T` is one of `float`, `float16`, `bfloat16`.
+
+#### Outputs
+
+***AB: tensor(T)***, ***AC: tensor(T)***
+
+Elementwise products `A * B` and `A * C`.
+
+</details>
+
+### ScatterNDOfShape
+
+<details>
+<summary>ScatterNDOfShape details</summary>
+
+Allocates a zero tensor of the given shape and applies a `ScatterND` reduction. Equivalent to `ScatterND(ConstantOfShape(shape, 0), indices, updates, reduction=...)` but fused.
+
+#### Attributes
+
+***reduction: string*** (default is "add")
+
+Reduction to apply to scattered updates. One of `"add"`, `"mul"`, `"min"`, `"max"`.
+
+#### Inputs
+
+***shape: tensor(int64)***
+
+1D tensor describing the output shape. Must live on CPU.
+
+***indices: tensor(int64)***
+
+Indices into the output, as in standard ScatterND.
+
+***updates: tensor(T)***
+
+Values to scatter. `T` is one of `float`, `float16`, `bfloat16`.
+
+#### Outputs
+
+***output: tensor(T)***
+
+Tensor of the requested shape with updates applied.
+
+</details>
+
+### MaskedScatterNDOfShape
+
+<details>
+<summary>MaskedScatterNDOfShape details</summary>
+
+Variant of `ScatterNDOfShape` that ignores entries of `indices` equal to a configurable mask value.
+
+#### Attributes
+
+***reduction: string*** (default is "add")
+
+Same as `ScatterNDOfShape`.
+
+***maskedValue: int64_t***
+
+Index value that causes the corresponding update to be skipped.
+
+#### Inputs
+
+Same as `ScatterNDOfShape`.
+
+#### Outputs
+
+Same as `ScatterNDOfShape`.
+
+</details>
+
+### Transpose2DCastFP16
+
+<details>
+<summary>Transpose2DCastFP16 details</summary>
+
+Fused 2D transpose + cast from `float` to `float16`.
+
+#### Inputs
+
+***input: tensor(float)***
+
+2D tensor of shape `[M, N]`.
+
+#### Outputs
+
+***output: tensor(float16)***
+
+2D tensor of shape `[N, M]`.
+
+</details>
+
+### Transpose2DCastFP32
+
+<details>
+<summary>Transpose2DCastFP32 details</summary>
+
+Fused 2D transpose + cast from `float16` to `float`.
+
+#### Inputs
+
+***input: tensor(float16)***
+
+2D tensor of shape `[M, N]`.
+
+#### Outputs
+
+***output: tensor(float)***
+
+2D tensor of shape `[N, M]`.
+
+</details>
 
 ### Template