diff --git a/docs/custom_ops.md b/docs/custom_ops.md
index cacbaba47..7bf8b026e 100644
--- a/docs/custom_ops.md
+++ b/docs/custom_ops.md
@@ -531,674 +531,621 @@ expect(node, inputs=[inputs],
-## String operators
-
-### StringEqual
+### CLIPTokenizer
-StringEqual details
+CLIPTokenizer details
-Compares two strings and returns true if they are equal and false if not.
+Byte-pair-encoding (BPE) tokenizer matching the CLIP text encoder from HuggingFace/OpenAI. Converts input strings into token id sequences.
-#### Inputs
+#### Attributes
-***x: tensor(string)***
+***vocab: string***
-The first string input
+JSON vocabulary mapping tokens to ids (contents of `vocab.json`).
-***x: tensor(string)***
+***merges: string***
-The second string input
+Merge rules (contents of `merges.txt`).
+
+***padding_length: int64_t*** (default is -1)
+
+If positive, the output is right-padded (or truncated) to this length. When -1, the output is padded to the maximum sequence length in the batch; the operator still returns a dense tensor with a dynamic second dimension.
+
+#### Inputs
+
+***input: tensor(string)***
+
+1D string tensor containing the input texts.
#### Outputs
-***z: tensor(boolean)***
+***input_ids: tensor(int64)***
+
+Tensor of token ids.
-String with replacements.
+***attention_mask: tensor(int64)*** (optional)
+
+Mask with the same shape as `input_ids` (1 for real tokens, 0 for padding).
+
+***offset_mapping: tensor(int64)*** (optional)
+
+If requested, per-token `(begin, end)` byte offsets into the corresponding input string.
-### StringHash
+### RobertaTokenizer
-StringHash details
+RobertaTokenizer details
+BPE tokenizer compatible with HuggingFace's RoBERTa tokenizer. Uses the same attributes and I/O contract as `CLIPTokenizer`.
-Hashes the input string based on the number of buckets
+#### Attributes
-#### Inputs
+***vocab: string***
-***input: tensor(string)***
+JSON vocabulary (contents of `vocab.json`).
-The string to hash
+***merges: string***
-***num_buckets: tensor(int64)***
+BPE merge rules (contents of `merges.txt`).
-The number of buckets (must be equal to 1?)
+***padding_length: int64_t*** (default is -1)
-#### Outputs
+See `CLIPTokenizer`.
-***name: tensor(int64)***
+#### Inputs
-The hash value of the string
+***input: tensor(string)***
-
+1D string tensor of input texts.
+#### Outputs
-### StringHashFast
+***input_ids: tensor(int64)***
-
-StringHashFast details
+Token ids.
+***attention_mask: tensor(int64)*** (optional)
-A faster implementation of StringHash.
+Attention mask, same shape as `input_ids`.
-
+***offset_mapping: tensor(int64)*** (optional)
+Per-token byte offsets into each input string.
-### StringJoin
+
-
-StringJoin details
+### SpmTokenizer
-Join an array of strings
+
+SpmTokenizer details
-#### Inputs
+SentencePiece-compatible tokenizer built on top of the shared BPE kernel. Produces tokens equivalent to HuggingFace's "fast" SentencePiece tokenizers (e.g. Llama, T5, XLM-RoBERTa).
-***input_X: tensor(string)***
+#### Attributes
-The input array of strings
+***vocab: string***
-***input_sep: tensor(string)***
+JSON vocabulary produced from a SentencePiece model.
-The string separator for the resulting joing
+***merges: string***
-***input_axis: tensor(int64)***
+SentencePiece merge rules.
-The axis along which to joing
+***padding_length: int64_t*** (default is -1)
-#### Outputs
+See `CLIPTokenizer`.
-***out: tensor(string)***
+#### Inputs
-The resulting joined string
+***input: tensor(string)***
-#### Examples
+1D string tensor of inputs.
+#### Outputs
-```bash
+***input_ids: tensor(int64)***
-input_X = [["a", "b", "c"], ["aa", "bb", ""]]
-input_sep=";"
-input_axis = 1
+Tensor of token ids.
-out = ["a;b;c", "aa;bb;"]
+***attention_mask: tensor(int64)*** (optional)
-input_axis = 0
+Attention mask with the same shape as `input_ids`.
-out = ['a;aa', 'b;bb', 'c;']
+***offset_mapping: tensor(int64)*** (optional)
+Per-token byte offsets.
-### StringRegexReplace
+### HfBertTokenizer
-StringRegexReplace details
+HfBertTokenizer details
+HuggingFace-compatible BERT WordPiece tokenizer. Behaves like `BertTokenizer`'s `__call__` method but with a smaller attribute surface. Produces ids, attention masks and token type ids in a single op.
-String replacement based on [Re2-format](https://github.com/google/re2/wiki/Syntax) regular expressions.
-
-#### Inputs
+#### Attributes
-***text: tensor(string)***
+***vocab_file: string***
-String tensor to extract slices from.
+Contents of `vocab.txt`.
-***pattern: tensor(string)***
+***do_lower_case: int64_t*** (default is 1)
-Pattern of the regular expression.
+Lowercase inputs before tokenization.
-***rewrite: tensor(string)***
+***strip_accents: int64_t*** (default is 0)
-Replacement.
+Strip accents as part of normalization.
-#### Attributes
+#### Inputs
-***global_replace: int64*** (default is 1)
+***input: tensor(string)***
-Replace all strings matching the pattern or the first one.
+1D string tensor containing the texts to tokenize.
#### Outputs
-***output: tensor(string)***
+***input_ids: tensor(int64)***
-String with replacements.
+Token ids.
-#### Examples
+***attention_mask: tensor(int64)***
-```python
+Attention mask, same shape as `input_ids`.
-node = onnx.helper.make_node(
- 'StringRegexReplace',
- inputs=['text', 'pattern', 'rewrite'],
- outputs=['y'],
-)
+***token_type_ids: tensor(int64)*** (optional)
-text = np.array([['def myfunc():'], ['def dummy():']])
-pattern = np.array([r'def\s+([a-zA-Z_][a-zA-Z_0-9]*)\s*\(\s*\):'])
-rewrite = np.array([r'static PyObject* py_\1(void) {'])
-y = [['static PyObject* py_myfunc(void) {'],
- ['static PyObject* py_dummy(void) {']]
+Segment ids. All zero for single-sentence input.
-expect(node, inputs=[text, pattern, rewrite], outputs=[y],
- name='test_string_regex_replace')
-```
+***offset_mapping: tensor(int64)*** (optional)
-
+Per-token `(begin, end)` byte offsets into the corresponding input string.
-### StringECMARegexReplace
+
-
-StringECMARegexReplace details
-String replacement based on [ECMA-format](https://en.cppreference.com/w/cpp/regex/ecmascript) regular expressions.
+### HfJsonTokenizer
-#### Inputs
+
+HfJsonTokenizer details
-***text: tensor(string)***
+Loads a HuggingFace `tokenizer.json` directly and dispatches to the appropriate kernel (BPE or Unigram). Matches HuggingFace fast tokenizers at inference time.
-String tensor to extract slices from.
+#### Attributes
-***pattern: tensor(string)***
+***tokenizer_config: string***
-Pattern of the regular expression.
+Contents of `tokenizer.json` (and optionally `tokenizer_config.json`).
-***rewrite: tensor(string)***
+***tokenizer_vocab: string*** (optional)
-Replacement.
+Additional vocabulary data when the tokenizer uses an external vocab file.
-#### Attributes
+#### Inputs
-***global_replace: int64*** (default is 1)
+***input: tensor(string)***
-Replace all strings matching the pattern or the first one.
+1D string tensor of inputs.
+#### Outputs
-***ignore_case: int64*** (default is 0)
+***input_ids: tensor(int64)***
-Replace
+Token ids.
-#### Outputs
+***attention_mask: tensor(int64)*** (optional)
-***output: tensor(string)***
+Attention mask matching `input_ids`.
-String with replacements.
+***offset_mapping: tensor(int64)*** (optional)
-#### Examples
+Per-token byte offsets.
+
-```python
-node = onnx.helper.make_node(
- 'StringRegexReplace',
- inputs=['text', 'pattern', 'rewrite'],
- outputs=['y'],
-)
+### SentencepieceDecoder
-text = np.array([['def myfunc():'], ['def dummy():']])
-pattern = np.array([r'def\s+([a-zA-Z_][a-zA-Z_0-9]*)\s*\(\s*\):'])
-rewrite = np.array([r'static PyObject* py_$1(void) {'])
-y = [['static PyObject* py_myfunc(void) {'],
- ['static PyObject* py_dummy(void) {']]
+
+SentencepieceDecoder details
-expect(node, inputs=[text, pattern, rewrite], outputs=[y],
- name='test_string_regex_replace')
-```
+Decodes a sequence of SentencePiece ids back into a string.
-
+#### Attributes
+***model: string***
+Serialized SentencePiece model (`*.model`).
-### StringSplit
+#### Inputs
-TODO
+***ids: tensor(int64)***
-### StringUpper
+1D or 2D tensor of ids. When 2D the leading dimension must be 1.
-TODO
+***fairseq: tensor(bool)*** (optional)
-### StringLower
+Scalar flag. When true the `fairseq` vocab-id offset convention is applied.
-TODO
+#### Outputs
-### StringLength
+***output: tensor(string)***
-
-StringECMARegexReplace details
+1D tensor with one string element containing the decoded text.
-Get the length of each string element in input tensor. Similar to the function `len("abcde"")` in python.
+
-#### Inputs
-***data: tensor(string)***
+### BpeDecoder
-String tensor to get length of its each string element.
+
+BpeDecoder details
-#### Outputs
+Decodes BPE token ids (GPT-2 / CLIP / RoBERTa style) back into text.
-***output: tensor(int64)***
+#### Attributes
-Data length tensor.
+***id_vocab: string***
-#### Examples
+Newline-separated token strings indexed by id.
+***byte_decoder: string***
-```python
+Reverse byte-to-unicode mapping used by GPT-2 BPE encoders.
-node = onnx.helper.make_node(
- 'StringLength',
- inputs=['x'],
- outputs=['y']
-)
+***added_tokens: string*** (optional)
-x = ["abcdef", "hijkl"]
-y = np.array([len(x[0]), len(x[1])], dtype=np.int64)
+Extra tokens appended to the base vocabulary.
+***all_special_ids: string*** (optional)
-expect(node, inputs=[x], outputs=[y],
- name='test_string_length')
-```
-
-
-### StringConcat
+Comma-separated list of special token ids.
-
-StringConcat details
+***skip_special_tokens: int64_t*** (default is 0)
-Concat the corresponding string in the two string tensor. Two input tensors should have the same dimension.
+When 1, ids in `all_special_ids` are skipped during decoding.
-```python
- output = []
- shape = input1.shape
- input1 = input1.flatten()
- input2 = input2.flatten()
- for i in range(len(input1)):
- output.append(input1[i] + input2[i])
- output = np.array(output).reshape(shape)
-```
+***en_normalization: int64_t*** (default is 0)
-#### Inputs
+Apply a minimal English-oriented post-processing step (e.g. undo leading-space markers).
-***input_1: tensor(string)***
+***whitespace_token: string*** (optional)
+***bos_token: string*** (optional)
+***eos_token: string*** (optional)
+***unk_token: string*** (optional)
-The first string tensor.
+Optional overrides for well-known special tokens.
-***input_2: tensor(string)***
+#### Inputs
-The second string tensor.
+***ids: tensor(int64)***
+1D or 2D tensor of token ids.
#### Outputs
***output: tensor(string)***
-The result.
-
-#### Examples
-
+Decoded string tensor.
-```python
+
-node = onnx.helper.make_node(
- 'StringConcat',
- inputs=['x', 'y'],
- outputs=['result'],
-)
-x = np.array(["abcd", "efgh"])
-y = np.array(["wxyz", "stuv"])
-result = np.array([x[0] + y[0], x[1] + y[1]])
+### TrieTokenizer
-expect(node, inputs=[x, y], outputs=[result],
- name='test_string_concat')
-```
+
+TrieTokenizer details
-
+Trie-based longest-match tokenizer used by RWKV-style models.
-### StringRegexSplitWithOffsets
+#### Attributes
-
-StringRegexSplitWithOffsets details
+***vocab: string***
-Splits string based on regular expressions.
+Newline-separated vocab where each line has the form `index token length`. `token` is a Python-repr-encoded byte string.
#### Inputs
-***text: tensor(string)***
+***input: tensor(string)***
-String tensor to extract slices from.
+1D string tensor of inputs.
-***delim_regex_pattern: tensor(string)***
+#### Outputs
-Splitting attern of the regular expression.
+***output: tensor(int64)***
-***keep_delim_regex_pattern: tensor(string)***
+2D right-padded tensor of token ids; padding uses id `0`.
-By default, delimiters are not included in the split string results. Delimiters may be included by specifying a regex pattern keep_delim_regex_pattern.
+
-#### Outputs
-***words: tensor(string)*** Tensor of words.
+### TrieDetokenizer
-***offsets: tensor(int64)*** 2D tensor with 3 columns:
-sentence index, position of the first character, position of the last one (excluded)
+
+TrieDetokenizer details
-***row_indices: tensor(int64)*** Indices of every first token of input sentences.
-`row_indices[i+1] - row_indices[i]` is the number of tokens in input `i`.
-These are updates row indices given as inputs or new ones if the second input is empty.
+Inverse of `TrieTokenizer`. Converts 1D or 2D id tensors back to strings using the same trie vocabulary.
+#### Attributes
-#### Examples
+***vocab: string***
+Same vocabulary format as `TrieTokenizer`.
-```python
+#### Inputs
-node = onnx.helper.make_node(
- 'StringRegexSplit',
- inputs=['text', 'pattern', 'rewrite'],
- outputs=['y', 'begin_end', 'indices'],
-)
+***ids: tensor(int64)***
-text = np.array(["hello there"])
-pattern = np.array([r'\s'])
-rewrite = np.array([r'\s'])
-y = np.array(["hello", " ", "there"])
-z1 = np.array([[0, 0, 5],
- [0, 5, 6],
- [0, 6, 11]], dtype=np.int64)
-z2 = np.array([0, 2], dtype=np.int64)
-
-expect(node, inputs=[text, pattern, rewrite], outputs=[y, z1, z2],
- name='test_string_regex_replace')
-```
+1D or 2D tensor of token ids.
-
+#### Outputs
+***output: tensor(string)***
-### StringECMARegexSplitWithOffsets
+Decoded text, one string per row.
-TODO
+
-### VectorToString
+
+### BlingFireSentenceBreaker
-VectorToString details
+BlingFireSentenceBreaker details
-VectorToString is the contrary operation to the `StringToVector` , they share same format of mapping table:
+Segments an input string into sentences using a compiled [BlingFire](https://github.com/microsoft/BlingFire) model.
- \t\s\s...
+#### Attributes
-Unmapped vector will output the value of the attribute `unk`.
+***model: string***
-Example:
+Raw bytes of the compiled BlingFire sentence-breaking model (`*.bin`).
-*Attributes:*
+***max_sentence: int64_t*** (default is -1)
-- `map`:
- ```
- a 0 0 1 2
- b 0 1 2 3
- d 0 1 3 4
- ```
+If positive, limits the number of returned sentences.
-- `unk`: "unknown_word"
+#### Inputs
-*Inputs:*
-- data: [[0,0,1,2],[0,1,3,4],[0,0,0,0]]
+***input: tensor(string)***
-*Ouputs:*
-- output: ["a", "d", "unknown_word" ]
+Scalar input string.
-#### Attributes
+#### Outputs
-***mapping_file_name***
+***output: tensor(string)***
-the formative mapping table
+1D tensor of sentences.
-***unmapping_value***
+
-the result returned when a vector aren't found in the map
+
+## String operators
+
+### StringEqual
+
+
+StringEqual details
+
+Compares two strings elementwise and returns true when they are equal.
#### Inputs
-***data: tensor(T)***
+***x: tensor(string)***
-Input tensor
+The first string input.
+
+***y: tensor(string)***
+
+The second string input. Must have the same shape as `x` (or be broadcastable).
#### Outputs
-***output: tensor(string)***
+***z: tensor(bool)***
-The mapping result of the input
+Boolean tensor with the same shape as the broadcasted inputs; `true` where the inputs are equal.
-#### Type Constraints
-***T:tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(bfloat16), tensor(float16), tensor(float), tensor(double), tensor(bool)***
+
-Constrain input and output types to numerical tensors.
+### StringToHashBucket
-#### Examples
+
+StringToHashBucket details
+Hashes each input string into one of `num_buckets` buckets using the internal FarmHash-like 64-bit hash implementation.
-```python
-mapping_table = \
- """
- a 0 0 1 2
- b 0 1 2 3
- d 0 1 3 4
- """
+#### Inputs
-node = onnx.helper.make_node(
- 'VectorToString',
- inputs=['x'],
- outputs=['y'],
- map=mapping_table,
- unk="unknown_word"
-)
+***input: tensor(string)***
+The input string tensor to hash.
-x = np.array([[0,0,1,2],[0,1,3,4],[0,0,0,0]], type=np.int64)
-y = ["a", "d", "unknown_word"]
+***num_buckets: tensor(int64)***
+Scalar number of hash buckets. Must be greater than 0.
+
+#### Outputs
+
+***output: tensor(int64)***
+
+Tensor of the same shape as `input` containing the hash-bucket index for each input string. Each value lies in the range `[0, num_buckets)`.
-expect(node, inputs=[x], outputs=[y],
- name='test_vector_to_string')
-```
-### StringToVector
+### StringToHashBucketFast
-StringToVector details
+StringToHashBucketFast details
-StringToVector will map each string element in the input to the corresponding vector according to the mapping file. The mapping file is a utf-8 encoding text file in tsv format:
+A faster variant of `StringToHashBucket` that uses `std::hash` internally. Hash values are not stable across platforms or compilers, so the op is intended for stateless in-process hashing rather than persisted lookup tables.
- \t\s\s...
+#### Inputs
-Unmapped string will output the value of the attribute `unmapping_value`.
+***input: tensor(string)***
-Example:
+The strings to hash.
-*Attributes:*
+***num_buckets: tensor(int64)***
-- `mapping_file_name`: vocabulary.txt
- ```
- a 0 0 1 2
- b 0 1 2 3
- d 0 1 3 4
- ```
-
-- `unmapping_value`: [0 0 0 0]
+Scalar number of hash buckets. Must be greater than 0.
-*Inputs:*
-- data: ["a", "d", "e"]
+#### Outputs
-*Ouputs:*
-- output: [[0,0,1,2],[0,1,3,4],[0,0,0,0]]
+***output: tensor(int64)***
-#### Attributes
+The hashed values, with the same shape as `input`.
-***mapping_file_name:string***
+
-The name of your string to vector mapping file.
-***unmapping_value:list(int)***
+### StringJoin
-Mapping result for unmapped string
+
+StringJoin details
+
+
+Join an array of strings
#### Inputs
-***data: tensor(string)***
+***input_X: tensor(string)***
-Input tensor
+The input array of strings
-#### Outputs
+***input_sep: tensor(string)***
-***output: tensor(T)***
+The string separator for the resulting joining.
-The mapping result of the input
+***input_axis: tensor(int64)***
-#### Type Constraints
-***T:tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(bfloat16), tensor(float16), tensor(float), tensor(double), tensor(bool)***
+The axis along which to join.
-Constrain input and output types to numerical tensors.
+#### Outputs
-#### Examples
+***out: tensor(string)***
+The resulting joined string
-```python
-# what's in vocabulary.txt
+#### Examples
-mapping_table = \
-"""
-a 0 0 1 2
-b 0 1 2 3
-d 0 1 3 4
-"""
-node = onnx.helper.make_node(
- 'StringToVector',
- inputs=['x'],
- outputs=['y'],
- mapping_table=mapping_table,
- unmapping_value=[0,0,0,0]
-)
+```bash
+input_X = [["a", "b", "c"], ["aa", "bb", ""]]
+input_sep=";"
+input_axis = 1
-x = ["a", "d", "e"]
-y = np.array([[0,0,1,2],[0,1,3,4],[0,0,0,0]], type=np.int64)
+out = ["a;b;c", "aa;bb;"]
+input_axis = 0
-expect(node, inputs=[x], outputs=[y],
- name='test_string_to_vector')
-```
+out = ['a;aa', 'b;bb', 'c;']
-
+
-### StringSlice
+### StringRegexReplace
-StringSlice details
+StringRegexReplace details
-Do the slice operation to each string element in input tensor. Similar to string slice in python
-```python
-a = "abcdef"
-b = a[1:2]
-c = a[3:1:-1]
-```
+String replacement based on [Re2-format](https://github.com/google/re2/wiki/Syntax) regular expressions.
#### Inputs
-***data: tensor(string)***
+***text: tensor(string)***
String tensor to extract slices from.
-***starts: tensor(int64/int32)***
+***pattern: tensor(string)***
-The tensor of starting indices of corresponding string in data, which has same dimension of data.
+Pattern of the regular expression.
-***ends: tensor(int64/int32)***
+***rewrite: tensor(string)***
-The tensor of ending indices of corresponding string in data, which has same dimension of data.
+Replacement.
-***steps(optional): tensor(int64/int32)***
+#### Attributes
-The tensor of slice step of corresponding string in data, which has same dimension of data.If steps is empty tensor, we will use default value 1 for each string
+***global_replace: int64*** (default is 1)
+
+Replace all strings matching the pattern or the first one.
#### Outputs
***output: tensor(string)***
-Sliced data tensor.
+String with replacements.
#### Examples
-
```python
node = onnx.helper.make_node(
- 'StringSlice',
- inputs=['x', 'starts', 'ends', 'steps'],
+ 'StringRegexReplace',
+ inputs=['text', 'pattern', 'rewrite'],
outputs=['y'],
)
-x = np.array(["abcdef", "hijkl"])
-y = np.array([x[0][1:3:1], x[1][3:1:-1]])
-starts = np.array([1, 3], dtype=np.int64)
-ends = np.array([3, 1], dtype=np.int64)
-axes = np.array([0, 1], dtype=np.int64)
-steps = np.array([1, 1], dtype=np.int64)
+text = np.array([['def myfunc():'], ['def dummy():']])
+pattern = np.array([r'def\s+([a-zA-Z_][a-zA-Z_0-9]*)\s*\(\s*\):'])
+rewrite = np.array([r'static PyObject* py_\1(void) {'])
+y = [['static PyObject* py_myfunc(void) {'],
+ ['static PyObject* py_dummy(void) {']]
-expect(node, inputs=[x, starts, ends, axes, steps], outputs=[y],
- name='test_string_slice')
+expect(node, inputs=[text, pattern, rewrite], outputs=[y],
+ name='test_string_regex_replace')
```
-
-### MaskedFill
+### StringECMARegexReplace
-MaskedFill details
+StringECMARegexReplace details
+String replacement based on [ECMA-format](https://en.cppreference.com/w/cpp/regex/ecmascript) regular expressions.
-Fills elements of self tensor with value where mask is True. The operator is similar with [`Tensor.masked_fill_`](https://pytorch.org/docs/stable/generated/torch.Tensor.masked_fill_.html#torch.Tensor.masked_fill_) in pytorch.
+#### Inputs
+***text: tensor(string)***
-#### Inputs
+String tensor to extract slices from.
-***value: tensor(string)***
+***pattern: tensor(string)***
-The value to fill in with, currently we only support string type and vector&scalar dimension.
+Pattern of the regular expression.
-***mask: tensor(bool)***
+***rewrite: tensor(string)***
+
+Replacement.
+
+#### Attributes
+
+***global_replace: int64*** (default is 1)
+
+Replace all strings matching the pattern or the first one.
-The boolean mask, the dimension of mask tensor should be same with value.
+
+***ignore_case: int64*** (default is 0)
+
+Whether to perform case-insensitive ECMAScript regular expression matching.
#### Outputs
***output: tensor(string)***
-The filled output of input tensor.
-
+String with replacements.
#### Examples
@@ -1206,59 +1153,1427 @@ The filled output of input tensor.
```python
node = onnx.helper.make_node(
- 'MaskedFill',
- inputs=['value', 'mask'],
- outputs=['output']
+ 'StringECMARegexReplace',
+ inputs=['text', 'pattern', 'rewrite'],
+ outputs=['y'],
)
+text = np.array([['def myfunc():'], ['def dummy():']])
+pattern = np.array([r'def\s+([a-zA-Z_][a-zA-Z_0-9]*)\s*\(\s*\):'])
+rewrite = np.array([r'static PyObject* py_$1(void) {'])
+y = [['static PyObject* py_myfunc(void) {'],
+ ['static PyObject* py_dummy(void) {']]
-value = np.array(["a", "b", "c", "d"])
-mask = np.array([True, False, True, False], dtype=bool)
-output = np.array(["a", "c"])
-
-
-expect(node, inputs=[value, mask], outputs=[output],
- name='test_masked_fill')
+expect(node, inputs=[text, pattern, rewrite], outputs=[y],
+ name='test_string_regex_replace')
```
+
-### StringRaggedTensorToDense
-TODO
+### StringSplit
-### StringMapping
+
+StringSplit details
-TODO
+Splits each string in the input by a separator, producing a ragged (sparse) representation of the resulting tokens.
-## Math operators
+#### Inputs
+***input: tensor(string)***
-### Inverse
+1D string tensor to split.
-TODO
+***sep: tensor(string)***
-### NegPos
+Scalar string separator used to split each element of `input`. If empty, the string is split on whitespace.
-TODO
+***skip_empty: tensor(bool)***
-### SegmentExtraction
+Scalar boolean. When true, empty substrings are removed from the output.
-TODO
+#### Outputs
+
+***indices: tensor(int64)***
+
+2D tensor of shape `[N, 2]` containing `(row, col)` coordinates of each output token in the ragged representation.
+
+***values: tensor(string)***
+
+1D tensor of `N` tokens produced by splitting, in row-major order.
+
+***shape: tensor(int64)***
+
+2-element tensor describing the dense shape `[num_rows, max_row_width]` of the ragged tensor.
+
+
+
+### StringUpper
+
+
+StringUpper details
+
+Converts every ASCII character in each string of the input tensor to uppercase using `std::toupper`. This operator is ASCII-only; non-ASCII bytes are passed through unchanged. For full Unicode case folding, pre-process inputs accordingly or use `StringLower` as a reference for Unicode handling.
+
+#### Inputs
+
+***input: tensor(string)***
+
+String tensor of arbitrary shape.
+
+#### Outputs
+
+***output: tensor(string)***
+
+String tensor of the same shape as `input` with uppercased strings.
+
+
+
+### StringLower
+
+
+StringLower details
+
+Converts each string in the input tensor to lowercase. Unlike `StringUpper`, this operator decodes input bytes as UTF-8 and performs Unicode-aware case folding on each code point before re-encoding the result.
+
+#### Inputs
+
+***input: tensor(string)***
+
+String tensor of arbitrary shape.
+
+#### Outputs
+
+***output: tensor(string)***
+
+String tensor of the same shape as `input` with lowercased strings.
+
+
+
+### StringStrip
+
+
+StringStrip details
+
+Removes leading and trailing whitespace characters from every string in the input tensor. Similar to `str.strip()` in Python.
+
+#### Inputs
+
+***input: tensor(string)***
+
+String tensor of arbitrary shape.
+
+#### Outputs
+
+***output: tensor(string)***
+
+String tensor of the same shape as `input` with whitespace stripped.
+
+
+
+### StringLength
+
+
+StringLength details
+
+Get the length of each string element in the input tensor. Similar to the function `len("abcde")` in Python.
+
+#### Inputs
+
+***input: tensor(string)***
+
+String tensor to get the length of each string element from.
+
+#### Outputs
+
+***output: tensor(int64)***
+
+Data length tensor.
+
+#### Examples
+
+
+```python
+
+node = onnx.helper.make_node(
+ 'StringLength',
+ inputs=['x'],
+ outputs=['y']
+)
+
+x = ["abcdef", "hijkl"]
+y = np.array([len(x[0]), len(x[1])], dtype=np.int64)
+
+
+expect(node, inputs=[x], outputs=[y],
+ name='test_string_length')
+```
+
+
+### StringConcat
+
+
+StringConcat details
+
+Concat the corresponding string in the two string tensor. Two input tensors should have the same dimension.
+
+```python
+ output = []
+ shape = input1.shape
+ input1 = input1.flatten()
+ input2 = input2.flatten()
+ for i in range(len(input1)):
+ output.append(input1[i] + input2[i])
+ output = np.array(output).reshape(shape)
+```
+
+#### Inputs
+
+***input_1: tensor(string)***
+
+The first string tensor.
+
+***input_2: tensor(string)***
+
+The second string tensor.
+
+
+#### Outputs
+
+***output: tensor(string)***
+
+The result.
+
+#### Examples
+
+
+```python
+
+node = onnx.helper.make_node(
+ 'StringConcat',
+ inputs=['x', 'y'],
+ outputs=['result'],
+)
+
+x = np.array(["abcd", "efgh"])
+y = np.array(["wxyz", "stuv"])
+result = np.array([x[0] + y[0], x[1] + y[1]])
+
+expect(node, inputs=[x, y], outputs=[result],
+ name='test_string_concat')
+```
+
+
+
+### StringRegexSplitWithOffsets
+
+
+StringRegexSplitWithOffsets details
+
+Splits strings based on regular expressions (RE2 dialect) and reports the byte offsets of each produced token.
+
+#### Inputs
+
+***text: tensor(string)***
+
+String tensor to split.
+
+***delim_regex_pattern: tensor(string)***
+
+Splitting pattern of the regular expression.
+
+***keep_delim_regex_pattern: tensor(string)***
+
+By default, delimiters are not included in the split string results. Delimiters may be included by specifying a regex pattern via `keep_delim_regex_pattern`.
+
+#### Outputs
+
+***tokens: tensor(string)***
+
+1D tensor of tokens produced by splitting, in row-major order.
+
+***begin_offsets: tensor(int64)***
+
+1D tensor with the begin byte offset of each token in the corresponding input string.
+
+***end_offsets: tensor(int64)***
+
+1D tensor with the end byte offset (exclusive) of each token in the corresponding input string.
+
+***row_offsets: tensor(int64)***
+
+1D tensor of row offsets such that tokens of the i-th input string occupy `[row_offsets[i], row_offsets[i+1])` in `tokens`.
+
+#### Examples
+
+
+```python
+
+node = onnx.helper.make_node(
+ 'StringRegexSplitWithOffsets',
+ inputs=['text', 'pattern', 'keep_pattern'],
+ outputs=['tokens', 'begin_offsets', 'end_offsets', 'row_offsets'],
+)
+
+text = np.array(["hello there"])
+pattern = np.array([r'\s'])
+keep_pattern = np.array([""])
+tokens = np.array(["hello", "there"])
+begin_offsets = np.array([0, 6], dtype=np.int64)
+end_offsets = np.array([5, 11], dtype=np.int64)
+row_offsets = np.array([0, 2], dtype=np.int64)
+
+expect(node, inputs=[text, pattern, keep_pattern],
+ outputs=[tokens, begin_offsets, end_offsets, row_offsets],
+ name='test_string_regex_split_with_offsets')
+```
+
+
+
+
+### StringECMARegexSplitWithOffsets
+
+
+StringECMARegexSplitWithOffsets details
+
+Splits strings using a regular expression in the ECMAScript dialect and reports the byte offsets of every produced token. Provides the same functionality as `StringRegexSplitWithOffsets` but uses `std::regex` instead of `re2`, allowing ECMAScript regex features.
+
+#### Inputs
+
+***input: tensor(string)***
+
+String tensor to split.
+
+***pattern: tensor(string)***
+
+Scalar string containing the ECMAScript regex splitting pattern.
+
+***keep_pattern: tensor(string)***
+
+Scalar string. Delimiter matches that also match this pattern are preserved as tokens in the output. Pass an empty string to drop all delimiters.
+
+#### Attributes
+
+***ignore_case: int64_t*** (default is 0)
+
+When set to 1 the regex is matched case-insensitively.
+
+#### Outputs
+
+***tokens: tensor(string)***
+
+1D tensor containing the split tokens.
+
+***begin_offsets: tensor(int64)***
+
+1D tensor with the begin byte offset of each token in the corresponding input string.
+
+***end_offsets: tensor(int64)***
+
+1D tensor with the end byte offset (exclusive) of each token in the corresponding input string.
+
+***row_offsets: tensor(int64)***
+
+1D tensor of row offsets such that tokens of the i-th input string occupy `[row_offsets[i], row_offsets[i+1])` in `tokens`.
+
+
+
+### VectorToString
+
+
+VectorToString details
+
+VectorToString is the contrary operation to the `StringToVector` , they share same format of mapping table:
+
+ \t\s\s...
+
+Unmapped vector will output the value of the attribute `unk`.
+
+Example:
+
+*Attributes:*
+
+- `map`:
+ ```
+ a 0 0 1 2
+ b 0 1 2 3
+ d 0 1 3 4
+ ```
+
+- `unk`: "unknown_word"
+
+*Inputs:*
+- data: [[0,0,1,2],[0,1,3,4],[0,0,0,0]]
+
+*Ouputs:*
+- output: ["a", "d", "unknown_word" ]
+
+#### Attributes
+
+***mapping_file_name***
+
+the formative mapping table
+
+***unmapping_value***
+
+the result returned when a vector aren't found in the map
+
+#### Inputs
+
+***data: tensor(T)***
+
+Input tensor
+
+#### Outputs
+
+***output: tensor(string)***
+
+The mapping result of the input
+
+#### Type Constraints
+***T:tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(bfloat16), tensor(float16), tensor(float), tensor(double), tensor(bool)***
+
+Constrain input and output types to numerical tensors.
+
+
+#### Examples
+
+
+```python
+mapping_table = \
+ """
+ a 0 0 1 2
+ b 0 1 2 3
+ d 0 1 3 4
+ """
+
+node = onnx.helper.make_node(
+ 'VectorToString',
+ inputs=['x'],
+ outputs=['y'],
+ map=mapping_table,
+ unk="unknown_word"
+)
+
+
+x = np.array([[0,0,1,2],[0,1,3,4],[0,0,0,0]], type=np.int64)
+y = ["a", "d", "unknown_word"]
+
+
+expect(node, inputs=[x], outputs=[y],
+ name='test_vector_to_string')
+```
+
+
+
+### StringToVector
+
+
+StringToVector details
+
+StringToVector will map each string element in the input to the corresponding vector according to the mapping file. The mapping file is a utf-8 encoding text file in tsv format:
+
+ \t\s\s...
+
+Unmapped string will output the value of the attribute `unmapping_value`.
+
+Example:
+
+*Attributes:*
+
+- `mapping_file_name`: vocabulary.txt
+ ```
+ a 0 0 1 2
+ b 0 1 2 3
+ d 0 1 3 4
+ ```
+
+- `unmapping_value`: [0 0 0 0]
+
+*Inputs:*
+- data: ["a", "d", "e"]
+
+*Ouputs:*
+- output: [[0,0,1,2],[0,1,3,4],[0,0,0,0]]
+
+#### Attributes
+
+***mapping_file_name:string***
+
+The name of your string to vector mapping file.
+
+***unmapping_value:list(int)***
+
+Mapping result for unmapped string
+
+#### Inputs
+
+***data: tensor(string)***
+
+Input tensor
+
+#### Outputs
+
+***output: tensor(T)***
+
+The mapping result of the input
+
+#### Type Constraints
+***T:tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(bfloat16), tensor(float16), tensor(float), tensor(double), tensor(bool)***
+
+Constrain input and output types to numerical tensors.
+
+#### Examples
+
+
+```python
+# what's in vocabulary.txt
+
+mapping_table = \
+"""
+a 0 0 1 2
+b 0 1 2 3
+d 0 1 3 4
+"""
+
+node = onnx.helper.make_node(
+ 'StringToVector',
+ inputs=['x'],
+ outputs=['y'],
+ mapping_table=mapping_table,
+ unmapping_value=[0,0,0,0]
+)
+
+
+x = ["a", "d", "e"]
+y = np.array([[0,0,1,2],[0,1,3,4],[0,0,0,0]], type=np.int64)
+
+
+expect(node, inputs=[x], outputs=[y],
+ name='test_string_to_vector')
+```
+
+
+
+
+
+### StringSlice
+
+
+StringSlice details
+
+Do the slice operation to each string element in input tensor. Similar to string slice in python
+
+```python
+a = "abcdef"
+b = a[1:2]
+c = a[3:1:-1]
+```
+
+#### Inputs
+
+***data: tensor(string)***
+
+String tensor to extract slices from.
+
+***starts: tensor(int64/int32)***
+
+The tensor of starting indices of corresponding string in data, which has same dimension of data.
+
+***ends: tensor(int64/int32)***
+
+The tensor of ending indices of corresponding string in data, which has same dimension of data.
+
+***steps(optional): tensor(int64/int32)***
+
+The tensor of slice step of corresponding string in data, which has same dimension of data.If steps is empty tensor, we will use default value 1 for each string
+
+#### Outputs
+
+***output: tensor(string)***
+
+Sliced data tensor.
+
+#### Examples
+
+
+```python
+
+node = onnx.helper.make_node(
+ 'StringSlice',
+ inputs=['x', 'starts', 'ends', 'steps'],
+ outputs=['y'],
+)
+
+x = np.array(["abcdef", "hijkl"])
+y = np.array([x[0][1:3:1], x[1][3:1:-1]])
+starts = np.array([1, 3], dtype=np.int64)
+ends = np.array([3, 1], dtype=np.int64)
+axes = np.array([0, 1], dtype=np.int64)
+steps = np.array([1, 1], dtype=np.int64)
+
+expect(node, inputs=[x, starts, ends, axes, steps], outputs=[y],
+ name='test_string_slice')
+```
+
+
+
+
+### MaskedFill
+
+
+MaskedFill details
+
+
+Fills elements of self tensor with value where mask is True. The operator is similar with [`Tensor.masked_fill_`](https://pytorch.org/docs/stable/generated/torch.Tensor.masked_fill_.html#torch.Tensor.masked_fill_) in pytorch.
+
+
+#### Inputs
+
+***value: tensor(string)***
+
+The value to fill in with, currently we only support string type and vector&scalar dimension.
+
+***mask: tensor(bool)***
+
+The boolean mask, the dimension of mask tensor should be same with value.
+
+#### Outputs
+
+***output: tensor(string)***
+
+The filled output of input tensor.
+
+
+#### Examples
+
+
+```python
+
+node = onnx.helper.make_node(
+ 'MaskedFill',
+ inputs=['value', 'mask'],
+ outputs=['output']
+)
+
+
+value = np.array(["a", "b", "c", "d"])
+mask = np.array([True, False, True, False], dtype=bool)
+output = np.array(["a", "c"])
+
+
+expect(node, inputs=[value, mask], outputs=[output],
+ name='test_masked_fill')
+```
+
+
+
+### StringRaggedTensorToDense
+
+
+StringRaggedTensorToDense details
+
+Converts a ragged string tensor to a dense 2D string tensor, padding shorter rows with a fill value.
+
+#### Inputs
+
+***row_splits: tensor(int64)***
+
+1D tensor with the starting position of each row in `values`. Row `i` contains `values[row_splits[i]:row_splits[i+1]]`.
+
+***values: tensor(string)***
+
+1D flat string tensor holding the concatenated row values.
+
+***default_value_shape: tensor(int64)***
+
+1D tensor describing the target dense shape. Only used to determine the number of columns.
+
+***default_value: tensor(string)***
+
+Scalar string used to pad rows that are shorter than the longest row.
+
+#### Outputs
+
+***output: tensor(string)***
+
+2D dense string tensor with padding applied.
+
+
+
+### StringMapping
+
+
+StringMapping details
+
+Maps each element of the input string tensor to another string using a user-supplied dictionary. Strings not found in the dictionary are passed through unchanged.
+
+#### Attributes
+
+***map: string***
+
+A string containing one mapping per line. Each line has the form `key\tvalue`, where key and value are separated by a tab character.
+
+#### Inputs
+
+***input: tensor(string)***
+
+Input string tensor of arbitrary shape.
+
+#### Outputs
+
+***output: tensor(string)***
+
+Output string tensor of the same shape as `input` after mapping.
+
+
+
+## Math operators
+
+
+### Inverse
+
+
+Inverse details
+
+Computes the matrix inverse of a 2D floating-point tensor.
+
+#### Inputs
+
+***input: tensor(float)***
+
+A 2D square matrix of shape `[N, N]`.
+
+#### Outputs
+
+***output: tensor(float)***
+
+The inverse of the input matrix, of shape `[N, N]`.
+
+
+
+### NegPos
+
+
+NegPos details
+
+Splits an input tensor into its negative and positive parts. Equivalent to `min(x, 0)` and `max(x, 0)` returned separately.
+
+#### Inputs
+
+***input: tensor(float)***
+
+Input tensor of arbitrary shape.
+
+#### Outputs
+
+***neg: tensor(float)***
+
+Tensor with the same shape as `input`; contains `x` where `x < 0`, else `0`.
+
+***pos: tensor(float)***
+
+Tensor with the same shape as `input`; contains `x` where `x >= 0`, else `0`.
+
+
+
+### SegmentExtraction
+
+
+SegmentExtraction details
+
+Extracts contiguous non-zero segments from a 1D integer input. For every maximal run of non-zero values, the start and end positions are returned.
+
+#### Inputs
+
+***input: tensor(int64)***
+
+1D input tensor.
+
+#### Outputs
+
+***position: tensor(int64)***
+
+2D tensor of shape `[num_segments, 2]` where each row is `(begin, end)` (end exclusive).
+
+***value: tensor(int64)***
+
+1D tensor of length `num_segments` with the value inside each segment.
+
+
### SegmentSum
-TODO
+
+SegmentSum details
+
+Computes sums along segments of the first axis of a tensor, similar to TensorFlow's `tf.math.segment_sum`.
+
+#### Inputs
+
+***data: tensor(float)***
+
+The values to reduce. The first dimension is the segment axis.
+
+***segment_ids: tensor(int64)***
+
+1D tensor with the same length as `data.shape[0]`. Must be non-decreasing.
+
+#### Outputs
+
+***output: tensor(float)***
+
+Tensor where `output[i]` is the sum of all rows of `data` whose corresponding `segment_ids` equal `i`.
+
+
+
+### StftNorm
+
+
+StftNorm details
+
+Computes a short-time Fourier transform (STFT) of a 1D signal and returns the magnitude spectrogram. The implementation uses a Hann-style sliding window.
+
+#### Attributes
+
+***onesided: int64_t*** (default is 1)
+
+If 1, only the non-redundant positive-frequency half of the spectrum is returned (length `n_fft / 2 + 1`). If 0, the full spectrum is returned.
+
+#### Inputs
+
+***pcm: tensor(float)***
+
+1D audio signal.
+
+***n_fft: tensor(int64)***
+
+Scalar FFT size.
+
+***hop_length: tensor(int64)***
+
+Scalar hop length between consecutive frames.
+
+***window: tensor(float)***
+
+1D window function of length `frame_length`.
+
+***frame_length: tensor(int64)***
+
+Scalar frame length (must equal `n_fft`).
+
+#### Outputs
+
+***output: tensor(float)***
+
+3D tensor of shape `[1, num_frames, num_freq_bins]` containing the magnitude spectrogram.
+
+
+
+### SplitSignalSegments
+
+
+SplitSignalSegments details
+
+Partitions an audio signal into segments of voiced/high-energy regions based on a simple short-time energy threshold.
+
+#### Inputs
+
+***input: tensor(float)***
+
+1D audio signal.
+
+***sr: tensor(int64)***
+
+Scalar sample rate in Hz.
+
+***frame_ms: tensor(int64)***
+
+Scalar analysis frame length in milliseconds.
+
+***hop_ms: tensor(int64)***
+
+Scalar hop length between analysis frames in milliseconds.
+
+***energy_threshold_db: tensor(float)***
+
+Scalar energy threshold in dBFS. Frames with average energy below this are treated as silence.
+
+#### Outputs
+
+***segments: tensor(int64)***
+
+2D tensor of shape `[num_segments, 2]` where each row contains the `(begin_sample, end_sample)` indices of a detected segment.
+
+
+
+### MergeSignalSegments
+
+
+MergeSignalSegments details
+
+Merges adjacent audio segments whose gap is shorter than a configurable threshold. Typically used as a post-processing step after `SplitSignalSegments`.
+
+#### Inputs
+
+***segments: tensor(int64)***
+
+2D tensor of shape `[N, 2]` with `(begin, end)` indices, as produced by `SplitSignalSegments`.
+
+***merge_gap_ms: tensor(int64)***
+
+Scalar gap threshold in milliseconds. Segments separated by less than this value are merged.
+
+#### Outputs
+
+***output: tensor(int64)***
+
+2D tensor of shape `[M, 2]` (M <= N) of the merged segment boundaries.
+
+
+
+## Tensor operators
+
+### RaggedTensorToSparse
+
+
+RaggedTensorToSparse details
+
+Converts a ragged tensor's row lengths to a COO-style sparse indexing representation.
+
+#### Inputs
+
+***n_element: tensor(int64)***
+
+1D tensor holding the number of elements in each row.
+
+#### Outputs
+
+***output_0: tensor(int64)***
+
+2D tensor of `(row, col)` indices for every element.
+
+***output_1: tensor(int64)***
+
+1D tensor of length 2 containing the dense shape `[num_rows, max_row_width]`.
+
+
+
+### RaggedTensorToDense
+
+
+RaggedTensorToDense details
+
+Converts a ragged int64 tensor to a dense 2D tensor, padding shorter rows with a configurable value.
+
+#### Attributes
+
+***missing_value: int64_t*** (default is -1)
+
+Value used to pad short rows.
+
+#### Inputs
+
+***input0: tensor(int64)***
+
+1D row-splits tensor indicating the start index of each row within `input3`.
+
+***input1: tensor(int64)***
+
+1D tensor of flat indices (unused by some consumers; reserved).
+
+***input2: tensor(int64)***
+
+1D tensor of length 2 describing the target dense shape `[num_rows, max_row_width]`.
+
+***input3: tensor(int64)***
+
+1D flat values tensor.
+
+#### Outputs
+
+***output: tensor(int64)***
+
+2D dense tensor with missing elements filled by `missing_value`.
+
+
+
+## Audio operators
+
+### AudioDecoder
+
+
+AudioDecoder details
+
+Decodes a byte stream containing an encoded audio file (WAV, MP3, or FLAC) into a float PCM tensor. Optionally resamples the audio to a target sample rate.
+
+#### Attributes
+
+***downsampling_rate: int64_t*** (default is -1)
+
+Target sample rate to resample the decoded audio to. When -1, the native sample rate of the decoded stream is used.
+
+***stereo_to_mono: int64_t*** (default is 0)
+
+If set to 1, multi-channel audio is mixed down to a single mono channel.
+
+#### Inputs
+
+***input: tensor(uint8)***
+
+1D tensor of raw bytes representing the encoded audio file.
+
+***format: tensor(string)*** (optional)
+
+Scalar describing the container format. Accepted values: `"wav"`, `"mp3"`, `"flac"`. When absent the format is detected from the file header.
+
+#### Outputs
+
+***output: tensor(float)***
+
+2D tensor of shape `[1, num_samples]` with the decoded (and optionally resampled) PCM samples in the range `[-1, 1]`.
+
+
+
+## Vision operators
+
+### DecodeImage
+
+
+DecodeImage details
+
+Decodes an encoded image (PNG, JPEG, BMP, TIFF, …) into an `HxWx3` uint8 tensor.
+
+#### Attributes
+
+***color_space: string*** (default is "bgr")
+
+Color ordering of the output. Valid values are `"rgb"` and `"bgr"` (case-insensitive).
+
+#### Inputs
+
+***input: tensor(uint8)***
+
+1D tensor containing the raw encoded image bytes.
+
+#### Outputs
+
+***output: tensor(uint8)***
+
+3D tensor of shape `[H, W, 3]`.
+
+
+
+### EncodeImage
+
+
+EncodeImage details
+
+Encodes a 3-channel `HxWx3` uint8 image tensor to image bytes.
+
+#### Attributes
+
+***format: string*** (default is "png")
+
+Output image format. Valid values are `"png"` and `"jpg"`.
+
+***color_space: string*** (default is "bgr")
+
+Color space / channel order of the input image. Supported values are `"bgr"` and `"rgb"`.
+
+#### Inputs
+
+***input: tensor(uint8)***
+
+3D tensor of shape `[H, W, 3]`. The expected channel order depends on `color_space`: BGR for `"bgr"` and RGB for `"rgb"`.
+
+#### Outputs
+
+***output: tensor(uint8)***
-## Tensor operators
+1D tensor of encoded image bytes.
-### RaggedTensorToSparse
+
-TODO
+### DrawBoundingBoxes
-### RaggedTensorToDense
+
+DrawBoundingBoxes details
+
+Draws bounding boxes on a BGR image tensor.
+
+#### Attributes
+
+***thickness: int64_t*** (default is 4)
+
+Line thickness of the drawn rectangles, in pixels.
+
+***num_classes: int64_t*** (default is 10)
+
+Number of class colors to cycle through.
+
+***mode: string*** (default is "XYXY")
+
+Interpretation of the box coordinates. One of `"XYXY"`, `"XYWH"`, or `"CENTER_XYWH"`.
+
+***colour_by_classes: int64_t*** (default is 1)
+
+When 1, boxes of the same class share a colour. When 0, each box gets a unique colour from the palette.
+
+#### Inputs
+
+***image: tensor(uint8)***
+
+3D tensor of shape `[H, W, 3]` in BGR order.
+
+***boxes: tensor(float)***
+
+2D tensor of shape `[N, 6]`. Each row is `(class_id, score, x0, y0, x1, y1)` (or equivalent depending on `mode`).
+
+#### Outputs
+
+***output: tensor(uint8)***
+
+Image tensor with boxes drawn, same shape as `image`.
+
+
+
+### GaussianBlur
+
+
+GaussianBlur details
+
+Applies a 2D Gaussian blur to an image tensor using OpenCV's `cv::GaussianBlur`. The current kernel wraps the input buffer as a single `CV_32FC3` matrix, so inputs must have `N == 1` and `C == 3` channels.
+
+#### Inputs
+
+***input: tensor(float)***
+
+4D image tensor of shape `[1, H, W, 3]`.
+
+***ksize: tensor(int64)***
+
+1D tensor of length 2 specifying the kernel size `[kx, ky]` (odd positive integers).
+
+***sigma: tensor(double)***
+
+1D tensor of length 2 specifying the Gaussian standard deviation along X and Y.
+
+#### Outputs
+
+***output: tensor(float)***
+
+Blurred tensor with the same shape as `input`.
+
+
+
+### ImageDecoder
+
+
+ImageDecoder details
+
+Decodes raw encoded image bytes using OpenCV's `cv::imdecode`. Similar to `DecodeImage` but always returns BGR and does not expose a color-space attribute.
+
+#### Inputs
+
+***input: tensor(uint8)***
+
+1D tensor of encoded image bytes.
+
+#### Outputs
+
+***output: tensor(uint8)***
+
+3D tensor of shape `[H, W, C]` containing the decoded BGR image.
+
+
+
+### ImageReader
+
+
+ImageReader details
+
+Reads an image from a file path using OpenCV's `cv::imread` and returns the decoded tensor.
+
+#### Inputs
+
+***input: tensor(string)***
+
+1D string tensor of shape `[1]` containing the path of the image file to read.
+
+#### Outputs
+
+***output: tensor(uint8)***
+
+4D tensor of shape `[1, H, W, C]` containing the decoded BGR image.
+
+
+
+## CUDA operators
+
+The following operators execute on CUDA devices only. They are only registered when the library is built with `USE_CUDA`. Unless otherwise noted each op supports `float`, `float16` (`MFloat16`), and in some cases `bfloat16` (`BFloat16`).
+
+### FastGelu
+
+
+FastGelu details
+
+Fused CUDA kernel computing `gelu(x + bias)` using the fast tanh-based approximation.
+
+#### Inputs
+
+***input: tensor(T)***
+
+Input tensor of any shape. `T` is one of `float`, `float16`, `bfloat16`.
+
+***bias: tensor(T)*** (optional)
+
+Bias added elementwise before applying Gelu. Broadcast to the shape of `input`.
+
+#### Outputs
+
+***output: tensor(T)***
+
+Same shape as `input`.
+
+
+
+### MulSigmoid
+
+
+MulSigmoid details
+
+Computes `x * sigmoid(x)` (the SiLU / Swish activation) in a single fused CUDA kernel.
+
+#### Inputs
+
+***input: tensor(T)***
+
+Input tensor. `T` is one of `float`, `float16`, `bfloat16`.
+
+#### Outputs
+
+***output: tensor(T)***
+
+Same shape as `input`.
+
+
+
+### MulMulSigmoid
+
+
+MulMulSigmoid details
+
+Computes `x * y * sigmoid(y)` in a single fused CUDA kernel. Tensors must have the same shape.
+
+#### Inputs
+
+***x: tensor(T)***, ***y: tensor(T)***
+
+`T` is one of `float`, `float16`, `bfloat16`.
+
+#### Outputs
+
+***output: tensor(T)***
+
+Tensor with the same shape as the inputs.
+
+
+
+### NegXPlus1
+
+
+NegXPlus1 details
-TODO
+Computes `1 - x` elementwise on CUDA.
+
+#### Inputs
+
+***input: tensor(T)***
+
+`T` is one of `float`, `float16`, `bfloat16`.
+
+#### Outputs
+
+***output: tensor(T)***
+
+Same shape as `input`.
+
+
+
+### ReplaceZero
+
+
+ReplaceZero details
+
+Replaces every zero element of the input with a scalar value.
+
+#### Attributes
+
+***by: float*** (default is 0.0)
+
+Replacement value for zero entries.
+
+#### Inputs
+
+***input: tensor(T)***
+
+`T` is one of `float`, `float16`, `bfloat16`.
+
+#### Outputs
+
+***output: tensor(T)***
+
+Same shape as `input`.
+
+
+
+### AddSharedInput
+
+
+AddSharedInput details
+
+Computes `A + B` and `A + C` in one kernel launch, sharing the read of `A`.
+
+#### Inputs
+
+***A: tensor(T)***, ***B: tensor(T)***, ***C: tensor(T)***
+
+`T` is one of `float`, `float16`, `bfloat16`. `B` and `C` must have the same shape as `A`.
+
+#### Outputs
+
+***AB: tensor(T)***, ***AC: tensor(T)***
+
+Elementwise sums `A + B` and `A + C`.
+
+
+
+### MulSharedInput
+
+
+MulSharedInput details
+
+Computes `A * B` and `A * C` in one kernel launch, sharing the read of `A`.
+
+#### Inputs
+
+***A: tensor(T)***, ***B: tensor(T)***, ***C: tensor(T)***
+
+`T` is one of `float`, `float16`, `bfloat16`.
+
+#### Outputs
+
+***AB: tensor(T)***, ***AC: tensor(T)***
+
+Elementwise products `A * B` and `A * C`.
+
+
+
+### ScatterNDOfShape
+
+
+ScatterNDOfShape details
+
+Allocates a zero tensor of the given shape and applies a `ScatterND` reduction. Equivalent to `ScatterND(ConstantOfShape(shape, 0), indices, updates, reduction=...)` but fused.
+
+#### Attributes
+
+***reduction: string*** (default is "add")
+
+Reduction to apply to scattered updates. One of `"add"`, `"mul"`, `"min"`, `"max"`.
+
+#### Inputs
+
+***shape: tensor(int64)***
+
+1D tensor describing the output shape. Must live on CPU.
+
+***indices: tensor(int64)***
+
+Indices into the output, as in standard ScatterND.
+
+***updates: tensor(T)***
+
+Values to scatter. `T` is one of `float`, `float16`, `bfloat16`.
+
+#### Outputs
+
+***output: tensor(T)***
+
+Tensor of the requested shape with updates applied.
+
+
+
+### MaskedScatterNDOfShape
+
+
+MaskedScatterNDOfShape details
+
+Variant of `ScatterNDOfShape` that ignores entries of `indices` equal to a configurable mask value.
+
+#### Attributes
+
+***reduction: string*** (default is "add")
+
+Same as `ScatterNDOfShape`.
+
+***maskedValue: int64_t***
+
+Index value that causes the corresponding update to be skipped.
+
+#### Inputs
+
+Same as `ScatterNDOfShape`.
+
+#### Outputs
+
+Same as `ScatterNDOfShape`.
+
+
+
+### Transpose2DCastFP16
+
+
+Transpose2DCastFP16 details
+
+Fused 2D transpose + cast from `float` to `float16`.
+
+#### Inputs
+
+***input: tensor(float)***
+
+2D tensor of shape `[M, N]`.
+
+#### Outputs
+
+***output: tensor(float16)***
+
+2D tensor of shape `[N, M]`.
+
+
+
+### Transpose2DCastFP32
+
+
+Transpose2DCastFP32 details
+
+Fused 2D transpose + cast from `float16` to `float`.
+
+#### Inputs
+
+***input: tensor(float16)***
+
+2D tensor of shape `[M, N]`.
+
+#### Outputs
+
+***output: tensor(float)***
+
+2D tensor of shape `[N, M]`.
+
+
### Template