Skip to content

enhancement(codecs): advanced syslog Structured Data & RFC compliance fixes#24662

Open
vparfonov wants to merge 10 commits intovectordotdev:masterfrom
vparfonov:fix_sd_encoding
Open

enhancement(codecs): advanced syslog Structured Data & RFC compliance fixes#24662
vparfonov wants to merge 10 commits intovectordotdev:masterfrom
vparfonov:fix_sd_encoding

Conversation

@vparfonov
Copy link
Contributor

@vparfonov vparfonov commented Feb 16, 2026

Summary

This PR upgrades the syslog encoding transform to support complex structured data (nested objects, arrays, scalars) and improves compliance with RFC 3164 and RFC 5424. It also fixes critical panics related to UTF-8 truncation.

Key Changes

1. Structured Data Improvements (RFC 5424)

Previously, the serializer only supported simple key-value pairs in structured data. This PR adds support for:

  • Scalars: Top-level scalars are now automatically wrapped in a value parameter (e.g., [id value="123"]).
  • Nested Objects: Deeply nested objects are flattened using dot notation to preserve hierarchy (e.g., {"meta": {"id": 1}} becomes [meta id="1"]).
  • Arrays: Arrays are safely serialized as JSON strings within the parameter value (e.g.,
    tags="[\"tag1\",\"tag2\",\"tag3\"]").
    ⚠️ RFC5424 doesn't define how to handle arrays in structured data, it only specifies SD-ID and key-value pairs
  • Validation: Added strict validation for SD-ID and PARAM-NAME fields to ensure they contain only printable ASCII characters (33-126) and exclude invalid characters (=, ], ", spaces), replacing invalid chars with _.

2. UTF-8 Safety (Panic Fix)

  • Fix: Replaced byte-based string truncation with character-based truncation.
  • Why: Previously, truncating a string at a fixed byte offset could split a multi-byte UTF-8 character (e.g., emoji or non-Latin scripts), causing the serializer to panic. The new truncate_chars helper ensures we always split on valid character boundaries.

3. RFC 3164 Compliance

  • Sanitization: The TAG field (App Name/Proc ID) in RFC 3164 is now strictly sanitized to ASCII printable characters. Non-ASCII characters are replaced with underscores to prevent protocol violations.
  • Structured Data: RFC 3164 does not support structured data. The serializer now correctly ignores structured data fields when rfc = "rfc3164" is selected, rather than emitting malformed headers.

Vector configuration

How did you test this PR?

  • added tests for nested objects and arrays in structured data.
  • added tests for UTF-8 truncation to verify panic safety with emoji and cyrillic characters.
  • added tests for verifing RFC 3164 sanitization.

Change Type

  • Bug fix
  • New feature
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

Notes

  • Please read our Vector contributor resources.
  • Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
  • Some CI checks run only after we manually approve them.
    • We recommend adding a pre-push hook, please see this template.
    • Alternatively, we recommend running the following locally before pushing to the remote branch:
      • make fmt
      • make check-clippy (if there are failures it's possible some of them can be fixed with make clippy-fix)
      • make test
  • After a review is requested, please avoid force pushes to help us review incrementally.
    • Feel free to push as many commits as you want. They will be squashed into one before merging.
    • For example, you can run git merge origin master and git push.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run make build-licenses to regenerate the license inventory and commit the changes (if any). More details here.

…ve RFC compliance

- Support scalars, nested objects and arrays in structured data
- Fix UTF-8 safety: use character-based truncation (prevents panics)
- Fix RFC 3164: ignore structured data instead of prepending
- Add ASCII sanitization for RFC 3164 fields
- Add RFC 5424 SD-ID/PARAM-NAME validation
@vparfonov vparfonov requested a review from a team as a code owner February 16, 2026 18:19
@vparfonov vparfonov changed the title enhancement(syslog encoding): advanced syslog Structured Data & RFC compliance fixes enhancement(codecs): advanced syslog Structured Data & RFC compliance fixes Feb 17, 2026
 fix SD-ID use char count instead of byte length
 use workspace dependency for toml
 changelog with detailed breakdown
Copy link
Member

@pront pront left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@pront pront enabled auto-merge February 24, 2026 14:05
@pront pront disabled auto-merge February 24, 2026 14:05
@pront
Copy link
Member

pront commented Feb 24, 2026

After the failing checks pass, we can enqueue. I attempted to fix one but I don't have perms to push to this branch.

@pront pront enabled auto-merge February 26, 2026 18:43
auto-merge was automatically disabled February 26, 2026 18:54

Head branch was pushed to by a user without write access

@github-actions github-actions bot added the domain: ci Anything related to Vector's CI environment label Feb 26, 2026
@pront pront enabled auto-merge February 26, 2026 19:21
auto-merge was automatically disabled February 26, 2026 19:31

Head branch was pushed to by a user without write access

@pront pront enabled auto-merge February 26, 2026 20:41
@pront
Copy link
Member

pront commented Feb 26, 2026

You can run make check-markdown locally to see all the errors

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: ci Anything related to Vector's CI environment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants