feat(codecs): Support more avro types#24774
feat(codecs): Support more avro types#24774omwbennett wants to merge 12 commits intovectordotdev:masterfrom
Conversation
|
All contributors have signed the CLA ✍️ ✅ |
|
I have read the CLA Document and I hereby sign the CLA |
1ecf525 to
40d0d6a
Compare
40d0d6a to
aa24899
Compare
|
Hey @vectordotdev/vector, appreciate if you could review this PR, thanks! |
|
@codex review Hi @omwbennett, thanks for contributing to Vector. I enabled the workflows and requested a first pass from codex. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 857b4b3903
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| @@ -0,0 +1 @@ | |||
| {"name":"test","type":"record","fields":[{"name":"date_field","type":{"type":"int"}}]} No newline at end of file | |||
There was a problem hiding this comment.
Preserve logicalType in date fixture schema
roundtrip_avro_fixtures reads this .avsc as the schema, but the new fixture stores date_field as plain int ({"type":{"type":"int"}}) rather than an Avro date logical type. That means decoding this fixture produces AvroValue::Int, so the new AvroValue::Date branch added in try_from is never exercised by tests; regressions in real logicalType: "date" inputs could pass CI unnoticed. Please keep the fixture schema with logicalType: "date" on the field type object so the test validates the intended path.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
This seems to be because the apache-avro dependency was updated to 0.20.0 which changed/fixed canonical_form() to remove logicalTypes, however the avro test fixtures were not re-generated until now.
I've updated the fixture generator to pass the schema as-is into the .avsc fixture file so we don't lose any parts of the schema.
This uncovered a bug in the Avro encoder where it's unable to encode a vrl Integer back to an Avro Date, so I've addressed this by making the encoder aware of the schema using a new method to_avro() in the encode path that maps different vrl values to Avro types based on the schema. I also went ahead and added support for a few other types as those looked straightforward.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cb2cbd735b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
check-spelling run (pull_request_target) for support-avro-date-values Signed-off-by: check-spelling-bot <check-spelling-bot@users.noreply.github.com> on-behalf-of: @check-spelling <check-spelling-bot@check-spelling.dev>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5336c62721
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
Hi @omwbennett, happy to review this once all codex comments are addressed (feel free to decline those with a justification if they do not make sense) |
Thanks @pront, I have addressed all the codex review comments now, appreciate if you could take another look. |
Summary
Addresses #24773 by adding support for decoding Avro
Datevalues, in addition toFixed,TimeMillis,TimestampNanos,LocalTimestampNanostypes.Due to the way VRL stores integers, an extra step was added to the avro encoder path to coerce Avro types based on the schema field's LogicalType.
Vector configuration
Also used similar configs for the other Avro data types.
How did you test this PR?
Change Type
Is this a breaking change?
Does this PR include user facing changes?
no-changeloglabel to this PR.References
Closes #24773
Notes
@vectordotdev/vectorto reach out to us regarding this PR.pre-pushhook, please see this template.make fmtmake check-clippy(if there are failures it's possible some of them can be fixed withmake clippy-fix)make testgit merge origin masterandgit push.Cargo.lock), pleaserun
make build-licensesto regenerate the license inventory and commit the changes (if any). More details here.