Feature description
After profiling a few Conduit instances that were running with the new architecture enabled, we discovered that JSON serialization took a lot of time. Tracking metrics about record size took a significant part in that.
The reason for that appears to be that, when obtaining metrics for a record, we serialize the record into a JSON object first (here and here).
Running our benchmarks with the source metrics disabled showed an improvement of ~25%. Disabling metrics in both, the source and destination tasks in the new architecture, results in a ~50% increased throughput. For example:
| pipeline |
metrics |
throughput |
| generator-chaos-structured |
enabled |
40684.56 |
| generator-chaos-structured |
disabled source metrics |
48041 |
| generator-chaos-structured |
disabled source and dest. metrics |
63490.6 |
| postgres-to-log-cdc |
enabled |
84632 |
| postgres-to-log-cdc |
disabled source metrics |
108289 |
| postgres-to-log-cdc |
disabled source and dest. metrics |
121403.43 |
We should find a faster way to obtain a record's size for metrics. Until then, it probably makes sense for size metrics to be disabled.
Feature description
After profiling a few Conduit instances that were running with the new architecture enabled, we discovered that JSON serialization took a lot of time. Tracking metrics about record size took a significant part in that.
The reason for that appears to be that, when obtaining metrics for a record, we serialize the record into a JSON object first (here and here).
Running our benchmarks with the source metrics disabled showed an improvement of ~25%. Disabling metrics in both, the source and destination tasks in the new architecture, results in a ~50% increased throughput. For example:
We should find a faster way to obtain a record's size for metrics. Until then, it probably makes sense for size metrics to be disabled.