Dataflow partitioning by HenriSchulte-MS · Pull Request #108 · microsoft/bc2adls

HenriSchulte-MS · 2023-03-30T07:57:07Z

Currently, data is not deliberately partitioned in the dataflow. Partitioning based on a unique identifier (systemid + company) can reduce data shuffling between worker nodes and reduce execution time.

* Removing tracked deleted records should be treated as separate from the export process (#79) * First draft * Adding notable change * update app.json * adding tooltip help Co-authored-by: Soumya Dutta <soudutta@microsoft.com> * TryFunction should not make DB calls (#82) * first draft * Further changes * Correcting the telemetry IDs Co-authored-by: Soumya Dutta <soudutta@microsoft.com> * Separate changelog (#85) Added separate changelog to shorten readme * Added ADLS Run API page (#90) * Merge branch 'main' of https://github.com/microsoft/bc2adls * Adjusted version * Improvements to logging (#89) * Improvements to logging * LockTable in Try function --------- Co-authored-by: Soumya Dutta <soudutta@microsoft.com> * Access denied issue on spark notebook (#92) * Added step * minor * Update SharedMetadataTables.md Clarified instructions reg. naming of the managed identity and reason for adding the permissions --------- Co-authored-by: Soumya Dutta <soudutta@microsoft.com> Co-authored-by: Henri Schulte <77101781+HenriSchulte-MS@users.noreply.github.com> * Warn user before makign schema changes if data already exported. (#96) Co-authored-by: Soumya Dutta <soudutta@microsoft.com> * Internal Fields cannot be exported (#98) Co-authored-by: Soumya Dutta <soudutta@microsoft.com> * Only start export for Enabled tables (#97) Co-authored-by: Soumya Dutta <soudutta@microsoft.com> * Skip global trigger event subscriber on missing license or permissionset (#100) * Skip event subscribers when no license or permissions * Increase version --------- Co-authored-by: Ron Koppelaar <Ron.Koppelaar@cegeka-dsa.nl> * Update Execution.md Adding link to Microsoft documentation to consume ADLS Gen 2 resources * Allow telemetry to be logged at all outputs. (#102) Co-authored-by: Soumya Dutta <soudutta@microsoft.com> * Adding the file path to the telemetry * Add the testimonials received (#103) * Add the testimonials received * remove logos --------- Co-authored-by: Soumya Dutta <soudutta@microsoft.com> --------- Co-authored-by: Soumya Dutta <38040179+DuttaSoumya@users.noreply.github.com> Co-authored-by: Soumya Dutta <soudutta@microsoft.com> Co-authored-by: Bert Verbeek <71499421+Bertverbeek4PS@users.noreply.github.com> Co-authored-by: Ron Koppelaar <33791875+RonKoppelaar@users.noreply.github.com> Co-authored-by: Ron Koppelaar <Ron.Koppelaar@cegeka-dsa.nl>

…FlowPartitioning

Arthurvdv · 2023-08-18T06:37:54Z

This sounds promising, may I ask a question about this?

"..If you plan on using non-equality comparisons in your custom expression, you should utilize the 'Fixed' broadcast setting and specify a minimum of 1 stream to be broadcast. If broadcasting, ensure that your Integration Runtime is sized appropriately.."

Is this a warning we should consider? And if so, what would be the best setting for the Broadcast options?

HenriSchulte-MS · 2023-08-22T11:48:55Z

@Arthurvdv The custom expression in the "Remove Deleted" step does not involve any non-quality comparisons, so I have not paid any mind to this warning.

Arthurvdv · 2023-08-22T16:38:18Z

@HenriSchulte-MS, thank you for sharing. I'll update our pipeline ahead of the merge of this PR.

HenriSchulte-MS and others added 3 commits January 11, 2023 09:41

Added partitioning on dataflow to reduce data shuffle

087ecfc

Merge branch 'main' of https://github.com/microsoft/bc2adls into Data…

be27d38

…FlowPartitioning

Bertverbeek4PS mentioned this pull request Aug 29, 2023

Dataflow partitioning Bertverbeek4PS/bc2adls#14

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataflow partitioning#108

Dataflow partitioning#108
HenriSchulte-MS wants to merge 3 commits intomainfrom
DataflowPartitioning

HenriSchulte-MS commented Mar 30, 2023

Uh oh!

Arthurvdv commented Aug 18, 2023

Uh oh!

HenriSchulte-MS commented Aug 22, 2023

Uh oh!

Arthurvdv commented Aug 22, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

HenriSchulte-MS commented Mar 30, 2023

Uh oh!

Arthurvdv commented Aug 18, 2023

Uh oh!

HenriSchulte-MS commented Aug 22, 2023

Uh oh!

Arthurvdv commented Aug 22, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants