Skip to content

jomarvel/Fabric-Multitenant-demo-using-ADF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

Fabric OneLake Multi-tenant demo using ADF

This demo is provided as-is and is not supported in any way by me or Microsoft. Feel free to provide feedback but I can not guarantee that it will be addressed.

The demo is designed to highlight one way in one scenario to build multi-tenant, reusable pipelines using Microsoft Fabric OneLake and Azure Data Factory. This demo will showcase using ADF pipelines to bring in data from 3 different SQL DBs and copy data to OneLake. When Fabric pipelines are able to paramterize connections, I'll create another repo using only Fabric. But until that feature release, we will utilize ADF.

Prerequisites

  • An Active Azure Subscription
  • A Microsoft Fabric Capacity. If you or your organization do not have a fabric capacity, you can take advantage of the trial capacity.

Demo Guide

Deploy Azure resources

Please use the Deploy to Azure button to deploy the Azure resources needed for this demo. All resources should be deployed to the same Azure region. Since this is a demo only, I would also recommend deploying to the same resource group.

Deploy to Azure

Paramaters for template that need to be added:

  • Unique lowercase data factory name
  • Unique lowercase storage account name
  • Unique lowercase unique SQL server Name
  • Admin login
  • Admin password

-For this demo we deploy three Azure SQL Databases with sample data. These are all hosted from the same Logical SQL Server. I use the S2 tier during the demo but scale down to S1 when I am not actively using it. I also enable SQL Authentication. The pipeline will use SQL Authentication so for ease of use I would use the same admin account for all of your databases. The best practice security wise would be to use a Managed Identity in Data Factory and grant access to SQL databases. Also make sure that your databases are using a public endpoint for this demo.

-We will also deploy an Azure Data Factory and an Azure Storage account

Note: Once deployed, you should see 1 SQL server, 3 SQL DBs, 1 storage account, and 1 data factory instance in your resource group.

image

Deploy the ARM pipeline ARM template

Note: This step is required. This step is not covered in the previous arm template deployment. The previous section is for deploying Azure resources while this step deploys your Data Factory Pipeline.

  1. Open Azure Data Factory in the Azure Portal and click on Open Azure Data Factory Studio.

  2. Select the Manage icon on the left, choose ARM template, and select Import ARM template. This should launch the Custom deployment page in the Azure portal.

  3. Select Build your own template in the editor and leave it open for now.

  4. Open the arm_templates\adf_pipeline_arm_template.json file in this repository. Select all of the text and then paste it into the Edit template page and click Save.

  5. Now choose the resource group and region that you are deploying into, update the Factory Name to reflect your data factory name. There's one connection string you will need to populate. Below is an example examples of what these should look like (without the quotes). The highlighted value will need to be changed to your admin login you choose at the beginning of this tutorial. The rest of the template can be ignored.

    Tenant Databases_connectionString = integrated security=False;encrypt=True;connection timeout=30;data source=@{linkedService().ServerName};initial catalog=@{linkedService().DatabaseName};user id=dblogin

    Note: The tenant database connection string is parameterized for the data source and initial catalog values. The pipeline will automatically fill in these values at run time.

  6. Select Review + create, then choose the Create button. Give the template a few minutes to deploy. Close and reopen your Azure Data Factory Studio and verify that you ARM template has successfully deployed by navigating to the Author page. Here you should now have two pipelines and three datasets.

Configure Fabric Workspace

  1. Navigate to https://app.fabric.microsoft.com. This will bring up the homescreen of Fabric. We're first going to create a new workspace, so click on Workspaces on the leftside blade and then click on + New Workspace.

image

  1. Go ahead and give the workspace a Name, something like Multitenant Demo if it's available. Everything else can be skipped over, click Apply to create the workspace.

Create Data Warehouse

  1. You should now be inside of your newly created workspace. Go ahead and click on the image that's most likely powerBI in the bottom left corner. You should see a list of all the Fabric capabilities. Click Data Warehouse.

    image

  2. Now click on the Warehouse button at the top to create a new Warehouse.

    image

  3. Provide your Warehouse a name and click Apply. Your new Warehouse should automatically open up, give it a minute for the warehouse to load.

Configure your metadata tables

This solution leverage the use of metadata tables in the OneLake Warehouse to store the server names, database names and tenant ids for the source databases. It also stores a list of table names that we would like to copy in our pipeline.

  1. In your newly created Warehouse, Click New SQL Query at the top of the screen.

  2. Open the SQL Queries\meta-driven-pipeline.sql file found in this repo in a text editor. You WILL need to update the insert statements for the TenantMetadata table with the correct ServerNames for your environment. The parts you must changed are highlighted below.

    INSERT INTO TenantMetaData VALUES (1, 'tenant1', 'db-host.database.windows.net', 'db-tenant1');

    INSERT INTO TenantMetaData VALUES (2, 'tenant2', 'db-host.database.windows.net', 'db-tenant2');

    INSERT INTO TenantMetaData VALUES (3, 'tenant3', 'db-host.database.windows.net', 'db-tenant3');

  3. The rest of the script may remain as-is. Paste the SQL query into Query editor in your warehouse and click Run.

  4. You can now verify that your new metadata tables have been created in your warehouse. Click under Schemas -> dbo -> Tables. You may need to click under more options for your tables and click refresh.

image

Configure App Registration for service principal authentication

  1. Return to the Azure portal and search for "App Registrations" in the top search bar. Then click + New registration at the top left.

  2. Provide a Name for your app registration. All other default values can be left.

  3. In your newly created App registration, click Certificates & Secrets and then click + New client secret. Finish adding the client secret by clicking Add

  4. Open a textFile editor, copy and paste the following values for future configurations.

     -Application(client) ID: xxxx
    
     -Directory(tenant) ID: xxxx
    
     -client secret value: xxxx
    
  5. Return to your Fabric workspace and click Manage Access, then click + Add people or groups. Search for the name of the app registration you just created, and provide it with contributor access.

    image

    image

    image

Configure your Pipelines and linked services

  1. To run the pipeline you will need to supply passwords/configurations for the linked services. Linked services hold the connection information for your sources and destinations. From the Azure Data Factory Studio, navigate to Manage->Linked Services->TenantDatabases. Update the User name and Password to match your source databases.

    Note: At this point the Test connection will not work for this linked service without you manually updating the values for the Fully qualified domain name @{linkedService().ServerName} and the DatabaseName @{linkedService().DatabaseName} parameters. If you scroll down on the__Edit linked service__ blade, you should see a parameters section with the parameterized values we are using. So if you want to test the connect, there will be a popout allowing you to provide those parameters manually (you may leave tenant-id set to the default for connection testing. When we run the pipeline, these parameterized values will be automatically populated from the tables we created earlier.

  2. Select Apply to accept the changes.

  3. Open the Warehouse1 linked service. Here you will update the configuration for your Fabric Warehouse. Change the Warehouse selection method to From Selection. Verify you are in the correct tenant -> Choose the workspace name that you created for this demo -> Choose the Warehouse name you created.

  4. Next, in the authentication reference method leave it at Inline. Next we will provide the values from our app registration.

    -Tenant = Insert the directory(tenant) ID value
    
    -Service principal ID = Insert the Application (client) ID
    
    -Service principal credential type =  Leave it at __Service principal key__
    
    -Service principal key = enter the client secret value
    
  5. Confirm everything was entered correctly by testing the connection using the button at the bottom right. Upon success, click Apply to accept the changes.

  6. Lastly, we will configure our Staging Blob linked service. Click on the StagingBlob linked service. Here you can simply change the Account selection method to From Azure subsription. Choose the subscription you have your resource group in, and then choose the storage account name that was created from ARM template. Go ahead and test the connection again. Upon completion, apply the changes.

Understanding the pipelines, activities, datasets and linked services

This solution contains two pipelines and three datasets. Each will be described in detail below.

The first pipeline is the TenantPipeline. This is the master pipeline. When running the demo, it is only necessary to trigger this pipeline, all other components of the demo are automated.

The TenantPipeline consists of three activities described below.

  1. The TenantLookup lookup activity. This activity is responsible for pulling the list of tenants by running the SELECT * FROM TenantMetadata ORDER BY TenantPriority query in the warehouse database. There is no parameterization in this activity but the output which will be passed on to the next activity is JSON formatted with the results of the query that will look something like this:

    {
    "count": 3,
    "value": [
        {
            "TenantPriority": 1,
            "TenantID": "tenant1",
            "ServerName": "db-host.database.windows.net",
            "DatabaseName": "db-tenant1"
        },
        {
            "TenantPriority": 2,
            "TenantID": "tenant2",
            "ServerName": "db-host.database.windows.net",
            "DatabaseName": "db-tenant2"
        },
        {
            "TenantPriority": 3,
            "TenantID": "tenant3",
            "ServerName": "db-host.database.windows.net",
            "DatabaseName": "db-tenant3"
        }
    ]
    }

    Note: In many production workloads, these metadata tables would be in a dedicated config database, especially if you have lots of rows, or lots of pipelines leveraging the tables, or if the source and or destination databases are in another region.

  2. The next activity is the ForEachTenant ForEach activity. If you select this activity, then choose the Settings tab, you will see that under Items we have added dynamic content representing the output of the previous lookup activity. The value is @activity('TenantLookup').output.value. If you delete this item, click in the empty field and choose the Add dynamic content link you will get a popup showing you all of the accessible options for dynamic content. Choose TenantLookup value array, which will give you the entire array of results from your lookup. The ForEach activity will then iterate through each element of the array. Note that ForEach loops are not recursive so nested arrays will not processed by the loop.

    On the Settings tab of your ForEachTenant activity, notice there is a Sequential checkbox. Enabling this allows your loop to process only one iteration at a time. In this condition, the loop will wait for the iteration to complete until the next one begins. The default behavior is is for the iterations to all run as soon as possible. Since we would like to process all tenants in parallel, we have left this option disabled.

  3. Within your ForEachTenant activity you have the ExecCopyDatabasePipeline activity. This is an an Execute Pipeline activity. We are using it to execute the DatabaseCopyPipeline for each tenant. If you open the ExecCopyDatabasePipeline activity and click on the Settings tab, you will see the name of the invoked pipeline. You will also notice that we have defined three parameters here; ServerName, DatabaseName and TenantID. These parameters will be passed into the invoked pipeline. If you delete one of these items and then click the Add dynamic content link, you will be taken to the add dynamic content popup. Choose ForEachTenant under the ForEach iterator section. This will show @item() in the editing box. But the item that we are iterating on is actually an array with one element for each column. So to properly assign the value from the array to the parameter type .column-name replacing column-name with the name of the parameter you deleted, your line should look something like this @item().ServerName. Click OK, then publish any changes.

  4. Open the DatabaseCopyPipeline under the Factory Resources menu. This pipeline is responsible for collecting the list of tables that we want to copy from each source and then copying each of those tables to the warehouse staging tables. Notice that on the Parameters tab, we have the same parameters that we had in our Execute Pipeline activity. These parameters will be populated by the calling pipeline.

  5. The LookupTables lookup activity. This activity is responsible for pulling the list of tables we would like to copy from the source databases by running the SELECT * FROM SchemaMetadata WHERE CopyFlag = 1 ORDER BY CopyPriority query in the warehouse. The ORDER BY on the CopyPriority column allows us to copy tables in a certain order if necessary by changing the values in the table. The SchemaName and TableName columns should be self-explanitory. The CopyFlag is not used in the demo but it could be used to allow the exclusion of certain tables from the copy process by filtering on this column. There is no parameterization in this activity but the output which will be passed on to the next activity is JSON formatted with the results of the query that will look something like this:

    {
    "count": 10,
    "value": [
        {
            "CopyPriority": 1,
            "SchemaName": "SalesLT",
            "TableName": "Address",
            "CopyFlag": true
        },
        {
            "CopyPriority": 2,
            "SchemaName": "SalesLT",
            "TableName": "Customer",
            "CopyFlag": true
        },
        {
            "CopyPriority": 3,
            "SchemaName": "SalesLT",
            "TableName": "CustomerAddress",
            "CopyFlag": true
        },
        {
            "CopyPriority": 4,
            "SchemaName": "SalesLT",
            "TableName": "ProductCategory",
            "CopyFlag": true
        },
        {
            "CopyPriority": 5,
            "SchemaName": "SalesLT",
            "TableName": "ProductModel",
            "CopyFlag": true
        },
        {
            "CopyPriority": 6,
            "SchemaName": "SalesLT",
            "TableName": "ProductDescription",
            "CopyFlag": true
        },
        {
            "CopyPriority": 7,
            "SchemaName": "SalesLT",
            "TableName": "ProductModelProductDescription",
            "CopyFlag": true
        },
        {
            "CopyPriority": 8,
            "SchemaName": "SalesLT",
            "TableName": "Product",
            "CopyFlag": true
        },
        {
            "CopyPriority": 9,
            "SchemaName": "SalesLT",
            "TableName": "SalesOrderHeader",
            "CopyFlag": true
        },
        {
            "CopyPriority": 10,
            "SchemaName": "SalesLT",
            "TableName": "SalesOrderDetail",
            "CopyFlag": true
        }
    ]
    }
  6. The next activity is the ForEachTable ForEach activity. If you select this activity, then choose the Settings tab, you will see that under Items we have added dynamic content representing the output of the previous lookup activity. The value is @activity('LookupTables').output.value. Notice that on this ForEach loop we have chosen to make the loop sequential so that we only copy one table at a time. This is not strictly necessary as we have no referential integrity being enforced on the destination tables but in cases where you do this can be used to load tables in the correct order.

  7. Finally we have the CopyTable activity. This copy data activity is what actually moves data from the source to the destination.

    Open the CopyTable activity and select the Source tab. Here we define the source dataset TenantData that we will copy data from. TenantData is a parameterized dataset that in turn uses the TenantDatabases parameterized Linked Service. You can see the list of dataset properties that we are using here. The SchemaName and TableName properties are being populated by values from our lookup and DatabaseName, ServerName and TenantID are pipeline parameters that were passed in via the execute pipeline activity in the TenantPipeline.

    If you scroll to the bottom of the Source tab you will notice a section called Additional columns. This adds a new column to every table and populates it with the TenantID pipeline parameter. This allows us to differentiate similar rows in the warehouse that belong to different tenants. For example, if two tenants have an order with the same orderID, you would need a way to differentiate one from the other.

    Now move to the Sink tab. Here we are copying data into the StagingData dataset. It takes one parameter, StagingTable, which is populated with the TableName value passed in during the lookup. If you open the dataset you will see that we are using the warehouse1 linked service. For the table, we have hard coded the schema to staging and are using the dataset property StagingTable for the table name.

    If you move to the Settings tab, you will see that we have enabled Staging. This is why we created the storage account and the storage accounts linked service, the data will be temporaily stored in blob storage before it's copied to oneLake Warehouse.

Running the pipeline

  1. If you have any unpublished changes, publish them now.

  2. Navigate to the TenantPipeline. Remember this is our master pipeline and as such we kick off our copy process from here. This pipeline takes no input from the user, it will collect all the information it needs to complete the copy from our metadata tables.

  3. Click the Add trigger button and choose Trigger now from the dropdown and click OK on the popup.

Monitor the pipelines

You can monitor the progress of the pipelines by selecting the Monitor tab on the left menu. If your pipeline is not making progress, click the refresh button near the top of the page. You should see your TenantPipeline run once, and the DatabaseCopyPipeline ran three times.

Clicking into the CopyTable jobs we can see the journey our data went on.

image

Your DatabaseCopyPipeline's should have all failed. If you click into the runs, you'll see that only one of the CopyTable operations failed causing the entire ForEachTable opeartion to fail.

image

The reason this failed is because the AdventureWorks sampled dataset we configured in our Databases has a table column of type varbinary(MAX). This data type is not currently available in fabric. As you are building in Fabric it's important to review the limiations and differences, this will help in your decision guide.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors