🔧Azure Databricks Series: Mounting Azure Data Lake Storage Gen 2 using App Registration and Service Principal🔧
An error occurred while calling o412.ls.
: Operation failed: "This request is not authorized to perform this operation using this permission.", 403, GET, [ Ссылка ], Authori
📈 Key Benefits of Mounting ADSL Gen 2 in Azure Databricks:
Security: Use Azure Active Directory and Service Principal for secure access.
Scalability: Handle large volumes of data and massive parallel processing.
Efficiency: Automate workflows and reduce data access overhead.
Data Integration: Easy integration of structured and unstructured data.
🔧 Prerequisites for this Tutorial 🔧
Before we dive into the mounting process, make sure you have the following prerequisites:
Azure Databricks Workspace 🧑💻
A Databricks workspace where you'll configure your notebooks and clusters.
Azure Active Directory App Registration 🔐
An Azure AD app registration to set up your Service Principal. You will need to create an app and assign it appropriate permissions to access your Azure Data Lake Store Gen 2.
Azure Data Lake Store Gen 2 💾
An existing ADLS Gen 2 storage account containing your data. We will be mounting it to your Databricks workspace for easy access.
Service Principal with Permissions 🛡️
Create and configure a Service Principal (SP) within Azure AD with the necessary role-based access control (RBAC) to your Data Lake Storage.
Databricks Cluster 🖥️
A running Azure Databricks cluster to execute your notebooks and run the mount operation.
Once you've completed these prerequisites, you're all set to start mounting ADLS Gen 2 using the Service Principal.
⚙️ Step-by-Step Guide: Mounting ADLS Gen 2 in Azure Databricks ⚙️
Now, let’s dive into the actual tutorial where we will mount Azure Data Lake Storage Gen 2 to Azure Databricks using Service Principal authentication.
Step 1: Register an Application in Azure Active Directory (AAD) 📜
Navigate to the Azure Portal and go to Azure Active Directory.
Under App registrations, click New registration.
Provide a name for the app (e.g., DatabricksApp), and select the supported account types.
Once registered, take note of the Application (client) ID and Directory (tenant) ID for later use.
Step 2: Create a Service Principal and Assign Permissions 🏅
Go to Azure Active Directory -- App Registrations, and find your newly created app.
Click Certificates & Secrets, then New client secret. Take note of the value, as this will be your Client Secret.
In the Azure Data Lake Store Gen 2, assign the necessary permissions (e.g., Storage Blob Data Contributor or Storage Blob Data Owner) to the Service Principal.
Step 3: Configure Azure Databricks Cluster 🖥️
In your Azure Databricks workspace, navigate to Clusters.
Click Create Cluster and configure the necessary resources for your workload.
Once the cluster is running, move to the next step.
Step 4: Generate the Secret Scope in Databricks 🛠️
Databricks uses Secret Scopes to securely manage sensitive data (like passwords or secrets).
Open Azure Databricks and go to User Settings.
Create a new Secret Scope that allows you to store and access your credentials, such as the Client Secret for the Service Principal.
Use the Databricks CLI or UI to store the Application (client) ID, Client Secret, and Tenant ID in the secret scope.
Step 5: Mounting the ADLS Gen 2 to Databricks 🔄
Here’s the code to mount your Azure Data Lake Storage Gen 2:
configs = {"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id": "ApplicationID", #Application ID found on the information blade of the App within App Registrations in Azure Active Directory
"fs.azure.account.oauth2.client.secret": "KEY", # This is the key value that is only shown once, if you did not write it down create a new key and put the value here
"fs.azure.account.oauth2.client.endpoint": "[ Ссылка ]"} # This is found under Azure Active Directory -- Properties - Directory ID (bottom part of the blade)
# Optionally, you can add --your-directory-name-- to the source URI of your mount point.
dbutils.fs.mount(
source = "abfss://container@StorageAccount.dfs.core.windows.net/",
mount_point = "/mnt/adlsgen2",
extra_configs = configs)
This code snippet securely mounts your Azure Data Lake Store Gen 2 using the Service Principal credentials. It uses OAuth for authentication, leveraging the Client ID, Client Secret, and Tenant ID stored in Databricks Secrets.
Step 6: Verifying the Mount ✅
After mounting, you can verify the mount by listing the files or folders within the container:
dbutils.fs.ls("/mnt/adlsgen2/")
Ещё видео!