What is SCOPE in Azure Databricks and how to use it?

2021-09-15

Managing secrets in Azure databricks begins with creating a secret scope. A secret scope is a collection of secrets identified by a name. A workspace is limited to a maximum of 100 secret scopes. There are two types of secret scope: Azure Key Vault-backed and Databricks-backed.

To reference secrets stored in an Azure Key Vault, you can create a secret scope in databricks backed by Azure Key Vault. To manage secrets in Azure Key Vault, you must use the Azure SetSecret REST API, PowerShell or Azure portal UI.

A Databricks-backed secret scope is stored in (backed by) an encrypted database owned and managed by Azure Databricks.

Then the next step is:

How to create scope within databricks and how to store secrets within it ?

Please refer below code snippet/Databricks CLI command where a ketVault backed scope with name AdlsScope getting created and then secrets stored within the scope using put command.

Note: all texts prefix with $ are all variables/parameters, you can pass those to the PowerShell or DB CLI scripts. The values can be hardcoded or can be defined in Devops library and assigned values later those can be used when you will call PowerShell scripts from yaml pipeline.

# if the scope is already exist that would be deleted first
databricks secrets delete-scope –scope AdbScope
#Creating a scope called AdbScope
# The new scope i.e. AdbScope is getting created
databricks secrets create-scope –scope AdbScope –scope-backend-type DATABRICKS –resource-id /subscriptions/$subscriptionId/resourceGroups/$resourceGroupName/providers/Microsoft.KeyVault/vaults/$keyVault
Name –dns-name https://$keyVaultName.vault.azure.net/ –initial-manage-principal users
# in this section secrets getting stored within defined scope using PUT command
databricks secrets put –scope AdbScope –key AdclientId –string-value $activeDirectoryClientId
databricks secrets put –scope AdbScope –key AdclientSecret –string-value $activeDirectoryClientSecret
databricks secrets put –scope AdbScope –key endpoint –string-value “https://login.microsoftonline.com/$ActiveDirectoryTenantId/oauth2/token”
databricks secrets put –scope AdbScope –key storageAccountName –string-value $adlsGen2StorageAccountName

2. How to use the stored secrets in scope or retrieve those secrets?

In the below Scala code snippet, its retrieving stored secrets from scope using dbutils.secret.get command, its checking if specific mountpoint not exist then it will create the mountpoint which would be accessible from databricks workspace.

Here a container/folder xyz exist within an adls Gen2 storage account and that would be accessible from Azure databricks by creating a mount point mnt/abc/xyz through Azure AD application authentication.

#Scala Snippet
val configs = Map(
“fs.azure.account.auth.type” -> “OAuth”,
“fs.azure.account.oauth.provider.type” -> “org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider”,
“fs.azure.account.oauth2.client.id” -> dbutils.secrets.get(“AdbScope “, ” AdclientId “),
“fs.azure.account.oauth2.client.secret” -> dbutils.secrets.get(“AdbScope “, ” AdclientSecret “)
“fs.azure.account.oauth2.client.endpoint” -> dbutils.secrets.get(“AdbScope “, “endpoint”)
)
val adlsStorageAccountName = dbutils.secrets.get(“AdbScope “, ” storageAccountName “)
val existingMounts = dbutils.fs.mounts()
if(existingMounts.exists(mount => mount.mountPoint == “/mnt/abc/xyz”)) {
println(“xyz filesystem is already mounted.”)
}
else {
dbutils.fs.mount(
source = s”abfss://xyz@$ adlsStorageAccountName.dfs.core.windows.net/”,
mountPoint = “/mnt/abc/xyz”,
extraConfigs = configs)
}

I hope you have become wiser by the article and that you have been helped by the scripts. Do not hesitate to contact me if you have any further questions!

About the Author

Sanjeev Nayak

sanjeev.nayak@capgemini.com

My name is Sanjeev Nayak and I work as a BI & Azure consultant at Capgemini in Stockholm. I have over 15+ years of experience in Microsoft BI and Azure. I have worked on various technologies throughout my career which includes MSBI stacks (SSIS, SSAS, SQL Server etc.), Azure, Azure SQL Datawarehouse, Azure Data Factory (ADF), Databricks, Scala, Python, CI/CD and yaml script etc.