Edit

Share via


Create datastores

APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2 (current)

In this article, you learn how to connect to Azure data storage services by using Azure Machine Learning datastores.

Prerequisites

Note

Machine Learning datastores don't create the underlying storage account resources. Instead, they link an existing storage account for Machine Learning use. You don't need Machine Learning datastores. If you have access to the underlying data, you can use storage URIs directly.

Create an Azure Blob datastore

from azure.ai.ml.entities import AzureBlobDatastore
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

ml_client = MLClient.from_config(credential=DefaultAzureCredential())

store = AzureBlobDatastore(
    name="",
    description="",
    account_name="",
    container_name=""
)

ml_client.create_or_update(store)

Create an Azure Data Lake Storage Gen2 datastore

from azure.ai.ml.entities import AzureDataLakeGen2Datastore
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

ml_client = MLClient.from_config(credential=DefaultAzureCredential())

store = AzureDataLakeGen2Datastore(
    name="",
    description="",
    account_name="",
    filesystem=""
)

ml_client.create_or_update(store)

Create an Azure Files datastore

from azure.ai.ml.entities import AzureFileDatastore
from azure.ai.ml.entities import AccountKeyConfiguration
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

ml_client = MLClient.from_config(credential=DefaultAzureCredential())

store = AzureFileDatastore(
    name="file_example",
    description="Datastore pointing to an Azure File Share.",
    account_name="mytestfilestore",
    file_share_name="my-share",
    credentials=AccountKeyConfiguration(
        account_key= "aaaaaaaa-0b0b-1c1c-2d2d-333333333333"
    ),
)

ml_client.create_or_update(store)

Create an Azure Data Lake Storage Gen1 datastore

Important

Azure Data Lake Storage Gen1 retired on February 29, 2024. You can't create new Gen1 accounts, and existing Gen1 resources are no longer accessible. The following content is provided for reference only. For new datastores, use Azure Data Lake Storage Gen2 instead. To learn about migrating existing data, see Migrate Azure Data Lake Storage from Gen1 to Gen2.

from azure.ai.ml.entities import AzureDataLakeGen1Datastore
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

ml_client = MLClient.from_config(credential=DefaultAzureCredential())

store = AzureDataLakeGen1Datastore(
    name="",
    store_name="",
    description="",
)

ml_client.create_or_update(store)

Create a OneLake (Microsoft Fabric) datastore (preview)

This section describes various options to create a OneLake datastore. The OneLake datastore is part of Microsoft Fabric. At this time, Machine Learning supports connection to Microsoft Fabric lakehouse artifacts in the "Files" folder that include folders or files and Amazon S3 shortcuts. For more information about lakehouses, see What is a lakehouse in Microsoft Fabric?.

OneLake datastore creation requires the following information from your Microsoft Fabric instance:

  • Endpoint
  • Workspace GUID
  • Artifact GUID

The following screenshots describe how to retrieve these required information resources from your Microsoft Fabric instance.

Screenshot that shows how to click into artifact properties of Microsoft Fabric workspace artifact in Microsoft Fabric UI.

You can find the "Endpoint", "Workspace GUID", and "Artifact GUID" in the "URL" and "ABFS path" from the "Properties" page:

  • URL format: https://{your_one_lake_endpoint}/{your_one_lake_workspace_guid}/{your_one_lake_artifact_guid}/Files
  • ABFS path format: abfss://{your_one_lake_workspace_guid}@{your_one_lake_endpoint}/{your_one_lake_artifact_guid}/Files

Screenshot that shows URL and ABFS path of a OneLake artifact in Microsoft Fabric UI.

Create a OneLake datastore

from azure.ai.ml.entities import OneLakeDatastore, OneLakeArtifact
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

ml_client = MLClient.from_config(credential=DefaultAzureCredential())

store = OneLakeDatastore(
    name="onelake_example_id",
    description="Datastore pointing to a Microsoft fabric artifact.",
    one_lake_workspace_name="bbbbbbbb-7777-8888-9999-cccccccccccc", #{your_one_lake_workspace_guid}
    endpoint="msit-onelake.dfs.fabric.microsoft.com", #{your_one_lake_endpoint}
    artifact=OneLakeArtifact(
        name="cccccccc-8888-9999-0000-dddddddddddd/Files", #{your_one_lake_artifact_guid}/Files
        type="lake_house"
    )
)

ml_client.create_or_update(store)

Next steps