Advertisement

How to add Data Lake Gen 2 ACL information to Azure AI search index (AI?

阅读量:

题意 :如何将 Data Lake Gen 2 的 ACL 信息添加到 Azure AI 搜索索引中(AI)?

问题背景:

I have successfully installed the chat application for Azure OpenAI onto your data, which has automatically created and configured the indexes and indexers through the Azure OpenAI Studio wizard, encompassing vector search capabilities. I am using an Azure Storage Data Lake Gen 2 as my document data source. Now, I aim to restrict access to documents for specific Entra user groups assigned to particular blob storage locations via access control lists (ACL). After extensive research, I have reviewed all available documentation related to this feature.

安装 Azure OpenAI 的聊天界面到你的数据集,并借助 Azure OpenAI Studio 指南自动生成索引与索引器(包含向量搜索功能)。我的文档数据源是Azure Storage Data Lake Gen 2。目前计划通过ACL设置专有访问权限给预定义的Entra用户组,并将这些用户组映射到对应的Blob存储桶中。我已经仔细查阅并阅读了相关技术资料以及微软社区文章Access Control in Generative AI applications with Azure AI Search和相关的样本脚本。

What steps can I take to expand the index structures and indexing system created by the AOAI Studio wizard to incorporate ACL information from Data Lake Gen 2? From my perspective, it seems that there is a lack of a straightforward field import from blob to the index group_ids field.

我寻求扩展AOAI Studio指导生成的索引与索引器的能力,并希望包含来自Data Lake Gen2的访问控制列表(ACL)信息。就目前情况来看,似乎缺乏一个直接有效的字段导入机制。具体而言,在数据流程中缺失的是能够将Blob中的数据直接映射到相关索引的group_ids字段。

I would rather not handle the document preprocessing myself if possible, as it has been implemented as a built-in feature by Microsoft.

如果有可能的话,我更愿意选择省却我自己对文档进行预处理的做法,因为微软已经提供了一个现有的实现方案。

问题解决:

There is no straightforward way to directly integrate Azure Data Lake Storage Gen2 (ADLS Gen2) with custom Access Control Lists (ACL). Instead, I will propose an alternative approach that could be beneficial. To obtain the necessary information regarding access control policies, please utilize either the Azure Storage SDKs or REST APIs provided by Microsoft. This will give you detailed insights into the access control policies associated with each blob.

我的看法是目前并不存在一种直接的方法可以将 Azure Data Lake Storage Gen2 (ADLS Gen2) 的 ACL 集成到 Azure AI 搜索中。我可以为你提供一个可行的方法供你参考。首先,请 you 使用 Azure Storage SDK 或 REST API 获取有关 ACL 的详细信息。这样 you 就能全面掌握每个 Blob 所具有的权限信息。

and now, this is indeed a complex task – creating a custom process. We will need to map these ACL details to your Entra user groups, ensuring that Azure AI Search can effectively utilize this information.

当前的重点在于——制定自定义流程。你需要将这些 ACL 详细信息映射至你的 Entra 用户组,并构建相应的数据架构以使 Azure AI 搜索系统能够理解该架构。我认为你已经明白了这一点。

完成后,请您使用Azure AI Search索引。通过Azure AI Search Indexer API或SDK更新现有索引以包含新的ACL注入数据。同时,请确保您已将必要的ACL相关字段添加到您的索引schema中。

完成后,请调用你的 Azure AI 搜索索引。通过 Azure AI 搜索索引器 API 或 SDK 来获取现有索引的更新权限。同时,请确保已正确设置与 ACL 相关的必要字段。

Ill sharwa a sample snipp below in python

我将在下面分享一个 Python 的示例代码片段。

复制代码
 # Fetch ACL, process data, and update Azure AI Search index

    
 from azure.storage.blob import BlobServiceClient
    
 from azure.search.documents import SearchServiceClient
    
 from azure.core.credentials import AzureKeyCredential
    
  
    
 # Fetch ACL and process data (fill in the blanks)
    
 # ...
    
  
    
 # Update Azure AI Search index
    
 search_service_name = "your-search-service-name"
    
 index_name = "your-index-name"
    
 api_key = "your-search-service-api-key"
    
  
    
 search_client = SearchServiceClient(service_endpoint=f"https://{search_service_name}.search.windows.net", credential=AzureKeyCredential(api_key))
    
 index_client = search_client.get_index_client(index_name)
    
  
    
 # Update documents in the index with ACL information (customize this part)
    
 # ...

全部评论 (0)

还没有任何评论哟~