Skip to content

Add Azure credential providers#343

Merged
kylebarron merged 14 commits intodevelopmentseed:mainfrom
daviewales:azure-credential
Mar 17, 2025
Merged

Add Azure credential providers#343
kylebarron merged 14 commits intodevelopmentseed:mainfrom
daviewales:azure-credential

Conversation

@daviewales
Copy link
Copy Markdown
Contributor

@daviewales daviewales commented Mar 12, 2025

What I am changing

  • Add Azure credential providers

How I did it

  • Implement sync provider with azure.identity and async provider with azure.identity.aio
  • Note: I used credential rather than credentials for the __init__ argument, as this matches other Azure libraries. Let me know if I should change this to credentials to match the existing Google credential provider.
  • I added simple token caching based on the token expiry time. If the token does not exist or expires in less than 5 minutes, fetch a new token, otherwise reuse the cached one.

How you can test it

  • Follow the examples in the docstrings.

Related Issues

WIP

This PR is WIP because although the credential_provider classes appear to work when testing them directly, I get the following error I try to list items after initialising a store with them:

import obstore
from obstore.auth.azure import AzureAuthCredentialProvider
credential_provider = AzureAuthCredentialProvider()
store = obstore.from_url("abfs://test@example.dfs.core.windows.net", credential_provider=credential_provider)
obstore.list(store).collect()

# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
# obstore.exceptions.GenericError: Generic External Azure credential provider error: RuntimeError: no running event loop
# 
# Debug source:
# Generic {
#     store: "External Azure credential provider",
#     source: PyErr {
#        type: <class 'RuntimeError'>,
#         value: RuntimeError('no running event loop'),
#         traceback: None,
#     },
# }

This sounds like an async error, but I'm not sure why, because I'm not using the async credential provider here.
Also, even if I do use the async credential provider, I get the same error.
This error appears to be raised from: pyo3-object_store\src\azure\credentials.rs:227
Is there a trick to enable tracebacks with line numbers, etc?
Can you shed some light on this?

Comment thread obstore/python/obstore/auth/azure.py
Comment thread obstore/python/obstore/auth/azure.py Outdated
Comment thread obstore/python/obstore/auth/azure.py Outdated
Comment thread obstore/python/obstore/auth/azure.py Outdated
@kylebarron
Copy link
Copy Markdown
Member

Thanks for the PR!

  • I added simple token caching based on the token expiry time. If the token does not exist or expires in less than 5 minutes, fetch a new token, otherwise reuse the cached one.

You don't need to do this! This is automatically done on the Rust side.

/// A temporary authentication token with an associated expiry
#[derive(Debug, Clone)]
pub(crate) struct TemporaryToken<T> {
/// The temporary credential
pub token: T,
/// The instant at which this credential is no longer valid
/// None means the credential does not expire
pub expiry: Option<DateTime<Utc>>,
}
/// Provides [`TokenCache::get_or_insert_with`] which can be used to cache a
/// [`TemporaryToken`] based on its expiry
#[derive(Debug)]
pub(crate) struct TokenCache<T> {
/// A temporary token and the instant at which it was fetched
cache: Mutex<Option<(TemporaryToken<T>, DateTime<Utc>)>>,
min_ttl: TimeDelta,
/// How long to wait before re-attempting a token fetch after receiving one that
/// is still within the min-ttl
fetch_backoff: TimeDelta,
}

  • Note: I used credential rather than credentials for the __init__ argument, as this matches other Azure libraries. Let me know if I should change this to credentials to match the existing Google credential provider.

I think whatever is most similar to the native library is best.

This PR is WIP because although the credential_provider classes appear to work when testing them directly, I get the following error I try to list items after initialising a store with them:

Hmm thanks, I'm not sure at a glance what to make of this.

I was wondering if maybe my manual testing had worked because I was using ipython instead of python, but the Boto3CredentialProvider worked for me in the normal Python shell as well.

That does look like an async error, but I'm not exactly sure why it's happening.

I think the most likely error is that it's failing to convert the result here:

if let Ok(credentials) = ob.extract() {
Ok(Self::Sync(credentials))
} else {

so it's probably assuming that your result is an awaitable, and then awaiting on it fails.

We should probably figure out a way to test that data conversion directly

@kylebarron
Copy link
Copy Markdown
Member

kylebarron commented Mar 12, 2025

I think the most likely error is that it's failing to convert the result here:

Indeed, this happens when there's no event loop running and it tries to await the result of a non-awaitable:

>>> def credential_provider():
...     return "test"
...
>>> store = S3Store(
...     "ds-wheels",
...     credential_provider=credential_provider,
...     region="us-east-1",
... )
>>> obs.list(store).collect()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
obstore.exceptions.GenericError: Generic External AWS credential provider error: RuntimeError: no running event loop

Debug source:
Generic {
    store: "External AWS credential provider",
    source: PyErr {
        type: <class 'RuntimeError'>,
        value: RuntimeError('no running event loop'),
        traceback: None,
    },
}

When I run that code in IPython, the terminal hangs forever, because it's trying to await the result of a non-awaitable. Let me add better error handling, and then we should be able to see where the Python -> Rust conversion of those Azure credentials is going wrong.

@kylebarron
Copy link
Copy Markdown
Member

Can you try on latest main, with #345? That should at least give an error for what's not working instead of trying to await your value. Perhaps the datetime conversion isn't going through or something

@daviewales
Copy link
Copy Markdown
Contributor Author

On latest main, the sync credential provider now works!

import obstore
from obstore.auth.azure import AzureCredentialProvider
credential_provider = AzureCredentialProvider()
store = obstore.store.from_url("abfs://test@example.dfs.core.windows.net", credential_provider=credential_provider)
obstore.list(store).collect()

# [{'path': 'example/directory/example.parquet', ...

The async credential provider fails with this error message:

import obstore
from obstore.auth.azure import AzureAsyncCredentialProvider
credential_provider = AzureAsyncCredentialProvider()
store = obstore.store.from_url("abfs://test@example.dfs.core.windows.net", credential_provider=credential_provider)
obstore.list(store).collect()

# <ipython-input-9-2f208d6687d5>:1: RuntimeWarning: coroutine 'AzureAsyncCredentialProvider.__call__' was never awaited
#   obstore.list(store).collect()
# RuntimeWarning: Enable tracemalloc to get the object allocation traceback
# ---------------------------------------------------------------------------
# UnauthenticatedError                      Traceback (most recent call last)
# Cell In[9], line 1
# ----> 1 obstore.list(store).collect()
# 
# UnauthenticatedError: The operation lacked valid authentication credentials for path External Azure credential provider: RuntimeError: no running event loop
# 
# Debug source:
# Unauthenticated {
#     path: "External Azure credential provider",
#     source: PyErr {
#         type: <class 'RuntimeError'>,
#         value: RuntimeError('no running event loop'),
#         traceback: None,
#     },
# }

I think that RuntimeWarning: coroutine 'AzureAsyncCredentialProvider.__call__' was never awaited is probably the issue.
I think that UnauthenticatedError: The operation lacked valid authentication credentials for path External Azure credential provider is a consequence of this, not a cause, because await credential_provider() happily returns a token.

@kylebarron
Copy link
Copy Markdown
Member

kylebarron commented Mar 13, 2025

The async credential provider fails with this error message:

When using async credential providers, I think you need to use async methods.

Can you try with

asyncio.run(obstore.list(store).collect_async())

We should document that better.

Edit: made #356

Comment thread docs/authentication.md
Comment thread obstore/python/obstore/auth/azure.py Outdated
@kylebarron
Copy link
Copy Markdown
Member

I'd like to release obstore 0.5 on Monday. If you can just resolve the last two comments, it would be awesome to get this in.

@daviewales
Copy link
Copy Markdown
Contributor Author

Is first thing on Monday morning (AEST) OK?

@kylebarron
Copy link
Copy Markdown
Member

Is first thing on Monday morning (AEST) OK?

yeah! That'll be 15 hours before I look at it 😄

@daviewales daviewales marked this pull request as ready for review March 17, 2025 03:24
Copy link
Copy Markdown
Member

@kylebarron kylebarron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Thank you!

Comment thread docs/authentication.md Outdated
Comment thread docs/authentication.md Outdated
Comment thread docs/authentication.md Outdated
Comment thread docs/authentication.md Outdated
@kylebarron
Copy link
Copy Markdown
Member

I moved the types inline so that they'd show up directly as hyperlinks in the docs:
image

@kylebarron kylebarron merged commit 8ae4bbf into developmentseed:main Mar 17, 2025
4 checks passed
@kylebarron kylebarron mentioned this pull request Mar 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants