scitex_scholar.auth

Authentication module for Scholar.

class scitex_scholar.auth.ScholarAuthManager(email_openathens=None, email_ezproxy=None, email_shibboleth=None, config=None)[source]

Bases: object

Manages multiple authentication providers.

This class coordinates between different authentication methods (OpenAthens, Lean Library, etc.) and provides a unified interface.

__init__(email_openathens=None, email_ezproxy=None, email_shibboleth=None, config=None)[source]

Initialize the authentication manager.

Parameters:
  • email_openathens (Optional[str]) – User’s institutional email for OpenAthens authentication

  • email_ezproxy (Optional[str]) – User’s institutional email for EZProxy authentication

  • email_shibboleth (Optional[str]) – User’s institutional email for Shibboleth authentication

  • config (Optional[ScholarConfig]) – ScholarConfig instance (creates new if None)

async ensure_authenticate_async(provider_name=None, verify_live=True, **kwargs)[source]
Return type:

bool

async is_authenticate_async(verify_live=True)[source]

Check if authenticate_async with any provider.

Return type:

bool

async authenticate_async(provider_name=None, **kwargs)[source]

Authenticate with specified or active provider.

Return type:

dict

async get_auth_headers_async()[source]

Get authentication headers from active provider.

Return type:

Dict[str, str]

async get_auth_options()[source]
Return type:

dict

async get_auth_cookies_async(essential_only=True)[source]

Get authentication cookies from active provider.

Return type:

List[Dict[str, Any]]

_register_provider(name, provider)[source]

Register an authentication provider with email context.

Return type:

None

set_active_provider(name)[source]

Set the active authentication provider.

Return type:

None

get_active_provider()[source]

Get the currently active provider.

Return type:

Optional[BaseAuthenticator]

async logout_async()[source]

Log out from all providers.

Return type:

None

list_providers()[source]

List all registered providers.

Return type:

List[str]

class scitex_scholar.auth.AuthenticationGateway(auth_manager, browser_manager, config=None)[source]

Bases: object

Transparent authentication layer for Scholar operations.

Responsibilities: - Determine if URL requires authentication (config-based, no hardcoding) - Prepare authenticated browser context - Visit authentication gateways (OpenURL) to establish publisher sessions - Cache authentication state for performance

This gateway sits between Scholar and URL/Download operations, preparing authentication transparently before content access.

property name
__init__(auth_manager, browser_manager, config=None)[source]

Initialize authentication gateway.

Parameters:
  • auth_manager – ScholarAuthManager instance

  • browser_manager – ScholarBrowserManager instance

  • config (ScholarConfig) – ScholarConfig instance

async prepare_context_async(doi, context, title=None)[source]

Prepare URL context with authentication if needed.

This is the main entry point - called BEFORE URL finding.

Flow: 1. Build OpenURL (authentication gateway) 2. Check if DOI needs authentication (based on known publishers) 3. If auth needed: Visit OpenURL to establish publisher cookies 4. Resolve to final publisher URL 5. Return prepared context with authenticated session

Parameters:
  • doi (str) – Paper DOI

  • context (BrowserContext) – Browser context (will be updated with auth cookies)

  • title (Optional[str]) – Optional paper title

Return type:

URLContext

Returns:

URLContext with authentication prepared and ready

async _resolve_publisher_url_async(url_context, context)[source]

Resolve DOI to publisher landing page URL.

Uses OpenURLResolver which already exists and works. The OpenURL is the authentication gateway for paywalled content.

Parameters:
  • url_context (URLContext) – URLContext with DOI

  • context (BrowserContext) – Browser context

Return type:

URLContext

Returns:

URLContext with url and auth_gateway_url populated

_check_auth_requirements_from_doi(url_context)[source]

Determine if DOI requires authentication based on DOI prefix patterns.

This allows early detection before resolving URL. IEEE DOIs start with 10.1109, Springer with 10.1007, etc.

Parameters:

url_context (URLContext) – URLContext with doi populated

Return type:

URLContext

Returns:

URLContext with requires_auth and auth_provider populated

_check_auth_requirements(url_context)[source]

Determine if URL requires authentication based on config.

This is config-based (no hardcoded domain lists). Checks URL against paywalled_publishers in config.

Parameters:

url_context (URLContext) – URLContext with url populated

Return type:

URLContext

Returns:

URLContext with requires_auth and auth_provider populated

async _establish_authentication_async(url_context, context)[source]

Establish authentication by visiting gateway URL and clicking through to publisher.

This is the KEY OPERATION that solves the IEEE issue: 1. Visit OpenURL (library resolver) 2. Find publisher link on resolver page 3. Click link → redirects through OpenAthens → lands at publisher 4. Publisher session cookies established in browser context

Without this step: - OpenAthens cookies exist at openathens.net - NO cookies exist at ieee.org - Chrome PDF viewer opens but download fails

With this step: - Visit OpenURL - Click IEEE link → redirect through OpenAthens - Land at ieee.org → IEEE session cookies established - Now ieee.org has cookies, Chrome PDF viewer works

Parameters:
  • url_context (URLContext) – URLContext with auth_gateway_url and doi

  • context (BrowserContext) – Browser context (will receive publisher cookies)

Return type:

Optional[str]

Returns:

Publisher URL if successful, None otherwise

class scitex_scholar.auth.URLContext(doi, title=None, url=None, pdf_urls=<factory>, requires_auth=None, auth_provider=None, auth_gateway_url=None)[source]

Bases: object

Context for URL operations with authentication information.

This dataclass carries all information needed for URL resolution and PDF download, including authentication state.

doi: str
title: str | None = None
url: str | None = None
pdf_urls: List[str]
requires_auth: bool | None = None
auth_provider: str | None = None
auth_gateway_url: str | None = None
__init__(doi, title=None, url=None, pdf_urls=<factory>, requires_auth=None, auth_provider=None, auth_gateway_url=None)