Skip to content

issue/Insufficient and Inconsistent API Key Validation Across Providers #268

@MdRahmatUllah

Description

@MdRahmatUllah

Issue Description

The API key validation in the Python bindings (python/src/validation.rs) uses simplistic length-based checks that are insufficient for detecting invalid or malformed API keys. Different providers have different key formats and requirements, but the current validation only checks minimum length without validating format, structure, or provider-specific patterns.

Affected Components

  • Python Layer: python/src/validation.rs - Validation logic
  • LLM Configuration: python/src/llm/config.rs - Configuration creation
  • Integration: All LLM provider configurations

Related Files

  • python/src/validation.rs (lines 6-29) - Validation function
  • python/src/llm/config.rs (lines 18-260) - All provider configuration methods that call validate_api_key
  • core/src/llm/openai.rs (lines 23-43) - OpenAI provider implementation (no validation)
  • core/src/llm/anthropic.rs (lines 22-32) - Anthropic provider implementation (no validation)
  • core/src/llm/azure_openai.rs (lines 26-53) - Azure OpenAI provider implementation (no validation)
  • tests/python_integration_tests/tests_validation.py (lines 14-100) - Tests expecting format validation

Code Snippets

Current Validation (Length-Only, No Format Checking)

// From python/src/validation.rs (lines 6-29)
pub(crate) fn validate_api_key(api_key: &str, provider: &str) -> PyResult<()> {
    if api_key.is_empty() {
        return Err(PyErr::new::<pyo3::exceptions::PyValueError, _>(format!(
            "{} API key cannot be empty",
            provider
        )));
    }

    let min_length = match provider.to_lowercase().as_str() {
        "openai" => 20,
        "anthropic" => 15,
        "huggingface" => 10,
        _ => 8,
    };

    if api_key.len() < min_length {
        return Err(PyErr::new::<pyo3::exceptions::PyValueError, _>(format!(
            "{} API key too short",
            provider
        )));
    }

    Ok(())
}

How Validation is Called (All Providers)

// From python/src/llm/config.rs (examples from lines 18-19, 31-32, 50, etc.)
#[staticmethod]
fn openai(api_key: String, model: Option<String>) -> PyResult<Self> {
    validate_api_key(&api_key, "OpenAI")?;  // ← Only length check, no format validation
    // ... rest of config creation
}

#[staticmethod]
fn anthropic(api_key: String, model: Option<String>) -> PyResult<Self> {
    validate_api_key(&api_key, "Anthropic")?;  // ← Only length check, no format validation
    // ... rest of config creation
}

Core Layer Has No Validation

// From core/src/llm/openai.rs (lines 23-43)
impl OpenAiProvider {
    pub fn new(api_key: String, model: String) -> GraphBitResult<Self> {
        // ← No validation of api_key format or content
        let client = Client::builder()
            .timeout(std::time::Duration::from_secs(60))
            .build()
            .map_err(|e| {
                GraphBitError::llm_provider("openai", format!("Failed to create HTTP client: {e}"))
            })?;

        Ok(Self {
            client,
            api_key,  // ← Accepted as-is, no validation
            model,
            base_url: "https://api.openai.com/v1".to_string(),
            organization: None,
        })
    }
}

Impact Assessment

Affected Areas:

  • User Experience: Invalid keys are accepted during configuration, only failing at runtime when the first API call is made. Users might not discover configuration errors until they try to use the system.
  • Error Detection: Errors occur late in the workflow execution (at first LLM call) rather than at configuration time. This delays error discovery and makes debugging harder.
  • Debugging: Users receive cryptic authentication errors from the LLM provider (e.g., "Unauthorized", "Invalid API key") instead of clear validation messages that explain what format is expected.
  • Developer Experience: Developers cannot quickly identify common mistakes like:
    • Using the wrong provider's API key (e.g., OpenAI key for Anthropic)
    • Typos in API keys
    • Incomplete or truncated keys
  • Test Coverage Gap: Integration tests expect format validation (e.g., test_invalid_api_key_error expects "invalid-format-key" to fail) but current implementation doesn't validate formats

Real-World Example:

# This currently succeeds at config time but fails at runtime
config = LlmConfig.openai("invalid-format-key", "gpt-4")  # ✅ Passes validation (length >= 20)
client = LlmClient(config)
response = client.complete("Hello")  # ❌ Fails with cryptic "Unauthorized" error

Expected Behavior:

# Should fail at config time with clear message
config = LlmConfig.openai("invalid-format-key", "gpt-4")  # ❌ Should fail: "OpenAI API keys must start with 'sk-'"

Potential Solutions

Recommendation 1: Implement Provider-Specific Format Validation

Enhance the validate_api_key() function to check format patterns for each provider:

pub(crate) fn validate_api_key(api_key: &str, provider: &str) -> PyResult<()> {
    if api_key.is_empty() {
        return Err(PyErr::new::<pyo3::exceptions::PyValueError, _>(format!(
            "{} API key cannot be empty",
            provider
        )));
    }

    let (min_length, expected_prefix, pattern_desc) = match provider.to_lowercase().as_str() {
        "openai" => (20, Some("sk-"), "OpenAI keys start with 'sk-' followed by alphanumeric characters"),
        "anthropic" => (15, Some("sk-ant-"), "Anthropic keys start with 'sk-ant-' followed by alphanumeric characters"),
        "huggingface" => (10, Some("hf_"), "HuggingFace keys start with 'hf_' followed by alphanumeric characters"),
        "azure openai" => (8, None, "Azure OpenAI keys are typically hex strings or UUIDs"),
        "ollama" => (0, None, "Ollama does not require an API key"),
        _ => (8, None, "Generic API key format"),
    };

    if api_key.len() < min_length {
        return Err(PyErr::new::<pyo3::exceptions::PyValueError, _>(format!(
            "{} API key too short (minimum {} characters). {}",
            provider, min_length, pattern_desc
        )));
    }

    if let Some(prefix) = expected_prefix {
        if !api_key.starts_with(prefix) {
            return Err(PyErr::new::<pyo3::exceptions::PyValueError, _>(format!(
                "{} API key has invalid format. {}",
                provider, pattern_desc
            )));
        }
    }

    Ok(())
}

Recommendation 2: Add Character Set Validation

Validate that API keys contain only expected characters (alphanumeric, hyphens, underscores):

  • Reject keys with spaces, special characters, or control characters
  • Catch common copy-paste errors (e.g., extra whitespace)

Recommendation 3: Provide Detailed Error Messages

Include specific guidance in error messages:

  • Show the expected format for the provider
  • Suggest common mistakes (e.g., "Did you use the wrong provider's key?")
  • Link to provider documentation

Recommendation 4: Add Optional Strict Mode

Provide a configuration option to enable/disable strict validation:

  • Default: Strict validation enabled (fail fast at config time)
  • Optional: Lazy validation (defer to first API call) for advanced use cases

Recommendation 5: Add Validation Tests

Ensure comprehensive test coverage for format validation:

  • Valid formats for each provider
  • Invalid formats (wrong prefix, too short, invalid characters)
  • Edge cases (empty, whitespace, special characters)

Current Strengths

  • ✅ Empty key detection works correctly
  • ✅ Length validation is provider-aware
  • ✅ Validation happens at configuration time (not at runtime)
  • ✅ Clear error messages for empty/short keys

Key Finding: The core layer (Rust) has NO validation - all validation happens in the Python bindings layer. This is appropriate since the Python layer is the user-facing API.

Notes

This validation enhancement should be non-breaking and should provide clear, actionable error messages to help users quickly identify and resolve configuration issues. The fix is localized to python/src/validation.rs and requires no changes to the core layer.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions