Issue Description
When errors occur in the Rust core library and are converted to Python exceptions through the bindings layer, significant context information is lost. The current error conversion mechanism in python/src/errors.rs performs lossy transformations that discard valuable debugging information such as provider names, field names, and retry information.
Affected Components
- Rust Core:
core/src/errors.rs - Error type definitions
- Python Layer:
python/src/errors.rs - Error conversion functions
- Integration: Error propagation between layers
Related Files
python/src/errors.rs (lines 129-161)
core/src/errors.rs (lines 14-129)
python/src/llm/client.rs (lines 985-1002)
Code Snippets
Current Error Conversion (Lossy)
// From python/src/errors.rs (lines 144-152)
GraphBitError::RateLimit { .. } => PythonBindingError::RateLimit {
message: "Rate limit exceeded".to_string(),
retry_after: None, // ← Context lost: provider name and retry_after_seconds discarded
},
GraphBitError::Validation { message, .. } => PythonBindingError::Validation {
message: message.clone(),
field: "unknown".to_string(), // ← Context lost: actual field name discarded
value: None,
},
Rust Core Error Definition (Rich Context)
// From core/src/errors.rs (lines 100-107)
#[error("Rate limit exceeded: {provider} - retry after {retry_after_seconds}s")]
RateLimit {
provider: String, // ← Available in core
retry_after_seconds: u64, // ← Available in core
},
#[error("Validation error: {field} - {message}")]
Validation {
field: String, // ← Available in core
message: String,
},
Python Binding Error Definition (Partial Context)
// From python/src/errors.rs (lines 42-46)
RateLimit {
message: String,
retry_after: Option<u64>, // ← Could be populated but isn't
},
Impact Assessment
Severity: Medium
Affected Areas:
- Debugging: Developers lose critical context when troubleshooting failures. For example, a RateLimit error loses the provider name and retry_after_seconds, making it harder to identify which provider is rate-limiting and when to retry.
- Error Recovery: Applications cannot implement sophisticated retry logic. The
retry_after field in PythonBindingError::RateLimit is always None, even though the core error contains retry_after_seconds. This forces applications to use generic retry delays instead of provider-specific recommendations.
- Observability: Monitoring systems cannot distinguish between different error scenarios. A Validation error always shows
field: "unknown" instead of the actual field name from the core error.
- User Experience: Generic error messages reduce clarity for end users. Error messages don't include provider context that would help users understand which service failed.
- Production Debugging: In production, logs show generic messages like "Rate limit exceeded" instead of "Rate limit exceeded: openai - retry after 30s", making it harder to correlate errors with specific providers.
Potential Solutions
Recommendation 1: Populate All Available Fields in Error Conversion
The most straightforward fix is to extract and preserve all available context when converting from GraphBitError to PythonBindingError. Specific improvements:
- RateLimit errors: Extract
provider and retry_after_seconds from the core error and populate the retry_after field in the Python error
- Validation errors: Extract the
field name from the core error instead of using "unknown"
- Authentication errors: Extract and populate the
provider field
- LlmProvider errors: Create a new Python error variant or extend existing ones to capture provider-specific context
Example fix for RateLimit:
GraphBitError::RateLimit { provider, retry_after_seconds } => PythonBindingError::RateLimit {
message: format!("Rate limit exceeded: {} - retry after {}s", provider, retry_after_seconds),
retry_after: Some(retry_after_seconds), // ← Now populated
},
Recommendation 2: Add Provider Context to Python Error Types
Extend PythonBindingError variants to include provider information where applicable:
- Add
provider: Option<String> field to RateLimit variant
- Add
provider: Option<String> field to LlmProvider variant (if created)
- Update the Display implementation to include provider context in error messages
Recommendation 3: Implement Error Metadata Accessor
Add methods to PythonBindingError that allow Python code to programmatically access error context:
impl PythonBindingError {
pub fn get_provider(&self) -> Option<String> { ... }
pub fn get_retry_after(&self) -> Option<u64> { ... }
pub fn get_field(&self) -> Option<String> { ... }
}
Recommendation 4: Add Comprehensive Error Logging
Ensure that the full error context is logged before conversion (already done at line 133), and consider adding structured logging that captures all error fields for observability systems.
Current Strengths
The error handling infrastructure is well-designed:
- ✅ Core errors capture rich context (provider names, field names, retry information)
- ✅ Python binding errors have fields to hold this context (e.g.,
retry_after: Option<u64>)
- ✅ Error logging is comprehensive (line 133 logs full error before conversion)
- ✅ Display implementation for Python errors is well-structured and includes context when available
- ✅ Error conversion function exists and is centralized (single point of improvement)
Verification Notes
Issue Confirmed: The code at lines 144-152 in python/src/errors.rs demonstrates the context loss:
- Line 144-147:
RateLimit error discards provider and retry_after_seconds
- Line 148-152:
Validation error discards the actual field name
- Line 140-143:
Authentication error discards provider context
Positive Finding: The PythonBindingError enum already has fields to hold this context (e.g., retry_after: Option<u64> at line 46), so the fix only requires populating these fields during conversion.
Notes
This issue is particularly important for production deployments where detailed error information is essential for monitoring, alerting, and automated recovery mechanisms. The good news is that the core library already captures all necessary context—it's just not being propagated to Python. The fix is straightforward and low-risk, requiring only changes to the error conversion logic in python/src/errors.rs (lines 129-161).
Issue Description
When errors occur in the Rust core library and are converted to Python exceptions through the bindings layer, significant context information is lost. The current error conversion mechanism in
python/src/errors.rsperforms lossy transformations that discard valuable debugging information such as provider names, field names, and retry information.Affected Components
core/src/errors.rs- Error type definitionspython/src/errors.rs- Error conversion functionsRelated Files
python/src/errors.rs(lines 129-161)core/src/errors.rs(lines 14-129)python/src/llm/client.rs(lines 985-1002)Code Snippets
Current Error Conversion (Lossy)
Rust Core Error Definition (Rich Context)
Python Binding Error Definition (Partial Context)
Impact Assessment
Severity: Medium
Affected Areas:
retry_afterfield inPythonBindingError::RateLimitis alwaysNone, even though the core error containsretry_after_seconds. This forces applications to use generic retry delays instead of provider-specific recommendations.field: "unknown"instead of the actual field name from the core error.Potential Solutions
Recommendation 1: Populate All Available Fields in Error Conversion
The most straightforward fix is to extract and preserve all available context when converting from
GraphBitErrortoPythonBindingError. Specific improvements:providerandretry_after_secondsfrom the core error and populate theretry_afterfield in the Python errorfieldname from the core error instead of using "unknown"providerfieldExample fix for RateLimit:
Recommendation 2: Add Provider Context to Python Error Types
Extend
PythonBindingErrorvariants to include provider information where applicable:provider: Option<String>field toRateLimitvariantprovider: Option<String>field toLlmProvidervariant (if created)Recommendation 3: Implement Error Metadata Accessor
Add methods to
PythonBindingErrorthat allow Python code to programmatically access error context:Recommendation 4: Add Comprehensive Error Logging
Ensure that the full error context is logged before conversion (already done at line 133), and consider adding structured logging that captures all error fields for observability systems.
Current Strengths
The error handling infrastructure is well-designed:
retry_after: Option<u64>)Verification Notes
Issue Confirmed: The code at lines 144-152 in
python/src/errors.rsdemonstrates the context loss:RateLimiterror discardsproviderandretry_after_secondsValidationerror discards the actualfieldnameAuthenticationerror discardsprovidercontextPositive Finding: The
PythonBindingErrorenum already has fields to hold this context (e.g.,retry_after: Option<u64>at line 46), so the fix only requires populating these fields during conversion.Notes
This issue is particularly important for production deployments where detailed error information is essential for monitoring, alerting, and automated recovery mechanisms. The good news is that the core library already captures all necessary context—it's just not being propagated to Python. The fix is straightforward and low-risk, requiring only changes to the error conversion logic in
python/src/errors.rs(lines 129-161).