Skip to content

Improve dependency handling for python-magic/libmagic in book parsing infrastructure #304

@akaszubski

Description

@akaszubski

Summary

Tests fail when the libmagic system library is not installed, despite the book_parser.py code gracefully handling the missing dependency via try/except and runtime error messages. This creates a poor developer experience where 17 tests fail unnecessarily when libmagic is absent, even though the core functionality correctly falls back to extension-based file type detection.

What Does NOT Work

Patterns that FAIL:

  1. Current test approach without conditional skipping - Tests that depend on python-magic fail hard when libmagic system library is not installed, even though the production code handles this gracefully with fallback logic.

  2. Mock patching system-level dependencies - Attempting to mock libmagic at the system level doesn't work well because the dependency is loaded at module import time, and the missing C library causes import failures before tests can mock anything.

  3. Assuming all developers have libmagic installed - This creates friction for new contributors who may not have the system library, leading to confusing test failures that don't reflect actual functionality problems.

Scenarios

Fresh Install

  • Developer installs autonomous-dev: May not have libmagic system library installed
  • Runs tests: 17 tests fail even though book_parser.py works correctly with extension-based fallback
  • Expected behavior: Tests should skip cleanly with clear message about optional dependency
  • Actual behavior: Tests fail with ImportError/AttributeError traces

Update/Upgrade

  • Existing users with working book_parser.py: May have been using extension-based detection without issues
  • Pull new test changes: Tests suddenly fail when libmagic not present
  • Expected behavior: Tests pass or skip based on dependency availability
  • Actual behavior: Test failures block development workflow

User Customizations

  • Not applicable: This is testing infrastructure, no user customizations involved

Implementation Approach

Files to modify:

  1. Tests using python-magic (identify via grep for `import magic` or `magic.from_file`):

    • Add `pytest.importorskip('magic')` at module level or in fixtures
    • Separate tests into magic-dependent and magic-independent suites
  2. pyproject.toml:
    ```toml
    [project.optional-dependencies]
    mime-detection = ["python-magic>=0.4.27"]
    ```

  3. plugins/autonomous-dev/lib/book_parser.py (enhance error messages):

    • Add platform-specific installation instructions to ImportError message
    • Consider adding puremagic as pure-Python fallback (optional enhancement)
  4. Documentation:

    • Create or update `docs/DEPENDENCIES.md` with libmagic installation instructions
    • Document optional dependencies in README.md or INSTALLATION.md

Technical approach:

```python

In test files that use magic

magic = pytest.importorskip('magic', reason="python-magic not installed - install libmagic system library")

Or for specific test functions

@pytest.mark.skipif(not has_magic, reason="Requires python-magic and libmagic")
def test_magic_based_detection():
...
```

Test Scenarios

Multiple test paths to verify:

  1. With libmagic installed:

    • All tests pass
    • Magic-based MIME detection tests run and pass
    • Extension-based fallback tests run and pass
  2. Without libmagic (fresh install):

    • Magic-dependent tests skip cleanly with informative message
    • Extension-based detection tests run and pass
    • No test failures, only skips
    • Skip message includes installation instructions
  3. Mock approach (optional):

    • Tests with mocked magic module pass
    • Helps test error handling paths without system dependency
  4. CI/CD environments:

    • Tests pass in environment with libmagic (full test suite)
    • Tests pass in environment without libmagic (reduced test suite with skips)
    • Optional: Run matrix testing both scenarios
  5. Error message verification:

    • Runtime error (when magic needed but absent) shows platform-specific install command
    • Error message mentions optional-dependencies installation method

Acceptance Criteria

Tests

  • Tests skip cleanly when libmagic missing (pytest.importorskip or skipif decorators)
  • Zero test failures when libmagic not installed (only skips)
  • All core functionality tests pass without libmagic (extension-based detection validated)
  • Magic-dependent tests run successfully when libmagic is available
  • Test output clearly indicates skipped tests and reason (e.g., "SKIPPED: python-magic not installed")

Documentation

  • Installation instructions for libmagic added to docs/ (platform-specific: macOS, Linux, Windows)
  • Optional dependencies declared in pyproject.toml under [project.optional-dependencies]
  • README.md or INSTALLATION.md mentions optional mime-detection feature and how to enable it
  • Documentation explains extension-based fallback behavior when libmagic absent

Error Messages

  • Runtime ImportError in book_parser.py includes platform-specific installation commands
  • Error message mentions pip install with optional-dependencies syntax
  • Error message is actionable (copy-paste command to fix)

Validation

  • Run tests with libmagic installed - all pass
  • Run tests without libmagic - no failures, only skips with clear messages
  • Verify extension-based detection still works without libmagic
  • Confirm error messages display correct installation commands on each platform

Dependencies

Python package: `python-magic>=0.4.27`

System library: `libmagic` (file type identification library)

  • Part of the `file` command utilities
  • Platform-specific installation:
    • macOS: `brew install libmagic`
    • Ubuntu/Debian: `apt-get install libmagic1`
    • RHEL/CentOS: `yum install file-libs`
    • Windows: Use `python-magic-bin` package (bundles libmagic DLL)

Alternative (pure Python): `puremagic` (optional consideration)

  • No system dependencies
  • Less accurate than libmagic but works everywhere
  • Could be used as fallback layer before extension-based detection

Environment Requirements

Verified environments:

  • Python 3.8+ (assumed based on autonomous-dev requirements)
  • pytest 7.0+ (for importorskip functionality)

System library availability:

  • libmagic is available on all major platforms but requires separate installation
  • Not included in Python standard library or PyPI packages (except python-magic-bin for Windows)

Testing without libmagic:

  • Should work on fresh Python environments
  • Should work in minimal Docker containers
  • Should work in CI/CD without apt-get/brew install steps

Source of Truth

Solution verified from:

  1. pytest documentation - `pytest.importorskip()` pattern for optional dependencies

  2. python-magic GitHub - Installation instructions and error handling patterns

  3. autonomous-dev book_parser.py implementation - Current try/except pattern

    • File: `plugins/autonomous-dev/lib/book_parser.py`
    • Already handles ImportError gracefully at runtime
    • Tests need to match this graceful handling
  4. Best practices - setuptools optional-dependencies for optional features

    • PEP 621 specification for pyproject.toml
    • Allows `pip install autonomous-dev[mime-detection]` syntax

Related Issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions