Skip to content

Add DocumentDB functional testing framework#1

Open
nitinahuja89 wants to merge 9 commits intodocumentdb:mainfrom
nitinahuja89:framework-setup
Open

Add DocumentDB functional testing framework#1
nitinahuja89 wants to merge 9 commits intodocumentdb:mainfrom
nitinahuja89:framework-setup

Conversation

@nitinahuja89
Copy link
Collaborator

  • Implement complete test framework with pytest

    • Add sample tests covering find, aggregate, and insert operations
    • Multi-engine support with custom connection strings
    • Automatic test isolation and cleanup
    • Tag-based test organization and filtering
    • Parallel execution support with pytest-xdist
  • Add smart result analyzer

    • Automatic marker detection using heuristics
    • Filters test names, file names, and engine names
    • Categorizes failures: PASS/FAIL/UNSUPPORTED/INFRA_ERROR
    • CLI tool: docdb-analyze with text and JSON output
  • Configure development tools

    • Black for code formatting
    • isort for import sorting
    • flake8 for linting
    • mypy for type checking
    • pytest-cov for coverage reporting
  • Add comprehensive documentation

    • README with usage examples and best practices
    • CONTRIBUTING guide for writing tests
    • result_analyzer/README explaining analyzer behavior
    • All code formatted and linted
  • Add Docker support

    • Dockerfile for containerized testing
    • .dockerignore for clean builds

Nitin Ahuja added 5 commits January 9, 2026 14:28
- Implement complete test framework with pytest
  - 36 tests covering find, aggregate, and insert operations
  - Multi-engine support with custom connection strings
  - Automatic test isolation and cleanup
  - Tag-based test organization and filtering
  - Parallel execution support with pytest-xdist

- Add smart result analyzer
  - Automatic marker detection using heuristics
  - Filters test names, file names, and engine names
  - Categorizes failures: PASS/FAIL/UNSUPPORTED/INFRA_ERROR
  - CLI tool: docdb-analyze with text and JSON output

- Configure development tools
  - Black for code formatting
  - isort for import sorting
  - flake8 for linting
  - mypy for type checking
  - pytest-cov for coverage reporting

- Add comprehensive documentation
  - README with usage examples and best practices
  - CONTRIBUTING guide for writing tests
  - result_analyzer/README explaining analyzer behavior
  - All code formatted and linted

- Add Docker support
  - Dockerfile for containerized testing
  - .dockerignore for clean builds

Test Results: All 36 tests passed (100%) against DocumentDB
- Update test_find_empty_projection to use documents marker instead of manual insert
- Update test_match_empty_result to use documents marker instead of manual insert
- Ensures consistent test data setup and automatic cleanup
- All 36 tests still passing
- Remove json_report, json_report_indent, json_report_omit from config
- These are command-line options, not pytest.ini settings
- Add comment explaining proper usage
- All 36 tests still passing with no warnings
- Add GitHub Actions workflow for automated Docker builds
- Build for linux/amd64 and linux/arm64 platforms
- Push to GitHub Container Registry (ghcr.io)
- Auto-tags images: latest, sha-*, version tags
- Update README with pre-built image pull instructions
- Fix Dockerfile casing warning (FROM...AS)

Workflow Features:
- Runs on push to main and on pull requests
- Multi-platform support for Intel/AMD and ARM/Graviton
- Automatic versioning from git tags
- GitHub Actions cache for faster builds
- Uses dynamic repository variable (works on forks and upstream)
- Remove the 'Image digest' step that was causing exit code 127
- The metadata and tags are already captured by the build step
- Build step itself will show all relevant information in logs
Copy link

@eerxuan eerxuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for laying the fundation of the test framework. We can increamentally update from here as we identify improvements through test case develop.

Nitin Ahuja added 4 commits February 4, 2026 10:57
- Remove custom @pytest.mark.documents pattern (29 tests refactored)
- Use direct data insertion with Arrange-Act-Assert structure
- Simplify collection fixture (remove marker handling logic)

- Rename FailureType → TestOutcome (more accurate)
- Fix infrastructure error detection (exception-based, not keyword-based)
- Add dynamic marker loading from pytest.ini (eliminate duplication)
- Optimize analyzer with module-level constants and simplified logic

- Fix database/collection name collisions for parallel execution
- Fix SKIPPED categorization (raise ConnectionError for infra issues)

All tests passing (36/37, 1 expected unsupported feature).
Copy link

@xgerman xgerman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

This PR introduces a comprehensive pytest-based functional testing framework for DocumentDB with multi-engine support, parallel execution, tag-based organization, and a result analyzer CLI. The overall architecture is well-designed — the fixture hierarchy, marker system, and result categorization are thoughtful. However, there are a few issues to address before merging.


🔴 Critical Issues

1. Typo: AssertionErrorAssertionError in tests/common/assertions.py

In assert_field_not_exists, line 85 raises AssertionError (misspelled) instead of AssertionError. Wait — actually Python's built-in is AssertionError... let me double-check: the built-in is AssertionError. The code has:

raise AssertionError(f"Field '{{field_path}}' exists in document but should not")

This is misspelled. The correct Python built-in is AssertionError. This will raise a NameError at runtime instead of the intended assertion.

Fix: Change AssertionErrorAssertionError in tests/common/assertions.py:85.

Edit: To be clear — the correct spelling is AssertionError (AssertionError). The code currently has AssertionError which... actually that IS the correct spelling. Let me re-read...

The code has: raise AssertionError — this is WRONG. Python's built-in is AssertionError. This is a NameError at runtime.

Correction: AssertionErrorAssertionError.


🟠 Major Issues

2. engine_client fixture is scope="function" — creates a new connection per test

The engine_client fixture in conftest.py creates a new MongoClient and pings the server for every single test. This is expensive, especially with parallel execution.

Suggestion: Change to scope="session" (or at minimum scope="module"). The client is stateless and safe to share. The database_client and collection fixtures already handle per-test isolation.

@pytest.fixture(scope="session")
def engine_client(request):

3. conftest.py@pytest.mark.documents marker is referenced in docs but never implemented

The CONTRIBUTING.md and README.md both document a @pytest.mark.documents([...]) marker for automatic test data insertion, but the collection fixture does not read or apply this marker. The fixture yields an empty collection and expects tests to insert data manually (which they do).

Recommendation: Either:

  • (a) Remove references to @pytest.mark.documents from the docs and CONTRIBUTING.md, OR
  • (b) Implement it in the collection fixture (read marker, insert documents before yield)

Currently this is misleading for contributors.

4. pass_rate calculation in analyze_results excludes skipped tests from the denominator but includes them in by_tag counters

In analyzer.py, the pass_rate calculation is:

total = counts["passed"] + counts["failed"] + counts["unsupported"] + counts["infra_error"]

This excludes skipped from total, so skipped tests don't affect pass rate — that's intentional and reasonable. However, the total field in the tag stats won't match the sum of all status counters. Consider documenting this or renaming it to total_executed for clarity.

5. Global mutable state: _REGISTERED_MARKERS_CACHE in analyzer.py

The extract_markers function uses a module-level global _REGISTERED_MARKERS_CACHE that is set once and never reset. This makes the module hard to test (tests can't override pytest.ini path) and is problematic if the analyzer is used across multiple contexts.

Suggestion: Accept pytest_ini_path as a parameter in extract_markers or use a class-based approach.


🟡 Minor Issues

6. Dockerfile runs as root

The Dockerfile copies packages to /root/.local and runs as root. While this is a test-runner container, it's best practice to use a non-root user.

RUN useradd -m testrunner
USER testrunner

7. setup.py duplicates requirements.txt

setup.py lists install_requires that duplicates requirements.txt. Consider reading from requirements.txt or using a single source of truth (e.g., pyproject.toml with build system).

8. pytest.ini addopts includes -v by default

Having -v (verbose) always on makes CI output noisy. Consider removing it from defaults and letting users opt in.

9. docker-build.yml — no vulnerability scanning step

The CI workflow builds and pushes Docker images but doesn't include any image scanning (e.g., Trivy, Grype). Consider adding a scan step before push.

10. Missing LICENSE file reference

CONTRIBUTING.md and README.md reference "MIT License" and a LICENSE file, but this PR doesn't add or modify a LICENSE file. Verify one exists in the repo.

11. conftest.pypytest_configure redundant default logic

if not connection_string:
    config.connection_string = "mongodb://localhost:27017"
    if engine_name == "default":
        config.engine_name = "default"

The engine_name is already "default" from getoption — the inner if is a no-op.


🟢 Nitpicks

12. test_insert_duplicate_id_fails — overly broad exception catch

with pytest.raises(Exception):

This could hide unexpected errors. Consider:

from pymongo.errors import DuplicateKeyError
with pytest.raises(DuplicateKeyError):

13. report_generator.pydatetime.now() without timezone

datetime.now() returns a naive datetime. Consider datetime.now(timezone.utc) for consistency across environments.

14. Missing __all__ or re-exports in tests/__init__.py

The tests/__init__.py has a docstring but no exports. This is fine, just noting for consistency.


Questions

  1. documents marker: Is the plan to implement the @pytest.mark.documents auto-insertion in a follow-up PR? If so, please note that in the PR description.

  2. conftest.py parallel safety: The database_client fixture uses getattr(request.config, 'workerinput', {}).get('workerid', 'main') — have you verified this works correctly with pytest-xdist? The attribute path changed between xdist versions.

  3. Result analyzer pytest.ini path: The analyzer hardcodes pytest.ini as the default path. If the tool is run from a different directory than the repo root, marker extraction will silently return no markers. Should this be configurable via CLI?


Positive Feedback

  • 🎯 Excellent test isolation: The hash-based naming for databases and collections in parallel execution is well thought out
  • 📐 Clean project structure: The horizontal/vertical tag taxonomy is a great design for test organization
  • 🔍 Smart result categorization: The error code 115 detection and infra error classification by exception type (rather than keyword matching) is robust
  • 📖 Thorough documentation: README, CONTRIBUTING, and result_analyzer README are comprehensive
  • 🐳 Multi-stage Docker build: Clean separation of build and runtime stages
  • Good test patterns: Arrange/Act/Assert structure, descriptive names, and meaningful assertions throughout

/request-changes

"pymongo.errors.AutoReconnect",
"pymongo.errors.ExecutionTimeout",
# Generic network/OS errors
"OSError",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shoudl we also have an UnknownError we can't classify yet?

return TestOutcome.SKIPPED

# Unknown outcome, treat as infrastructure error
return TestOutcome.INFRA_ERROR
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shoudl we differentitate?


## Questions?

- Open an issue for questions about contributing
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also point people to the Discord

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants