Skip to content

fix: return 400 for Elasticsearch query parse errors instead of 500#106

Merged
DominicBM merged 1 commit into
mainfrom
fix/es-400-as-bad-request
Apr 23, 2026
Merged

fix: return 400 for Elasticsearch query parse errors instead of 500#106
DominicBM merged 1 commit into
mainfrom
fix/es-400-as-bad-request

Conversation

@DominicBM

@DominicBM DominicBM commented Apr 20, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Malformed Lucene syntax in the q parameter (e.g. MARC bibliographic subject headings with trailing )- like García,, Eloy,, 1958)-) causes Elasticsearch to return HTTP 400. Previously ElasticSearchClient.processSearch and processRandom used a wildcard case ElasticSearchHttpError(_) that discarded the status code and mapped all non-200 ES responses to SearchFailure → HTTP 500.
  • Added SearchQueryParseFailure to SearchProtocol and updated processSearch/processRandom in ElasticSearchClient to emit it when ES returns 400 (mirroring how processFetch already handles 404).
  • Wired SearchQueryParseFailure through SearchRegistryBehaviorValidationFailure("The q parameter contains invalid search syntax.") → HTTP 400 with JSON body for search and random paths; added a defensive handler in the fetch path.
  • Added MockEsClientQueryParseError and an end-to-end test asserting /v2/items?q=invalid%29- returns 400.

Root cause

Observed in production: a Basque library scraper (Jakarta Commons-HttpClient/3.1) sending MARC subject headings as Lucene queries, generating 7 HTTP 500s that should have been 400s. The ES 400 responses were silently swallowed and re-emitted as 500s.

Test plan

  • sbt "compile; test" passes (274/274)
  • New end-to-end test: Elasticsearch query parse errorreturn BadRequest for /v2/items
  • Existing tests for SearchFailure (non-400 ES errors) unchanged

🤖 Generated with Claude Code

Overview

This PR fixes incorrect HTTP status codes returned when Elasticsearch receives malformed Lucene query syntax in the q parameter. Previously, Elasticsearch 400 responses were being converted to HTTP 500 errors; the fix ensures these now correctly return HTTP 400 with a descriptive validation error message. This resolves an issue where certain valid use cases (e.g., MARC subject headings with trailing punctuation like García,, Eloy,, 1958)-) were being rejected with 500 errors instead of 400 validation failures.

Changes

Core Changes

  • Protocol Update: Added a new SearchQueryParseFailure response type to the SearchProtocol sealed trait to distinguish ES 400 errors from other search failures.
  • Error Handling: Updated ElasticSearchClient.processSearch() and processRandom() to explicitly match HTTP 400 status codes from Elasticsearch and respond with SearchQueryParseFailure (consistent with how processFetch handles 404 errors); all other HTTP errors continue to return SearchFailure.
  • Registry Behavior: Propagated SearchQueryParseFailure through SearchRegistryBehavior to convert it to ValidationFailure("The q parameter contains invalid search syntax.") for the /v2/items search and random endpoints, producing HTTP 400 responses with JSON error bodies. Added a defensive handler in the fetch path for completeness.

Testing

  • Added MockEsClientQueryParseError to simulate Elasticsearch 400 responses in tests.
  • Added end-to-end test validating that /v2/items?q=invalid%29- returns HTTP 400 with appropriate error message.
  • All 274 existing tests pass; no regression in error handling for other HTTP status codes.

Public API Changes

Modifies public-facing API response shape: The /v2/items search and random endpoints now return HTTP 400 (BadRequest) instead of HTTP 500 (InternalServerError) when the q parameter contains invalid Lucene query syntax. Error response body format remains consistent with other validation errors.

…ead of 500

Malformed Lucene syntax (e.g. MARC subject headings with trailing `)-`)
causes ES to return HTTP 400. Previously any non-200 ES response in the
search and random paths was blindly mapped to SearchFailure → HTTP 500.
Added SearchQueryParseFailure response type, wired it through the ES
client, registry, and added end-to-end test coverage.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Apr 20, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6bccdb72-e772-47bd-8fb5-42a8edae5f1c

📥 Commits

Reviewing files that changed from the base of the PR and between 64d519e and f2f1a5a.

📒 Files selected for processing (5)
  • src/main/scala/dpla/api/v2/registry/SearchRegistryBehavior.scala
  • src/main/scala/dpla/api/v2/search/ElasticSearchClient.scala
  • src/main/scala/dpla/api/v2/search/SearchProtocol.scala
  • src/test/scala/dpla/api/v2/endToEnd/InvalidParamsTest.scala
  • src/test/scala/dpla/api/v2/search/MockEsClientQueryParseError.scala

Walkthrough

The changes introduce explicit handling for Elasticsearch query parse failures (HTTP 400 errors) by adding a new SearchQueryParseFailure response type to the protocol and propagating it through the search request handling pipeline, allowing the API to distinguish parse errors from other failures and return appropriate HTTP responses to clients.

Changes

Cohort / File(s) Summary
Protocol Definition
src/main/scala/dpla/api/v2/search/SearchProtocol.scala
Added new SearchQueryParseFailure case object as a SearchResponse type to represent query parse errors.
Error Handling Implementation
src/main/scala/dpla/api/v2/search/ElasticSearchClient.scala, src/main/scala/dpla/api/v2/registry/SearchRegistryBehavior.scala
Updated Elasticsearch HTTP error handling to distinguish HTTP 400 errors as SearchQueryParseFailure; updated registry handlers to catch and map this failure type to appropriate validation or internal failures depending on context.
Testing Infrastructure
src/test/scala/dpla/api/v2/search/MockEsClientQueryParseError.scala, src/test/scala/dpla/api/v2/endToEnd/InvalidParamsTest.scala
Added mock Elasticsearch client that simulates query parse failures and extended test coverage with a scenario validating 400 BadRequest response for invalid q parameter syntax.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant SearchRegistry
    participant ElasticSearchClient
    participant Elasticsearch

    Client->>SearchRegistry: SearchQuery (with malformed q param)
    SearchRegistry->>ElasticSearchClient: Execute search
    ElasticSearchClient->>Elasticsearch: HTTP request
    Elasticsearch-->>ElasticSearchClient: HTTP 400 Bad Request
    ElasticSearchClient-->>SearchRegistry: SearchQueryParseFailure
    SearchRegistry-->>Client: 400 BadRequest + ValidationFailure(message)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and accurately summarizes the main change: fixing the error response status code from 500 to 400 for Elasticsearch query parse errors, which is the core objective of this PR.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/es-400-as-bad-request

Comment @coderabbitai help to get the list of available commands and usage tips.

@DominicBM DominicBM merged commit 557525d into main Apr 23, 2026
5 checks passed
@DominicBM DominicBM deleted the fix/es-400-as-bad-request branch April 23, 2026 12:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant