Skip to content

Support public search for Cosmetology#1305

Closed
landonshumway-ia wants to merge 11 commits intocsg-org:mainfrom
InspiringApps:feat/public-search
Closed

Support public search for Cosmetology#1305
landonshumway-ia wants to merge 11 commits intocsg-org:mainfrom
InspiringApps:feat/public-search

Conversation

@landonshumway-ia
Copy link
Copy Markdown
Collaborator

@landonshumway-ia landonshumway-ia commented Feb 27, 2026

Cosmetology has slightly different requirements for public searching given the expected volume of practitioners with similar names. We need to support additional search parameters, such as license number. In order to accommodate these various patterns, we have determined to leverage the OpenSearch Domain to support public search, while keeping the request schema similar in shape to the public search request body used by JCC.

This requires us to support custom pagination tracking on the backend to ensure that all matching license records from a query are accounted for when paging through large numbers of matches.

This PR also includes the change to remove the financial summary related compact configuration fields, which were missed in the previous PR to remove all unneeded compact config fields.

Testing List

  • yarn test:unit:all should run without errors or warnings
  • yarn serve should run without errors or warnings
  • yarn build should run without errors or warnings
  • For API configuration changes: CDK tests added/updated in backend/compact-connect/tests/unit/test_api.py
  • For API endpoint changes: OpenAPI spec updated to show latest endpoint configuration run compact-connect/bin/download_oas30.py
  • Code review

Closes #1295

Summary by CodeRabbit

  • New Features

    • Public provider search: license-number filtering, cursor pagination, and provider search endpoint.
    • Indexing/docs: one-index-per-license documents, document IDs per license, and support for dateOfBirth in search/indexing.
    • Authorization: queries referencing private fields (e.g., dateOfBirth) now require explicit read-private scope.
  • Bug Fixes

    • Removed compact/jurisdiction summary-report notification email fields from APIs, schemas, responses, defaults, and models.
  • Infrastructure

    • Added search persistent resources and a dedicated public search handler; API wired to use the new search stack.
  • Tests & Docs

    • Expanded/updated tests and docs to cover new search, indexing, auth, and removed notification fields.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 27, 2026

📝 Walkthrough

Walkthrough

Removes summary-report notification email fields across models, schemas, docs, and tests; adds an unauthenticated public license-search Lambda with OpenSearch integration, licenseNumber filtering, cursor pagination, and associated CDK wiring and tests.

Changes

Cohort / File(s) Summary
Notification email field removals
backend/cosmetology-app/lambdas/nodejs/lib/models/compact.ts, backend/cosmetology-app/lambdas/nodejs/lib/models/jurisdiction.ts, backend/cosmetology-app/lambdas/nodejs/lib/models/email-notification-service-event.ts, backend/cosmetology-app/docs/internal/api-specification/latest-oas30.json, backend/cosmetology-app/docs/internal/postman/postman-collection.json
Deleted compactSummaryReportNotificationEmails and jurisdictionSummaryReportNotificationEmails from TypeScript models, OpenAPI/Postman specs and related schema references.
Python data models & schemas
backend/cosmetology-app/lambdas/python/common/cc_common/data_model/schema/compact/..., .../jurisdiction/..., .../license/api.py, .../provider/api.py
Removed summary-report email fields from compact/jurisdiction schemas and records; added OpenSearch-oriented schemas (LicenseOpenSearchDocumentSchema, ProviderOpenSearchDocumentSchema, PublicLicenseSearchResponseSchema) and tightened/extended provider query validation (pageSize range, licenseNumber filter, post_load renaming).
Public search handler & utils
backend/cosmetology-app/lambdas/python/search/handlers/public_search.py, backend/cosmetology-app/lambdas/python/search/utils.py, backend/cosmetology-app/lambdas/python/search/opensearch_client.py
Added public_search_api_handler with validation, query building, search_after cursor encoding/decoding, sanitization and pagination; renamed/updated provider document generation to produce one document per license and added delete_provider_documents to OpenSearch client.
Search ingest/indexing flows
backend/cosmetology-app/lambdas/python/search/handlers/populate_provider_documents.py, .../provider_update_ingest.py, .../manage_opensearch_indices.py
Switched to one-doc-per-license indexing, updated bulk indexing to use documentId as id_field, added dateOfBirth mapping, changed update/remove handling to delete by provider and re-index remaining docs, and adjusted stats counters.
Tests (Python & Node)
backend/cosmetology-app/lambdas/python/search/tests/*, backend/cosmetology-app/lambdas/python/common/tests/*, backend/cosmetology-app/lambdas/nodejs/tests/*, backend/cosmetology-app/tests/resources/snapshots/*
Added extensive tests for public search, OpenSearch document generation, delete_provider_documents, and provider-record utilities; updated many fixtures/snapshots to remove summary email fields and reflect license-number/dateOfBirth changes.
CDK wiring & API models
backend/cosmetology-app/pipeline/backend_stage.py, backend/cosmetology-app/stacks/search_persistent_stack/..., backend/cosmetology-app/stacks/api_stack/..., backend/cosmetology-app/stacks/api_lambda_stack/public_lookup_api.py, backend/cosmetology-app/tests/app/test_api/test_public_lookup_api.py
Instantiated and threaded a SearchPersistentStack earlier, added PublicSearchProvidersFunction and exports/alarms/permissions, passed search_persistent_stack through ApiStack → V1Api → PublicLookupApi, and updated tests to reference the new handler and wiring.
Test runner inclusion
backend/cosmetology-app/bin/run_python_tests.py
Added lambdas/python/search to the python test directories scanned by the unified pytest runner.

Sequence Diagram

sequenceDiagram
    participant Client as Client
    participant APIG as ApiGateway
    participant Handler as PublicSearchHandler
    participant OS as OpenSearch
    participant DB as DynamoDB

    Client->>APIG: POST /v1/public/compacts/{compact}/providers/query (query + pagination)
    APIG->>Handler: public_search_api_handler(event, context)
    Handler->>Handler: _parse_and_validate_public_query_body()
    Handler->>Handler: _decode_public_cursor() (if pagination.lastKey)
    Handler->>Handler: _build_public_license_search_body()
    Handler->>OS: search(index: compact_{compact}_providers, body)
    OS-->>Handler: hits[] (documents per-license)
    Handler->>Handler: sanitize via PublicLicenseSearchResponseSchema
    Handler->>Handler: _encode_public_cursor() -> lastKey
    Handler-->>APIG: {providers, pagination:{pageSize,lastKey,prevLastKey}, query}
    APIG-->>Client: 200 OK (paginated results)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related issues

  • #1295 — cosmetology public search - BE: PR adds licenseNumber filtering and public search endpoint, which aligns with the issue's API/search objectives.
  • cosmetology - Support searching by date of birth for staff user search #1310: Adds dateOfBirth indexing and enforces readPrivate permission checks; this PR implements both dateOfBirth indexing and authorization checks tied to private fields.

Possibly related PRs

Suggested reviewers

  • jlkravitz

Poem

🐰 Hopping through schemas with care,
I sniff out licenses in open air,
Summary emails quietly gone,
Public search now hops along,
Cursor in paw, results to share.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 79.31% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Support public search for Cosmetology' directly and clearly summarizes the main objective of the PR, which is to add public search capabilities for the Cosmetology application using OpenSearch.
Description check ✅ Passed The PR description includes all key sections: requirements (search parameters, pagination), explanation of approach (OpenSearch, pagination tracking), removal of config fields, and comprehensive testing list matching the template.
Linked Issues check ✅ Passed The changes comprehensively implement issue #1295 requirements: support license-number queries in addition to name/jurisdiction searches, establish OpenSearch-based pagination through cursor tracking, add API endpoints for public search, and include automated tests with updated API documentation.
Out of Scope Changes check ✅ Passed All code changes align with stated objectives: OpenSearch indexing refactor (one doc per license), removal of summary report notification email fields, provider document schema updates, and public search handler implementation. No out-of-scope changes detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Also added custom cursor logic for tracking
accurate page size of licenses.
@landonshumway-ia landonshumway-ia changed the title WIP - public search with OpenSearch Support public search for Cosmetology Mar 3, 2026
@landonshumway-ia landonshumway-ia marked this pull request as ready for review March 3, 2026 21:39
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (3)
backend/cosmetology-app/lambdas/python/search/tests/function/test_public_search_providers.py (1)

15-31: Preserve explicit empty request bodies in the event helper.

Line [29] currently turns {} into None. Use an explicit is not None check so tests can intentionally send an empty JSON object when needed.

🔧 Proposed fix
-            'body': json.dumps(body) if body else None,
+            'body': json.dumps(body) if body is not None else None,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@backend/cosmetology-app/lambdas/python/search/tests/function/test_public_search_providers.py`
around lines 15 - 31, The helper _create_public_api_event currently converts an
explicit empty dict body ({}) into None by using a truthy check; change the body
serialization to use an explicit "is not None" check so that
body=json.dumps(body) if body is not None else None, ensuring tests can send an
intentional empty JSON object; update the 'body' field generation inside
_create_public_api_event to use this explicit check and leave all other fields
unchanged.
backend/cosmetology-app/lambdas/python/search/handlers/public_search.py (2)

74-78: Consider logging when resume_provider_id is not found.

If resume_provider_id from the cursor doesn't match any provider in the current result set (e.g., due to data changes between pagination requests), the resume logic silently processes all providers without skipping. This could lead to duplicate results across pages.

Adding a warning log after the main loop when resume_provider_id is still set (not None) would help with debugging pagination inconsistencies in production.

🔍 Proposed enhancement
         prev_sort = last_sort
         last_sort = hit.get('sort')
 
+    if resume_provider_id is not None:
+        logger.warning(
+            'Resume provider not found in results, possible data change during pagination',
+            resume_provider_id=resume_provider_id,
+        )
+
     last_key = None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/cosmetology-app/lambdas/python/search/handlers/public_search.py`
around lines 74 - 78, Add a warning log when the resume cursor's provider id is
never found in the current result set: after the loop that inspects each hit
(where inner_hits, provider_id, skip, resume_offset and resume_provider_id are
used and possibly reset), check if resume_provider_id is still not None and if
so emit a warning via the existing logger (include the unresolved
resume_provider_id and resume_offset) so pagination inconsistencies are visible
in production.

94-98: Redundant pop('jurisdiction') call.

The PublicLicenseSearchResponseSchema.rename_jurisdiction_to_license_jurisdiction @post_load method already pops 'jurisdiction' from the data dict. This line is a no-op.

♻️ Proposed fix
             try:
                 sanitized = license_schema.load(license_source)
-                sanitized.pop('jurisdiction', None)
                 providers.append(sanitized)
                 consumed_from_this_provider += 1
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/cosmetology-app/lambdas/python/search/handlers/public_search.py`
around lines 94 - 98, Remove the redundant pop call: after deserializing with
license_schema.load(license_source) (in the try block that appends to providers
and increments consumed_from_this_provider), delete the line
sanitized.pop('jurisdiction', None) because
PublicLicenseSearchResponseSchema.rename_jurisdiction_to_license_jurisdiction
already removes 'jurisdiction'; leave the rest of the block
(providers.append(sanitized) and consumed_from_this_provider += 1) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@backend/cosmetology-app/lambdas/python/common/cc_common/data_model/schema/provider/api.py`:
- Line 235: The schema field licenseNumber currently validates with
Length(min=1, max=100) which conflicts with the API contract's maxLength of 500;
update the validation on the licenseNumber field (the String(...) call in
provider/api.py) to use max=500 so incoming payloads allowed by the request
schema are accepted (keep min=1 and allow_none=False as-is).

In `@backend/cosmetology-app/lambdas/python/search/handlers/public_search.py`:
- Around line 19-21: The inline comment above the OpenSearch client
instantiation is inconsistent: it says "20 seconds" but the code sets
timeout=25; update the comment that precedes opensearch_client =
OpenSearchClient(timeout=25) to state "25 seconds" (or otherwise match the
actual timeout value set on OpenSearchClient) so the comment and the
instantiation of OpenSearchClient are consistent.

In `@backend/cosmetology-app/stacks/api_stack/v1_api/api_model.py`:
- Around line 113-118: The query schema's licenseNumber field in api_model.py
allows max_length=500 which is inconsistent with the public query/response
schemas capped at 100; update the JsonSchema for 'licenseNumber' used in the
query model (the entry with key 'licenseNumber' in the query schema block) to
use max_length=100 (and ensure min_length stays 1) so all related models
(query.licenseNumber and the public response/query licenseNumber schemas) share
the same 1..100 constraint and avoid contract drift.

In `@backend/cosmetology-app/stacks/search_persistent_stack/search_handler.py`:
- Around line 80-99: PublicSearchProvidersFunction is being created with the
shared lambda_role which grants export-bucket and other internal privileges;
create a dedicated IAM role (e.g., public_search_lambda_role) for the public
handler with only the minimal permissions it needs (CloudWatch Logs,
network/VPC, and specific OpenSearch access) and do not attach the export-bucket
or internal datastore policies; then pass that new role into the PythonFunction
call for PublicSearchProvidersFunction instead of lambda_role and ensure any
code that previously relied on the shared role’s extra policies is updated or
kept out of the public function.

---

Nitpick comments:
In `@backend/cosmetology-app/lambdas/python/search/handlers/public_search.py`:
- Around line 74-78: Add a warning log when the resume cursor's provider id is
never found in the current result set: after the loop that inspects each hit
(where inner_hits, provider_id, skip, resume_offset and resume_provider_id are
used and possibly reset), check if resume_provider_id is still not None and if
so emit a warning via the existing logger (include the unresolved
resume_provider_id and resume_offset) so pagination inconsistencies are visible
in production.
- Around line 94-98: Remove the redundant pop call: after deserializing with
license_schema.load(license_source) (in the try block that appends to providers
and increments consumed_from_this_provider), delete the line
sanitized.pop('jurisdiction', None) because
PublicLicenseSearchResponseSchema.rename_jurisdiction_to_license_jurisdiction
already removes 'jurisdiction'; leave the rest of the block
(providers.append(sanitized) and consumed_from_this_provider += 1) unchanged.

In
`@backend/cosmetology-app/lambdas/python/search/tests/function/test_public_search_providers.py`:
- Around line 15-31: The helper _create_public_api_event currently converts an
explicit empty dict body ({}) into None by using a truthy check; change the body
serialization to use an explicit "is not None" check so that
body=json.dumps(body) if body is not None else None, ensuring tests can send an
intentional empty JSON object; update the 'body' field generation inside
_create_public_api_event to use this explicit check and leave all other fields
unchanged.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d1c77a3 and 9114378.

📒 Files selected for processing (47)
  • backend/cosmetology-app/bin/run_python_tests.py
  • backend/cosmetology-app/docs/internal/api-specification/latest-oas30.json
  • backend/cosmetology-app/docs/internal/postman/postman-collection.json
  • backend/cosmetology-app/lambdas/nodejs/email-notification-service/README.md
  • backend/cosmetology-app/lambdas/nodejs/lib/models/compact.ts
  • backend/cosmetology-app/lambdas/nodejs/lib/models/email-notification-service-event.ts
  • backend/cosmetology-app/lambdas/nodejs/lib/models/jurisdiction.ts
  • backend/cosmetology-app/lambdas/nodejs/tests/email-notification-service.test.ts
  • backend/cosmetology-app/lambdas/nodejs/tests/lib/compact-configuration-client.test.ts
  • backend/cosmetology-app/lambdas/nodejs/tests/lib/email/encumbrance-notification-service.test.ts
  • backend/cosmetology-app/lambdas/nodejs/tests/lib/email/investigation-notification-service.test.ts
  • backend/cosmetology-app/lambdas/nodejs/tests/lib/jurisdiction-client.test.ts
  • backend/cosmetology-app/lambdas/nodejs/tests/sample-records.ts
  • backend/cosmetology-app/lambdas/python/common/cc_common/data_model/schema/compact/__init__.py
  • backend/cosmetology-app/lambdas/python/common/cc_common/data_model/schema/compact/api.py
  • backend/cosmetology-app/lambdas/python/common/cc_common/data_model/schema/compact/record.py
  • backend/cosmetology-app/lambdas/python/common/cc_common/data_model/schema/jurisdiction/__init__.py
  • backend/cosmetology-app/lambdas/python/common/cc_common/data_model/schema/jurisdiction/api.py
  • backend/cosmetology-app/lambdas/python/common/cc_common/data_model/schema/jurisdiction/record.py
  • backend/cosmetology-app/lambdas/python/common/cc_common/data_model/schema/provider/api.py
  • backend/cosmetology-app/lambdas/python/common/common_test/test_data_generator.py
  • backend/cosmetology-app/lambdas/python/common/tests/resources/dynamo/compact.json
  • backend/cosmetology-app/lambdas/python/common/tests/resources/dynamo/jurisdiction.json
  • backend/cosmetology-app/lambdas/python/compact-configuration/handlers/compact_configuration.py
  • backend/cosmetology-app/lambdas/python/compact-configuration/tests/function/test_compact_configuration.py
  • backend/cosmetology-app/lambdas/python/search/handlers/public_search.py
  • backend/cosmetology-app/lambdas/python/search/tests/__init__.py
  • backend/cosmetology-app/lambdas/python/search/tests/function/__init__.py
  • backend/cosmetology-app/lambdas/python/search/tests/function/test_public_search_providers.py
  • backend/cosmetology-app/pipeline/backend_stage.py
  • backend/cosmetology-app/stacks/api_lambda_stack/public_lookup_api.py
  • backend/cosmetology-app/stacks/api_stack/__init__.py
  • backend/cosmetology-app/stacks/api_stack/api.py
  • backend/cosmetology-app/stacks/api_stack/v1_api/api.py
  • backend/cosmetology-app/stacks/api_stack/v1_api/api_model.py
  • backend/cosmetology-app/stacks/api_stack/v1_api/public_lookup_api.py
  • backend/cosmetology-app/stacks/search_persistent_stack/search_handler.py
  • backend/cosmetology-app/tests/app/test_api/test_public_lookup_api.py
  • backend/cosmetology-app/tests/resources/snapshots/GET_COMPACT_CONFIGURATION_RESPONSE_SCHEMA.json
  • backend/cosmetology-app/tests/resources/snapshots/GET_JURISDICTION_CONFIGURATION_RESPONSE_SCHEMA.json
  • backend/cosmetology-app/tests/resources/snapshots/PUBLIC_QUERY_PROVIDERS_REQUEST_SCHEMA.json
  • backend/cosmetology-app/tests/resources/snapshots/PUBLIC_QUERY_PROVIDERS_RESPONSE_SCHEMA.json
  • backend/cosmetology-app/tests/resources/snapshots/PUT_COMPACT_CONFIGURATION_REQUEST_SCHEMA.json
  • backend/cosmetology-app/tests/resources/snapshots/PUT_JURISDICTION_CONFIGURATION_REQUEST_SCHEMA.json
  • backend/cosmetology-app/tests/resources/snapshots/QUERY_PROVIDERS_REQUEST_SCHEMA.json
  • backend/cosmetology-app/tests/smoke/compact_configuration_smoke_tests.py
  • backend/cosmetology-app/tests/smoke/rollback_license_upload_smoke_tests.py
💤 Files with no reviewable changes (26)
  • backend/cosmetology-app/tests/resources/snapshots/GET_JURISDICTION_CONFIGURATION_RESPONSE_SCHEMA.json
  • backend/cosmetology-app/lambdas/python/common/tests/resources/dynamo/jurisdiction.json
  • backend/cosmetology-app/lambdas/nodejs/lib/models/email-notification-service-event.ts
  • backend/cosmetology-app/lambdas/nodejs/tests/email-notification-service.test.ts
  • backend/cosmetology-app/lambdas/python/common/cc_common/data_model/schema/compact/api.py
  • backend/cosmetology-app/tests/resources/snapshots/GET_COMPACT_CONFIGURATION_RESPONSE_SCHEMA.json
  • backend/cosmetology-app/lambdas/python/common/common_test/test_data_generator.py
  • backend/cosmetology-app/tests/smoke/rollback_license_upload_smoke_tests.py
  • backend/cosmetology-app/tests/resources/snapshots/PUT_JURISDICTION_CONFIGURATION_REQUEST_SCHEMA.json
  • backend/cosmetology-app/tests/resources/snapshots/PUT_COMPACT_CONFIGURATION_REQUEST_SCHEMA.json
  • backend/cosmetology-app/lambdas/python/compact-configuration/handlers/compact_configuration.py
  • backend/cosmetology-app/lambdas/python/common/cc_common/data_model/schema/jurisdiction/record.py
  • backend/cosmetology-app/lambdas/nodejs/lib/models/compact.ts
  • backend/cosmetology-app/tests/smoke/compact_configuration_smoke_tests.py
  • backend/cosmetology-app/lambdas/nodejs/tests/lib/jurisdiction-client.test.ts
  • backend/cosmetology-app/lambdas/nodejs/email-notification-service/README.md
  • backend/cosmetology-app/lambdas/nodejs/tests/lib/compact-configuration-client.test.ts
  • backend/cosmetology-app/lambdas/nodejs/lib/models/jurisdiction.ts
  • backend/cosmetology-app/lambdas/python/common/cc_common/data_model/schema/compact/init.py
  • backend/cosmetology-app/lambdas/python/common/tests/resources/dynamo/compact.json
  • backend/cosmetology-app/lambdas/python/common/cc_common/data_model/schema/jurisdiction/init.py
  • backend/cosmetology-app/lambdas/python/common/cc_common/data_model/schema/compact/record.py
  • backend/cosmetology-app/lambdas/python/common/cc_common/data_model/schema/jurisdiction/api.py
  • backend/cosmetology-app/docs/internal/api-specification/latest-oas30.json
  • backend/cosmetology-app/lambdas/nodejs/tests/sample-records.ts
  • backend/cosmetology-app/lambdas/python/compact-configuration/tests/function/test_compact_configuration.py

jurisdiction = Jurisdiction(required=False, allow_none=False)
givenName = String(required=False, allow_none=False, validate=Length(min=1, max=100))
familyName = String(required=False, allow_none=False, validate=Length(min=1, max=100))
licenseNumber = String(required=False, allow_none=False, validate=Length(min=1, max=100))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Align licenseNumber max length with the request contract.

Line 235 enforces max=100, but the request schema snapshots in this PR advertise maxLength: 500. That mismatch will reject payloads the API contract says are valid.

Proposed fix
-        licenseNumber = String(required=False, allow_none=False, validate=Length(min=1, max=100))
+        licenseNumber = String(required=False, allow_none=False, validate=Length(min=1, max=500))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
licenseNumber = String(required=False, allow_none=False, validate=Length(min=1, max=100))
licenseNumber = String(required=False, allow_none=False, validate=Length(min=1, max=500))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@backend/cosmetology-app/lambdas/python/common/cc_common/data_model/schema/provider/api.py`
at line 235, The schema field licenseNumber currently validates with
Length(min=1, max=100) which conflicts with the API contract's maxLength of 500;
update the validation on the licenseNumber field (the String(...) call in
provider/api.py) to use max=500 so incoming payloads allowed by the request
schema are accepted (keep min=1 and allow_none=False as-is).

Comment on lines +113 to +118
'licenseNumber': JsonSchema(
type=JsonSchemaType.STRING,
min_length=1,
max_length=500,
description='Filter for licenses with a specific license number',
),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Align licenseNumber length constraints across related models.

query.licenseNumber here allows up to 500 chars, while the public query/response licenseNumber schemas cap at 100. This creates avoidable contract drift between request and public response surfaces.

🔧 Proposed fix
                             'licenseNumber': JsonSchema(
                                 type=JsonSchemaType.STRING,
                                 min_length=1,
-                                max_length=500,
+                                max_length=100,
                                 description='Filter for licenses with a specific license number',
                             ),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
'licenseNumber': JsonSchema(
type=JsonSchemaType.STRING,
min_length=1,
max_length=500,
description='Filter for licenses with a specific license number',
),
'licenseNumber': JsonSchema(
type=JsonSchemaType.STRING,
min_length=1,
max_length=100,
description='Filter for licenses with a specific license number',
),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/cosmetology-app/stacks/api_stack/v1_api/api_model.py` around lines
113 - 118, The query schema's licenseNumber field in api_model.py allows
max_length=500 which is inconsistent with the public query/response schemas
capped at 100; update the JsonSchema for 'licenseNumber' used in the query model
(the entry with key 'licenseNumber' in the query schema block) to use
max_length=100 (and ensure min_length stays 1) so all related models
(query.licenseNumber and the public response/query licenseNumber schemas) share
the same 1..100 constraint and avoid contract drift.

Comment on lines +80 to +99
self.public_handler = PythonFunction(
self,
'PublicSearchProvidersFunction',
description='Public search handler for OpenSearch license queries',
index=os.path.join('handlers', 'public_search.py'),
lambda_dir='search',
handler='public_search_api_handler',
role=lambda_role,
log_retention=RetentionDays.ONE_MONTH,
environment={
'OPENSEARCH_HOST_ENDPOINT': opensearch_domain.domain_endpoint,
**stack.common_env_vars,
},
timeout=Duration.seconds(29),
memory_size=2048,
vpc=vpc_stack.vpc,
vpc_subnets=vpc_subnets,
security_groups=[vpc_stack.lambda_security_group],
alarm_topic=alarm_topic,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid sharing the internal search role with the public search Lambda.

PublicSearchProvidersFunction currently reuses lambda_role, so it inherits export-bucket permissions granted for the internal handler. That violates least privilege for a public-facing function.

Proposed fix
         self.public_handler = PythonFunction(
             self,
             'PublicSearchProvidersFunction',
             description='Public search handler for OpenSearch license queries',
             index=os.path.join('handlers', 'public_search.py'),
             lambda_dir='search',
             handler='public_search_api_handler',
-            role=lambda_role,
             log_retention=RetentionDays.ONE_MONTH,
             environment={
                 'OPENSEARCH_HOST_ENDPOINT': opensearch_domain.domain_endpoint,
                 **stack.common_env_vars,
             },
@@
         )
         opensearch_domain.grant_read(self.public_handler)
+
+        NagSuppressions.add_resource_suppressions_by_path(
+            stack,
+            f'{self.public_handler.role.node.path}/DefaultPolicy/Resource',
+            [
+                {
+                    'id': 'AwsSolutions-IAM5',
+                    'reason': 'Public search lambda needs wildcard index permissions for OpenSearch read operations.',
+                },
+            ],
+        )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
self.public_handler = PythonFunction(
self,
'PublicSearchProvidersFunction',
description='Public search handler for OpenSearch license queries',
index=os.path.join('handlers', 'public_search.py'),
lambda_dir='search',
handler='public_search_api_handler',
role=lambda_role,
log_retention=RetentionDays.ONE_MONTH,
environment={
'OPENSEARCH_HOST_ENDPOINT': opensearch_domain.domain_endpoint,
**stack.common_env_vars,
},
timeout=Duration.seconds(29),
memory_size=2048,
vpc=vpc_stack.vpc,
vpc_subnets=vpc_subnets,
security_groups=[vpc_stack.lambda_security_group],
alarm_topic=alarm_topic,
)
self.public_handler = PythonFunction(
self,
'PublicSearchProvidersFunction',
description='Public search handler for OpenSearch license queries',
index=os.path.join('handlers', 'public_search.py'),
lambda_dir='search',
handler='public_search_api_handler',
log_retention=RetentionDays.ONE_MONTH,
environment={
'OPENSEARCH_HOST_ENDPOINT': opensearch_domain.domain_endpoint,
**stack.common_env_vars,
},
timeout=Duration.seconds(29),
memory_size=2048,
vpc=vpc_stack.vpc,
vpc_subnets=vpc_subnets,
security_groups=[vpc_stack.lambda_security_group],
alarm_topic=alarm_topic,
)
opensearch_domain.grant_read(self.public_handler)
NagSuppressions.add_resource_suppressions_by_path(
stack,
f'{self.public_handler.role.node.path}/DefaultPolicy/Resource',
[
{
'id': 'AwsSolutions-IAM5',
'reason': 'Public search lambda needs wildcard index permissions for OpenSearch read operations.',
},
],
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/cosmetology-app/stacks/search_persistent_stack/search_handler.py`
around lines 80 - 99, PublicSearchProvidersFunction is being created with the
shared lambda_role which grants export-bucket and other internal privileges;
create a dedicated IAM role (e.g., public_search_lambda_role) for the public
handler with only the minimal permissions it needs (CloudWatch Logs,
network/VPC, and specific OpenSearch access) and do not attach the export-bucket
or internal datastore policies; then pass that new role into the PythonFunction
call for PublicSearchProvidersFunction instead of lambda_role and ensure any
code that previously relied on the shared role’s extra policies is updated or
kept out of the public function.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
backend/cosmetology-app/stacks/api_stack/v1_api/api_model.py (1)

113-118: ⚠️ Potential issue | 🟠 Major

Align licenseNumber length constraints across API model and runtime validation.

Line 116 allows up to 500 chars, but runtime validation in backend/cosmetology-app/lambdas/python/common/cc_common/data_model/schema/provider/api.py (Line 236) caps at 100. Requests in the 101–500 range will pass API model validation and fail in Lambda.

🔧 Proposed fix
                             'licenseNumber': JsonSchema(
                                 type=JsonSchemaType.STRING,
                                 min_length=1,
-                                max_length=500,
+                                max_length=100,
                                 description='Filter for licenses with a specific license number',
                             ),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/cosmetology-app/stacks/api_stack/v1_api/api_model.py` around lines
113 - 118, The API model's JsonSchema for 'licenseNumber' allows max_length=500
but runtime validation in cc_common.data_model.schema.provider.api (validator
for licenseNumber) enforces max 100, causing mismatched behavior; update the
JsonSchema definition (the 'licenseNumber' JsonSchema entry) to set
max_length=100 to match the runtime validator (or alternatively raise the
runtime cap to 500 if that is desired across the system) and ensure both
JsonSchema and the validator in cc_common.data_model.schema.provider.api use the
same max length value.
backend/cosmetology-app/lambdas/python/search/handlers/public_search.py (1)

19-21: ⚠️ Potential issue | 🟡 Minor

Fix timeout comment drift.

Line 20 says “20 seconds” while Line 21 configures timeout=25.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/cosmetology-app/lambdas/python/search/handlers/public_search.py`
around lines 19 - 21, Comment and code disagree about the OpenSearch client
timeout: update the comment above the OpenSearchClient instantiation
(referencing OpenSearchClient(timeout=25)) so it accurately states "25 seconds"
(or alternatively change the timeout arg to 20 if you prefer the comment) — make
the comment wording consistent with the configured timeout value.
🧹 Nitpick comments (1)
backend/cosmetology-app/stacks/api_lambda_stack/public_lookup_api.py (1)

55-58: Add a concrete cleanup owner/timeline for this temporary bridge.

Since this keeps a legacy lambda/export alive during cutover, please attach a linked cleanup issue (or explicit release/date + owner) to ensure it cannot linger past migration.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/cosmetology-app/stacks/api_lambda_stack/public_lookup_api.py` around
lines 55 - 58, Add a concrete cleanup owner and timeline to the temporary export
for the legacy lambda: update the TODO near
stack.export_value(self.query_providers_handler.function_arn) to include a
JIRA/GitHub issue link or issue ID, the responsible owner (team or username),
and a specific removal date/release (e.g., "remove by 2026-06-01 or release
vX.Y"), and add a short comment that the export should be deleted after
SearchPersistentStack.public_handler is fully live; ensure the comment
references query_providers_handler.function_arn and stack.export_value so future
readers can locate and remove the bridge.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/cosmetology-app/lambdas/python/search/handlers/public_search.py`:
- Around line 76-91: The cursor math is incorrect: when building the next cursor
inside the inner_hits loop update next_cursor_license_offset to reflect the
cumulative offset for the provider (e.g., resume_offset +
consumed_from_this_provider or consumed_from_this_provider if no resume_offset)
instead of only current-page consumption, and set next_cursor_search_after to
the current provider's sort value (use last_sort or hit.get('sort')) rather than
prev_sort so the resume uses the latest sort key; apply the same fixes to the
corresponding block around the other occurrence (the block at lines ~106-110)
and ensure consumed_from_this_provider is incremented consistently before
computing the next cursor.
- Around line 167-172: The code currently only checks presence of resume keys in
decoded but not their types; add explicit type validation after computing
has_resume: if has_resume ensure decoded['license_offset'] is an int (or
convertible to int) and decoded['resume_provider_sort'] is a list (and
optionally its items are of expected type), and if decoded.get('search_after')
is present ensure it is the expected type (e.g., list); for any type mismatch
raise CCInvalidRequestException('Invalid lastKey') so malformed lastKey values
return 400 instead of later causing 500; apply these checks where decoded is
validated (the block that sets has_resume and returns decoded in
public_search.py).

---

Duplicate comments:
In `@backend/cosmetology-app/lambdas/python/search/handlers/public_search.py`:
- Around line 19-21: Comment and code disagree about the OpenSearch client
timeout: update the comment above the OpenSearchClient instantiation
(referencing OpenSearchClient(timeout=25)) so it accurately states "25 seconds"
(or alternatively change the timeout arg to 20 if you prefer the comment) — make
the comment wording consistent with the configured timeout value.

In `@backend/cosmetology-app/stacks/api_stack/v1_api/api_model.py`:
- Around line 113-118: The API model's JsonSchema for 'licenseNumber' allows
max_length=500 but runtime validation in
cc_common.data_model.schema.provider.api (validator for licenseNumber) enforces
max 100, causing mismatched behavior; update the JsonSchema definition (the
'licenseNumber' JsonSchema entry) to set max_length=100 to match the runtime
validator (or alternatively raise the runtime cap to 500 if that is desired
across the system) and ensure both JsonSchema and the validator in
cc_common.data_model.schema.provider.api use the same max length value.

---

Nitpick comments:
In `@backend/cosmetology-app/stacks/api_lambda_stack/public_lookup_api.py`:
- Around line 55-58: Add a concrete cleanup owner and timeline to the temporary
export for the legacy lambda: update the TODO near
stack.export_value(self.query_providers_handler.function_arn) to include a
JIRA/GitHub issue link or issue ID, the responsible owner (team or username),
and a specific removal date/release (e.g., "remove by 2026-06-01 or release
vX.Y"), and add a short comment that the export should be deleted after
SearchPersistentStack.public_handler is fully live; ensure the comment
references query_providers_handler.function_arn and stack.export_value so future
readers can locate and remove the bridge.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9114378 and 3406a5d.

📒 Files selected for processing (5)
  • backend/cosmetology-app/lambdas/python/common/cc_common/data_model/schema/provider/api.py
  • backend/cosmetology-app/lambdas/python/search/handlers/public_search.py
  • backend/cosmetology-app/lambdas/python/search/tests/function/test_public_search_providers.py
  • backend/cosmetology-app/stacks/api_lambda_stack/public_lookup_api.py
  • backend/cosmetology-app/stacks/api_stack/v1_api/api_model.py

We are updating our OpenSearch indexing design to index one document
per license, rather than per provider. This means that we can use
native OpenSearch pagination and return the exact number of results
the client requests if there are enough matches found. It also reduces
the complexity of only returning licenses that match, rather than
having to map inner hits.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (1)
backend/cosmetology-app/lambdas/python/search/tests/function/test_populate_provider_documents.py (1)

138-160: Add a 2-license fixture to this handler test.

With only one license, this still passes if the handler regresses from extend() back to append() or only forwards the first generated document to bulk_index(). A provider with two licenses would exercise the core 1-license→1-document behavior introduced here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@backend/cosmetology-app/lambdas/python/search/tests/function/test_populate_provider_documents.py`
around lines 138 - 160, The test test_populate_indexes_document_with_document_id
currently only inserts one license so it won't catch regressions that use
append() or only forward the first document; update the test setup to insert a
second license for the same provider (call the existing helper
_put_test_provider_and_license_record_in_dynamodb_table twice or add a new
fixture via that helper) so the handler must produce two documents, then assert
bulk_index was called with two documents and verify both expected documents via
_generate_expected_document('cosm') for each license (update indexed_documents
length/assertions accordingly) while keeping the same checks for index_name and
id_field on mock_client_instance.bulk_index call.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@backend/cosmetology-app/lambdas/python/common/cc_common/data_model/schema/provider/api.py`:
- Around line 220-229: Make licenseJurisdiction required and normalized: replace
the current String(...) declaration for licenseJurisdiction with the existing
Jurisdiction field type and set required=True, allow_none=False so
malformed/missing values fail validation; add a `@pre_load` method that, if
'jurisdiction' exists in the incoming data, copies it to 'licenseJurisdiction'
before field validation (so the Jurisdiction field enforces lowercase
normalization), and keep the existing `@post_load`
rename_jurisdiction_to_license_jurisdiction for safety.

In
`@backend/cosmetology-app/lambdas/python/search/handlers/populate_provider_documents.py`:
- Around line 79-81: The code currently treats failed_ids as document-level and
increments providers_failed per failed document, causing double-counting when
multiple license docs for the same provider fail; update the failure counting to
count unique provider IDs instead. Locate where failed_ids is used to update
compacts_processed['providers_failed'] and total_providers_failed and replace
the per-document increment with computing the set of provider IDs from the
failed documents (e.g., extract provider_id from each failed doc entry or map)
and increment by len(set(provider_ids)); apply the same change for the other
occurrences referenced (the compacts_processed update block and the
total_providers_failed update at the other locations) so both per-batch
compacts_processed['providers_failed'] and total_providers_failed reflect unique
providers failed rather than failed documents.

In
`@backend/cosmetology-app/lambdas/python/search/handlers/provider_update_ingest.py`:
- Around line 202-239: The try/except around
generate_provider_opensearch_documents(...) in provider_update_ingest.py
currently only handles CCNotFoundException and CCInternalException; add a except
block for ValidationError (the exception raised by
generate_provider_opensearch_documents) to log the validation failure (include
provider_id, compact, and index_name), mark the provider as failed by calling
failed_providers[compact].add(provider_id), and avoid letting the
ValidationError bubble up and cause the whole SQS batch to retry.

In `@backend/cosmetology-app/lambdas/python/search/handlers/public_search.py`:
- Around line 85-89: The pagination cursor logic sets last_key based on the raw
OpenSearch hits and page_size which is incorrect when you filter hits; change
the condition that sets last_key (currently using last_sort, hits and page_size)
to use the number of accepted results (providers) instead of hits — i.e., only
set last_key = _encode_public_cursor(last_sort) when last_sort is not None and
len(providers) >= page_size (or implement fetching additional documents to fill
providers before setting last_key) so filtered-out documents don’t cause
premature cursors and empty subsequent pages.

In `@backend/cosmetology-app/lambdas/python/search/handlers/search.py`:
- Around line 66-67: The DOB permission check in
_validate_date_of_birth_permission currently only scans dict keys and misses
field references embedded in scalar values and the top-level sort parameter;
update validation to gate both the 'query' and 'sort' inputs by recursively
extracting exact field references from dict keys and from scalar fields named
'field' or 'fields', as well as parsing query-string values (e.g.,
"licenses.dateOfBirth:...") and exists/terms style clauses; ensure
_validate_date_of_birth_permission (and the places that call it from the search
handler) rejects any reference to 'dateOfBirth' (including
"licenses.dateOfBirth") found in nested structures, scalar 'field'/'fields'
entries, query_string values, and in sort arrays before allowing a caller with
only readGeneral scope to proceed.

In `@backend/cosmetology-app/lambdas/python/search/utils.py`:
- Around line 47-50: The code uses jurisdiction as-is when setting
serializable['licenses'][0] and composing serializable['documentId'], which can
produce inconsistent IDs for mixed-case input; normalize by lowercasing the
jurisdiction value before storing and before composing documentId (e.g., set
jurisdiction = license_info['jurisdiction'].lower() and update
serializable['licenses'][0]['jurisdiction'] accordingly) so
serializable['documentId'] uses the normalized jurisdiction; also ensure
provider_record_util.py and related modules follow the same lowercase
normalization pattern for licenses.jurisdiction.

---

Nitpick comments:
In
`@backend/cosmetology-app/lambdas/python/search/tests/function/test_populate_provider_documents.py`:
- Around line 138-160: The test test_populate_indexes_document_with_document_id
currently only inserts one license so it won't catch regressions that use
append() or only forward the first document; update the test setup to insert a
second license for the same provider (call the existing helper
_put_test_provider_and_license_record_in_dynamodb_table twice or add a new
fixture via that helper) so the handler must produce two documents, then assert
bulk_index was called with two documents and verify both expected documents via
_generate_expected_document('cosm') for each license (update indexed_documents
length/assertions accordingly) while keeping the same checks for index_name and
id_field on mock_client_instance.bulk_index call.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 51d72262-cfa3-440a-98dd-363564cb6cb7

📥 Commits

Reviewing files that changed from the base of the PR and between 3406a5d and 73cdb84.

📒 Files selected for processing (19)
  • backend/cosmetology-app/lambdas/python/common/cc_common/data_model/provider_record_util.py
  • backend/cosmetology-app/lambdas/python/common/cc_common/data_model/schema/license/api.py
  • backend/cosmetology-app/lambdas/python/common/cc_common/data_model/schema/provider/api.py
  • backend/cosmetology-app/lambdas/python/common/tests/unit/test_data_model/test_schema/test_license.py
  • backend/cosmetology-app/lambdas/python/common/tests/unit/test_data_model/test_schema/test_provider.py
  • backend/cosmetology-app/lambdas/python/common/tests/unit/test_provider_record_util.py
  • backend/cosmetology-app/lambdas/python/search/handlers/manage_opensearch_indices.py
  • backend/cosmetology-app/lambdas/python/search/handlers/populate_provider_documents.py
  • backend/cosmetology-app/lambdas/python/search/handlers/provider_update_ingest.py
  • backend/cosmetology-app/lambdas/python/search/handlers/public_search.py
  • backend/cosmetology-app/lambdas/python/search/handlers/search.py
  • backend/cosmetology-app/lambdas/python/search/opensearch_client.py
  • backend/cosmetology-app/lambdas/python/search/tests/function/test_manage_opensearch_indices.py
  • backend/cosmetology-app/lambdas/python/search/tests/function/test_populate_provider_documents.py
  • backend/cosmetology-app/lambdas/python/search/tests/function/test_provider_update_ingest.py
  • backend/cosmetology-app/lambdas/python/search/tests/function/test_public_search_providers.py
  • backend/cosmetology-app/lambdas/python/search/tests/function/test_search_providers.py
  • backend/cosmetology-app/lambdas/python/search/tests/unit/test_opensearch_client.py
  • backend/cosmetology-app/lambdas/python/search/utils.py
✅ Files skipped from review due to trivial changes (2)
  • backend/cosmetology-app/lambdas/python/search/handlers/manage_opensearch_indices.py
  • backend/cosmetology-app/lambdas/python/search/tests/function/test_manage_opensearch_indices.py

Comment on lines +220 to +229
jurisdiction = String(required=False, allow_none=False, load_only=True) # OpenSearch uses jurisdiction
licenseJurisdiction = String(required=False, allow_none=False, load_default=None)
compact = Compact(required=True, allow_none=False)
licenseNumber = String(required=True, allow_none=False, validate=Length(1, 100))

@post_load
def rename_jurisdiction_to_license_jurisdiction(self, data, **kwargs):
if 'jurisdiction' in data:
data['licenseJurisdiction'] = data.pop('jurisdiction')
return data
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make licenseJurisdiction fail fast and normalized.

backend/cosmetology-app/lambdas/python/search/handlers/public_search.py:55-76 feeds this schema from licenses[0].copy(). With jurisdiction optional here, a malformed OpenSearch document is accepted instead of rejected, and using String skips the existing Jurisdiction normalization that the other provider response schemas already enforce.

🛠️ Proposed fix
-    jurisdiction = String(required=False, allow_none=False, load_only=True)  # OpenSearch uses jurisdiction
-    licenseJurisdiction = String(required=False, allow_none=False, load_default=None)
+    jurisdiction = Jurisdiction(required=True, allow_none=False, load_only=True)  # OpenSearch uses jurisdiction
+    licenseJurisdiction = Jurisdiction(required=False, allow_none=False)

     `@post_load`
     def rename_jurisdiction_to_license_jurisdiction(self, data, **kwargs):
-        if 'jurisdiction' in data:
-            data['licenseJurisdiction'] = data.pop('jurisdiction')
+        data['licenseJurisdiction'] = data.pop('jurisdiction')
         return data

Based on learnings: In provider_record_util.py (and related Python modules under backend/cosmetology-app), rely on and enforce the existing normalization pattern where jurisdiction values are stored in lowercase.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@backend/cosmetology-app/lambdas/python/common/cc_common/data_model/schema/provider/api.py`
around lines 220 - 229, Make licenseJurisdiction required and normalized:
replace the current String(...) declaration for licenseJurisdiction with the
existing Jurisdiction field type and set required=True, allow_none=False so
malformed/missing values fail validation; add a `@pre_load` method that, if
'jurisdiction' exists in the incoming data, copies it to 'licenseJurisdiction'
before field validation (so the Jurisdiction field enforces lowercase
normalization), and keep the existing `@post_load`
rename_jurisdiction_to_license_jurisdiction for safety.

Comment on lines 79 to 81
'total_providers_processed': 0,
'total_providers_indexed': 0,
'total_licenses_indexed': 0,
'total_providers_failed': 0,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Count failed providers separately from failed license documents.

failed_ids is document-level now. If two license docs for the same provider fail in one batch, providers_failed is incremented twice, so compacts_processed[*].providers_failed and total_providers_failed no longer match their names.

🩹 Proposed fix
 def _index_records_and_track_stats(
     documents_to_index: list[dict], compact: str, opensearch_client: OpenSearchClient, compact_stats: dict
 ):
     index_name = f'compact_{compact}_providers'
     if documents_to_index:
         failed_ids = _bulk_index_documents(opensearch_client, index_name, documents_to_index)
         compact_stats['licenses_indexed'] += len(documents_to_index) - len(failed_ids)
         if failed_ids:
-            compact_stats['providers_failed'] += len(failed_ids)
+            failed_provider_ids = {doc_id.split('#', 1)[0] for doc_id in failed_ids}
+            compact_stats['providers_failed'] += len(failed_provider_ids)
             logger.warning(
                 'Some documents failed to index in batch',
                 compact=compact,

Also applies to: 112-114, 294-296

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@backend/cosmetology-app/lambdas/python/search/handlers/populate_provider_documents.py`
around lines 79 - 81, The code currently treats failed_ids as document-level and
increments providers_failed per failed document, causing double-counting when
multiple license docs for the same provider fail; update the failure counting to
count unique provider IDs instead. Locate where failed_ids is used to update
compacts_processed['providers_failed'] and total_providers_failed and replace
the per-document increment with computing the set of provider IDs from the
failed documents (e.g., extract provider_id from each failed doc entry or map)
and increment by len(set(provider_ids)); apply the same change for the other
occurrences referenced (the compacts_processed update block and the
total_providers_failed update at the other locations) so both per-batch
compacts_processed['providers_failed'] and total_providers_failed reflect unique
providers failed rather than failed documents.

Comment on lines +202 to +239
# Re-check DynamoDB -- the REMOVE may have been for a single record while
# the provider still has other records remaining.
try:
docs = generate_provider_opensearch_documents(compact, provider_id)
if docs:
response = opensearch_client.bulk_index(
index_name=index_name, documents=docs, id_field='documentId'
)
logger.info(
'Re-indexed remaining documents after delete',
index_name=index_name,
provider_id=provider_id,
document_count=len(docs),
)
if response.get('errors'):
for item in response.get('items', []):
index_result = item.get('index', {})
if index_result.get('error'):
logger.error(
'Document re-indexing failed after delete',
document_id=index_result.get('_id'),
error=index_result.get('error'),
)
failed_providers[compact].add(provider_id)
except CCNotFoundException:
logger.info(
'Provider no longer exists after REMOVE event, delete is complete',
provider_id=provider_id,
compact=compact,
)
except CCInternalException as e:
# All deletes for this compact failed
logger.error(
'Failed to bulk delete documents after retries',
'Failed to re-index remaining documents after delete',
index_name=index_name,
document_count=len(providers_to_delete),
provider_id=provider_id,
error=str(e),
)
# Mark all providers to delete as failed
for provider_id in providers_to_delete:
failed_providers[compact].add(provider_id)
failed_providers[compact].add(provider_id)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Catch ValidationError when re-indexing after a REMOVE.

generate_provider_opensearch_documents() can fail validation in this branch too, but only CCNotFoundException and CCInternalException are handled. A malformed surviving provider will currently escape the handler and retry the whole SQS batch instead of producing a targeted batch failure.

🩹 Proposed fix
             except CCNotFoundException:
                 logger.info(
                     'Provider no longer exists after REMOVE event, delete is complete',
                     provider_id=provider_id,
                     compact=compact,
                 )
+            except ValidationError as e:
+                logger.warning(
+                    'Failed to process remaining provider for re-index after delete',
+                    index_name=index_name,
+                    provider_id=provider_id,
+                    compact=compact,
+                    error=str(e),
+                )
+                failed_providers[compact].add(provider_id)
             except CCInternalException as e:
                 logger.error(
                     'Failed to re-index remaining documents after delete',
                     index_name=index_name,
                     provider_id=provider_id,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@backend/cosmetology-app/lambdas/python/search/handlers/provider_update_ingest.py`
around lines 202 - 239, The try/except around
generate_provider_opensearch_documents(...) in provider_update_ingest.py
currently only handles CCNotFoundException and CCInternalException; add a except
block for ValidationError (the exception raised by
generate_provider_opensearch_documents) to log the validation failure (include
provider_id, compact, and index_name), mark the provider as failed by calling
failed_providers[compact].add(provider_id), and avoid letting the
ValidationError bubble up and cause the whole SQS batch to retry.

Comment on lines +85 to +89
last_sort = hits[-1].get('sort') if hits else None
# Full page from OpenSearch => may have more results; use last hit's sort values for search_after
last_key = None
if last_sort is not None and len(hits) >= page_size:
last_key = _encode_public_cursor(last_sort)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Pagination cursor may be set incorrectly when hits are filtered out.

The last_key is set based on len(hits) >= page_size, but the loop filters out hits where compact doesn't match (line 58), licenses array is empty (line 66), or schema validation fails (line 78). This means providers could have fewer items than page_size while still returning a last_key, potentially causing empty pages on subsequent requests if many consecutive documents are filtered.

Consider checking len(providers) >= page_size instead, or fetching additional documents to compensate for filtered hits.

📝 Proposed fix
     last_sort = hits[-1].get('sort') if hits else None
     # Full page from OpenSearch => may have more results; use last hit's sort values for search_after
     last_key = None
-    if last_sort is not None and len(hits) >= page_size:
+    if last_sort is not None and len(providers) >= page_size:
         last_key = _encode_public_cursor(last_sort)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/cosmetology-app/lambdas/python/search/handlers/public_search.py`
around lines 85 - 89, The pagination cursor logic sets last_key based on the raw
OpenSearch hits and page_size which is incorrect when you filter hits; change
the condition that sets last_key (currently using last_sort, hits and page_size)
to use the number of accepted results (providers) instead of hits — i.e., only
set last_key = _encode_public_cursor(last_sort) when last_sort is not None and
len(providers) >= page_size (or implement fetching additional documents to fill
providers before setting last_key) so filtered-out documents don’t cause
premature cursors and empty subsequent pages.

Comment on lines +66 to +67
# If the query references dateOfBirth, verify the caller has readPrivate permission
_validate_date_of_birth_permission(body.get('query', {}), compact, get_event_scopes(event))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cat -n backend/cosmetology-app/lambdas/python/search/handlers/search.py | head -300

Repository: csg-org/CompactConnect

Length of output: 11653


🏁 Script executed:

# Check the SearchProvidersRequestSchema to understand request validation
fd -type f -name "*.py" | xargs grep -l "SearchProvidersRequestSchema" | head -5

Repository: csg-org/CompactConnect

Length of output: 236


🏁 Script executed:

fd "SearchProvidersRequestSchema" --type f

Repository: csg-org/CompactConnect

Length of output: 48


🏁 Script executed:

rg "class SearchProvidersRequestSchema" -A 30

Repository: csg-org/CompactConnect

Length of output: 8606


🏁 Script executed:

# Examine how the dateOfBirth field appears in the provider document structure
rg "dateOfBirth" backend/cosmetology-app/lambdas/python --type py -B 2 -A 2 | head -100

Repository: csg-org/CompactConnect

Length of output: 11650


🏁 Script executed:

# Look at the specific test case for nested dateOfBirth queries
rg "test_search_with_nested_date_of_birth_query_rejected_without_read_private_scope" -A 40 backend/cosmetology-app/lambdas/python/search/tests/function/test_search_providers.py

Repository: csg-org/CompactConnect

Length of output: 1320


🏁 Script executed:

# Check if there are any tests that try to bypass the dateOfBirth check using sort or query_string
rg "query_string|sort.*dateOfBirth|dateOfBirth.*sort" backend/cosmetology-app/lambdas/python/search/tests/ -B 3 -A 3

Repository: csg-org/CompactConnect

Length of output: 48


🏁 Script executed:

# Check the test file to see if there are any attempted bypasses documented
cat -n backend/cosmetology-app/lambdas/python/search/tests/function/test_search_providers.py | grep -A 100 "class.*Test"

Repository: csg-org/CompactConnect

Length of output: 4825


🏁 Script executed:

# List all test methods in the test_search_providers.py file
rg "def test_" backend/cosmetology-app/lambdas/python/search/tests/function/test_search_providers.py | head -20

Repository: csg-org/CompactConnect

Length of output: 1437


🏁 Script executed:

# Check if there are any tests for exists clause with dateOfBirth bypass
rg "exists.*field.*dateOfBirth|dateOfBirth.*exists" backend/cosmetology-app/lambdas/python/search/tests/

Repository: csg-org/CompactConnect

Length of output: 48


🏁 Script executed:

# Let's trace through the _query_references_field logic with actual examples to verify the bypass claims
python3 << 'EOF'
# Simulate the _query_references_field function
def _query_references_field(obj, field_name: str) -> bool:
    """
    Recursively check if any key in the query DSL references the given field name.
    """
    if isinstance(obj, dict):
        for key, value in obj.items():
            if field_name in key:
                return True
            if _query_references_field(value, field_name):
                return True
    elif isinstance(obj, list):
        for item in obj:
            if _query_references_field(item, field_name):
                return True
    return False

# Test case 1: exists clause with field value (bypass claim)
query1 = {
    "exists": {
        "field": "licenses.dateOfBirth"
    }
}
result1 = _query_references_field(query1, "dateOfBirth")
print(f"Test 1 - exists with field value: {result1} (Expected: False, Vulnerable: True)")

# Test case 2: term query with field key (should catch)
query2 = {
    "term": {
        "licenses.dateOfBirth": "1985-06-06"
    }
}
result2 = _query_references_field(query2, "dateOfBirth")
print(f"Test 2 - term query with field key: {result2} (Expected: True)")

# Test case 3: query_string with query value (bypass claim)
query3 = {
    "query_string": {
        "query": "licenses.dateOfBirth:1985"
    }
}
result3 = _query_references_field(query3, "dateOfBirth")
print(f"Test 3 - query_string with dateOfBirth in value: {result3} (Expected: False, Vulnerable: True)")

# Test case 4: range query with field key (should catch)
query4 = {
    "range": {
        "licenses.dateOfBirth": {"gte": "1985-01-01"}
    }
}
result4 = _query_references_field(query4, "dateOfBirth")
print(f"Test 4 - range query with field key: {result4} (Expected: True)")

# Test case 5: wildcard in field value (bypass claim)
query5 = {
    "wildcard": {
        "field": "licenses.dateOfBirth*"
    }
}
result5 = _query_references_field(query5, "dateOfBirth")
print(f"Test 5 - wildcard field value: {result5} (Expected: False, Vulnerable: True)")

print("\n--- Summary ---")
print("The function ONLY checks dict keys, not string values.")
print("Bypass vectors confirmed:")
print("1. exists clause: Yes")
print("2. query_string: Yes")
print("3. wildcard/other operators with field in value: Yes")
EOF

Repository: csg-org/CompactConnect

Length of output: 616


🏁 Script executed:

# Check if sort parameter is validated for dateOfBirth references
rg "sort" backend/cosmetology-app/lambdas/python/search/handlers/search.py | grep -A 2 -B 2 "dateOfBirth\|_validate_date_of_birth"

Repository: csg-org/CompactConnect

Length of output: 48


🏁 Script executed:

# Confirm that _validate_date_of_birth_permission is only called on the query parameter, not sort
rg "_validate_date_of_birth_permission" backend/cosmetology-app/lambdas/python/search/handlers/search.py -B 2 -A 5

Repository: csg-org/CompactConnect

Length of output: 716


🏁 Script executed:

# Verify if there are any tests that attempt to bypass using sort with dateOfBirth
rg "test.*sort" backend/cosmetology-app/lambdas/python/search/tests/function/test_search_providers.py -i

Repository: csg-org/CompactConnect

Length of output: 536


🏁 Script executed:

# Let's verify the actual sort handling with a potential bypass
python3 << 'EOF'
def _query_references_field(obj, field_name: str) -> bool:
    """
    Recursively check if any key in the query DSL references the given field name.
    """
    if isinstance(obj, dict):
        for key, value in obj.items():
            if field_name in key:
                return True
            if _query_references_field(value, field_name):
                return True
    elif isinstance(obj, list):
        for item in obj:
            if _query_references_field(item, field_name):
                return True
    return False

# Simulate what happens when sort contains dateOfBirth
# This is NOT checked because _validate_date_of_birth_permission only checks body.get('query', {})
sort_bypass = [
    {"licenses.dateOfBirth": "asc"}
]

# The check at line 67 only validates query parameter
query = {}  # Empty query
result = _query_references_field(query, "dateOfBirth")
print(f"Query check result: {result}")
print(f"Sort parameter is: {sort_bypass}")
print(f"Sort parameter check: SKIPPED (not validated)")
print("\nBypass: A readGeneral caller can use sort parameter to reference dateOfBirth")
EOF

Repository: csg-org/CompactConnect

Length of output: 268


🏁 Script executed:

# Verify the exact flow - check if sort is validated anywhere else in the codebase
rg "sort.*dateOfBirth|dateOfBirth.*sort" backend/cosmetology-app/lambdas/python/search/ -i

Repository: csg-org/CompactConnect

Length of output: 48


DOB authorization is bypassable through value-based field references and the sort parameter.

The permission check only scans dict keys and misses field references embedded in scalar values. A readGeneral caller can bypass the guard via:

  • {"exists": {"field": "licenses.dateOfBirth"}}
  • {"query_string": {"query": "licenses.dateOfBirth:..."}}
  • [{"licenses.dateOfBirth": "asc"}] in the top-level sort parameter (not validated at all)

Gate both query and sort parameters, and inspect exact field references in scalar field, fields, and query-string values instead of only checking dict keys.

Also applies to: 188-196, 204-254

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/cosmetology-app/lambdas/python/search/handlers/search.py` around
lines 66 - 67, The DOB permission check in _validate_date_of_birth_permission
currently only scans dict keys and misses field references embedded in scalar
values and the top-level sort parameter; update validation to gate both the
'query' and 'sort' inputs by recursively extracting exact field references from
dict keys and from scalar fields named 'field' or 'fields', as well as parsing
query-string values (e.g., "licenses.dateOfBirth:...") and exists/terms style
clauses; ensure _validate_date_of_birth_permission (and the places that call it
from the search handler) rejects any reference to 'dateOfBirth' (including
"licenses.dateOfBirth") found in nested structures, scalar 'field'/'fields'
entries, query_string values, and in sort arrays before allowing a caller with
only readGeneral scope to proceed.

Comment on lines +47 to +50
license_info = serializable['licenses'][0]
jurisdiction = license_info['jurisdiction']
license_type = license_info['licenseType']
serializable['documentId'] = f'{provider_id}#{jurisdiction}#{license_type}'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Normalize jurisdiction before writing the document and documentId.

If mixed-case data slips through here, the same logical license can get a different _id on a later reindex and exact licenses.jurisdiction filters become inconsistent. Lowercase the stored field before composing documentId.

🩹 Proposed fix
         license_info = serializable['licenses'][0]
-        jurisdiction = license_info['jurisdiction']
+        jurisdiction = license_info['jurisdiction'].lower()
+        license_info['jurisdiction'] = jurisdiction
         license_type = license_info['licenseType']
         serializable['documentId'] = f'{provider_id}#{jurisdiction}#{license_type}'
Based on learnings, in provider_record_util.py (and related Python modules under backend/cosmetology-app), rely on and enforce the existing normalization pattern where jurisdiction values are stored in lowercase.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/cosmetology-app/lambdas/python/search/utils.py` around lines 47 - 50,
The code uses jurisdiction as-is when setting serializable['licenses'][0] and
composing serializable['documentId'], which can produce inconsistent IDs for
mixed-case input; normalize by lowercasing the jurisdiction value before storing
and before composing documentId (e.g., set jurisdiction =
license_info['jurisdiction'].lower() and update
serializable['licenses'][0]['jurisdiction'] accordingly) so
serializable['documentId'] uses the normalized jurisdiction; also ensure
provider_record_util.py and related modules follow the same lowercase
normalization pattern for licenses.jurisdiction.

@landonshumway-ia
Copy link
Copy Markdown
Collaborator Author

moved into #1317 to resolve merge conflicts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cosmetology public search - BE

1 participant