Support public search for Cosmetology#1305
Support public search for Cosmetology#1305landonshumway-ia wants to merge 11 commits intocsg-org:mainfrom
Conversation
📝 WalkthroughWalkthroughRemoves summary-report notification email fields across models, schemas, docs, and tests; adds an unauthenticated public license-search Lambda with OpenSearch integration, licenseNumber filtering, cursor pagination, and associated CDK wiring and tests. Changes
Sequence DiagramsequenceDiagram
participant Client as Client
participant APIG as ApiGateway
participant Handler as PublicSearchHandler
participant OS as OpenSearch
participant DB as DynamoDB
Client->>APIG: POST /v1/public/compacts/{compact}/providers/query (query + pagination)
APIG->>Handler: public_search_api_handler(event, context)
Handler->>Handler: _parse_and_validate_public_query_body()
Handler->>Handler: _decode_public_cursor() (if pagination.lastKey)
Handler->>Handler: _build_public_license_search_body()
Handler->>OS: search(index: compact_{compact}_providers, body)
OS-->>Handler: hits[] (documents per-license)
Handler->>Handler: sanitize via PublicLicenseSearchResponseSchema
Handler->>Handler: _encode_public_cursor() -> lastKey
Handler-->>APIG: {providers, pagination:{pageSize,lastKey,prevLastKey}, query}
APIG-->>Client: 200 OK (paginated results)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Also added custom cursor logic for tracking accurate page size of licenses.
4ad6962 to
5f375cd
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (3)
backend/cosmetology-app/lambdas/python/search/tests/function/test_public_search_providers.py (1)
15-31: Preserve explicit empty request bodies in the event helper.Line [29] currently turns
{}intoNone. Use an explicitis not Nonecheck so tests can intentionally send an empty JSON object when needed.🔧 Proposed fix
- 'body': json.dumps(body) if body else None, + 'body': json.dumps(body) if body is not None else None,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/cosmetology-app/lambdas/python/search/tests/function/test_public_search_providers.py` around lines 15 - 31, The helper _create_public_api_event currently converts an explicit empty dict body ({}) into None by using a truthy check; change the body serialization to use an explicit "is not None" check so that body=json.dumps(body) if body is not None else None, ensuring tests can send an intentional empty JSON object; update the 'body' field generation inside _create_public_api_event to use this explicit check and leave all other fields unchanged.backend/cosmetology-app/lambdas/python/search/handlers/public_search.py (2)
74-78: Consider logging when resume_provider_id is not found.If
resume_provider_idfrom the cursor doesn't match any provider in the current result set (e.g., due to data changes between pagination requests), the resume logic silently processes all providers without skipping. This could lead to duplicate results across pages.Adding a warning log after the main loop when
resume_provider_idis still set (not None) would help with debugging pagination inconsistencies in production.🔍 Proposed enhancement
prev_sort = last_sort last_sort = hit.get('sort') + if resume_provider_id is not None: + logger.warning( + 'Resume provider not found in results, possible data change during pagination', + resume_provider_id=resume_provider_id, + ) + last_key = None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/cosmetology-app/lambdas/python/search/handlers/public_search.py` around lines 74 - 78, Add a warning log when the resume cursor's provider id is never found in the current result set: after the loop that inspects each hit (where inner_hits, provider_id, skip, resume_offset and resume_provider_id are used and possibly reset), check if resume_provider_id is still not None and if so emit a warning via the existing logger (include the unresolved resume_provider_id and resume_offset) so pagination inconsistencies are visible in production.
94-98: Redundantpop('jurisdiction')call.The
PublicLicenseSearchResponseSchema.rename_jurisdiction_to_license_jurisdiction@post_loadmethod already pops'jurisdiction'from the data dict. This line is a no-op.♻️ Proposed fix
try: sanitized = license_schema.load(license_source) - sanitized.pop('jurisdiction', None) providers.append(sanitized) consumed_from_this_provider += 1🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/cosmetology-app/lambdas/python/search/handlers/public_search.py` around lines 94 - 98, Remove the redundant pop call: after deserializing with license_schema.load(license_source) (in the try block that appends to providers and increments consumed_from_this_provider), delete the line sanitized.pop('jurisdiction', None) because PublicLicenseSearchResponseSchema.rename_jurisdiction_to_license_jurisdiction already removes 'jurisdiction'; leave the rest of the block (providers.append(sanitized) and consumed_from_this_provider += 1) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@backend/cosmetology-app/lambdas/python/common/cc_common/data_model/schema/provider/api.py`:
- Line 235: The schema field licenseNumber currently validates with
Length(min=1, max=100) which conflicts with the API contract's maxLength of 500;
update the validation on the licenseNumber field (the String(...) call in
provider/api.py) to use max=500 so incoming payloads allowed by the request
schema are accepted (keep min=1 and allow_none=False as-is).
In `@backend/cosmetology-app/lambdas/python/search/handlers/public_search.py`:
- Around line 19-21: The inline comment above the OpenSearch client
instantiation is inconsistent: it says "20 seconds" but the code sets
timeout=25; update the comment that precedes opensearch_client =
OpenSearchClient(timeout=25) to state "25 seconds" (or otherwise match the
actual timeout value set on OpenSearchClient) so the comment and the
instantiation of OpenSearchClient are consistent.
In `@backend/cosmetology-app/stacks/api_stack/v1_api/api_model.py`:
- Around line 113-118: The query schema's licenseNumber field in api_model.py
allows max_length=500 which is inconsistent with the public query/response
schemas capped at 100; update the JsonSchema for 'licenseNumber' used in the
query model (the entry with key 'licenseNumber' in the query schema block) to
use max_length=100 (and ensure min_length stays 1) so all related models
(query.licenseNumber and the public response/query licenseNumber schemas) share
the same 1..100 constraint and avoid contract drift.
In `@backend/cosmetology-app/stacks/search_persistent_stack/search_handler.py`:
- Around line 80-99: PublicSearchProvidersFunction is being created with the
shared lambda_role which grants export-bucket and other internal privileges;
create a dedicated IAM role (e.g., public_search_lambda_role) for the public
handler with only the minimal permissions it needs (CloudWatch Logs,
network/VPC, and specific OpenSearch access) and do not attach the export-bucket
or internal datastore policies; then pass that new role into the PythonFunction
call for PublicSearchProvidersFunction instead of lambda_role and ensure any
code that previously relied on the shared role’s extra policies is updated or
kept out of the public function.
---
Nitpick comments:
In `@backend/cosmetology-app/lambdas/python/search/handlers/public_search.py`:
- Around line 74-78: Add a warning log when the resume cursor's provider id is
never found in the current result set: after the loop that inspects each hit
(where inner_hits, provider_id, skip, resume_offset and resume_provider_id are
used and possibly reset), check if resume_provider_id is still not None and if
so emit a warning via the existing logger (include the unresolved
resume_provider_id and resume_offset) so pagination inconsistencies are visible
in production.
- Around line 94-98: Remove the redundant pop call: after deserializing with
license_schema.load(license_source) (in the try block that appends to providers
and increments consumed_from_this_provider), delete the line
sanitized.pop('jurisdiction', None) because
PublicLicenseSearchResponseSchema.rename_jurisdiction_to_license_jurisdiction
already removes 'jurisdiction'; leave the rest of the block
(providers.append(sanitized) and consumed_from_this_provider += 1) unchanged.
In
`@backend/cosmetology-app/lambdas/python/search/tests/function/test_public_search_providers.py`:
- Around line 15-31: The helper _create_public_api_event currently converts an
explicit empty dict body ({}) into None by using a truthy check; change the body
serialization to use an explicit "is not None" check so that
body=json.dumps(body) if body is not None else None, ensuring tests can send an
intentional empty JSON object; update the 'body' field generation inside
_create_public_api_event to use this explicit check and leave all other fields
unchanged.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (47)
backend/cosmetology-app/bin/run_python_tests.pybackend/cosmetology-app/docs/internal/api-specification/latest-oas30.jsonbackend/cosmetology-app/docs/internal/postman/postman-collection.jsonbackend/cosmetology-app/lambdas/nodejs/email-notification-service/README.mdbackend/cosmetology-app/lambdas/nodejs/lib/models/compact.tsbackend/cosmetology-app/lambdas/nodejs/lib/models/email-notification-service-event.tsbackend/cosmetology-app/lambdas/nodejs/lib/models/jurisdiction.tsbackend/cosmetology-app/lambdas/nodejs/tests/email-notification-service.test.tsbackend/cosmetology-app/lambdas/nodejs/tests/lib/compact-configuration-client.test.tsbackend/cosmetology-app/lambdas/nodejs/tests/lib/email/encumbrance-notification-service.test.tsbackend/cosmetology-app/lambdas/nodejs/tests/lib/email/investigation-notification-service.test.tsbackend/cosmetology-app/lambdas/nodejs/tests/lib/jurisdiction-client.test.tsbackend/cosmetology-app/lambdas/nodejs/tests/sample-records.tsbackend/cosmetology-app/lambdas/python/common/cc_common/data_model/schema/compact/__init__.pybackend/cosmetology-app/lambdas/python/common/cc_common/data_model/schema/compact/api.pybackend/cosmetology-app/lambdas/python/common/cc_common/data_model/schema/compact/record.pybackend/cosmetology-app/lambdas/python/common/cc_common/data_model/schema/jurisdiction/__init__.pybackend/cosmetology-app/lambdas/python/common/cc_common/data_model/schema/jurisdiction/api.pybackend/cosmetology-app/lambdas/python/common/cc_common/data_model/schema/jurisdiction/record.pybackend/cosmetology-app/lambdas/python/common/cc_common/data_model/schema/provider/api.pybackend/cosmetology-app/lambdas/python/common/common_test/test_data_generator.pybackend/cosmetology-app/lambdas/python/common/tests/resources/dynamo/compact.jsonbackend/cosmetology-app/lambdas/python/common/tests/resources/dynamo/jurisdiction.jsonbackend/cosmetology-app/lambdas/python/compact-configuration/handlers/compact_configuration.pybackend/cosmetology-app/lambdas/python/compact-configuration/tests/function/test_compact_configuration.pybackend/cosmetology-app/lambdas/python/search/handlers/public_search.pybackend/cosmetology-app/lambdas/python/search/tests/__init__.pybackend/cosmetology-app/lambdas/python/search/tests/function/__init__.pybackend/cosmetology-app/lambdas/python/search/tests/function/test_public_search_providers.pybackend/cosmetology-app/pipeline/backend_stage.pybackend/cosmetology-app/stacks/api_lambda_stack/public_lookup_api.pybackend/cosmetology-app/stacks/api_stack/__init__.pybackend/cosmetology-app/stacks/api_stack/api.pybackend/cosmetology-app/stacks/api_stack/v1_api/api.pybackend/cosmetology-app/stacks/api_stack/v1_api/api_model.pybackend/cosmetology-app/stacks/api_stack/v1_api/public_lookup_api.pybackend/cosmetology-app/stacks/search_persistent_stack/search_handler.pybackend/cosmetology-app/tests/app/test_api/test_public_lookup_api.pybackend/cosmetology-app/tests/resources/snapshots/GET_COMPACT_CONFIGURATION_RESPONSE_SCHEMA.jsonbackend/cosmetology-app/tests/resources/snapshots/GET_JURISDICTION_CONFIGURATION_RESPONSE_SCHEMA.jsonbackend/cosmetology-app/tests/resources/snapshots/PUBLIC_QUERY_PROVIDERS_REQUEST_SCHEMA.jsonbackend/cosmetology-app/tests/resources/snapshots/PUBLIC_QUERY_PROVIDERS_RESPONSE_SCHEMA.jsonbackend/cosmetology-app/tests/resources/snapshots/PUT_COMPACT_CONFIGURATION_REQUEST_SCHEMA.jsonbackend/cosmetology-app/tests/resources/snapshots/PUT_JURISDICTION_CONFIGURATION_REQUEST_SCHEMA.jsonbackend/cosmetology-app/tests/resources/snapshots/QUERY_PROVIDERS_REQUEST_SCHEMA.jsonbackend/cosmetology-app/tests/smoke/compact_configuration_smoke_tests.pybackend/cosmetology-app/tests/smoke/rollback_license_upload_smoke_tests.py
💤 Files with no reviewable changes (26)
- backend/cosmetology-app/tests/resources/snapshots/GET_JURISDICTION_CONFIGURATION_RESPONSE_SCHEMA.json
- backend/cosmetology-app/lambdas/python/common/tests/resources/dynamo/jurisdiction.json
- backend/cosmetology-app/lambdas/nodejs/lib/models/email-notification-service-event.ts
- backend/cosmetology-app/lambdas/nodejs/tests/email-notification-service.test.ts
- backend/cosmetology-app/lambdas/python/common/cc_common/data_model/schema/compact/api.py
- backend/cosmetology-app/tests/resources/snapshots/GET_COMPACT_CONFIGURATION_RESPONSE_SCHEMA.json
- backend/cosmetology-app/lambdas/python/common/common_test/test_data_generator.py
- backend/cosmetology-app/tests/smoke/rollback_license_upload_smoke_tests.py
- backend/cosmetology-app/tests/resources/snapshots/PUT_JURISDICTION_CONFIGURATION_REQUEST_SCHEMA.json
- backend/cosmetology-app/tests/resources/snapshots/PUT_COMPACT_CONFIGURATION_REQUEST_SCHEMA.json
- backend/cosmetology-app/lambdas/python/compact-configuration/handlers/compact_configuration.py
- backend/cosmetology-app/lambdas/python/common/cc_common/data_model/schema/jurisdiction/record.py
- backend/cosmetology-app/lambdas/nodejs/lib/models/compact.ts
- backend/cosmetology-app/tests/smoke/compact_configuration_smoke_tests.py
- backend/cosmetology-app/lambdas/nodejs/tests/lib/jurisdiction-client.test.ts
- backend/cosmetology-app/lambdas/nodejs/email-notification-service/README.md
- backend/cosmetology-app/lambdas/nodejs/tests/lib/compact-configuration-client.test.ts
- backend/cosmetology-app/lambdas/nodejs/lib/models/jurisdiction.ts
- backend/cosmetology-app/lambdas/python/common/cc_common/data_model/schema/compact/init.py
- backend/cosmetology-app/lambdas/python/common/tests/resources/dynamo/compact.json
- backend/cosmetology-app/lambdas/python/common/cc_common/data_model/schema/jurisdiction/init.py
- backend/cosmetology-app/lambdas/python/common/cc_common/data_model/schema/compact/record.py
- backend/cosmetology-app/lambdas/python/common/cc_common/data_model/schema/jurisdiction/api.py
- backend/cosmetology-app/docs/internal/api-specification/latest-oas30.json
- backend/cosmetology-app/lambdas/nodejs/tests/sample-records.ts
- backend/cosmetology-app/lambdas/python/compact-configuration/tests/function/test_compact_configuration.py
| jurisdiction = Jurisdiction(required=False, allow_none=False) | ||
| givenName = String(required=False, allow_none=False, validate=Length(min=1, max=100)) | ||
| familyName = String(required=False, allow_none=False, validate=Length(min=1, max=100)) | ||
| licenseNumber = String(required=False, allow_none=False, validate=Length(min=1, max=100)) |
There was a problem hiding this comment.
Align licenseNumber max length with the request contract.
Line 235 enforces max=100, but the request schema snapshots in this PR advertise maxLength: 500. That mismatch will reject payloads the API contract says are valid.
Proposed fix
- licenseNumber = String(required=False, allow_none=False, validate=Length(min=1, max=100))
+ licenseNumber = String(required=False, allow_none=False, validate=Length(min=1, max=500))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| licenseNumber = String(required=False, allow_none=False, validate=Length(min=1, max=100)) | |
| licenseNumber = String(required=False, allow_none=False, validate=Length(min=1, max=500)) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@backend/cosmetology-app/lambdas/python/common/cc_common/data_model/schema/provider/api.py`
at line 235, The schema field licenseNumber currently validates with
Length(min=1, max=100) which conflicts with the API contract's maxLength of 500;
update the validation on the licenseNumber field (the String(...) call in
provider/api.py) to use max=500 so incoming payloads allowed by the request
schema are accepted (keep min=1 and allow_none=False as-is).
| 'licenseNumber': JsonSchema( | ||
| type=JsonSchemaType.STRING, | ||
| min_length=1, | ||
| max_length=500, | ||
| description='Filter for licenses with a specific license number', | ||
| ), |
There was a problem hiding this comment.
Align licenseNumber length constraints across related models.
query.licenseNumber here allows up to 500 chars, while the public query/response licenseNumber schemas cap at 100. This creates avoidable contract drift between request and public response surfaces.
🔧 Proposed fix
'licenseNumber': JsonSchema(
type=JsonSchemaType.STRING,
min_length=1,
- max_length=500,
+ max_length=100,
description='Filter for licenses with a specific license number',
),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| 'licenseNumber': JsonSchema( | |
| type=JsonSchemaType.STRING, | |
| min_length=1, | |
| max_length=500, | |
| description='Filter for licenses with a specific license number', | |
| ), | |
| 'licenseNumber': JsonSchema( | |
| type=JsonSchemaType.STRING, | |
| min_length=1, | |
| max_length=100, | |
| description='Filter for licenses with a specific license number', | |
| ), |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/cosmetology-app/stacks/api_stack/v1_api/api_model.py` around lines
113 - 118, The query schema's licenseNumber field in api_model.py allows
max_length=500 which is inconsistent with the public query/response schemas
capped at 100; update the JsonSchema for 'licenseNumber' used in the query model
(the entry with key 'licenseNumber' in the query schema block) to use
max_length=100 (and ensure min_length stays 1) so all related models
(query.licenseNumber and the public response/query licenseNumber schemas) share
the same 1..100 constraint and avoid contract drift.
| self.public_handler = PythonFunction( | ||
| self, | ||
| 'PublicSearchProvidersFunction', | ||
| description='Public search handler for OpenSearch license queries', | ||
| index=os.path.join('handlers', 'public_search.py'), | ||
| lambda_dir='search', | ||
| handler='public_search_api_handler', | ||
| role=lambda_role, | ||
| log_retention=RetentionDays.ONE_MONTH, | ||
| environment={ | ||
| 'OPENSEARCH_HOST_ENDPOINT': opensearch_domain.domain_endpoint, | ||
| **stack.common_env_vars, | ||
| }, | ||
| timeout=Duration.seconds(29), | ||
| memory_size=2048, | ||
| vpc=vpc_stack.vpc, | ||
| vpc_subnets=vpc_subnets, | ||
| security_groups=[vpc_stack.lambda_security_group], | ||
| alarm_topic=alarm_topic, | ||
| ) |
There was a problem hiding this comment.
Avoid sharing the internal search role with the public search Lambda.
PublicSearchProvidersFunction currently reuses lambda_role, so it inherits export-bucket permissions granted for the internal handler. That violates least privilege for a public-facing function.
Proposed fix
self.public_handler = PythonFunction(
self,
'PublicSearchProvidersFunction',
description='Public search handler for OpenSearch license queries',
index=os.path.join('handlers', 'public_search.py'),
lambda_dir='search',
handler='public_search_api_handler',
- role=lambda_role,
log_retention=RetentionDays.ONE_MONTH,
environment={
'OPENSEARCH_HOST_ENDPOINT': opensearch_domain.domain_endpoint,
**stack.common_env_vars,
},
@@
)
opensearch_domain.grant_read(self.public_handler)
+
+ NagSuppressions.add_resource_suppressions_by_path(
+ stack,
+ f'{self.public_handler.role.node.path}/DefaultPolicy/Resource',
+ [
+ {
+ 'id': 'AwsSolutions-IAM5',
+ 'reason': 'Public search lambda needs wildcard index permissions for OpenSearch read operations.',
+ },
+ ],
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| self.public_handler = PythonFunction( | |
| self, | |
| 'PublicSearchProvidersFunction', | |
| description='Public search handler for OpenSearch license queries', | |
| index=os.path.join('handlers', 'public_search.py'), | |
| lambda_dir='search', | |
| handler='public_search_api_handler', | |
| role=lambda_role, | |
| log_retention=RetentionDays.ONE_MONTH, | |
| environment={ | |
| 'OPENSEARCH_HOST_ENDPOINT': opensearch_domain.domain_endpoint, | |
| **stack.common_env_vars, | |
| }, | |
| timeout=Duration.seconds(29), | |
| memory_size=2048, | |
| vpc=vpc_stack.vpc, | |
| vpc_subnets=vpc_subnets, | |
| security_groups=[vpc_stack.lambda_security_group], | |
| alarm_topic=alarm_topic, | |
| ) | |
| self.public_handler = PythonFunction( | |
| self, | |
| 'PublicSearchProvidersFunction', | |
| description='Public search handler for OpenSearch license queries', | |
| index=os.path.join('handlers', 'public_search.py'), | |
| lambda_dir='search', | |
| handler='public_search_api_handler', | |
| log_retention=RetentionDays.ONE_MONTH, | |
| environment={ | |
| 'OPENSEARCH_HOST_ENDPOINT': opensearch_domain.domain_endpoint, | |
| **stack.common_env_vars, | |
| }, | |
| timeout=Duration.seconds(29), | |
| memory_size=2048, | |
| vpc=vpc_stack.vpc, | |
| vpc_subnets=vpc_subnets, | |
| security_groups=[vpc_stack.lambda_security_group], | |
| alarm_topic=alarm_topic, | |
| ) | |
| opensearch_domain.grant_read(self.public_handler) | |
| NagSuppressions.add_resource_suppressions_by_path( | |
| stack, | |
| f'{self.public_handler.role.node.path}/DefaultPolicy/Resource', | |
| [ | |
| { | |
| 'id': 'AwsSolutions-IAM5', | |
| 'reason': 'Public search lambda needs wildcard index permissions for OpenSearch read operations.', | |
| }, | |
| ], | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/cosmetology-app/stacks/search_persistent_stack/search_handler.py`
around lines 80 - 99, PublicSearchProvidersFunction is being created with the
shared lambda_role which grants export-bucket and other internal privileges;
create a dedicated IAM role (e.g., public_search_lambda_role) for the public
handler with only the minimal permissions it needs (CloudWatch Logs,
network/VPC, and specific OpenSearch access) and do not attach the export-bucket
or internal datastore policies; then pass that new role into the PythonFunction
call for PublicSearchProvidersFunction instead of lambda_role and ensure any
code that previously relied on the shared role’s extra policies is updated or
kept out of the public function.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
backend/cosmetology-app/stacks/api_stack/v1_api/api_model.py (1)
113-118:⚠️ Potential issue | 🟠 MajorAlign
licenseNumberlength constraints across API model and runtime validation.Line 116 allows up to 500 chars, but runtime validation in
backend/cosmetology-app/lambdas/python/common/cc_common/data_model/schema/provider/api.py(Line 236) caps at 100. Requests in the 101–500 range will pass API model validation and fail in Lambda.🔧 Proposed fix
'licenseNumber': JsonSchema( type=JsonSchemaType.STRING, min_length=1, - max_length=500, + max_length=100, description='Filter for licenses with a specific license number', ),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/cosmetology-app/stacks/api_stack/v1_api/api_model.py` around lines 113 - 118, The API model's JsonSchema for 'licenseNumber' allows max_length=500 but runtime validation in cc_common.data_model.schema.provider.api (validator for licenseNumber) enforces max 100, causing mismatched behavior; update the JsonSchema definition (the 'licenseNumber' JsonSchema entry) to set max_length=100 to match the runtime validator (or alternatively raise the runtime cap to 500 if that is desired across the system) and ensure both JsonSchema and the validator in cc_common.data_model.schema.provider.api use the same max length value.backend/cosmetology-app/lambdas/python/search/handlers/public_search.py (1)
19-21:⚠️ Potential issue | 🟡 MinorFix timeout comment drift.
Line 20 says “20 seconds” while Line 21 configures
timeout=25.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/cosmetology-app/lambdas/python/search/handlers/public_search.py` around lines 19 - 21, Comment and code disagree about the OpenSearch client timeout: update the comment above the OpenSearchClient instantiation (referencing OpenSearchClient(timeout=25)) so it accurately states "25 seconds" (or alternatively change the timeout arg to 20 if you prefer the comment) — make the comment wording consistent with the configured timeout value.
🧹 Nitpick comments (1)
backend/cosmetology-app/stacks/api_lambda_stack/public_lookup_api.py (1)
55-58: Add a concrete cleanup owner/timeline for this temporary bridge.Since this keeps a legacy lambda/export alive during cutover, please attach a linked cleanup issue (or explicit release/date + owner) to ensure it cannot linger past migration.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/cosmetology-app/stacks/api_lambda_stack/public_lookup_api.py` around lines 55 - 58, Add a concrete cleanup owner and timeline to the temporary export for the legacy lambda: update the TODO near stack.export_value(self.query_providers_handler.function_arn) to include a JIRA/GitHub issue link or issue ID, the responsible owner (team or username), and a specific removal date/release (e.g., "remove by 2026-06-01 or release vX.Y"), and add a short comment that the export should be deleted after SearchPersistentStack.public_handler is fully live; ensure the comment references query_providers_handler.function_arn and stack.export_value so future readers can locate and remove the bridge.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/cosmetology-app/lambdas/python/search/handlers/public_search.py`:
- Around line 76-91: The cursor math is incorrect: when building the next cursor
inside the inner_hits loop update next_cursor_license_offset to reflect the
cumulative offset for the provider (e.g., resume_offset +
consumed_from_this_provider or consumed_from_this_provider if no resume_offset)
instead of only current-page consumption, and set next_cursor_search_after to
the current provider's sort value (use last_sort or hit.get('sort')) rather than
prev_sort so the resume uses the latest sort key; apply the same fixes to the
corresponding block around the other occurrence (the block at lines ~106-110)
and ensure consumed_from_this_provider is incremented consistently before
computing the next cursor.
- Around line 167-172: The code currently only checks presence of resume keys in
decoded but not their types; add explicit type validation after computing
has_resume: if has_resume ensure decoded['license_offset'] is an int (or
convertible to int) and decoded['resume_provider_sort'] is a list (and
optionally its items are of expected type), and if decoded.get('search_after')
is present ensure it is the expected type (e.g., list); for any type mismatch
raise CCInvalidRequestException('Invalid lastKey') so malformed lastKey values
return 400 instead of later causing 500; apply these checks where decoded is
validated (the block that sets has_resume and returns decoded in
public_search.py).
---
Duplicate comments:
In `@backend/cosmetology-app/lambdas/python/search/handlers/public_search.py`:
- Around line 19-21: Comment and code disagree about the OpenSearch client
timeout: update the comment above the OpenSearchClient instantiation
(referencing OpenSearchClient(timeout=25)) so it accurately states "25 seconds"
(or alternatively change the timeout arg to 20 if you prefer the comment) — make
the comment wording consistent with the configured timeout value.
In `@backend/cosmetology-app/stacks/api_stack/v1_api/api_model.py`:
- Around line 113-118: The API model's JsonSchema for 'licenseNumber' allows
max_length=500 but runtime validation in
cc_common.data_model.schema.provider.api (validator for licenseNumber) enforces
max 100, causing mismatched behavior; update the JsonSchema definition (the
'licenseNumber' JsonSchema entry) to set max_length=100 to match the runtime
validator (or alternatively raise the runtime cap to 500 if that is desired
across the system) and ensure both JsonSchema and the validator in
cc_common.data_model.schema.provider.api use the same max length value.
---
Nitpick comments:
In `@backend/cosmetology-app/stacks/api_lambda_stack/public_lookup_api.py`:
- Around line 55-58: Add a concrete cleanup owner and timeline to the temporary
export for the legacy lambda: update the TODO near
stack.export_value(self.query_providers_handler.function_arn) to include a
JIRA/GitHub issue link or issue ID, the responsible owner (team or username),
and a specific removal date/release (e.g., "remove by 2026-06-01 or release
vX.Y"), and add a short comment that the export should be deleted after
SearchPersistentStack.public_handler is fully live; ensure the comment
references query_providers_handler.function_arn and stack.export_value so future
readers can locate and remove the bridge.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
backend/cosmetology-app/lambdas/python/common/cc_common/data_model/schema/provider/api.pybackend/cosmetology-app/lambdas/python/search/handlers/public_search.pybackend/cosmetology-app/lambdas/python/search/tests/function/test_public_search_providers.pybackend/cosmetology-app/stacks/api_lambda_stack/public_lookup_api.pybackend/cosmetology-app/stacks/api_stack/v1_api/api_model.py
backend/cosmetology-app/lambdas/python/search/handlers/public_search.py
Outdated
Show resolved
Hide resolved
backend/cosmetology-app/lambdas/python/search/handlers/public_search.py
Outdated
Show resolved
Hide resolved
We are updating our OpenSearch indexing design to index one document per license, rather than per provider. This means that we can use native OpenSearch pagination and return the exact number of results the client requests if there are enough matches found. It also reduces the complexity of only returning licenses that match, rather than having to map inner hits.
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (1)
backend/cosmetology-app/lambdas/python/search/tests/function/test_populate_provider_documents.py (1)
138-160: Add a 2-license fixture to this handler test.With only one license, this still passes if the handler regresses from
extend()back toappend()or only forwards the first generated document tobulk_index(). A provider with two licenses would exercise the core 1-license→1-document behavior introduced here.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/cosmetology-app/lambdas/python/search/tests/function/test_populate_provider_documents.py` around lines 138 - 160, The test test_populate_indexes_document_with_document_id currently only inserts one license so it won't catch regressions that use append() or only forward the first document; update the test setup to insert a second license for the same provider (call the existing helper _put_test_provider_and_license_record_in_dynamodb_table twice or add a new fixture via that helper) so the handler must produce two documents, then assert bulk_index was called with two documents and verify both expected documents via _generate_expected_document('cosm') for each license (update indexed_documents length/assertions accordingly) while keeping the same checks for index_name and id_field on mock_client_instance.bulk_index call.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@backend/cosmetology-app/lambdas/python/common/cc_common/data_model/schema/provider/api.py`:
- Around line 220-229: Make licenseJurisdiction required and normalized: replace
the current String(...) declaration for licenseJurisdiction with the existing
Jurisdiction field type and set required=True, allow_none=False so
malformed/missing values fail validation; add a `@pre_load` method that, if
'jurisdiction' exists in the incoming data, copies it to 'licenseJurisdiction'
before field validation (so the Jurisdiction field enforces lowercase
normalization), and keep the existing `@post_load`
rename_jurisdiction_to_license_jurisdiction for safety.
In
`@backend/cosmetology-app/lambdas/python/search/handlers/populate_provider_documents.py`:
- Around line 79-81: The code currently treats failed_ids as document-level and
increments providers_failed per failed document, causing double-counting when
multiple license docs for the same provider fail; update the failure counting to
count unique provider IDs instead. Locate where failed_ids is used to update
compacts_processed['providers_failed'] and total_providers_failed and replace
the per-document increment with computing the set of provider IDs from the
failed documents (e.g., extract provider_id from each failed doc entry or map)
and increment by len(set(provider_ids)); apply the same change for the other
occurrences referenced (the compacts_processed update block and the
total_providers_failed update at the other locations) so both per-batch
compacts_processed['providers_failed'] and total_providers_failed reflect unique
providers failed rather than failed documents.
In
`@backend/cosmetology-app/lambdas/python/search/handlers/provider_update_ingest.py`:
- Around line 202-239: The try/except around
generate_provider_opensearch_documents(...) in provider_update_ingest.py
currently only handles CCNotFoundException and CCInternalException; add a except
block for ValidationError (the exception raised by
generate_provider_opensearch_documents) to log the validation failure (include
provider_id, compact, and index_name), mark the provider as failed by calling
failed_providers[compact].add(provider_id), and avoid letting the
ValidationError bubble up and cause the whole SQS batch to retry.
In `@backend/cosmetology-app/lambdas/python/search/handlers/public_search.py`:
- Around line 85-89: The pagination cursor logic sets last_key based on the raw
OpenSearch hits and page_size which is incorrect when you filter hits; change
the condition that sets last_key (currently using last_sort, hits and page_size)
to use the number of accepted results (providers) instead of hits — i.e., only
set last_key = _encode_public_cursor(last_sort) when last_sort is not None and
len(providers) >= page_size (or implement fetching additional documents to fill
providers before setting last_key) so filtered-out documents don’t cause
premature cursors and empty subsequent pages.
In `@backend/cosmetology-app/lambdas/python/search/handlers/search.py`:
- Around line 66-67: The DOB permission check in
_validate_date_of_birth_permission currently only scans dict keys and misses
field references embedded in scalar values and the top-level sort parameter;
update validation to gate both the 'query' and 'sort' inputs by recursively
extracting exact field references from dict keys and from scalar fields named
'field' or 'fields', as well as parsing query-string values (e.g.,
"licenses.dateOfBirth:...") and exists/terms style clauses; ensure
_validate_date_of_birth_permission (and the places that call it from the search
handler) rejects any reference to 'dateOfBirth' (including
"licenses.dateOfBirth") found in nested structures, scalar 'field'/'fields'
entries, query_string values, and in sort arrays before allowing a caller with
only readGeneral scope to proceed.
In `@backend/cosmetology-app/lambdas/python/search/utils.py`:
- Around line 47-50: The code uses jurisdiction as-is when setting
serializable['licenses'][0] and composing serializable['documentId'], which can
produce inconsistent IDs for mixed-case input; normalize by lowercasing the
jurisdiction value before storing and before composing documentId (e.g., set
jurisdiction = license_info['jurisdiction'].lower() and update
serializable['licenses'][0]['jurisdiction'] accordingly) so
serializable['documentId'] uses the normalized jurisdiction; also ensure
provider_record_util.py and related modules follow the same lowercase
normalization pattern for licenses.jurisdiction.
---
Nitpick comments:
In
`@backend/cosmetology-app/lambdas/python/search/tests/function/test_populate_provider_documents.py`:
- Around line 138-160: The test test_populate_indexes_document_with_document_id
currently only inserts one license so it won't catch regressions that use
append() or only forward the first document; update the test setup to insert a
second license for the same provider (call the existing helper
_put_test_provider_and_license_record_in_dynamodb_table twice or add a new
fixture via that helper) so the handler must produce two documents, then assert
bulk_index was called with two documents and verify both expected documents via
_generate_expected_document('cosm') for each license (update indexed_documents
length/assertions accordingly) while keeping the same checks for index_name and
id_field on mock_client_instance.bulk_index call.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 51d72262-cfa3-440a-98dd-363564cb6cb7
📒 Files selected for processing (19)
backend/cosmetology-app/lambdas/python/common/cc_common/data_model/provider_record_util.pybackend/cosmetology-app/lambdas/python/common/cc_common/data_model/schema/license/api.pybackend/cosmetology-app/lambdas/python/common/cc_common/data_model/schema/provider/api.pybackend/cosmetology-app/lambdas/python/common/tests/unit/test_data_model/test_schema/test_license.pybackend/cosmetology-app/lambdas/python/common/tests/unit/test_data_model/test_schema/test_provider.pybackend/cosmetology-app/lambdas/python/common/tests/unit/test_provider_record_util.pybackend/cosmetology-app/lambdas/python/search/handlers/manage_opensearch_indices.pybackend/cosmetology-app/lambdas/python/search/handlers/populate_provider_documents.pybackend/cosmetology-app/lambdas/python/search/handlers/provider_update_ingest.pybackend/cosmetology-app/lambdas/python/search/handlers/public_search.pybackend/cosmetology-app/lambdas/python/search/handlers/search.pybackend/cosmetology-app/lambdas/python/search/opensearch_client.pybackend/cosmetology-app/lambdas/python/search/tests/function/test_manage_opensearch_indices.pybackend/cosmetology-app/lambdas/python/search/tests/function/test_populate_provider_documents.pybackend/cosmetology-app/lambdas/python/search/tests/function/test_provider_update_ingest.pybackend/cosmetology-app/lambdas/python/search/tests/function/test_public_search_providers.pybackend/cosmetology-app/lambdas/python/search/tests/function/test_search_providers.pybackend/cosmetology-app/lambdas/python/search/tests/unit/test_opensearch_client.pybackend/cosmetology-app/lambdas/python/search/utils.py
✅ Files skipped from review due to trivial changes (2)
- backend/cosmetology-app/lambdas/python/search/handlers/manage_opensearch_indices.py
- backend/cosmetology-app/lambdas/python/search/tests/function/test_manage_opensearch_indices.py
| jurisdiction = String(required=False, allow_none=False, load_only=True) # OpenSearch uses jurisdiction | ||
| licenseJurisdiction = String(required=False, allow_none=False, load_default=None) | ||
| compact = Compact(required=True, allow_none=False) | ||
| licenseNumber = String(required=True, allow_none=False, validate=Length(1, 100)) | ||
|
|
||
| @post_load | ||
| def rename_jurisdiction_to_license_jurisdiction(self, data, **kwargs): | ||
| if 'jurisdiction' in data: | ||
| data['licenseJurisdiction'] = data.pop('jurisdiction') | ||
| return data |
There was a problem hiding this comment.
Make licenseJurisdiction fail fast and normalized.
backend/cosmetology-app/lambdas/python/search/handlers/public_search.py:55-76 feeds this schema from licenses[0].copy(). With jurisdiction optional here, a malformed OpenSearch document is accepted instead of rejected, and using String skips the existing Jurisdiction normalization that the other provider response schemas already enforce.
🛠️ Proposed fix
- jurisdiction = String(required=False, allow_none=False, load_only=True) # OpenSearch uses jurisdiction
- licenseJurisdiction = String(required=False, allow_none=False, load_default=None)
+ jurisdiction = Jurisdiction(required=True, allow_none=False, load_only=True) # OpenSearch uses jurisdiction
+ licenseJurisdiction = Jurisdiction(required=False, allow_none=False)
`@post_load`
def rename_jurisdiction_to_license_jurisdiction(self, data, **kwargs):
- if 'jurisdiction' in data:
- data['licenseJurisdiction'] = data.pop('jurisdiction')
+ data['licenseJurisdiction'] = data.pop('jurisdiction')
return dataBased on learnings: In provider_record_util.py (and related Python modules under backend/cosmetology-app), rely on and enforce the existing normalization pattern where jurisdiction values are stored in lowercase.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@backend/cosmetology-app/lambdas/python/common/cc_common/data_model/schema/provider/api.py`
around lines 220 - 229, Make licenseJurisdiction required and normalized:
replace the current String(...) declaration for licenseJurisdiction with the
existing Jurisdiction field type and set required=True, allow_none=False so
malformed/missing values fail validation; add a `@pre_load` method that, if
'jurisdiction' exists in the incoming data, copies it to 'licenseJurisdiction'
before field validation (so the Jurisdiction field enforces lowercase
normalization), and keep the existing `@post_load`
rename_jurisdiction_to_license_jurisdiction for safety.
| 'total_providers_processed': 0, | ||
| 'total_providers_indexed': 0, | ||
| 'total_licenses_indexed': 0, | ||
| 'total_providers_failed': 0, |
There was a problem hiding this comment.
Count failed providers separately from failed license documents.
failed_ids is document-level now. If two license docs for the same provider fail in one batch, providers_failed is incremented twice, so compacts_processed[*].providers_failed and total_providers_failed no longer match their names.
🩹 Proposed fix
def _index_records_and_track_stats(
documents_to_index: list[dict], compact: str, opensearch_client: OpenSearchClient, compact_stats: dict
):
index_name = f'compact_{compact}_providers'
if documents_to_index:
failed_ids = _bulk_index_documents(opensearch_client, index_name, documents_to_index)
compact_stats['licenses_indexed'] += len(documents_to_index) - len(failed_ids)
if failed_ids:
- compact_stats['providers_failed'] += len(failed_ids)
+ failed_provider_ids = {doc_id.split('#', 1)[0] for doc_id in failed_ids}
+ compact_stats['providers_failed'] += len(failed_provider_ids)
logger.warning(
'Some documents failed to index in batch',
compact=compact,Also applies to: 112-114, 294-296
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@backend/cosmetology-app/lambdas/python/search/handlers/populate_provider_documents.py`
around lines 79 - 81, The code currently treats failed_ids as document-level and
increments providers_failed per failed document, causing double-counting when
multiple license docs for the same provider fail; update the failure counting to
count unique provider IDs instead. Locate where failed_ids is used to update
compacts_processed['providers_failed'] and total_providers_failed and replace
the per-document increment with computing the set of provider IDs from the
failed documents (e.g., extract provider_id from each failed doc entry or map)
and increment by len(set(provider_ids)); apply the same change for the other
occurrences referenced (the compacts_processed update block and the
total_providers_failed update at the other locations) so both per-batch
compacts_processed['providers_failed'] and total_providers_failed reflect unique
providers failed rather than failed documents.
| # Re-check DynamoDB -- the REMOVE may have been for a single record while | ||
| # the provider still has other records remaining. | ||
| try: | ||
| docs = generate_provider_opensearch_documents(compact, provider_id) | ||
| if docs: | ||
| response = opensearch_client.bulk_index( | ||
| index_name=index_name, documents=docs, id_field='documentId' | ||
| ) | ||
| logger.info( | ||
| 'Re-indexed remaining documents after delete', | ||
| index_name=index_name, | ||
| provider_id=provider_id, | ||
| document_count=len(docs), | ||
| ) | ||
| if response.get('errors'): | ||
| for item in response.get('items', []): | ||
| index_result = item.get('index', {}) | ||
| if index_result.get('error'): | ||
| logger.error( | ||
| 'Document re-indexing failed after delete', | ||
| document_id=index_result.get('_id'), | ||
| error=index_result.get('error'), | ||
| ) | ||
| failed_providers[compact].add(provider_id) | ||
| except CCNotFoundException: | ||
| logger.info( | ||
| 'Provider no longer exists after REMOVE event, delete is complete', | ||
| provider_id=provider_id, | ||
| compact=compact, | ||
| ) | ||
| except CCInternalException as e: | ||
| # All deletes for this compact failed | ||
| logger.error( | ||
| 'Failed to bulk delete documents after retries', | ||
| 'Failed to re-index remaining documents after delete', | ||
| index_name=index_name, | ||
| document_count=len(providers_to_delete), | ||
| provider_id=provider_id, | ||
| error=str(e), | ||
| ) | ||
| # Mark all providers to delete as failed | ||
| for provider_id in providers_to_delete: | ||
| failed_providers[compact].add(provider_id) | ||
| failed_providers[compact].add(provider_id) |
There was a problem hiding this comment.
Catch ValidationError when re-indexing after a REMOVE.
generate_provider_opensearch_documents() can fail validation in this branch too, but only CCNotFoundException and CCInternalException are handled. A malformed surviving provider will currently escape the handler and retry the whole SQS batch instead of producing a targeted batch failure.
🩹 Proposed fix
except CCNotFoundException:
logger.info(
'Provider no longer exists after REMOVE event, delete is complete',
provider_id=provider_id,
compact=compact,
)
+ except ValidationError as e:
+ logger.warning(
+ 'Failed to process remaining provider for re-index after delete',
+ index_name=index_name,
+ provider_id=provider_id,
+ compact=compact,
+ error=str(e),
+ )
+ failed_providers[compact].add(provider_id)
except CCInternalException as e:
logger.error(
'Failed to re-index remaining documents after delete',
index_name=index_name,
provider_id=provider_id,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@backend/cosmetology-app/lambdas/python/search/handlers/provider_update_ingest.py`
around lines 202 - 239, The try/except around
generate_provider_opensearch_documents(...) in provider_update_ingest.py
currently only handles CCNotFoundException and CCInternalException; add a except
block for ValidationError (the exception raised by
generate_provider_opensearch_documents) to log the validation failure (include
provider_id, compact, and index_name), mark the provider as failed by calling
failed_providers[compact].add(provider_id), and avoid letting the
ValidationError bubble up and cause the whole SQS batch to retry.
| last_sort = hits[-1].get('sort') if hits else None | ||
| # Full page from OpenSearch => may have more results; use last hit's sort values for search_after | ||
| last_key = None | ||
| if last_sort is not None and len(hits) >= page_size: | ||
| last_key = _encode_public_cursor(last_sort) |
There was a problem hiding this comment.
Pagination cursor may be set incorrectly when hits are filtered out.
The last_key is set based on len(hits) >= page_size, but the loop filters out hits where compact doesn't match (line 58), licenses array is empty (line 66), or schema validation fails (line 78). This means providers could have fewer items than page_size while still returning a last_key, potentially causing empty pages on subsequent requests if many consecutive documents are filtered.
Consider checking len(providers) >= page_size instead, or fetching additional documents to compensate for filtered hits.
📝 Proposed fix
last_sort = hits[-1].get('sort') if hits else None
# Full page from OpenSearch => may have more results; use last hit's sort values for search_after
last_key = None
- if last_sort is not None and len(hits) >= page_size:
+ if last_sort is not None and len(providers) >= page_size:
last_key = _encode_public_cursor(last_sort)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/cosmetology-app/lambdas/python/search/handlers/public_search.py`
around lines 85 - 89, The pagination cursor logic sets last_key based on the raw
OpenSearch hits and page_size which is incorrect when you filter hits; change
the condition that sets last_key (currently using last_sort, hits and page_size)
to use the number of accepted results (providers) instead of hits — i.e., only
set last_key = _encode_public_cursor(last_sort) when last_sort is not None and
len(providers) >= page_size (or implement fetching additional documents to fill
providers before setting last_key) so filtered-out documents don’t cause
premature cursors and empty subsequent pages.
| # If the query references dateOfBirth, verify the caller has readPrivate permission | ||
| _validate_date_of_birth_permission(body.get('query', {}), compact, get_event_scopes(event)) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n backend/cosmetology-app/lambdas/python/search/handlers/search.py | head -300Repository: csg-org/CompactConnect
Length of output: 11653
🏁 Script executed:
# Check the SearchProvidersRequestSchema to understand request validation
fd -type f -name "*.py" | xargs grep -l "SearchProvidersRequestSchema" | head -5Repository: csg-org/CompactConnect
Length of output: 236
🏁 Script executed:
fd "SearchProvidersRequestSchema" --type fRepository: csg-org/CompactConnect
Length of output: 48
🏁 Script executed:
rg "class SearchProvidersRequestSchema" -A 30Repository: csg-org/CompactConnect
Length of output: 8606
🏁 Script executed:
# Examine how the dateOfBirth field appears in the provider document structure
rg "dateOfBirth" backend/cosmetology-app/lambdas/python --type py -B 2 -A 2 | head -100Repository: csg-org/CompactConnect
Length of output: 11650
🏁 Script executed:
# Look at the specific test case for nested dateOfBirth queries
rg "test_search_with_nested_date_of_birth_query_rejected_without_read_private_scope" -A 40 backend/cosmetology-app/lambdas/python/search/tests/function/test_search_providers.pyRepository: csg-org/CompactConnect
Length of output: 1320
🏁 Script executed:
# Check if there are any tests that try to bypass the dateOfBirth check using sort or query_string
rg "query_string|sort.*dateOfBirth|dateOfBirth.*sort" backend/cosmetology-app/lambdas/python/search/tests/ -B 3 -A 3Repository: csg-org/CompactConnect
Length of output: 48
🏁 Script executed:
# Check the test file to see if there are any attempted bypasses documented
cat -n backend/cosmetology-app/lambdas/python/search/tests/function/test_search_providers.py | grep -A 100 "class.*Test"Repository: csg-org/CompactConnect
Length of output: 4825
🏁 Script executed:
# List all test methods in the test_search_providers.py file
rg "def test_" backend/cosmetology-app/lambdas/python/search/tests/function/test_search_providers.py | head -20Repository: csg-org/CompactConnect
Length of output: 1437
🏁 Script executed:
# Check if there are any tests for exists clause with dateOfBirth bypass
rg "exists.*field.*dateOfBirth|dateOfBirth.*exists" backend/cosmetology-app/lambdas/python/search/tests/Repository: csg-org/CompactConnect
Length of output: 48
🏁 Script executed:
# Let's trace through the _query_references_field logic with actual examples to verify the bypass claims
python3 << 'EOF'
# Simulate the _query_references_field function
def _query_references_field(obj, field_name: str) -> bool:
"""
Recursively check if any key in the query DSL references the given field name.
"""
if isinstance(obj, dict):
for key, value in obj.items():
if field_name in key:
return True
if _query_references_field(value, field_name):
return True
elif isinstance(obj, list):
for item in obj:
if _query_references_field(item, field_name):
return True
return False
# Test case 1: exists clause with field value (bypass claim)
query1 = {
"exists": {
"field": "licenses.dateOfBirth"
}
}
result1 = _query_references_field(query1, "dateOfBirth")
print(f"Test 1 - exists with field value: {result1} (Expected: False, Vulnerable: True)")
# Test case 2: term query with field key (should catch)
query2 = {
"term": {
"licenses.dateOfBirth": "1985-06-06"
}
}
result2 = _query_references_field(query2, "dateOfBirth")
print(f"Test 2 - term query with field key: {result2} (Expected: True)")
# Test case 3: query_string with query value (bypass claim)
query3 = {
"query_string": {
"query": "licenses.dateOfBirth:1985"
}
}
result3 = _query_references_field(query3, "dateOfBirth")
print(f"Test 3 - query_string with dateOfBirth in value: {result3} (Expected: False, Vulnerable: True)")
# Test case 4: range query with field key (should catch)
query4 = {
"range": {
"licenses.dateOfBirth": {"gte": "1985-01-01"}
}
}
result4 = _query_references_field(query4, "dateOfBirth")
print(f"Test 4 - range query with field key: {result4} (Expected: True)")
# Test case 5: wildcard in field value (bypass claim)
query5 = {
"wildcard": {
"field": "licenses.dateOfBirth*"
}
}
result5 = _query_references_field(query5, "dateOfBirth")
print(f"Test 5 - wildcard field value: {result5} (Expected: False, Vulnerable: True)")
print("\n--- Summary ---")
print("The function ONLY checks dict keys, not string values.")
print("Bypass vectors confirmed:")
print("1. exists clause: Yes")
print("2. query_string: Yes")
print("3. wildcard/other operators with field in value: Yes")
EOFRepository: csg-org/CompactConnect
Length of output: 616
🏁 Script executed:
# Check if sort parameter is validated for dateOfBirth references
rg "sort" backend/cosmetology-app/lambdas/python/search/handlers/search.py | grep -A 2 -B 2 "dateOfBirth\|_validate_date_of_birth"Repository: csg-org/CompactConnect
Length of output: 48
🏁 Script executed:
# Confirm that _validate_date_of_birth_permission is only called on the query parameter, not sort
rg "_validate_date_of_birth_permission" backend/cosmetology-app/lambdas/python/search/handlers/search.py -B 2 -A 5Repository: csg-org/CompactConnect
Length of output: 716
🏁 Script executed:
# Verify if there are any tests that attempt to bypass using sort with dateOfBirth
rg "test.*sort" backend/cosmetology-app/lambdas/python/search/tests/function/test_search_providers.py -iRepository: csg-org/CompactConnect
Length of output: 536
🏁 Script executed:
# Let's verify the actual sort handling with a potential bypass
python3 << 'EOF'
def _query_references_field(obj, field_name: str) -> bool:
"""
Recursively check if any key in the query DSL references the given field name.
"""
if isinstance(obj, dict):
for key, value in obj.items():
if field_name in key:
return True
if _query_references_field(value, field_name):
return True
elif isinstance(obj, list):
for item in obj:
if _query_references_field(item, field_name):
return True
return False
# Simulate what happens when sort contains dateOfBirth
# This is NOT checked because _validate_date_of_birth_permission only checks body.get('query', {})
sort_bypass = [
{"licenses.dateOfBirth": "asc"}
]
# The check at line 67 only validates query parameter
query = {} # Empty query
result = _query_references_field(query, "dateOfBirth")
print(f"Query check result: {result}")
print(f"Sort parameter is: {sort_bypass}")
print(f"Sort parameter check: SKIPPED (not validated)")
print("\nBypass: A readGeneral caller can use sort parameter to reference dateOfBirth")
EOFRepository: csg-org/CompactConnect
Length of output: 268
🏁 Script executed:
# Verify the exact flow - check if sort is validated anywhere else in the codebase
rg "sort.*dateOfBirth|dateOfBirth.*sort" backend/cosmetology-app/lambdas/python/search/ -iRepository: csg-org/CompactConnect
Length of output: 48
DOB authorization is bypassable through value-based field references and the sort parameter.
The permission check only scans dict keys and misses field references embedded in scalar values. A readGeneral caller can bypass the guard via:
{"exists": {"field": "licenses.dateOfBirth"}}{"query_string": {"query": "licenses.dateOfBirth:..."}}[{"licenses.dateOfBirth": "asc"}]in the top-levelsortparameter (not validated at all)
Gate both query and sort parameters, and inspect exact field references in scalar field, fields, and query-string values instead of only checking dict keys.
Also applies to: 188-196, 204-254
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/cosmetology-app/lambdas/python/search/handlers/search.py` around
lines 66 - 67, The DOB permission check in _validate_date_of_birth_permission
currently only scans dict keys and misses field references embedded in scalar
values and the top-level sort parameter; update validation to gate both the
'query' and 'sort' inputs by recursively extracting exact field references from
dict keys and from scalar fields named 'field' or 'fields', as well as parsing
query-string values (e.g., "licenses.dateOfBirth:...") and exists/terms style
clauses; ensure _validate_date_of_birth_permission (and the places that call it
from the search handler) rejects any reference to 'dateOfBirth' (including
"licenses.dateOfBirth") found in nested structures, scalar 'field'/'fields'
entries, query_string values, and in sort arrays before allowing a caller with
only readGeneral scope to proceed.
| license_info = serializable['licenses'][0] | ||
| jurisdiction = license_info['jurisdiction'] | ||
| license_type = license_info['licenseType'] | ||
| serializable['documentId'] = f'{provider_id}#{jurisdiction}#{license_type}' |
There was a problem hiding this comment.
Normalize jurisdiction before writing the document and documentId.
If mixed-case data slips through here, the same logical license can get a different _id on a later reindex and exact licenses.jurisdiction filters become inconsistent. Lowercase the stored field before composing documentId.
🩹 Proposed fix
license_info = serializable['licenses'][0]
- jurisdiction = license_info['jurisdiction']
+ jurisdiction = license_info['jurisdiction'].lower()
+ license_info['jurisdiction'] = jurisdiction
license_type = license_info['licenseType']
serializable['documentId'] = f'{provider_id}#{jurisdiction}#{license_type}'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/cosmetology-app/lambdas/python/search/utils.py` around lines 47 - 50,
The code uses jurisdiction as-is when setting serializable['licenses'][0] and
composing serializable['documentId'], which can produce inconsistent IDs for
mixed-case input; normalize by lowercasing the jurisdiction value before storing
and before composing documentId (e.g., set jurisdiction =
license_info['jurisdiction'].lower() and update
serializable['licenses'][0]['jurisdiction'] accordingly) so
serializable['documentId'] uses the normalized jurisdiction; also ensure
provider_record_util.py and related modules follow the same lowercase
normalization pattern for licenses.jurisdiction.
|
moved into #1317 to resolve merge conflicts |
Cosmetology has slightly different requirements for public searching given the expected volume of practitioners with similar names. We need to support additional search parameters, such as license number. In order to accommodate these various patterns, we have determined to leverage the OpenSearch Domain to support public search, while keeping the request schema similar in shape to the public search request body used by JCC.
This requires us to support custom pagination tracking on the backend to ensure that all matching license records from a query are accounted for when paging through large numbers of matches.
This PR also includes the change to remove the financial summary related compact configuration fields, which were missed in the previous PR to remove all unneeded compact config fields.
Testing List
yarn test:unit:allshould run without errors or warningsyarn serveshould run without errors or warningsyarn buildshould run without errors or warningsbackend/compact-connect/tests/unit/test_api.pyrun compact-connect/bin/download_oas30.pyCloses #1295
Summary by CodeRabbit
New Features
Bug Fixes
Infrastructure
Tests & Docs