chore: switch from sonnet to haiku in maintainers processing [CM-1049] by mbani01 · Pull Request #3915 · linuxfoundation/crowd.dev

mbani01 · 2026-03-12T15:22:24Z

This pull request updates the Bedrock model used for maintainer extraction, improves handling of model output, updates cost calculations, and clarifies the extraction prompt for maintainers and contributors. The most important changes are grouped below:

Bedrock Model and Output Handling:

Switched the Bedrock model from claude-sonnet-4 to claude-haiku-4.5 for faster and potentially more cost-effective inference (bedrock.py).
Added logic to strip markdown code fences from model output, addressing cases where Haiku ignores system prompts and wraps JSON in markdown.

Cost Calculation:

Updated token cost calculations to match claude-haiku-4.5 pricing ($0.80 per 1M input tokens, $4.00 per 1M output tokens), replacing the previous Sonnet 4 rates.

Maintainer Extraction Prompt Improvements:

Clarified the prompt to ensure that when explicit roles are not assigned per person, every individual receives the same derived title (e.g., "Maintainer" or "Contributor"), and that all persons must be included even if their email is unknown.
Refined the logic for determining normalized_title based on filename patterns, making the mapping from filename to role more explicit and robust.

Note

Medium Risk
Medium risk because it changes the underlying LLM model and parsing behavior for maintainer extraction, which can alter extracted results and downstream maintainer updates/cost reporting.

Overview
Switches the Bedrock invocation used for maintainer extraction from Claude Sonnet to Claude Haiku 4.5, and updates token cost calculation to Haiku’s per-1M-token pricing.

Hardens response handling by stripping optional markdown code fences before json.loads() to tolerate non-compliant model output.

Refines the maintainer extraction prompt to derive consistent per-person titles when roles aren’t explicitly labeled, to map normalized_title more deterministically from filename patterns, and to require including people even when emails are missing.

^{Written by Cursor Bugbot for commit 1dea030. This will update automatically on new commits. Configure here.}

Signed-off-by: Mouad BANI <mouad-mb@outlook.com>

github-actions · 2026-03-12T15:22:38Z

⚠️ Jira Issue Key Missing

Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability.

Example:

feat: add user authentication (CM-123)
feat: add user authentication (IN-123)

Projects:

CM: Community Data Platform
IN: Insights

Please add a Jira issue key to your PR title.

CLAassistant · 2026-03-12T15:22:42Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

github-actions · 2026-03-12T15:22:55Z

⚠️ Jira Issue Key Missing

Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability.

Example:

feat: add user authentication (CM-123)
feat: add user authentication (IN-123)

Projects:

CM: Community Data Platform
IN: Insights

Please add a Jira issue key to your PR title.

github-actions · 2026-03-12T15:24:32Z

⚠️ Jira Issue Key Missing

Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability.

Example:

feat: add user authentication (CM-123)
feat: add user authentication (IN-123)

Projects:

CM: Community Data Platform
IN: Insights

Please add a Jira issue key to your PR title.

Copilot

Pull request overview

This PR updates the Git Integration maintainer-extraction pipeline to use a different AWS Bedrock Claude model, adjusts the extraction prompt guidance, and updates response parsing/cost calculation to better match the new model’s behavior and pricing.

Changes:

Switched Bedrock inference from Claude Sonnet 4 to Claude Haiku 4.5 and updated token cost calculations accordingly.
Added markdown code-fence stripping to handle cases where the model wraps JSON in ``` blocks.
Refined the maintainer/contributor extraction prompt rules (role/title derivation, filename-to-normalized_title mapping, and inclusion rules for people without emails).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
services/apps/git_integration/src/crowdgit/services/maintainer/maintainer_service.py	Tightens the LLM extraction prompt to improve role/title normalization and ensure people aren’t omitted when email is missing.
services/apps/git_integration/src/crowdgit/services/maintainer/bedrock.py	Switches Bedrock model + updates cost calculation and adds response post-processing for markdown-wrapped JSON outputs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot · 2026-03-12T15:25:29Z

services/apps/git_integration/src/crowdgit/services/maintainer/bedrock.py

+            if raw_text.startswith("```"):
+                raw_text = raw_text.split("\n", 1)[-1]
+                if raw_text.endswith("```"):
+                    raw_text = raw_text.rsplit("```", 1)[0]
+                raw_text = raw_text.strip()


The markdown code-fence stripping can still leave a trailing closing fence in common outputs like json\n{...}\n\n (note the newline after the closing fence). In that case raw_text.endswith("```") is false before .strip(), so the closing fence remains and json.loads(raw_text) will fail. Consider stripping whitespace before checking for the closing fence (or using a small regex that removes leading .*\n and trailing with surrounding whitespace).

Suggested change

if raw_text.startswith("```"):

raw_text = raw_text.split("\n", 1)[-1]

if raw_text.endswith("```"):

raw_text = raw_text.rsplit("```", 1)[0]

raw_text = raw_text.strip()

stripped_text = raw_text.strip()

if stripped_text.startswith("```"):

# Remove the opening code fence (and optional language specifier)

stripped_text = stripped_text.split("\n", 1)[-1]

# Tolerate trailing whitespace/newlines after the closing fence

if stripped_text.rstrip().endswith("```"):

stripped_text = stripped_text.rstrip().rsplit("```", 1)[0]

raw_text = stripped_text.strip()

Copilot · 2026-03-12T15:25:29Z

services/apps/git_integration/src/crowdgit/services/maintainer/bedrock.py

        try:
            body_bytes = await response["body"].read()
            response_body = json.loads(body_bytes.decode("utf-8"))
            raw_text = response_body["content"][0]["text"].replace('"""', "").strip()


If decoding/parsing the Bedrock response body fails at response_body = json.loads(...), the except block later will try to log response_body["content"][0]["text"], but response_body will be undefined, masking the original error. Consider initializing response_body/raw_text to a safe default before the try, and in the exception path log the raw body_bytes (or decoded text) when JSON parsing fails.

Copilot · 2026-03-12T15:25:30Z

services/apps/git_integration/src/crowdgit/services/maintainer/bedrock.py

+            # Strip markdown code fences if present (Haiku sometimes ignores the system prompt)
+            if raw_text.startswith("```"):
+                raw_text = raw_text.split("\n", 1)[-1]
+                if raw_text.endswith("```"):
+                    raw_text = raw_text.rsplit("```", 1)[0]
+                raw_text = raw_text.strip()
+
            output = json.loads(raw_text)


The new parsing behavior (stripping markdown code fences before json.loads) is a regression-prone edge case that would benefit from a unit test (e.g., outputs with ```json fences, trailing newlines, and plain JSON). There is an existing pytest suite under services/apps/git_integration/src/test, but no coverage around this Bedrock response parsing currently.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.}

cursor · 2026-03-12T15:26:46Z

services/apps/git_integration/src/crowdgit/services/maintainer/bedrock.py

-            input_cost = (input_tokens / 1000) * 0.003
-            output_cost = (output_tokens / 1000) * 0.015
+            input_cost = (input_tokens / 1_000_000) * 0.80
+            output_cost = (output_tokens / 1_000_000) * 4.00


Incorrect Haiku 4.5 token pricing underestimates costs

Medium Severity

The cost calculation uses $0.80 per 1M input tokens and $4.00 per 1M output tokens, but the actual AWS Bedrock pricing for Claude Haiku 4.5 is $1.00 per 1M input tokens and $5.00 per 1M output tokens. Both the comment and the multiplier values are wrong, causing cost tracking to underreport actual spend by 20%.

@mbani01 coment is correct. Should be $5

joanagmaia · 2026-03-12T17:55:25Z

services/apps/git_integration/src/crowdgit/services/maintainer/bedrock.py

-            input_cost = (input_tokens / 1000) * 0.003
-            output_cost = (output_tokens / 1000) * 0.015
+            input_cost = (input_tokens / 1_000_000) * 0.80
+            output_cost = (output_tokens / 1_000_000) * 4.00


@mbani01 coment is correct. Should be $5

mbani01 added 2 commits March 12, 2026 15:19

feat: switch to Haiku model

5d2f0e2

Signed-off-by: Mouad BANI <mouad-mb@outlook.com>

chore: adapt query to Haiku model

1dea030

Signed-off-by: Mouad BANI <mouad-mb@outlook.com>

mbani01 self-assigned this Mar 12, 2026

Copilot AI review requested due to automatic review settings March 12, 2026 15:22

mbani01 requested a review from joanagmaia March 12, 2026 15:22

Copilot started reviewing on behalf of mbani01 March 12, 2026 15:22 View session

Copilot AI reviewed Mar 12, 2026

View reviewed changes

mbani01 changed the title ~~chore: switch from sonnet to haiku in maintainers processing~~ chore: switch from sonnet to haiku in maintainers processing [CM-1049] Mar 12, 2026

cursor bot reviewed Mar 12, 2026

View reviewed changes

joanagmaia approved these changes Mar 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: switch from sonnet to haiku in maintainers processing [CM-1049]#3915

chore: switch from sonnet to haiku in maintainers processing [CM-1049]#3915
mbani01 wants to merge 2 commits intomainfrom
feat/switch_maintainer_to_haiku

mbani01 commented Mar 12, 2026 •

edited by cursor bot

Loading

Uh oh!

github-actions bot commented Mar 12, 2026

Uh oh!

CLAassistant commented Mar 12, 2026

Uh oh!

github-actions bot commented Mar 12, 2026

Uh oh!

github-actions bot commented Mar 12, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 12, 2026

Uh oh!

Copilot AI Mar 12, 2026

Uh oh!

Copilot AI Mar 12, 2026

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Mar 12, 2026

Uh oh!

joanagmaia Mar 12, 2026

Uh oh!

joanagmaia Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

-            if raw_text.startswith("```"):
-                raw_text = raw_text.split("\n", 1)[-1]
-                if raw_text.endswith("```"):
-                    raw_text = raw_text.rsplit("```", 1)[0]
-                raw_text = raw_text.strip()
+            stripped_text = raw_text.strip()
+            if stripped_text.startswith("```"):
+                # Remove the opening code fence (and optional language specifier)
+                stripped_text = stripped_text.split("\n", 1)[-1]
+                # Tolerate trailing whitespace/newlines after the closing fence
+                if stripped_text.rstrip().endswith("```"):
+                    stripped_text = stripped_text.rstrip().rsplit("```", 1)[0]
+                raw_text = stripped_text.strip()

Conversation

mbani01 commented Mar 12, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 12, 2026

Uh oh!

CLAassistant commented Mar 12, 2026

Uh oh!

github-actions bot commented Mar 12, 2026

Uh oh!

github-actions bot commented Mar 12, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Mar 12, 2026

Choose a reason for hiding this comment

Incorrect Haiku 4.5 token pricing underestimates costs

Uh oh!

joanagmaia Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

joanagmaia Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mbani01 commented Mar 12, 2026 •

edited by cursor bot

Loading