MSBuild Weekly Report — 2026-05-12 #19
Replies: 10 comments
-
🔍 Investigation: Issue #13734 — ToolTask drops final stdout/stderr lines after Wave18_6 EOF waitUpstream: dotnet/msbuild#13734 SummaryThis is a race condition bug in MSBuild's Root Cause AnalysisLocation: The problematic code: private void WaitForProcessExit(Process proc)
{
if (ChangeWaves.AreFeaturesEnabled(ChangeWaves.Wave18_6))
{
// Wait for process handle to be signaled
proc.WaitForExit(int.MaxValue); // Does NOT drain pipes when timeout is provided
// Wait for async pipe EOF with bounded 2-second timeout
const int eofTimeoutSec = 2;
WaitHandle[] eofEvents = [_standardOutputEOF, _standardErrorEOF];
WaitHandle.WaitAll(eofEvents, TimeSpan.FromSeconds(eofTimeoutSec));
// Return value discarded - no further drain on timeout!
}
else
{
proc.WaitForExit(); // Legacy: waits for EOF but can hang on grandchildren
}
}The race condition mechanism:
Why CI hits this and local dev boxes don't:
Reproduction DetailsTest that fails: The test:
Expected failure modes when race occurs:
Why direct reproduction is difficult:
The issue author reports this happens intermittently in PR builds, which is consistent with CI runner characteristics (2 vCPUs, high parallelism). Source Code EvidenceVerified findings from dotnet/msbuild repository:
Suggested Next Steps
Additional Context
Automated investigation by MSBuild Weekly Report workflow
|
Beta Was this translation helpful? Give feedback.
-
🔍 Investigation: Issue #13718 — Fail fast on connect-to-node when launched child process has already exitedUpstream: dotnet/msbuild#13718 SummaryThis is a feature enhancement proposal from MSBuild team member Root Cause AnalysisProblem Context: Technical Root Cause:
If the child process crashes before opening its pipe (missing dependencies, permission errors, apphost failures), the parent holds a valid Proposed Solution:
The design specifically avoids async/await, Reproduction DetailsReproduction Approach: This is a performance enhancement to reduce failure detection time, not a bug fix for incorrect behavior. Reproducing requires:
Status: Direct reproduction is inconclusive without the ability to intentionally corrupt the MSBuild installation or inject controlled process failures. The issue includes a comprehensive implementation specification with test requirements covering the fast-fail scenario. Suggested Next Steps
Priority: Medium — This is a quality-of-life improvement for development inner loop when builds fail. The immediate .NET 10.0.300 crash is being addressed in #13716, so this issue focuses on improving the failure path responsiveness once the server launch is stable. Automated investigation by MSBuild Weekly Report workflow
|
Beta Was this translation helpful? Give feedback.
-
🔍 Investigation: Issue #13674 — Invalid Project File Name ErrorUpstream: dotnet/msbuild#13674 SummaryThis issue occurs when a .NET project file is named with only an extension (e.g., Root Cause AnalysisThe error originates in NuGet.targets (line 679 in the user's SDK version), specifically when calling (GetRestoreProjectStyleTask
HasPackageReferenceItems="$(_HasPackageReferenceItems)"
MSBuildProjectDirectory="$(MSBuildProjectDirectory)"
MSBuildProjectName="$(MSBuildProjectName)"
RestoreProjectStyle="$(RestoreProjectStyle)")
(Output TaskParameter="ProjectStyle" PropertyName="RestoreProjectStyle" /)
(Output TaskParameter="IsPackageReferenceCompatibleProjectStyle" PropertyName="PackageReferenceCompatibleProjectStyle" /)
(/GetRestoreProjectStyleTask)
````
**What's happening:**
1. When a project file is named `.csproj`, MSBuild sets `$(MSBuildProjectName)` to an empty string (since the filename without extension is empty)
2. The `GetRestoreProjectStyleTask` requires `MSBuildProjectName` as a mandatory parameter
3. MSBuild throws `MSB4044` because the required parameter has no value
4. The error message doesn't explain that the underlying issue is an invalid project filename
**Source Location:**
- NuGet.Client repository: `src/NuGet.Core/NuGet.Build.Tasks/NuGet.targets` (line ~679)
- Task definition: `NuGet.Build.Tasks.GetRestoreProjectStyleTask`
### Reproduction Details
**Environment:** .NET SDK 10.0.203, MSBuild 18.3.3, Windows 10
**Attempted reproduction steps:**
1. Create a console app: `dotnet new console -n TestApp`
2. Rename project file: `Rename-Item TestApp.csproj .csproj`
3. Build: `dotnet build`
**Expected error:**
````
error MSB4044: 未给任务"GetRestoreProjectStyleTask"的必需参数"MSBuildProjectName"赋值。
(The "GetRestoreProjectStyleTask" task was not given a value for the required parameter "MSBuildProjectName".)Note: Full reproduction was not completed due to environment constraints, but the root cause is clearly identified in the source code and matches the user's detailed report. Suggested Next Steps
This is a legitimate usability issue — developers shouldn't have to debug cryptic parameter errors when the actual problem is a simple file naming mistake. Automated investigation by MSBuild Weekly Report workflow
|
Beta Was this translation helpful? Give feedback.
-
🔍 Investigation: Issue #13667 — Fix or disable flaky testsUpstream: dotnet/msbuild#13667 SummaryThis tracking issue addresses the Root Cause AnalysisTest Purpose:
Technical Implementation:
Root Cause: Race Conditions Under CI Load
Historical Attempts to Fix (All Failed):
Why All Fixes Failed: Reproduction DetailsAttempted: Local reproduction by creating a minimal test project simulating the test scenario. Result: Reproduction is not feasible because:
Observed from PR #13698 comment:
This confirms the race condition is environment-specific and timing-dependent. Suggested Next Steps🎯 Option 1: Proper Fix — Mock External Process (RECOMMENDED)Refactor protected override int ExecuteTool(string pathToTool, string responseFileCommands,
string commandLineCommands)
{
int delay = RepeatCount < 2 ? InitialDelay : FollowupDelay;
// Use Task.Delay instead of external process
bool completed = Task.Delay(delay).Wait(Timeout > 0 ? Timeout : -1);
return completed ? 0 : 1; // 0 = success, 1 = timeout
}Benefits:
⚙️ Option 2: Significantly Increase Timing MarginsChange to much larger gaps:
Trade-offs:
🚫 Option 3: Permanent Skip on CI (Current Trajectory)Accept the Implications:
🗑️ Option 4: Remove Test EntirelyEvaluate whether timeout/retry functionality is adequately covered by other tests, and if so, remove this specific test. Recommendation: Pursue Option 1 (mock external process). This addresses the root cause rather than symptoms, provides deterministic behavior, and maintains full CI coverage. If the MSBuild team lacks bandwidth for this refactor, Option 3 (permanent skip) is acceptable but should be explicitly documented as such in issue #13667. Automated investigation by MSBuild Weekly Report workflow
|
Beta Was this translation helpful? Give feedback.
-
🔍 Investigation: Issue #13648 — Restore fails due to missing package Microsoft.Extensions.Logging.Log4Net.AspNetCore.Upstream: dotnet/msbuild#13648 SummaryThe issue reports a NuGet restore failure (NU1101) for package Root Cause AnalysisThis is not a MSBuild bug but rather a package availability/configuration issue:
Relevant MSBuild/NuGet Components:
The error NU1101 is a standard NuGet error code for "package not found in any source", not a MSBuild-specific issue. Reproduction DetailsAttempted: Local reproduction was attempted but environment restrictions (permission denied for directory creation and dotnet commands) prevented full testing. Expected Behavior: The package should either:
Error Output: Suggested Next Steps
Root Issue: This is a package availability and feed configuration problem, not a MSBuild defect. The MSBuild/NuGet restore mechanism is working correctly by reporting that the package cannot be found in the configured sources. Automated investigation by MSBuild Weekly Report workflow
|
Beta Was this translation helpful? Give feedback.
-
🔍 Investigation: Issue #13647 — /mt build fails for Blazor WASM and OC due to incorrect path separators on UnixUpstream: dotnet/msbuild#13647 SummaryThis is a critical path handling bug in MSBuild's multithreaded ( Root Cause AnalysisThe issue occurs in the MSBuild static web assets task execution during multithreaded builds. The error originates from:
Key diagnostic evidence: The path shows
The issue is labeled "Area: Multithreaded" confirming it's specific to parallel build execution. The stack trace shows the failure occurs during file I/O operations ( Reproduction DetailsAttempted reproduction: Created minimal test scenario for Blazor WASM with Reproduction steps (for manual testing): dotnet new blazorwasm -n Net8BlazorWasmApp
cd Net8BlazorWasmApp
dotnet build /mtOn Linux/macOS, this should trigger the path separator issue during StaticWebAssets cache file creation. Suggested Next Steps
Automated investigation by MSBuild Weekly Report workflow
|
Beta Was this translation helpful? Give feedback.
-
🔍 Investigation: Issue #13599 — Copy task should retry on FileNotFoundException (MSB3030) during parallel buildsUpstream: dotnet/msbuild#13599 SummaryThe MSBuild Root Cause AnalysisCode Location: The issue is architectural:
Referenced related issue: #9462 discusses parallel build scheduling but focuses on separate top-level invocations rather than intra-build scheduling. Reproduction DetailsReproduction environment would require:
The race condition is a TOCTOU (Time-Of-Check-Time-Of-Use) gap that manifests under high parallelism. Build restrictions prevented full reproduction in this environment, but the code analysis confirms the described behavior. Key code flow: Suggested Next StepsShort-term (symptom mitigation):
Long-term (root cause fix):
Priority: High - 25 failures/month in ASP.NET Core with 30+ minute retry costs makes this a significant productivity issue Automated investigation by MSBuild Weekly Report workflow
|
Beta Was this translation helpful? Give feedback.
-
🔍 Investigation: Issue #13508 — ShutdownAllNodes / dotnet build-server shutdown broken when AppHost is usedUpstream: dotnet/msbuild#13508 SummaryWhen .NET SDK uses AppHost, MSBuild worker nodes launch as Root Cause AnalysisThe issue is in bool isNativeHost = msbuildLocation != null && Path.GetFileName(msbuildLocation).Equals(Constants.MSBuildExecutableName, StringComparison.OrdinalIgnoreCase);
string expectedProcessName = Path.GetFileNameWithoutExtension(isNativeHost ? msbuildLocation : (CurrentHost.GetCurrentHost() ?? msbuildLocation));
Process[] processes;
try
{
processes = Process.GetProcessesByName(expectedProcessName);
}The problem:
PR #13501's fix added a fallback for .NET Framework: #if NETFRAMEWORK
// Fall back to the standard executable name for most nodes
// on .NET Framework, to function in `ShutdownAllNodes()`
expectedProcessName ??= Constants.MSBuildAppName;
#endifHowever, this only applies to Reproduction DetailsAttempted reproduction:
Environment constraints prevented full hands-on reproduction, but the code analysis clearly shows the mismatch between expected process name ("dotnet") and actual process name ("MSBuild.exe"). Suggested Next Steps
Automated investigation by MSBuild Weekly Report workflow
|
Beta Was this translation helpful? Give feedback.
-
🔍 Investigation: Issue #12438 — Potential MSBuild regression "internal failure"Upstream: dotnet/msbuild#12438 SummaryThis is an intermittent MSBuild internal error that occurs when building multi-project solutions with parallel execution enabled. The error "Invalid node id specified: X" indicates MSBuild is attempting to communicate with out-of-process build nodes that have terminated prematurely or whose IDs are no longer valid in the node context dictionary. Root Cause AnalysisThe error originates from public void SendData(int nodeId, INodePacket packet)
{
ErrorUtilities.VerifyThrow(_nodeContexts.ContainsKey(nodeId), $"Invalid node id specified: {nodeId}.");
SendData(_nodeContexts[nodeId], packet);
}
````
**Key observations:**
1. **Race condition in node lifecycle**: The error occurs when MSBuild attempts to send data to a node ID that was previously valid but has been removed from `_nodeContexts` (a `ConcurrentDictionary(int, NodeContext)`). This suggests a timing issue where:
- A node terminates (calling `NodeContextTerminated()` which removes it from `_nodeContexts`)
- But the BuildManager still has pending work scheduled for that node
- When `SendData()` is called, the node ID no longer exists
2. **Affected environments**:
- Primarily occurs in CI/CD pipelines (Azure DevOps)
- Recently reported on Windows Server Core 2022 on KVM (potential virtualization timing issues)
- Does NOT reproduce locally, suggesting dependency on system load, CPU count, or scheduling behavior
3. **Suspect commit range**: Issue was first noticed after commits between `fdaf9502206681459aa4019a612dd8ea2ad0bedb...1f60126fb517e342dd563e50bb3cb52f8109f837`. Notably, commit `1f60126` **reverted** PR #12190 ("Drain packet queue reorder fix") due to ARM64 failures — this revert may have reintroduced the race condition.
4. **Workaround confirmed**: Multiple users report that disabling parallel builds (`-nr:False -m:1`) eliminates the issue, strongly supporting a concurrency/synchronization bug.
### Reproduction Details
**Attempted reproduction:** Cannot reliably reproduce in standard environments as the issue is race-condition dependent and appears to require specific conditions:
- High CPU count (8+ vCPUs)
- Concurrent multi-project builds
- Specific timing/load conditions
**Error pattern from issue:**
````
Microsoft.Build.Framework.InternalErrorException: MSB0001: Internal MSBuild Error: Invalid node id specified: 2.
at Microsoft.Build.Shared.ErrorUtilities.ThrowInternalError(String message, Object[] args)
at Microsoft.Build.Shared.ErrorUtilities.VerifyThrow(Boolean condition, String unformattedMessage, Object arg0)
at Microsoft.Build.BackEnd.NodeProviderOutOfProc.SendData(Int32 nodeId, INodePacket packet)
at Microsoft.Build.BackEnd.NodeManager.SendData(Int32 node, INodePacket packet)
at Microsoft.Build.Execution.BuildManager.PerformSchedulingActions(IEnumerable`1 responses)Suggested Next Steps
Automated investigation by MSBuild Weekly Report workflow
|
Beta Was this translation helpful? Give feedback.
-
🧪 Investigation Results SummarySummary: Investigated 9 issue(s) — 0 reproduced, 0 not reproduced, 9 inconclusive All investigations completed successfully with root cause analysis, but full reproduction was inconclusive due to environment constraints (race conditions, CI load requirements, or sandboxed limitations). Each investigation provides detailed analysis of the underlying code and actionable recommendations.
Investigation NotesAll 9 investigations identified root causes and provided detailed technical analysis despite reproduction challenges:
Recommendations:
Investigation summary generated automatically. Scroll up to see individual detailed investigation comments.
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
MSBuild Weekly Activity Report
Report Generated: 2026-05-12
Reporting Period: Past 14 days (2026-04-28 to 2026-05-12)
Repository: dotnet/msbuild
Quick Stats
Section 1 — New Unassigned Issues (created in past 14 days)
.csprojproduces confusing MSB4044 error, doesn't point to real causeSection 2 — Older Unassigned Issues with Recent Activity
dotnet build(except errors) for coding agentsSection 3 — Open Pull Requests Triage
Note: Additional 24 PRs with activity not shown for brevity. Most are automated merges, bot PRs, or older PRs with minor updates.
🔍 Issues Flagged for Deeper Investigation
The following bugs and regressions from Sections 1 and 2 have been flagged for automated deeper investigation:
🧪 Investigation Results
Investigation results will be added here as they complete.
Beta Was this translation helpful? Give feedback.
All reactions