Update ICU to 78 - Part 2: Microsoft patch/code changes#179
Open
aoruganti-msft wants to merge 15 commits into
Open
Update ICU to 78 - Part 2: Microsoft patch/code changes#179aoruganti-msft wants to merge 15 commits into
aoruganti-msft wants to merge 15 commits into
Conversation
…ion_number Windows OS ICU build uses a versionless data filename (icudtl.dat) instead of versioned (icudtl78l.dat) so the filename does not churn each upgrade. Guarded by ICU_DATA_DIR_WINDOWS; public/SDK build is unaffected. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add IGNORE_WINDOWS_HEADERS_START/END markers around regions of putil.h and unistr.h that should be stripped from the Windows OS SDK header. putil.h: data-directory + timezone-files-directory + filesystem-separator constants are not user-mutable in Windows OS ICU. unistr.h: UStringCaseMapper internal callback typedef is meaningless to SDK consumers that don't expose C++ UnicodeString. No C/C++ semantics change; markers are pure comments interpreted by the Windows SDK header-stripping tool only. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add IGNORE_WINDOWS_HEADERS_START/END markers around regions of 4 public ICU headers so the Windows SDK header-stripping tool omits them. Reworked from ICU 72 form (8/12 hunks landed at new offsets, 2 hand-ported, 2 dropped): - uchar.h: wraps U_UNICODE_VERSION macro (runtime-variable; SDK consumers should use u_getUnicodeVersion() API instead). - uconfig.h: wraps uconfig_local.h include hook and UCONFIG_USE_WINDOWS_ LCID_MAPPING_API switch (compile-time settings irrelevant to SDK). - utypes.h: wraps ICUDATA naming scheme constants (Windows OS uses a fixed single-data-file layout). - uversion.h: wraps U_NAMESPACE_BEGIN/END and C++ namespace plumbing (Windows OS SDK exposes flat C APIs only). Dropped umachine.h hunks: U_OVERRIDE and U_FINAL macros no longer exist in ICU 78 (upstream removed them in favor of using the C++11 keywords directly). The patch's intent for that file is resolved by upstream. No C/C++ semantics change; markers are comments consumed only by the Windows SDK header-stripping tool. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Make u_cleanup() a no-op for the Windows OS ICU build to prevent race-condition crashes when multiple threads (Windows.Globalization, default OS sort, app code) are concurrently using ICU. - Real implementation renamed to uprv_u_cleanup() (private; combined DLL can still call it; not exported from DEF). - New public u_cleanup() under ICU_DATA_DIR_WINDOWS returns no-op; otherwise delegates to uprv_u_cleanup() so public/Nuget consumers retain the original behavior. Reworked: ICU 78 modernized the function signature from (void) to () and NULL to nullptr; the rework matches the new style. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…unk 1 only) icu4c/source/data/build.xml: Change CLDR_TMP_DIR from cldr-aux to cldr-staging (MS-ICU pipeline uses cldr-staging as its CLDR temp directory rather than vanilla CLDR's cldr-aux default). Dropped hunk 2 (build-icu-data.xml): target file removed in ICU 73+; the cldr-to-icu data build toolchain is now driven by config.xml + Maven + Cldr2Icu.java rather than that Ant build script. The hunk's intent (forceDelete=true; mvn->mvn.cmd for Windows) does not have a direct landing zone in the new toolchain. If Maven-on-Windows breaks during Step 6 data build, fix at the new invocation point then. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Under ICU_DATA_DIR_WINDOWS, make extendICUData() return early (false). Windows OS ICU has only one data file (versionless icudtl.dat from patch 000) and never has extended data; running the normal extension path would try to load icudt78l.dat on top of the already-loaded common data, creating redundant work or load conflicts. Reworked: ICU 78 modernized FALSE -> false in this file; the new guard matches that style. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Two Linux make-dist adjustments: - DISTY_FILES strips DISTY_DOC_ZIP. MS-ICU does not build Doxygen docs in its release pipeline; the dist target would otherwise fail looking for a non-existent docs.zip. - git archive path adapted for MS-ICU GitHub layout: cd ../.. (two levels up instead of one) and HEAD:icu/icu4c/ (extra icu/ prefix) because microsoft/icu has its icu4c tree at icu/icu4c/ rather than vanilla ICU's top-level icu4c/. Reworked: ICU 78 fixed an upstream typo (we watn -> we want) and added testdata/ copy logic to dist.mk (lines 72-73); both are preserved. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…regions Remove test entries for blocked region codes (DG, EA, EH, IC, etc.) from ICU's region tests. MS-ICU strips these codes from data via GeoPol policy; the tests would otherwise fail looking up regions that no longer exist in MS-ICU's region data. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…namicLink_UCRT Statically link VCRuntime + VCStartup + STL into icuuc.dll / icuin.dll, but keep UCRT (ucrtbase.dll) dynamic. This eliminates the VC Redist dependency for consumers — Windows 10+ ships UCRT, but VCRuntime and STL would otherwise require manual VC Redist install. Mechanism (applied to both common.vcxproj and i18n.vcxproj, Debug + Release blocks): - RuntimeLibrary: MultiThreadedDebugDLL/MultiThreadedDLL -> MultiThreaded Debug/MultiThreaded (compiler switches to static C++ runtime). - IgnoreSpecificDefaultLibraries=libucrtd.lib;libucrt.lib (linker drops the static UCRT pulled in by /MT[d]). - /DEFAULTLIB:ucrt[d].lib via AdditionalOptions (force the dynamic UCRT). Reworked: ICU 78 already uses $(IcuMajorVersion) macro for DLL names (unchanged by this patch). Verified no arch-specific overrides exist — only generic Debug/Release ItemDefinitionGroups — so the fix applies uniformly to x86, x64, and ARM64. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add ICU major version number to PDB (debug symbol) filenames so they match the DLL filenames. Prevents PDB filename collision when two ICU versions are deployed side-by-side and lets debuggers correctly correlate symbols across versions. Files updated: common.vcxproj, i18n.vcxproj, stubdata.vcxproj. PDBs become icuuc78.pdb / icuuc78d.pdb, icuin78.pdb / icuin78d.pdb, icudt78.pdb (matching the existing icuuc78.dll / icuuc78d.dll / icuin78.dll / icuin78d.dll / icudt78.dll naming). Reworked: Used $(IcuMajorVersion) MSBuild macro rather than hardcoding 78. This is the same pattern ICU 78's <OutputFile> tags already use in these vcxproj files, so future ICU upgrades won't need to touch these strings. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…tra_locales Bump STRING_STORE_SIZE from 100000 to 120000 in package.h. The package tool uses this as a static buffer for item names when building the .dat data file; CLDR-MS adds extra locales that overflow the vanilla 100K buffer. Applied verbatim from the patch (120000). If CLDR 48 + MS-CLDR overlay overflows this in the Step 6 data build, bump further (see followup todo). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…locale_as_full_BCP47_tag
Add MS-only uprefs library so uprv_getDefaultLocaleID() on Windows
returns a full BCP47 tag (e.g. en-US-u-ca-gregory-hc-h12-fw-mon-ms-metric)
that encodes the user's calendar, currency, hour cycle, first day of
week, sort method, and measurement system from Windows Globalization
APIs. Vanilla ICU only returns language+region.
Files:
- new uprefs.cpp/h (common library, gated on UCONFIG_USE_WINDOWS_PREFERENCES_LIBRARY
and U_PLATFORM_USES_ONLY_WIN32_API)
- new uprefstest.cpp/h
- putil.cpp wires uprefs_getBCP47Tag() into uprv_getDefaultLocaleID();
unifies buffer sizing (POSIX_LOCALE_CAPACITY -> length * 2)
- uconfig.h defines UCONFIG_USE_WINDOWS_PREFERENCES_LIBRARY = 1
- sources.txt lists uprefs.cpp
- common.vcxproj
- common_uwp.vcxproj reference uprefs.cpp/h
- test/intltest/ Makefile.in + intltest.vcxproj wire uprefstest.{cpp,h};
itutil.cpp registers UPrefsTest class
Reworked from ICU 72 form:
- 8 hunks applied cleanly via git apply (with offsets only)
- 5 build-system list-insertion hunks reworked manually due to context drift
(ICU 78 added fixedstring.cpp, new test files between the patch's anchor
lines)
- New file uprefstest.cpp uses backup version from ICU 72.1.0.4 (commit
860c2ea by Rahul Pandey, "Add missing parameters to MockGetLocaleInfoEx"
Nov 2022) which contains style/whitespace cleanup not present in the
original 2021 patch file. Patch file in icu-patches/patches/ remains stale
and will be regenerated at end of upgrade.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Patch 018 originally bumped this from 100000 -> 120000 for the CLDR-MS extra-locales overflow at the package-tool stage of data build. CLDR 48 has substantially more locales than CLDR 44 (which 120000 was sized for); prior session evidence (now-deleted branch) suggests 120000 may still overflow at Step 6 data build. Pre-emptively bump to 200000 to avoid a Step 6 rerun on overflow. If the actual measurement at Step 6 shows 120000 was enough, we can revisit post-shipping. Bumping high now is harmless (static array sized at compile time in tool; trivial RAM increase only while makedata runs). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ry_validation The new ICU 78 Cldr2Icu Maven/Java toolchain replaces the deleted build-icu-data.xml Ant entry point. Its CLI-options constructor runs validateEnvironment() and System.exit(1)s unless ICU_DIR contains an icu4j/ subdirectory. The microsoft/icu fork is icu4c-only -- there is no icu4j source tree -- so the Step 6 data-build pipeline cannot run without bypassing this check. The runtime Java dependency on icu4j (used by TransformsMapper for Transliterator) is satisfied via the Maven artifact com.ibm.icu:icu4j in ~/.m2, which is unaffected by source-tree absence. This is an MS-only divergence. Both the patch file and the applied source change land here as a single commit; the patch file documents the divergence in icu-patches/patches/ for future upgrades. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use the Unicode U+20C1 SAUDI RIYAL SIGN for the ar-SA SAR currency symbol and add a C API regression test for the locale-specific override. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Part 2 of the ICU 78 update split.
This PR is stacked on top of Part 1 (#178) and contains the Microsoft patch/code changes bucket only. It applies the MS-specific ICU source and build patches on top of the pure upstream ICU 78 dump, so reviewers can inspect the Microsoft delta separately from the bulk upstream ingestion.
Included here:
Excluded from this PR:
Original source commits from the abandoned combined PR: e9bf197, 4996ac6, acd296c, 55e244b, 7a1ec2c, 94ef83d, 16a6fdd, 88bc9e2, d3fee5b, 466e759, 88d4592, 9ed9c47, f84323b, 8a8adb1, 3e973e8.