Skip to content

Update ICU to 78 - Part 2: Microsoft patch/code changes#179

Open
aoruganti-msft wants to merge 15 commits into
user/aoruganti/icu78-upstream-dumpfrom
user/aoruganti/icu78-ms-patches
Open

Update ICU to 78 - Part 2: Microsoft patch/code changes#179
aoruganti-msft wants to merge 15 commits into
user/aoruganti/icu78-upstream-dumpfrom
user/aoruganti/icu78-ms-patches

Conversation

@aoruganti-msft
Copy link
Copy Markdown
Collaborator

Part 2 of the ICU 78 update split.

This PR is stacked on top of Part 1 (#178) and contains the Microsoft patch/code changes bucket only. It applies the MS-specific ICU source and build patches on top of the pure upstream ICU 78 dump, so reviewers can inspect the Microsoft delta separately from the bulk upstream ingestion.

Included here:

  • Windows datafile naming/header/build behavior patches
  • Windows/Linux build-related ICU patch applications
  • uprefs support for default locale as full BCP47 tag
  • Cldr2Icu directory validation adjustment
  • ar-SA Saudi Riyal symbol override

Excluded from this PR:

  • Regenerated CLDR/ICU data
  • Blocked-content data cleanup
  • Test expectation ports
  • Patch-record refreshes and dev report updates
  • CI/repo plumbing changes

Original source commits from the abandoned combined PR: e9bf197, 4996ac6, acd296c, 55e244b, 7a1ec2c, 94ef83d, 16a6fdd, 88bc9e2, d3fee5b, 466e759, 88d4592, 9ed9c47, f84323b, 8a8adb1, 3e973e8.

Arvind Oruganti and others added 15 commits May 22, 2026 21:46
…ion_number

Windows OS ICU build uses a versionless data filename (icudtl.dat) instead
of versioned (icudtl78l.dat) so the filename does not churn each upgrade.
Guarded by ICU_DATA_DIR_WINDOWS; public/SDK build is unaffected.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add IGNORE_WINDOWS_HEADERS_START/END markers around regions of putil.h
and unistr.h that should be stripped from the Windows OS SDK header.

putil.h: data-directory + timezone-files-directory + filesystem-separator
constants are not user-mutable in Windows OS ICU.

unistr.h: UStringCaseMapper internal callback typedef is meaningless to
SDK consumers that don't expose C++ UnicodeString.

No C/C++ semantics change; markers are pure comments interpreted by the
Windows SDK header-stripping tool only.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add IGNORE_WINDOWS_HEADERS_START/END markers around regions of 4 public
ICU headers so the Windows SDK header-stripping tool omits them.

Reworked from ICU 72 form (8/12 hunks landed at new offsets, 2 hand-ported,
2 dropped):

- uchar.h: wraps U_UNICODE_VERSION macro (runtime-variable; SDK consumers
  should use u_getUnicodeVersion() API instead).
- uconfig.h: wraps uconfig_local.h include hook and UCONFIG_USE_WINDOWS_
  LCID_MAPPING_API switch (compile-time settings irrelevant to SDK).
- utypes.h: wraps ICUDATA naming scheme constants (Windows OS uses a
  fixed single-data-file layout).
- uversion.h: wraps U_NAMESPACE_BEGIN/END and C++ namespace plumbing
  (Windows OS SDK exposes flat C APIs only).

Dropped umachine.h hunks: U_OVERRIDE and U_FINAL macros no longer exist
in ICU 78 (upstream removed them in favor of using the C++11 keywords
directly). The patch's intent for that file is resolved by upstream.

No C/C++ semantics change; markers are comments consumed only by the
Windows SDK header-stripping tool.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Make u_cleanup() a no-op for the Windows OS ICU build to prevent
race-condition crashes when multiple threads (Windows.Globalization,
default OS sort, app code) are concurrently using ICU.

- Real implementation renamed to uprv_u_cleanup() (private; combined DLL
  can still call it; not exported from DEF).
- New public u_cleanup() under ICU_DATA_DIR_WINDOWS returns no-op;
  otherwise delegates to uprv_u_cleanup() so public/Nuget consumers
  retain the original behavior.

Reworked: ICU 78 modernized the function signature from (void) to ()
and NULL to nullptr; the rework matches the new style.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…unk 1 only)

icu4c/source/data/build.xml: Change CLDR_TMP_DIR from cldr-aux to
cldr-staging (MS-ICU pipeline uses cldr-staging as its CLDR temp
directory rather than vanilla CLDR's cldr-aux default).

Dropped hunk 2 (build-icu-data.xml): target file removed in ICU 73+;
the cldr-to-icu data build toolchain is now driven by config.xml +
Maven + Cldr2Icu.java rather than that Ant build script. The hunk's
intent (forceDelete=true; mvn->mvn.cmd for Windows) does not have a
direct landing zone in the new toolchain. If Maven-on-Windows breaks
during Step 6 data build, fix at the new invocation point then.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Under ICU_DATA_DIR_WINDOWS, make extendICUData() return early (false).
Windows OS ICU has only one data file (versionless icudtl.dat from
patch 000) and never has extended data; running the normal extension
path would try to load icudt78l.dat on top of the already-loaded common
data, creating redundant work or load conflicts.

Reworked: ICU 78 modernized FALSE -> false in this file; the new guard
matches that style.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Two Linux make-dist adjustments:

- DISTY_FILES strips DISTY_DOC_ZIP. MS-ICU does not build Doxygen docs
  in its release pipeline; the dist target would otherwise fail looking
  for a non-existent docs.zip.

- git archive path adapted for MS-ICU GitHub layout: cd ../.. (two
  levels up instead of one) and HEAD:icu/icu4c/ (extra icu/ prefix)
  because microsoft/icu has its icu4c tree at icu/icu4c/ rather than
  vanilla ICU's top-level icu4c/.

Reworked: ICU 78 fixed an upstream typo (we watn -> we want) and added
testdata/ copy logic to dist.mk (lines 72-73); both are preserved.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…regions

Remove test entries for blocked region codes (DG, EA, EH, IC, etc.)
from ICU's region tests. MS-ICU strips these codes from data via GeoPol
policy; the tests would otherwise fail looking up regions that no longer
exist in MS-ICU's region data.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…namicLink_UCRT

Statically link VCRuntime + VCStartup + STL into icuuc.dll / icuin.dll,
but keep UCRT (ucrtbase.dll) dynamic. This eliminates the VC Redist
dependency for consumers — Windows 10+ ships UCRT, but VCRuntime and
STL would otherwise require manual VC Redist install.

Mechanism (applied to both common.vcxproj and i18n.vcxproj, Debug +
Release blocks):
- RuntimeLibrary: MultiThreadedDebugDLL/MultiThreadedDLL -> MultiThreaded
  Debug/MultiThreaded (compiler switches to static C++ runtime).
- IgnoreSpecificDefaultLibraries=libucrtd.lib;libucrt.lib (linker drops
  the static UCRT pulled in by /MT[d]).
- /DEFAULTLIB:ucrt[d].lib via AdditionalOptions (force the dynamic UCRT).

Reworked: ICU 78 already uses $(IcuMajorVersion) macro for DLL names
(unchanged by this patch). Verified no arch-specific overrides exist —
only generic Debug/Release ItemDefinitionGroups — so the fix applies
uniformly to x86, x64, and ARM64.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add ICU major version number to PDB (debug symbol) filenames so they
match the DLL filenames. Prevents PDB filename collision when two ICU
versions are deployed side-by-side and lets debuggers correctly
correlate symbols across versions.

Files updated: common.vcxproj, i18n.vcxproj, stubdata.vcxproj.
PDBs become icuuc78.pdb / icuuc78d.pdb, icuin78.pdb / icuin78d.pdb,
icudt78.pdb (matching the existing icuuc78.dll / icuuc78d.dll /
icuin78.dll / icuin78d.dll / icudt78.dll naming).

Reworked: Used $(IcuMajorVersion) MSBuild macro rather than hardcoding
78. This is the same pattern ICU 78's <OutputFile> tags already use
in these vcxproj files, so future ICU upgrades won't need to touch
these strings.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…tra_locales

Bump STRING_STORE_SIZE from 100000 to 120000 in package.h. The package
tool uses this as a static buffer for item names when building the .dat
data file; CLDR-MS adds extra locales that overflow the vanilla 100K
buffer.

Applied verbatim from the patch (120000). If CLDR 48 + MS-CLDR overlay
overflows this in the Step 6 data build, bump further (see followup
todo).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…locale_as_full_BCP47_tag

Add MS-only uprefs library so uprv_getDefaultLocaleID() on Windows
returns a full BCP47 tag (e.g. en-US-u-ca-gregory-hc-h12-fw-mon-ms-metric)
that encodes the user's calendar, currency, hour cycle, first day of
week, sort method, and measurement system from Windows Globalization
APIs. Vanilla ICU only returns language+region.

Files:
- new uprefs.cpp/h    (common library, gated on UCONFIG_USE_WINDOWS_PREFERENCES_LIBRARY
                        and U_PLATFORM_USES_ONLY_WIN32_API)
- new uprefstest.cpp/h
- putil.cpp           wires uprefs_getBCP47Tag() into uprv_getDefaultLocaleID();
                      unifies buffer sizing (POSIX_LOCALE_CAPACITY -> length * 2)
- uconfig.h           defines UCONFIG_USE_WINDOWS_PREFERENCES_LIBRARY = 1
- sources.txt         lists uprefs.cpp
- common.vcxproj
- common_uwp.vcxproj  reference uprefs.cpp/h
- test/intltest/      Makefile.in + intltest.vcxproj wire uprefstest.{cpp,h};
                      itutil.cpp registers UPrefsTest class

Reworked from ICU 72 form:
- 8 hunks applied cleanly via git apply (with offsets only)
- 5 build-system list-insertion hunks reworked manually due to context drift
  (ICU 78 added fixedstring.cpp, new test files between the patch's anchor
  lines)
- New file uprefstest.cpp uses backup version from ICU 72.1.0.4 (commit
  860c2ea by Rahul Pandey, "Add missing parameters to MockGetLocaleInfoEx"
  Nov 2022) which contains style/whitespace cleanup not present in the
  original 2021 patch file. Patch file in icu-patches/patches/ remains stale
  and will be regenerated at end of upgrade.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Patch 018 originally bumped this from 100000 -> 120000 for the CLDR-MS
extra-locales overflow at the package-tool stage of data build. CLDR 48
has substantially more locales than CLDR 44 (which 120000 was sized for);
prior session evidence (now-deleted branch) suggests 120000 may still
overflow at Step 6 data build.

Pre-emptively bump to 200000 to avoid a Step 6 rerun on overflow. If the
actual measurement at Step 6 shows 120000 was enough, we can revisit
post-shipping. Bumping high now is harmless (static array sized at
compile time in tool; trivial RAM increase only while makedata runs).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ry_validation

The new ICU 78 Cldr2Icu Maven/Java toolchain replaces the deleted
build-icu-data.xml Ant entry point. Its CLI-options constructor runs
validateEnvironment() and System.exit(1)s unless ICU_DIR contains an
icu4j/ subdirectory. The microsoft/icu fork is icu4c-only -- there is
no icu4j source tree -- so the Step 6 data-build pipeline cannot run
without bypassing this check.

The runtime Java dependency on icu4j (used by TransformsMapper for
Transliterator) is satisfied via the Maven artifact com.ibm.icu:icu4j
in ~/.m2, which is unaffected by source-tree absence.

This is an MS-only divergence. Both the patch file and the applied
source change land here as a single commit; the patch file documents
the divergence in icu-patches/patches/ for future upgrades.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use the Unicode U+20C1 SAUDI RIYAL SIGN for the ar-SA SAR currency symbol and add a C API regression test for the locale-specific override.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant