Skip to content

CharacterSet: Memory-related refinements #2026

Closed
chloe-yeo wants to merge 32 commits into
swiftlang:release/6.4.xfrom
chloe-yeo:pr/characterSet-changes
Closed

CharacterSet: Memory-related refinements #2026
chloe-yeo wants to merge 32 commits into
swiftlang:release/6.4.xfrom
chloe-yeo:pr/characterSet-changes

Conversation

@chloe-yeo

Copy link
Copy Markdown
Contributor

This PR introduces a method that fills a buffer for a specified plane of a BuiltInUnicodeScalarSet so that mutating methods in CharacterSet do not instantiate 8KB of Data that are immediately thrown away after that Data has been used for mutating another buffer via mutating methods such as union and intersection. The method is equivalent in functionality to the previous bitmap(forPlane:isInverted:) method.

Motivation:

This is a change that helps to reduce the memory usage of CharacterSet when trying to fill Annex planes of built-in CharacterSets.

Modifications:

Added a method bitmap(forPlane:isInverted:into:apply:) method and refactored bitmap(forPlane:isInverted:) to call into it to avoid duplicate logic.

Result:

There should be no behavioral change.

Testing:

All the existing unit tests should still pass.

jmschonfeld and others added 30 commits May 4, 2026 15:44
…iftlang#1944)

Co-authored-by: Thomas Krajacic <tkrajacic@users.noreply.github.com>
…-2026-05-06_09-47

Merge `release/6.3` into `main`
…-2026-05-11_10-14

Merge `release/6.4.x` into `main`
…-2026-05-13_09-54

Merge `release/6.4.x` into `main`
…-2026-05-14_09-47

Merge `release/6.4.x` into `main`
…-2026-05-15_09-54

Merge `release/6.4.x` into `main`
* Decimal: stop reading past endIndex when matching a multi-byte separator.

Prior to this patch, `stringViewContainsDecimalSeparator` walked `0..<decimalSeparator.count` and indexed into `utf8View` for each offset, but the caller only checked that the index of the first byte was inside the input.  With a separator of length > 1 and an input that happened to match the separator's first bytes but ended before the full separator, the loop walked past the input's `endIndex` and crashed:

```swift
Decimal._decimal(from: "1,".utf8,
                 decimalSeparator: ",,".utf8,
                 matchEntireString: false)
// Fatal error: String index is out of bounds
```

Now, it walks both views with `formIndex(after:)` and returns `false` as soon as the input is exhausted.

`Decimal._decimal` is exposed via `@_spi(SwiftCorelibsFoundation)`, so this is reachable from outside the module.

* Removed comment from the new `decimalParseTruncatedMultiByteSeparator` test.
…lang#1956)

* Move URL.FormatStyle and URL.ParseStrategy to swift-foundation now that URLComponents is available

1. Move URLFormatStyle.swift, URLParseStrategy.swift, and URL+UnicodeLookalikeTable.swift into FoundationInternationalization.

2. Migrate XCTest-based tests to Swift Testing.

Credit: @iCharlesHu

* Fix cmake build
…-2026-05-20_10-07

Merge `release/6.4.x` into `main`
…-2026-05-21_10-15

Merge `release/6.4.x` into `main`
* Add workflow step to validate CMake file lists

* Update CMakeLists.txt file lists
…ftlang#1990) (swiftlang#1998)

Co-authored-by: Tina L <49205802+itingliu@users.noreply.github.com>
…lang#1996)

The KeyedDecodingContainer.decodeIfPresent(_:forKey:configuration:) overloads
only checked contains(key) before delegating to decode, so a present key with
an explicit null value was forwarded to the configuration's decoder instead of
being treated as nil. That threw a valueNotFound error rather than returning
nil, which is inconsistent with the standard library's decodeIfPresent, with
the UnkeyedDecodingContainer configuration overloads, and with the
@CodableConfiguration property wrapper, all of which treat null as nil.

Add the missing decodeNil(forKey:) check so a null value yields nil.
* Create _FoundationInternationalizationData library

* Fix FOUNDATION_FRAMEWORK build failure

* Fix build failure

* Introduce copy of _CShimsMacros.h
The internal `Storage` enum's range-replacement subscript-setter for the
`.pair` case enumerated `(0, 0)`, `(0, 1)`, `(0, 2)`, `(1, 2)`, `(2, 2)` and trapped the default branch — but it never handled `(1, 1)`.  Inserting at the interior position of a two-element IndexPath via `path[1..<1] = newValue`, prior to this patch, crashes with `Fatal error: Range 1..<1 is out of bounds of count 2`, even though the analogous `(1, 1)` case is supported for `.single` and `.array`.

`Storage` is a private size-bucketed representation of one logical thing — an `Array<Int>`.  The `.single`/`.pair` cases are just compact spellings for arrays of length 1 and 2; the `.array` case is the general implementation.  Whatever the public `IndexPath` API exposes has to behave identically regardless of which bucket happens to hold the data.

Both `.single` and `.array` already handle the `(1, 1)` case - the latter because it handles every valid range, including interior insertions like `(1, 1)` on a 2-element path — via the generic `removeSubrange` + `insert` path).  That means a 2-element `IndexPath` stored as `.array([a, b])` already accepts `path[1..<1] = newValue`.

So this fix isn't changing semantics — it's making the `.pair` shortcut match the behaviour the `.array` fallback was already producing for the same input.
* Add pure-Swift _CalendarHebrew + parity suites

* Expand Hebrew calendar parity coverage; fix 7 bugs at edge cases

  Adds ~300 new probe dates across 13 edge-case topics (year-length regimes,
  Cheshvan/Kislev boundaries, full Metonic cycle, RH postponement, Adar I↔II,
  year/month boundaries, holidays, time-of-day, far past/future, week-of-year
  wrap, DST timezones, locale variations) plus a 64-case DST policy parity
  test against _CalendarGregorian.

  Fixes:
  1. ordinality(.weekday, in: .weekOfYear) ignored firstWeekday
  2. dateInterval(.yearForWeekOfYear).duration used Hebrew calendar year
  3. R&D post-hoc dehiyot disagreed with swift-foundation-icu's chained form
  4. dateComponents extraction policy (use secondsFromGMT for UTC→local)
  5. dateInterval(.hour/.minute/.second) re-construction policy
  6. dateComponents(_:from:to:) iteration: cumulative, not iterative
  7. utcDate must drop skippedTimePolicy at TZ-offset query (matches Gregorian)

  Hebcal regression skips a documented 385-day window at Hebrew year 5806
  where Hebcal (standard R&D) disagrees with swift-foundation-icu's chained
  dehiyot algorithm; PARITY mandates ICU as authoritative.

* Add bugs swiftlang#8-swiftlang#9 fixes, expand Suite B to ~300 dates, move DST policy tests

  - Bug swiftlang#8: split multi-field date(byAdding:) into sequential year-then-month
    operations (matches _CalendarGregorian's per-field decompose-adjust-clamp).
    Kislev 30 + 1 year landing in a deficient year now clamps to 29 before
    the month-add runs.
  - Bug swiftlang#9: nanosecond extraction simplified — single fractional subtraction
    matching _CalendarGregorian's truncation, replacing chained subtractions
    with FP rounding error.
  - Suite B expanded to ~300 dates across 11 topic-specific tests, mirroring
    Suite A's coverage through the public Calendar API.
  - Moved utcDate_allPolicyCombinations_matchGregorian and
    date_from_hebrewVsGregorian_atDSTBoundaries from
    FoundationEssentialsTests/HebrewCalendarTests.swift to
    FoundationInternationalizationTests/HebrewDSTPolicyParityTests.swift.
    IANA TimeZone identifiers require _TimeZoneICU (linked via dynamic
    replacement from FoundationInternationalization), so these tests
    silently failed in the Essentials target.
  - Added _CalendarHebrew.nextDate(after:matching:) as a proof-of-concept
    fast-path for {month, day} patterns. Not wired into _CalendarProtocol;
    Calendar.enumerateDates dispatches through Calendar_Enumerate.swift's
    generic framework which has no way to reach it.

* Add fast-path nextDate(after:matching:direction:) protocol method

  Adds an optional fast-path on _CalendarProtocol that allows calendar
  implementations to answer Calendar.nextDate / Calendar.enumerateDates
  directly when they can compute the target in O(1), bypassing the
  generic month-loop in Calendar_Enumerate.swift.

  The default protocol extension returns nil, so all existing calendars
  (_CalendarICU, _CalendarGregorian, _CalendarChinese, etc.) continue
  using the existing framework path unchanged. Only _CalendarHebrew
  opts in.

  Wiring (Calendar.swift):
  - Calendar.nextDate(after:matching:matchingPolicy:repeatedTimePolicy:direction:)
    consults _calendar.nextDate(...) when policies are at their defaults
    (.nextTime + .first), falling through to enumerateDates otherwise.
  - Calendar.enumerateDates(...) does the same: if the calendar can answer
    the first match, drive the block via repeated nextDate calls; else
    use the generic framework.

  Hebrew fast paths (Calendar_Hebrew.swift) — recognized patterns:
  1. {month, day, h?, m?, s?, ns?} — annual recurrence (e.g. Hanukkah,
     Passover; with optional time-of-day preserved).
  2. {month, h?, m?, s?, ns?} — month-only (treated as day=1).
  3. {day, h?, m?, s?, ns?} — month-walking (e.g. Rosh Chodesh, the 1st
     of every Hebrew month).
  4. {weekday, h?, m?, s?, ns?} — weekday RD-modular arithmetic. Same-
     weekday inputs always step a full ±7 days to match ICU's nextWeekend
     semantics; we don't try to be clever about same-day-with-later-time.

  Any other component combination (era, year, weekdayOrdinal, weekOf*,
  yearForWeekOfYear, dayOfYear, mixed weekday+month) returns nil and
  falls through.

  Performance (Intel iMac, debug, GMT, _CalendarHebrew vs _CalendarICU
  through public Calendar.enumerateDates):

    {m,d}        Hanukkah        1,447 µs/match → 2 µs   (723×)
    {m,d,h,m,s}  Hanukkah 18:30  1,910 µs/match → 2 µs   (955×)
    {day:1}      Rosh Chodesh    1,570 µs/match → 2 µs   (785×)
    {month:1}    Tishri 1          186 µs/match → 2 µs    (93×)
    {weekday:7}  Saturdays         432 µs/match → 1 µs   (432×)

  Correctness verified against ICU's enumerateDates as ground truth:
  9 patterns × 50–100 matches each, 0 divergences. Full Hebrew suite
  (49 tests across 7 suites) passes; 1,386/1,386 full Foundation tests
  pass.

* Extend Hebrew fast-path to {month, weekday, weekdayOrdinal} + cache YearData

  - Recognize {m, wd, wdOrd} pattern in _CalendarHebrew.nextDate (e.g. "4th
    Thursday of November"). O(1) Hebrew arithmetic per candidate year, with
    year iteration to honor strict-after-input, leap-only Adar I, and
    out-of-range ordinals. Negative ordinals fall through to the generic
    framework to match ICU's enumerate contract.

  - Add single-slot YearData cache in HebrewArithmetic. Route call sites
    through it; skip caching inside hebrewFromFixed's year-approximation
    loop (would thrash a 1-slot cache).

* Update Calendar_Hebrew for Swift 6.4: LockedState → Mutex, fix warning

* Remove diagnostic probe files and icu4swift references

* Address PR swiftlang#1953 review feedback: remove unused imports, fix DST tests, add benchmarks

* Address PR swiftlang#1953 review feedback: trim verbose comments, add TODO for shared weekend logic

* Fix missing `floor` and add a feature flag

* Move ICU-dependent Hebrew calendar tests to FoundationInternationalizationTests

FoundationEssentialsTests does not link FoundationInternationalization, so
Calendar(identifier: .hebrew) resolves to _calendarICUClass() returning nil
there, causing a SIGSEGV. Move the three tests that use Calendar(identifier:
.hebrew) to a new HebrewCalendarICUTests.swift in
FoundationInternationalizationTests where the ICU backing is available.

---------

Co-authored-by: Tina Liu <tinaliu@apple.com>
* Add a CONTRIBUTION_GUIDELINE.md for coding style and testing practices that we use in this repo.

* Add mention of exit testing
* Reapply "Add Swift Hebrew calendar implementation  (swiftlang#1953)" (swiftlang#2015)

This reverts commit c15f44a.

* Add Hebrew Calendar file to cmakelist
…-2026-06-03_10-34

Merge `release/6.4.x` into `main`
…-2026-06-04_10-18

Merge `release/6.4.x` into `main`
chloe-yeo added 2 commits June 9, 2026 10:51
cleanup 2

clean up 3

clean up 3

cleanup 4

cleanup 4

cleanup

add comments + cleanup

cleanup SetAlgebra file

add comment about safety of immortal pointer

refactor to avoid logic duplication

add fatalError

add check for cachedBMP

explicitly copy Data for built-in sets Data that was initialized by bytesNoCopy

streamline bitmapAll & bitmapEmpty cases handling + callsites

cleanup

remove conditional checking

refactor bitmapBacked init

deduplication

address comments

cleanup

add test for copy

add unit test for union and intersection

add tests + address warnings

address comment

fix

cleanup

add comment + test

add release

only call makeBitmap() inside .bitmapFilled cases

avoid busy work + add union test for additional validation

preserve existing behavior

recover previous method
@chloe-yeo chloe-yeo requested a review from a team as a code owner June 9, 2026 17:52
@chloe-yeo

Copy link
Copy Markdown
Contributor Author

@swift-ci please test

@chloe-yeo chloe-yeo changed the base branch from main to release/6.4.x June 9, 2026 18:48
@chloe-yeo

Copy link
Copy Markdown
Contributor Author

@swift-ci please test

@chloe-yeo chloe-yeo closed this Jun 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants