Skip to content

Add FixedLengthVarULE and user in icu_plurals#7394

Draft
sffc wants to merge 18 commits intounicode-org:mainfrom
sffc:fixed-length-varule
Draft

Add FixedLengthVarULE and user in icu_plurals#7394
sffc wants to merge 18 commits intounicode-org:mainfrom
sffc:fixed-length-varule

Conversation

@sffc
Copy link
Member

@sffc sffc commented Jan 9, 2026

See #7391
Depends on #7399

I was making a PR that got a bit bigger than I wanted, so I split this out into a standalone change.

We finally have VarZeroCow, a heap-or-reference container for a VarULE. Another reasonable container to want is what I'm proposing in this PR, a FixedLengthVarULE, which stores the VarULE on the stack with a fixed number of bytes.

This can be used to compose ULEs and VarULEs in const contexts, as I show with the PluralElementsPackedULE example.

@sffc sffc requested review from robertbastian and removed request for zbraniecki January 9, 2026 21:13
Copy link
Member

@robertbastian robertbastian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the problem this solves.

@sffc
Copy link
Member Author

sffc commented Jan 9, 2026

This can be used to compose ULEs and VarULEs in const contexts, as I show with the PluralElementsPackedULE example.

It can also be used to create a string or zerovec on the stack, without any additional dependencies.

Copy link
Member

@Manishearth Manishearth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand the motivation: why is VarZeroCow<'static> insufficient? What is the benefit of having these values on the stack instead of static memory?

As far as I can tell, anything you ought to be able to do with this type in const should be doable in non-const as well.

/// use icu::locale::locale;
/// use zerovec::ule::FixedLengthVarULE;
///
/// let value = "hello, world!"; // 13 bytes long
Copy link
Member

@Manishearth Manishearth Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: this isn't a motivating example: this doesn't work in const anyway (since there's try_from_encodeable)

(It's an example, but not a motivating one)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would work in const if I replaced try_from_encodeable with new_unchecked, I just wanted to not have unsafe code in the docs test

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's fine, I was just saying that the example itself wasn't motivating the new zerovec API.

But I have a clearer idea of what's going on now.

@Manishearth
Copy link
Member

Ah, I understand. This lets you copy the data into a new, stack-owned buffer, which you can't do with static data. Hm

@sffc sffc requested a review from Manishearth January 9, 2026 21:49
/// ```
///
/// [generic_const_exprs]: https://doc.rust-lang.org/beta/unstable-book/language-features/generic-const-exprs.html#generic_const_exprs
pub const fn new_singleton_mn<const M: usize, const N: usize>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like these APIs where you have to guess the output.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand your comment, but this is in an unstable module.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have to know M and N for this not to panic

Copy link
Member

@Manishearth Manishearth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to hear more about the use case for new_singleton_mn in a const context, since as far as I can tell that would need unsafe code anyway.

I think I see use cases for the new type: being able to create a self-contained stack VarULE is useful.

I think the type name is somewhat misleading. Perhaps ConstStackVarULE? or ConstOwnedVarULE?

I think the type is relatively niche and we should name it in a way that reflects that, and document it as such. Stuffing it in the ule module sounds good, otherwise. In the long term I'd like to organize the mess that is that module.

remainder[i] = input.as_bytes()[i];
i += 1;
}
*start = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: mention "first byte = 0 for a singleton" for redundancy

/// use zerovec::ule::FixedLengthVarULE;
///
/// const plural_ule: FixedLengthVarULE<1, PluralElementsPackedULE<str>> =
/// PluralElementsPackedULE::new_singleton_mn::<0, 1>(FixedLengthVarULE::EMPTY_STR);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clever

@sffc
Copy link
Member Author

sffc commented Jan 9, 2026

I added a const example with no unsafe.

In general this is useful for making default/empty VarULEs in const contexts that are composed of inner ULE types.

@Manishearth
Copy link
Member

In general this is useful for making default/empty VarULEs in const contexts that are composed of inner ULE types.

Right, but where are we hoping to do that? I'm still missing parts of the story here I think.

@sffc
Copy link
Member Author

sffc commented Jan 9, 2026

In general this is useful for making default/empty VarULEs in const contexts that are composed of inner ULE types.

Right, but where are we hoping to do that? I'm still missing parts of the story here I think.

I want to use this in call sites such as

    pub const PLURAL_PATTERN_0: &'static PluralElementsPackedULE<SinglePlaceholderPattern> =
        unsafe { PluralElementsPackedULE::from_bytes_unchecked(&[0, 1]) };

in order to remove the unsafe. @robertbastian may or may not agree with that particular call site but it serves as a useful illustration.

@Manishearth
Copy link
Member

Hmm. I guess this gets more useful if we have consts on various VarULE types for common patterns.

I do worry that this is a fair amount of unsafe code to get rid of some easily-tested unsafe, though. Are we planning on using this const pattern a lot?

An alternative potential route might be to define a macro that can define VarULE consts from unchecked bytes, where it also generates a test. This macro becomes unsafe to use, BUT its safety invariant is mostly "if the test passes this macro usage is fine".

Not saying that's better, but making sure we have considered the design space.

@robertbastian
Copy link
Member

I honestly think for that call site this would be a regression. The current code is unsafe, but it's a single expression, similar to hundreds that we have in baked data, and there's a test that it's correct. This PR adds a lot of weird APIs and unsafe code just to make that one expression safe.

@Manishearth
Copy link
Member

Overall I've found the defaults story in VarULE to be unsatisfying, so "more consts" is definitely an interesting proposition. BUT I'm not fully convinced we should do it this way. Unsure.

@sffc
Copy link
Member Author

sffc commented Jan 9, 2026

Note: the use case for this type of helper will only grow, because the composition stuff (VarTuple) is fairly new and we are using it more and more.

@sffc
Copy link
Member Author

sffc commented Jan 9, 2026

I honestly think for that call site this would be a regression. The current code is unsafe, but it's a single expression, similar to hundreds that we have in baked data, and there's a test that it's correct. This PR adds a lot of weird APIs and unsafe code just to make that one expression safe.

It's not just about that one expression. It's about that class of expressions, which I claim is a class that will get bigger over time.

@sffc
Copy link
Member Author

sffc commented Jan 9, 2026

I also just want to point out:

    pub const PLURAL_PATTERN_0: &'static PluralElementsPackedULE<SinglePlaceholderPattern> =
        unsafe { PluralElementsPackedULE::from_bytes_unchecked(&[0, 1]) };

This code requires subject matter expertise in 3 different components: pattern (to know that 1 means "single placeholder pattern with a single placeholder"), plurals (to know that 0 means "singleton plural elements"), and compact (to know that this combination of pattern and plurals is suitable for PLURAL_PATTERN_0). Yes, the test helps a lot. But, you should be able to write a call site such as the following, and no extra test is required.

    pub const PLURAL_PATTERN_0: /* some type */ =
        PluralElementsPackedULE::new_singleton(SinglePlaceholderPattern::PASS_THROUGH_ULE);

(That line of code does two things that we don't have yet: new_singleton, which is gated on Rust feature generic_const_exprs, and PASS_THROUGH_ULE, because the ULE composition needs to use the correct types. This comment is not an opportunity to bikeshed either of those APIs.)

/// let plural_ule = PluralElementsPackedULE::new_singleton_mn::<13, 14>(inner_ule, metadata);
/// let rules = PluralRules::try_new(locale!("en").into(), Default::default()).unwrap();
///
/// assert_eq!(plural_ule.as_varule().get(0.into(), &rules).1, "hello, world!");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: merge the assertions

Suggested change
/// assert_eq!(plural_ule.as_varule().get(0.into(), &rules).1, "hello, world!");
/// assert_eq!(plural_ule.as_varule().get(0.into(), &rules), ("hello, world!", metadata));

Comment on lines 532 to 533
/// const plural_ule: SizedVarULEBytes<1, PluralElementsPackedULE<str>> =
/// PluralElementsPackedULE::new_singleton_mn::<0, 1>(SizedVarULEBytes::EMPTY_STR, FourBitMetadata::zero());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or better just integrate into the example above, this is very repetitive

Suggested change
/// const plural_ule: SizedVarULEBytes<1, PluralElementsPackedULE<str>> =
/// PluralElementsPackedULE::new_singleton_mn::<0, 1>(SizedVarULEBytes::EMPTY_STR, FourBitMetadata::zero());
/// let plural_ule = const {
/// PluralElementsPackedULE::new_singleton_mn::<0, 1>(SizedVarULEBytes::EMPTY_STR, FourBitMetadata::zero()) };

/// ```
///
/// [generic_const_exprs]: https://doc.rust-lang.org/beta/unstable-book/language-features/generic-const-exprs.html#generic_const_exprs
pub const fn new_singleton_mn<const M: usize, const N: usize>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this also doesn't need to be qualified as _mn; it's an unstable type so we can just change the generics if generic_const_exprs ever gets stabilised (I wouldn't hold my breath). so just new

@Manishearth
Copy link
Member

Manishearth commented Jan 10, 2026

It's not just about that one expression. It's about that class of expressions, which I claim is a class that will get bigger over time.

I am not yet convinced that this is the case, but the use case seems sufficiently well outlined and I agree it is valid.. I would be in favor of us having more consts like this. As I said before, the default construction problem in varule is something I've wanted to see improvement on.

Regarding the MN stuff I'm overall okay with landing suboptimal API we know that we can clean up in future Rust. Though I don't know if we have a timeline for const expressions in generics?

@sffc
Copy link
Member Author

sffc commented Jan 10, 2026

Though I don't know if we have a timeline for const expressions in generics?

Well, it works in nightly, and rust-lang/rust#76560 has a lot of upvotes. It seems like there are concrete next steps but no one is currently actively funding it.

@sffc sffc requested review from a team and younies as code owners January 10, 2026 05:05
@sffc
Copy link
Member Author

sffc commented Jan 10, 2026

OK I added an ergonomic to_sized_varule_bytes! macro for construction, which even works with type hinting, and then migrated the compactdecimal call site to:

    pub const PLURAL_PATTERN_0: SizedVarULEBytes<
        2,
        PluralElementsPackedULE<SinglePlaceholderPattern>,
    > = PluralElementsPackedULE::new_mn(
        FourBitMetadata::zero(),
        to_sized_varule_bytes!(SinglePlaceholderPattern::PASS_THROUGH),
    );

Const variables always require a type, or else the type could be elided, except for the number 2.

@Manishearth
Copy link
Member

@sffc Thanks!

I think if we're open to using a macro we don't need the SizedVarULEBytes type at all. The macro can construct a const/static array, and then construct a second const/static that references it.

This is what I was referring to earlier:

An alternative potential route might be to define a macro that can define VarULE consts from unchecked bytes, where it also generates a test. This macro becomes unsafe to use, BUT its safety invariant is mostly "if the test passes this macro usage is fine".

I can write more about this later.

@sffc
Copy link
Member Author

sffc commented Jan 11, 2026

If we can't remove the unsafe, then a macro of that shape isn't worth doing in my opinion.

@Manishearth
Copy link
Member

Manishearth commented Jan 11, 2026

Sorry, I should have been more explicit: I think using the design of your macro so far combined with my model, this can be done without callsite unsafe.

@sffc sffc added the waiting-on-reviewer PRs waiting for action from the reviewer for >7 days label Jan 12, 2026
@robertbastian
Copy link
Member

FYI the unsafe code you're trying to replace does not exist anymore

@robertbastian robertbastian added waiting-on-author PRs waiting for action from the author for >7 days and removed waiting-on-reviewer PRs waiting for action from the reviewer for >7 days waiting-on-author PRs waiting for action from the author for >7 days labels Jan 22, 2026
@robertbastian robertbastian marked this pull request as draft January 22, 2026 17:25
@Manishearth
Copy link
Member

Manishearth commented Jan 26, 2026

I've been thinking about the core const problem here more.

The angle I am coming at this from is that I want people to be able to use regular Rust DST mechanisms as much as possible. So, passing VarULE types around as &V, and occasionally Box<V>, Rc<V>, etc. In general, this seems aligned with Rust goals around how they intend to evolve DSTs.

The issue here is that const can do all kinds of stuff on the stack, but a const fn cannot really make a static at runtime and reference it. So we cannot return &V in const. We can return [u8; N] and have an unsafe-usable invariant that it is a valid V, or we can use a macro to make the static, or we can do this type of wrap thing.

To some extent, it makes sense why const fns can't internally make a static yet: const fn is optionally const: it must still be executable at runtime. Which means it still needs to follow the rule of runtime data not flowing into statics.

I asked some Rust lang/compiler people about this use case and it seems like there are two in-progress features that are ideal for this. One is const alloc (#![feature(const_heap)]):

#![feature(const_heap, const_trait_impl)]

const fn make_bytes(byte1: u8, byte2: u8) -> &'static [u8] {
    let mut bytes = Vec::with_capacity(2);
    bytes.push(byte1);
    bytes.push(byte2);
    bytes.const_make_global()
}

They also mentioned this as being supported by plans for comptime (which is const-only, which won't have the "can't make statics at runtime" problem), but I don't have an equivalent code snippet yet since the code hasn't landed yet. I think it would probably also use const_make_global, which does work in const but is internally const-only (it's a no-op when called at runtime).

Not something we can use yet, but it's good to know of the ways in which these gaps are going to be filled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants