Add FixedLengthVarULE and user in icu_plurals#7394
Add FixedLengthVarULE and user in icu_plurals#7394sffc wants to merge 18 commits intounicode-org:mainfrom
Conversation
robertbastian
left a comment
There was a problem hiding this comment.
I don't see the problem this solves.
It can also be used to create a string or zerovec on the stack, without any additional dependencies. |
Manishearth
left a comment
There was a problem hiding this comment.
I don't really understand the motivation: why is VarZeroCow<'static> insufficient? What is the benefit of having these values on the stack instead of static memory?
As far as I can tell, anything you ought to be able to do with this type in const should be doable in non-const as well.
| /// use icu::locale::locale; | ||
| /// use zerovec::ule::FixedLengthVarULE; | ||
| /// | ||
| /// let value = "hello, world!"; // 13 bytes long |
There was a problem hiding this comment.
issue: this isn't a motivating example: this doesn't work in const anyway (since there's try_from_encodeable)
(It's an example, but not a motivating one)
There was a problem hiding this comment.
It would work in const if I replaced try_from_encodeable with new_unchecked, I just wanted to not have unsafe code in the docs test
There was a problem hiding this comment.
Yeah, that's fine, I was just saying that the example itself wasn't motivating the new zerovec API.
But I have a clearer idea of what's going on now.
|
Ah, I understand. This lets you copy the data into a new, stack-owned buffer, which you can't do with static data. Hm |
components/plurals/src/provider.rs
Outdated
| /// ``` | ||
| /// | ||
| /// [generic_const_exprs]: https://doc.rust-lang.org/beta/unstable-book/language-features/generic-const-exprs.html#generic_const_exprs | ||
| pub const fn new_singleton_mn<const M: usize, const N: usize>( |
There was a problem hiding this comment.
I don't like these APIs where you have to guess the output.
There was a problem hiding this comment.
I don't fully understand your comment, but this is in an unstable module.
There was a problem hiding this comment.
you have to know M and N for this not to panic
Manishearth
left a comment
There was a problem hiding this comment.
I'd like to hear more about the use case for new_singleton_mn in a const context, since as far as I can tell that would need unsafe code anyway.
I think I see use cases for the new type: being able to create a self-contained stack VarULE is useful.
I think the type name is somewhat misleading. Perhaps ConstStackVarULE? or ConstOwnedVarULE?
I think the type is relatively niche and we should name it in a way that reflects that, and document it as such. Stuffing it in the ule module sounds good, otherwise. In the long term I'd like to organize the mess that is that module.
components/plurals/src/provider.rs
Outdated
| remainder[i] = input.as_bytes()[i]; | ||
| i += 1; | ||
| } | ||
| *start = 0; |
There was a problem hiding this comment.
issue: mention "first byte = 0 for a singleton" for redundancy
components/plurals/src/provider.rs
Outdated
| /// use zerovec::ule::FixedLengthVarULE; | ||
| /// | ||
| /// const plural_ule: FixedLengthVarULE<1, PluralElementsPackedULE<str>> = | ||
| /// PluralElementsPackedULE::new_singleton_mn::<0, 1>(FixedLengthVarULE::EMPTY_STR); |
|
I added a const example with no unsafe. In general this is useful for making default/empty VarULEs in const contexts that are composed of inner ULE types. |
Right, but where are we hoping to do that? I'm still missing parts of the story here I think. |
I want to use this in call sites such as pub const PLURAL_PATTERN_0: &'static PluralElementsPackedULE<SinglePlaceholderPattern> =
unsafe { PluralElementsPackedULE::from_bytes_unchecked(&[0, 1]) };in order to remove the unsafe. @robertbastian may or may not agree with that particular call site but it serves as a useful illustration. |
|
Hmm. I guess this gets more useful if we have consts on various VarULE types for common patterns. I do worry that this is a fair amount of unsafe code to get rid of some easily-tested unsafe, though. Are we planning on using this const pattern a lot? An alternative potential route might be to define a macro that can define VarULE consts from unchecked bytes, where it also generates a test. This macro becomes unsafe to use, BUT its safety invariant is mostly "if the test passes this macro usage is fine". Not saying that's better, but making sure we have considered the design space. |
|
I honestly think for that call site this would be a regression. The current code is unsafe, but it's a single expression, similar to hundreds that we have in baked data, and there's a test that it's correct. This PR adds a lot of weird APIs and unsafe code just to make that one expression safe. |
|
Overall I've found the defaults story in VarULE to be unsatisfying, so "more consts" is definitely an interesting proposition. BUT I'm not fully convinced we should do it this way. Unsure. |
|
Note: the use case for this type of helper will only grow, because the composition stuff (VarTuple) is fairly new and we are using it more and more. |
It's not just about that one expression. It's about that class of expressions, which I claim is a class that will get bigger over time. |
|
I also just want to point out: pub const PLURAL_PATTERN_0: &'static PluralElementsPackedULE<SinglePlaceholderPattern> =
unsafe { PluralElementsPackedULE::from_bytes_unchecked(&[0, 1]) };This code requires subject matter expertise in 3 different components: pattern (to know that 1 means "single placeholder pattern with a single placeholder"), plurals (to know that 0 means "singleton plural elements"), and compact (to know that this combination of pattern and plurals is suitable for pub const PLURAL_PATTERN_0: /* some type */ =
PluralElementsPackedULE::new_singleton(SinglePlaceholderPattern::PASS_THROUGH_ULE);(That line of code does two things that we don't have yet: |
components/plurals/src/provider.rs
Outdated
| /// let plural_ule = PluralElementsPackedULE::new_singleton_mn::<13, 14>(inner_ule, metadata); | ||
| /// let rules = PluralRules::try_new(locale!("en").into(), Default::default()).unwrap(); | ||
| /// | ||
| /// assert_eq!(plural_ule.as_varule().get(0.into(), &rules).1, "hello, world!"); |
There was a problem hiding this comment.
nit: merge the assertions
| /// assert_eq!(plural_ule.as_varule().get(0.into(), &rules).1, "hello, world!"); | |
| /// assert_eq!(plural_ule.as_varule().get(0.into(), &rules), ("hello, world!", metadata)); |
components/plurals/src/provider.rs
Outdated
| /// const plural_ule: SizedVarULEBytes<1, PluralElementsPackedULE<str>> = | ||
| /// PluralElementsPackedULE::new_singleton_mn::<0, 1>(SizedVarULEBytes::EMPTY_STR, FourBitMetadata::zero()); |
There was a problem hiding this comment.
or better just integrate into the example above, this is very repetitive
| /// const plural_ule: SizedVarULEBytes<1, PluralElementsPackedULE<str>> = | |
| /// PluralElementsPackedULE::new_singleton_mn::<0, 1>(SizedVarULEBytes::EMPTY_STR, FourBitMetadata::zero()); | |
| /// let plural_ule = const { | |
| /// PluralElementsPackedULE::new_singleton_mn::<0, 1>(SizedVarULEBytes::EMPTY_STR, FourBitMetadata::zero()) }; |
components/plurals/src/provider.rs
Outdated
| /// ``` | ||
| /// | ||
| /// [generic_const_exprs]: https://doc.rust-lang.org/beta/unstable-book/language-features/generic-const-exprs.html#generic_const_exprs | ||
| pub const fn new_singleton_mn<const M: usize, const N: usize>( |
There was a problem hiding this comment.
this also doesn't need to be qualified as _mn; it's an unstable type so we can just change the generics if generic_const_exprs ever gets stabilised (I wouldn't hold my breath). so just new
I am not yet convinced that this is the case, but the use case seems sufficiently well outlined and I agree it is valid.. I would be in favor of us having more consts like this. As I said before, the default construction problem in varule is something I've wanted to see improvement on. Regarding the MN stuff I'm overall okay with landing suboptimal API we know that we can clean up in future Rust. Though I don't know if we have a timeline for const expressions in generics? |
Well, it works in nightly, and rust-lang/rust#76560 has a lot of upvotes. It seems like there are concrete next steps but no one is currently actively funding it. |
|
OK I added an ergonomic pub const PLURAL_PATTERN_0: SizedVarULEBytes<
2,
PluralElementsPackedULE<SinglePlaceholderPattern>,
> = PluralElementsPackedULE::new_mn(
FourBitMetadata::zero(),
to_sized_varule_bytes!(SinglePlaceholderPattern::PASS_THROUGH),
);Const variables always require a type, or else the type could be elided, except for the number 2. |
|
@sffc Thanks! I think if we're open to using a macro we don't need the SizedVarULEBytes type at all. The macro can construct a const/static array, and then construct a second const/static that references it. This is what I was referring to earlier:
I can write more about this later. |
|
If we can't remove the unsafe, then a macro of that shape isn't worth doing in my opinion. |
|
Sorry, I should have been more explicit: I think using the design of your macro so far combined with my model, this can be done without callsite unsafe. |
|
FYI the unsafe code you're trying to replace does not exist anymore |
|
I've been thinking about the core The angle I am coming at this from is that I want people to be able to use regular Rust DST mechanisms as much as possible. So, passing VarULE types around as The issue here is that To some extent, it makes sense why I asked some Rust lang/compiler people about this use case and it seems like there are two in-progress features that are ideal for this. One is const alloc ( #![feature(const_heap, const_trait_impl)]
const fn make_bytes(byte1: u8, byte2: u8) -> &'static [u8] {
let mut bytes = Vec::with_capacity(2);
bytes.push(byte1);
bytes.push(byte2);
bytes.const_make_global()
}They also mentioned this as being supported by plans for Not something we can use yet, but it's good to know of the ways in which these gaps are going to be filled. |
See #7391
Depends on #7399
I was making a PR that got a bit bigger than I wanted, so I split this out into a standalone change.
We finally have
VarZeroCow, a heap-or-reference container for a VarULE. Another reasonable container to want is what I'm proposing in this PR, aFixedLengthVarULE, which stores the VarULE on the stack with a fixed number of bytes.This can be used to compose ULEs and VarULEs in const contexts, as I show with the PluralElementsPackedULE example.