Simplify pointer casts of pointer arithmetic#8836
Simplify pointer casts of pointer arithmetic#8836tautschnig wants to merge 1 commit intodiffblue:developfrom
Conversation
This permits simplifying, e.g., `(char*)(ptr + 1) + -1` to `(char*)ptr` when `ptr` is `unsigned char*`. This, in turn, avoids different formulae produced on AArch64 (where `char` is unsigned) and Apple Silicon/macOS (where `char` is signed, as is the case on x86).
There was a problem hiding this comment.
Pull request overview
This PR extends the expression simplifier to normalize pointer-to-pointer casts around pointer arithmetic, enabling cast-pushdown and offset scaling based on element-size ratios to avoid target-dependent formula differences (e.g., due to char signedness on different architectures).
Changes:
- Add a new
simplify_typecastrewrite that pushes pointer typecasts into pointer+expressions when element-size ratios are integral. - Scale integer offset operands by
sizeof(inner)/sizeof(outer)and simplify the resulting+expression.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| step.has_value() && *step != 0 && new_step.has_value() && | ||
| *new_step != 0 && *step >= *new_step && *step % *new_step == 0) | ||
| { | ||
| const typet size_t_type(size_type()); |
There was a problem hiding this comment.
size_t_type is declared but never used in this new pointer-typecast rewrite branch. This will trigger an unused-variable warning (and can fail builds when warnings are treated as errors). Remove it or use it as intended (e.g., if operands are meant to be cast to size_t, apply it explicitly).
| const typet size_t_type(size_type()); |
| // Push a pointer typecast into pointer arithmetic | ||
| // (T)(ptr + int) ---> (T)ptr + sizeof(inner-st)/sizeof(outer-st)*int | ||
| // when the inner subtype's size is a multiple of the outer subtype's size | ||
| if( | ||
| expr_type.id() == ID_pointer && op_type.id() == ID_pointer && | ||
| expr.op().id() == ID_plus) | ||
| { | ||
| const auto step = | ||
| pointer_offset_size(to_pointer_type(op_type).base_type(), ns); | ||
| const auto new_step = | ||
| pointer_offset_size(to_pointer_type(expr_type).base_type(), ns); | ||
|
|
||
| if( | ||
| step.has_value() && *step != 0 && new_step.has_value() && | ||
| *new_step != 0 && *step >= *new_step && *step % *new_step == 0) | ||
| { | ||
| const typet size_t_type(size_type()); | ||
| auto new_expr = expr.op(); | ||
| new_expr.type() = expr.type(); | ||
|
|
||
| for(auto &op : new_expr.operands()) | ||
| { | ||
| if(op.type().id() == ID_pointer) | ||
| { | ||
| exprt new_op = simplify_typecast(typecast_exprt{op, expr.type()}); | ||
| op = std::move(new_op); | ||
| } | ||
| else if(*step > *new_step) | ||
| { | ||
| exprt new_op = simplify_mult( | ||
| mult_exprt{from_integer(*step / *new_step, op.type()), op}); | ||
| op = std::move(new_op); | ||
| } | ||
| } | ||
|
|
||
| return changed(simplify_plus(to_plus_expr(new_expr))); | ||
| } |
There was a problem hiding this comment.
This introduces new simplification behaviour for pointer-to-pointer casts over pointer arithmetic, but there doesn’t appear to be a unit/regression test exercising it. Please add a test (e.g., covering the motivating case (char*)(ptr + 1) + -1 -> (char*)ptr when ptr is unsigned char*, and at least one non-trivial element-size ratio case) to prevent regressions across architectures/char signedness.
There was a problem hiding this comment.
@copilot open a new pull request to apply changes based on this feedback
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #8836 +/- ##
========================================
Coverage 80.00% 80.00%
========================================
Files 1700 1700
Lines 188252 188274 +22
Branches 73 73
========================================
+ Hits 150613 150631 +18
- Misses 37639 37643 +4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This permits simplifying, e.g.,
(char*)(ptr + 1) + -1to(char*)ptrwhenptrisunsigned char*. This, in turn, avoids different formulae produced on AArch64 (wherecharis unsigned) and Apple Silicon/macOS (wherecharis signed, as is the case on x86).