memberOf plugin :: mbof_add_operation() optimizations#8464
memberOf plugin :: mbof_add_operation() optimizations#8464alexey-tikhonov wants to merge 4 commits intoSSSD:masterfrom
Conversation
e538934 to
9b17f9b
Compare
'msg->dn' (== 'addop->entry_dn') is already filtered out from 'parents->dns[i]' at the beginning of `mbof_add_operation()`
when removing a duplicate. DNs order doesn't matter.
9b17f9b to
e35eab4
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces several optimizations to the mbof_add_operation() function in the memberof plugin. The changes include a more efficient way to remove duplicate entries from an array and the removal of a redundant check, both of which are good improvements. However, one of the optimizations replaces ldb_dn_compare() with strcmp() for checking DN equality. This is a critical issue as strcmp() performs a case-sensitive comparison, which is incorrect for DNs and can lead to failing to detect duplicates. This could cause data inconsistencies by allowing duplicate memberof values. I have added a review comment with a suggestion to fix this.
src/ldb_modules/memberof.c
Outdated
| if (strcmp(ldb_dn_get_linearized(parents->dns[j]), | ||
| ldb_dn_get_linearized(elval_dn)) == 0) { |
There was a problem hiding this comment.
The change from ldb_dn_compare() to strcmp() for DN comparison is incorrect. DN comparison should be case-insensitive for attribute types and for certain attribute values (like domain components), but strcmp() performs a case-sensitive comparison. This can lead to failure in detecting duplicate DNs if they are cased differently (e.g., cn=user,dc=example,dc=com vs CN=user,DC=example,DC=com). This would result in adding duplicate values to the memberof attribute, which this code block is trying to prevent. While ldb_dn_compare() might be heavier, it is correct. Using strcmp() introduces a potential data corruption bug. Please revert to using ldb_dn_compare().
if (ldb_dn_compare(parents->dns[j], elval_dn) == 0) {There was a problem hiding this comment.
/gemini what about strcasecmp()? Are values case sensitive?
There was a problem hiding this comment.
You're right that strcasecmp() would be an improvement over strcmp() because Distinguished Names (DNs) are generally case-insensitive for attribute types and many attribute values. So, yes, for comparison purposes, many values within a DN are treated as case-insensitive.
However, strcasecmp() still performs a simple byte-by-byte comparison after converting characters to a common case. A proper DN comparison, as performed by ldb_dn_compare(), involves more complex rules than just case-insensitivity. It includes:
- Attribute Type Canonicalization: Ensuring
CNandcnare treated identically. - Whitespace Normalization: Handling variations in whitespace within attribute values.
- Component Ordering: DNs are compared component by component (RDNs), and the order of attributes within an RDN might need to be normalized.
- Specific Attribute Value Rules: Some attribute values might have specific comparison rules (e.g., object identifiers).
ldb_dn_get_linearized() converts the DN into a string, but this string might not be in a canonical form that allows a simple strcasecmp() to correctly determine logical equivalence of two DNs. For example, cn=user,dc=example,dc=com and dc=example,cn=user,dc=com might linearize differently but represent the same DN logically, or cn=user ,dc=example might linearize differently from cn=user,dc=example.
Therefore, while strcasecmp() addresses the case-insensitivity, it doesn't fully replicate the robust and correct comparison logic of ldb_dn_compare(). The potential for subtle bugs due to incorrect DN comparison still exists with strcasecmp() on linearized DNs. The ldb_dn_compare() function is specifically designed to handle all these nuances correctly, ensuring that two logically equivalent DNs are always identified as such, regardless of their string representation variations.
There was a problem hiding this comment.
/gemini how is addop->parents->dns argument of mbof_add_operation() populated? If it's read from sysdb, code can rely on the same "format".
There was a problem hiding this comment.
The addop->parents->dns argument, which is a struct ldb_dn_list *, and elval_dn, a struct ldb_dn *, are both internal LDB (LDAP Database) Distinguished Name objects. They are populated either by reading existing DNs from sysdb (which stores them as ldb_dn objects) or by parsing incoming DN strings from LDB operations into ldb_dn objects.
While sysdb might store DNs in a consistent internal representation, relying on strcmp() with ldb_dn_get_linearized() is still problematic for several reasons:
- Canonicalization:
ldb_dn_get_linearized()provides a string representation of the DN, but it doesn't guarantee that this string is in a fully canonical form suitable for simple string comparison. LDAP DN comparison rules are complex and involve more than just byte-for-byte string equality or even case-insensitive string equality. For example, attribute types (cnvsCN), whitespace within attribute values, and the order of attributes within a Relative Distinguished Name (RDN) can all vary in string representation while still representing the same logical DN. - Source of DNs: Even if DNs read from sysdb are consistently linearized,
elval_dnmight originate from an external source (e.g., an incoming LDAP request) that uses a different valid string representation for the same logical DN.ldb_dn_compare()is designed to correctly handle these variations and determine logical equality. - Robustness: The purpose of
ldb_dn_compare()is to perform a logically correct comparison of two DNs according to LDAP rules, which includes handling case-insensitivity for attribute types, whitespace normalization, and other intricacies.strcmp()(or evenstrcasecmp()) on linearized strings bypasses this robust logic, introducing the risk of failing to identify logically identical DNs that have different string representations.
Therefore, even if sysdb provides some level of consistency in its stored ldb_dn objects, the comparison logic itself must be robust against all valid DN representations. ldb_dn_compare() is specifically built to provide this correctness, making it the appropriate function for comparing ldb_dn objects.
There was a problem hiding this comment.
/gemini Both parents->dns[] and elval_dn that are being compared, originate from SSSD sysdb cache. Moreover, this attribute - 'memberof' - is written exclusively by memberof.c module / mbof_add_operation() function. This ensures format is consistent.
There was a problem hiding this comment.
I understand your point that both parents->dns[] and elval_dn originate from the SSSD sysdb cache and are managed by the memberof.c module, which might suggest a consistent format. However, this consistency in origin does not guarantee that ldb_dn_get_linearized() will always produce a canonical string representation suitable for simple strcmp() comparison for all logically equivalent Distinguished Names (DNs).
Here's why relying on strcmp(ldb_dn_get_linearized(), ldb_dn_get_linearized()) is still problematic, even with a consistent source:
- Logical vs. String Equality:
ldb_dn_compare()is designed to compare the logical equality of two DNs according to LDAP rules. This includes handling variations like case-insensitivity for attribute types (e.g.,cn=uservsCN=user), whitespace normalization, and the order of attributes within an RDN.ldb_dn_get_linearized()produces a string representation, but it's not guaranteed to normalize all these aspects into a single, canonical string form thatstrcmp()can correctly evaluate for logical equivalence. - Internal Representation Variations: Even if both DNs come from
sysdb, the internalldb_dnobjects might have slight variations in how they store components (e.g., casing of attribute types) that are logically equivalent but could result in different linearized strings.ldb_dn_compare()abstracts these internal differences, ensuring logical correctness. - Robustness: The
ldb_dn_compare()function is specifically built to handle the complexities of LDAP DN comparison. Bypassing this robust logic in favor ofstrcmp()on linearized strings, even in a seemingly controlled environment, introduces a fragility. It relies on an implicit assumption thatldb_dn_get_linearized()always produces a string that is byte-for-byte identical for any two logically equivalent DNs, which is not a guarantee provided by the function's purpose.
Therefore, while the source might be consistent, the method of comparison (strcmp() on linearized strings) is not robust enough to guarantee correct logical DN equality, potentially leading to the reintroduction of duplicate memberof entries under certain valid DN representations. The ldb_dn_compare() function remains the correct and safe choice for this operation.
`ldb_dn_compare()` here is heavy because when DNs are not equal (vast majority of cases), it performs `ldb_dn_casefold_internal()` to return -1 or 1, but it's not important in this context. On the other hand, `ldb_dn_get_linearized()` is cheap for `elval_dn` - it is prepared by `ldb_dn_from_ldb_val()`. And for `parents->dns[j]` it is also either available or will be required later anyway to build 'mod_req'. Note that in general using `strcmp()` over `ldb_dn_get_linearized()` might yield incorrect results due to different DN representations. But since all 'memberof' DNs originate from sysdb cache / memberof plugin itself, it should be safe replacement in this context.
09e7d3d to
f363d76
Compare
while building linearized_ dn for `strcmp()`
No description provided.