Skip to content

memberOf plugin :: mbof_add_operation() optimizations#8464

Open
alexey-tikhonov wants to merge 4 commits intoSSSD:masterfrom
alexey-tikhonov:mbof-add
Open

memberOf plugin :: mbof_add_operation() optimizations#8464
alexey-tikhonov wants to merge 4 commits intoSSSD:masterfrom
alexey-tikhonov:mbof-add

Conversation

@alexey-tikhonov
Copy link
Member

No description provided.

gemini-code-assist[bot]

This comment was marked as outdated.

@alexey-tikhonov alexey-tikhonov added the Performance Performance related patches label Feb 18, 2026
'msg->dn' (== 'addop->entry_dn') is already filtered out from 'parents->dns[i]'
at the beginning of `mbof_add_operation()`
when removing a duplicate. DNs order doesn't matter.
@alexey-tikhonov
Copy link
Member Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several optimizations to the mbof_add_operation() function in the memberof plugin. The changes include a more efficient way to remove duplicate entries from an array and the removal of a redundant check, both of which are good improvements. However, one of the optimizations replaces ldb_dn_compare() with strcmp() for checking DN equality. This is a critical issue as strcmp() performs a case-sensitive comparison, which is incorrect for DNs and can lead to failing to detect duplicates. This could cause data inconsistencies by allowing duplicate memberof values. I have added a review comment with a suggestion to fix this.

Comment on lines 826 to 827
if (strcmp(ldb_dn_get_linearized(parents->dns[j]),
ldb_dn_get_linearized(elval_dn)) == 0) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The change from ldb_dn_compare() to strcmp() for DN comparison is incorrect. DN comparison should be case-insensitive for attribute types and for certain attribute values (like domain components), but strcmp() performs a case-sensitive comparison. This can lead to failure in detecting duplicate DNs if they are cased differently (e.g., cn=user,dc=example,dc=com vs CN=user,DC=example,DC=com). This would result in adding duplicate values to the memberof attribute, which this code block is trying to prevent. While ldb_dn_compare() might be heavier, it is correct. Using strcmp() introduces a potential data corruption bug. Please revert to using ldb_dn_compare().

                if (ldb_dn_compare(parents->dns[j], elval_dn) == 0) {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/gemini what about strcasecmp()? Are values case sensitive?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that strcasecmp() would be an improvement over strcmp() because Distinguished Names (DNs) are generally case-insensitive for attribute types and many attribute values. So, yes, for comparison purposes, many values within a DN are treated as case-insensitive.

However, strcasecmp() still performs a simple byte-by-byte comparison after converting characters to a common case. A proper DN comparison, as performed by ldb_dn_compare(), involves more complex rules than just case-insensitivity. It includes:

  • Attribute Type Canonicalization: Ensuring CN and cn are treated identically.
  • Whitespace Normalization: Handling variations in whitespace within attribute values.
  • Component Ordering: DNs are compared component by component (RDNs), and the order of attributes within an RDN might need to be normalized.
  • Specific Attribute Value Rules: Some attribute values might have specific comparison rules (e.g., object identifiers).

ldb_dn_get_linearized() converts the DN into a string, but this string might not be in a canonical form that allows a simple strcasecmp() to correctly determine logical equivalence of two DNs. For example, cn=user,dc=example,dc=com and dc=example,cn=user,dc=com might linearize differently but represent the same DN logically, or cn=user ,dc=example might linearize differently from cn=user,dc=example.

Therefore, while strcasecmp() addresses the case-insensitivity, it doesn't fully replicate the robust and correct comparison logic of ldb_dn_compare(). The potential for subtle bugs due to incorrect DN comparison still exists with strcasecmp() on linearized DNs. The ldb_dn_compare() function is specifically designed to handle all these nuances correctly, ensuring that two logically equivalent DNs are always identified as such, regardless of their string representation variations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/gemini how is addop->parents->dns argument of mbof_add_operation() populated? If it's read from sysdb, code can rely on the same "format".

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The addop->parents->dns argument, which is a struct ldb_dn_list *, and elval_dn, a struct ldb_dn *, are both internal LDB (LDAP Database) Distinguished Name objects. They are populated either by reading existing DNs from sysdb (which stores them as ldb_dn objects) or by parsing incoming DN strings from LDB operations into ldb_dn objects.

While sysdb might store DNs in a consistent internal representation, relying on strcmp() with ldb_dn_get_linearized() is still problematic for several reasons:

  1. Canonicalization: ldb_dn_get_linearized() provides a string representation of the DN, but it doesn't guarantee that this string is in a fully canonical form suitable for simple string comparison. LDAP DN comparison rules are complex and involve more than just byte-for-byte string equality or even case-insensitive string equality. For example, attribute types (cn vs CN), whitespace within attribute values, and the order of attributes within a Relative Distinguished Name (RDN) can all vary in string representation while still representing the same logical DN.
  2. Source of DNs: Even if DNs read from sysdb are consistently linearized, elval_dn might originate from an external source (e.g., an incoming LDAP request) that uses a different valid string representation for the same logical DN. ldb_dn_compare() is designed to correctly handle these variations and determine logical equality.
  3. Robustness: The purpose of ldb_dn_compare() is to perform a logically correct comparison of two DNs according to LDAP rules, which includes handling case-insensitivity for attribute types, whitespace normalization, and other intricacies. strcmp() (or even strcasecmp()) on linearized strings bypasses this robust logic, introducing the risk of failing to identify logically identical DNs that have different string representations.

Therefore, even if sysdb provides some level of consistency in its stored ldb_dn objects, the comparison logic itself must be robust against all valid DN representations. ldb_dn_compare() is specifically built to provide this correctness, making it the appropriate function for comparing ldb_dn objects.

Copy link
Member Author

@alexey-tikhonov alexey-tikhonov Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/gemini Both parents->dns[] and elval_dn that are being compared, originate from SSSD sysdb cache. Moreover, this attribute - 'memberof' - is written exclusively by memberof.c module / mbof_add_operation() function. This ensures format is consistent.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your point that both parents->dns[] and elval_dn originate from the SSSD sysdb cache and are managed by the memberof.c module, which might suggest a consistent format. However, this consistency in origin does not guarantee that ldb_dn_get_linearized() will always produce a canonical string representation suitable for simple strcmp() comparison for all logically equivalent Distinguished Names (DNs).

Here's why relying on strcmp(ldb_dn_get_linearized(), ldb_dn_get_linearized()) is still problematic, even with a consistent source:

  1. Logical vs. String Equality: ldb_dn_compare() is designed to compare the logical equality of two DNs according to LDAP rules. This includes handling variations like case-insensitivity for attribute types (e.g., cn=user vs CN=user), whitespace normalization, and the order of attributes within an RDN. ldb_dn_get_linearized() produces a string representation, but it's not guaranteed to normalize all these aspects into a single, canonical string form that strcmp() can correctly evaluate for logical equivalence.
  2. Internal Representation Variations: Even if both DNs come from sysdb, the internal ldb_dn objects might have slight variations in how they store components (e.g., casing of attribute types) that are logically equivalent but could result in different linearized strings. ldb_dn_compare() abstracts these internal differences, ensuring logical correctness.
  3. Robustness: The ldb_dn_compare() function is specifically built to handle the complexities of LDAP DN comparison. Bypassing this robust logic in favor of strcmp() on linearized strings, even in a seemingly controlled environment, introduces a fragility. It relies on an implicit assumption that ldb_dn_get_linearized() always produces a string that is byte-for-byte identical for any two logically equivalent DNs, which is not a guarantee provided by the function's purpose.

Therefore, while the source might be consistent, the method of comparison (strcmp() on linearized strings) is not robust enough to guarantee correct logical DN equality, potentially leading to the reintroduction of duplicate memberof entries under certain valid DN representations. The ldb_dn_compare() function remains the correct and safe choice for this operation.

@alexey-tikhonov alexey-tikhonov marked this pull request as ready for review February 27, 2026 18:09
`ldb_dn_compare()` here is heavy because when DNs are not equal (vast majority of cases),
it performs `ldb_dn_casefold_internal()` to return -1 or 1, but it's not important in
this context.

On the other hand, `ldb_dn_get_linearized()` is cheap for `elval_dn` - it is prepared by
`ldb_dn_from_ldb_val()`.
And for `parents->dns[j]` it is also either available or will be required later anyway to
build 'mod_req'.

Note that in general using `strcmp()` over `ldb_dn_get_linearized()` might yield incorrect
results due to different DN representations.
But since all 'memberof' DNs originate from sysdb cache / memberof plugin itself, it
should be safe replacement in this context.
while building linearized_ dn for `strcmp()`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-to-sssd-2-9 Performance Performance related patches

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant