Conversation
|
""" WalkthroughThe changes refactor and enhance date validation in Changes
Assessment against linked issues
Poem
Note ⚡️ AI Code Reviews for VS Code, Cursor, WindsurfCodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback. Note ⚡️ Faster reviews with cachingCodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 16th. To opt out, configure 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (2)
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
Gedcom551/GedcomStructure.cs (1)
478-512: Pre-compile the frequently-used regexes to boost performance
new Regex(...)is executed every timeIsValidDatePeriodruns, which happens indirectly for every DATE payload.
On large GEDCOM files this results in thousands of allocations and the regex engine recompiling the same pattern over and over.Declare the patterns once:
private static readonly Regex ToPeriodRegex = new(@"^TO " + DateRegex + "$", RegexOptions.Compiled); private static readonly Regex FromToPeriodRegex = new(@"^FROM " + DateRegex + " TO " + DateRegex + "$", RegexOptions.Compiled); private static readonly Regex FromPeriodRegex = new(@"^FROM " + DateRegex + "$", RegexOptions.Compiled);and reuse them. Do the same in
IsValidDateRange,IsValidDateApproximated,IsValidDateValue.This is a low-risk change (no functional impact) that measurably reduces GC churn and CPU.
Tests/ValidationTests.cs (1)
1061-1068: Recommend adding a symmetric negative test for “/99” overflowYou already added several negative cases (
1699/01,1699/0, …).
Please also add something likeValidateInvalidDateValuePayload("1999/100")(or/99where the base year isn’t one less) to ensure the(year + 1) % 100 == altYearrule is enforced.This will protect against future regressions if the comparison logic is touched.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting
📒 Files selected for processing (2)
Gedcom551/GedcomStructure.cs(3 hunks)Tests/ValidationTests.cs(3 hunks)
🔇 Additional comments (1)
Tests/ValidationTests.cs (1)
1053-1056: Great addition – positive coverage for dual-dating edge-casesThe three new assertions (
1699/00, with/without month/day) exercise the exact, tricky logic introduced inYearRegexandIsValidDateRegex. 👍
| private static bool IsValidDateValue(string value) | ||
| { | ||
| // Check for a valid date period. | ||
| if (IsValidDatePeriod(value)) | ||
| { | ||
| return true; | ||
| } | ||
|
|
||
| // Check for a valid dateRange. | ||
| if (IsValidDateRange(value)) | ||
| { | ||
| return true; | ||
| } | ||
|
|
||
| // Check for a valid dateApprox. | ||
| if (IsValidDateApproximated(value)) | ||
| { | ||
| return true; | ||
| } | ||
|
|
||
| // Check for a valid date. | ||
| // This must be done after the other checks so that we don't try to parse | ||
| // a keyword like "BEF" or "FROM" as a month. | ||
| regex = new Regex("^" + DateRegex + "$"); | ||
| match = regex.Match(value); | ||
| var regex = new Regex("^" + DateRegex + "$"); | ||
| Match match = regex.Match(value); | ||
| if (match.Success) | ||
| { | ||
| string calendar = match.Groups[2].Value; | ||
| uint day = match.Groups[5].Success ? uint.Parse(match.Groups[5].Value) : 0; | ||
| string month = match.Groups[6].Value; | ||
| uint year = uint.Parse(match.Groups[9].Value); | ||
| string epoch = match.Groups[11].Value; | ||
| return IsValidDate(calendar, day, month, year, epoch); | ||
| return IsValidDateRegex(match.Groups, 2); | ||
| } | ||
|
|
||
| return false; |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Magic constants 2 / 16 leak internal regex structure
IsValidDateRegex(groups, 2) / (..., 16) rely on the fact that one DateRegex contributes exactly 14 capturing groups. If any inner group is changed (see first comment) all these call-sites become wrong and bugs slip in silently.
After switching to named groups this entire parameter can be removed; until then, at least replace the literals with const int CalendarGroupIndex = 2; and const int SecondCalendarGroupIndex = CalendarGroupIndex + GroupsPerDate to make the dependency obvious.
This safeguards maintainability without affecting behaviour.
🤖 Prompt for AI Agents
In Gedcom551/GedcomStructure.cs around lines 600 to 630, the method
IsValidDateValue uses magic constants 2 and 16 as group indices in calls to
IsValidDateRegex, which depend on the internal structure of the DateRegex
capturing groups. To fix this, define descriptive constant integers like
CalendarGroupIndex = 2 and SecondCalendarGroupIndex = CalendarGroupIndex +
GroupsPerDate, then replace the literal numbers with these constants in all
relevant calls. This will make the code more maintainable and prevent silent
bugs if the regex structure changes.
| private const string DayRegex = @"((\d{1,2}) )?"; | ||
| private const string YearRegex = @"(\d{1,4})(/(\d{2}))?"; | ||
| private const string DateRegex = @"(" + Calendar551Regex + @" )?(" + DayRegex + MonthRegex + @" )?" + YearRegex + @"( " + Epoch551Regex + @")?"; | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Replace needless capturing with non-capturing (?: … ) and/or named groups
DayRegex, YearRegex and the re-composed DateRegex introduce 14 positional capturing groups per date occurrence.
Most of these captures (e.g. the TagCharRegex inside the * / + repetitions) are never read – they only bloat the GroupCollection, force every caller to keep the brittle magic indices (offset+11, offset+13, …) and make any future tweak extremely error-prone (change the inner regex ⇒ all indices shift).
Consider:
- turning purely structural groups into non-capturing groups
(?: … ) - giving the few required groups explicit names and reading them via
match.Groups["year"],["altYear"], …
That will:
• shrink the number of groups,
• get rid of the hard-coded offsets (2, 16, 3 …),
• make the intent self-documenting and resilient to later changes.
🤖 Prompt for AI Agents
In Gedcom551/GedcomStructure.cs around lines 464 to 467, the regex patterns
DayRegex, YearRegex, and DateRegex use many unnecessary capturing groups that
increase the GroupCollection size and require brittle positional indices.
Refactor these regexes by replacing all structural capturing groups with
non-capturing groups (?: … ) and assign explicit names to the essential
capturing groups. Then update the code to access these groups by their names
instead of numeric indices to improve readability, maintainability, and reduce
errors from future regex changes.
| { | ||
| int offset = calendarIndex - 2; | ||
| string calendar = groups[offset + 2].Value; | ||
| uint day = groups[offset + 5].Success ? uint.Parse(groups[offset + 5].Value) : 0; | ||
| string month = groups[offset + 6].Value; | ||
| uint year = uint.Parse(groups[offset + 9].Value); | ||
| string epoch = groups[offset + 13].Value; | ||
| if (!IsValidDate(calendar, day, month, year, epoch)) | ||
| { | ||
| return false; | ||
| } | ||
| if (!string.IsNullOrEmpty(groups[offset + 11].Value)) | ||
| { | ||
| uint altyear = uint.Parse(groups[offset + 11].Value); | ||
| if ((year + 1) % 100 == altyear) | ||
| { | ||
| return IsValidDate(calendar, day, month, year + 1, epoch); | ||
| } | ||
| return false; | ||
| } | ||
| return true; | ||
| } |
There was a problem hiding this comment.
Guard against 4-digit overflow and malformed alternate years
uint.Parse(groups[offset + 11].Value) assumes the substring always contains exactly two digits, yet the regex permits any \d{2} following the slash.
Before parsing, validate length and range explicitly to avoid surprises (/99 → 99 is fine, /000 would currently throw).
-uint altyear = uint.Parse(groups[offset + 11].Value);
+string altStr = groups[offset + 11].Value;
+if (altStr.Length != 2) return false; // defensive – spec says 2-digit
+if (!uint.TryParse(altStr, out uint altyear)) return false;Additionally, (year + 1) can reach 10000, which still fits uint but might violate GEDCOM constraints (1--9999). Consider bounding the result or reusing IsValidDate to check legitimacy after the increment.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| { | |
| int offset = calendarIndex - 2; | |
| string calendar = groups[offset + 2].Value; | |
| uint day = groups[offset + 5].Success ? uint.Parse(groups[offset + 5].Value) : 0; | |
| string month = groups[offset + 6].Value; | |
| uint year = uint.Parse(groups[offset + 9].Value); | |
| string epoch = groups[offset + 13].Value; | |
| if (!IsValidDate(calendar, day, month, year, epoch)) | |
| { | |
| return false; | |
| } | |
| if (!string.IsNullOrEmpty(groups[offset + 11].Value)) | |
| { | |
| uint altyear = uint.Parse(groups[offset + 11].Value); | |
| if ((year + 1) % 100 == altyear) | |
| { | |
| return IsValidDate(calendar, day, month, year + 1, epoch); | |
| } | |
| return false; | |
| } | |
| return true; | |
| } | |
| { | |
| int offset = calendarIndex - 2; | |
| string calendar = groups[offset + 2].Value; | |
| uint day = groups[offset + 5].Success ? uint.Parse(groups[offset + 5].Value) : 0; | |
| string month = groups[offset + 6].Value; | |
| uint year = uint.Parse(groups[offset + 9].Value); | |
| string epoch = groups[offset + 13].Value; | |
| if (!IsValidDate(calendar, day, month, year, epoch)) | |
| { | |
| return false; | |
| } | |
| if (!string.IsNullOrEmpty(groups[offset + 11].Value)) | |
| { | |
| - uint altyear = uint.Parse(groups[offset + 11].Value); | |
| + string altStr = groups[offset + 11].Value; | |
| + if (altStr.Length != 2) return false; // defensive – spec says 2-digit | |
| + if (!uint.TryParse(altStr, out uint altyear)) return false; | |
| if ((year + 1) % 100 == altyear) | |
| { | |
| return IsValidDate(calendar, day, month, year + 1, epoch); | |
| } | |
| return false; | |
| } | |
| return true; | |
| } |
🤖 Prompt for AI Agents
In Gedcom551/GedcomStructure.cs around lines 572 to 593, before parsing
groups[offset + 11].Value as uint, add explicit validation to ensure the string
is exactly two digits and within the valid range (0-99) to prevent parsing
errors from malformed input. Also, when calculating (year + 1), add a check to
ensure the incremented year does not exceed 9999, or use IsValidDate to verify
the incremented year is valid before proceeding. Adjust the logic to return
false if these validations fail.
Fixes #44 Signed-off-by: Dave Thaler <dthaler1968@gmail.com>
Signed-off-by: Dave Thaler <dthaler1968@gmail.com>
Fixes #44
gedcom7code/test-files#26 must be merged and this rebased to use it before the test will pass with this PR.
Summary by CodeRabbit
New Features
Bug Fixes
Tests