Skip to content

Support dual-date validation#58

Merged
dthaler merged 2 commits intomainfrom
dual-dating
May 16, 2025
Merged

Support dual-date validation#58
dthaler merged 2 commits intomainfrom
dual-dating

Conversation

@dthaler
Copy link
Copy Markdown
Contributor

@dthaler dthaler commented May 15, 2025

Fixes #44

gedcom7code/test-files#26 must be merged and this rebased to use it before the test will pass with this PR.

Summary by CodeRabbit

  • New Features

    • Added support for validating dates with an optional two-digit year suffix (e.g., "1699/00") across all date formats.
  • Bug Fixes

    • Improved accuracy and consistency of date value validation, including for date periods, ranges, and approximations.
  • Tests

    • Expanded test coverage to include new valid and invalid date formats, especially those with two-digit year suffixes.
    • Reactivated a previously disabled test to ensure comprehensive validation.
    • Updated existing tests to reflect refined validation rules and expected outcomes.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 15, 2025

"""

Walkthrough

The changes refactor and enhance date validation in GedcomStructure.cs by modularizing regex components and introducing a helper for dual year suffix parsing. Test coverage is expanded in ValidationTests.cs to include dual dating formats and previously disabled tests are enabled. No public API signatures are altered.

Changes

Files / Areas Change Summary
Gedcom551/GedcomStructure.cs Refactored date validation logic; split regex into day/year/date components; added helper for dual year suffixes; updated validation methods to use helper; improved modularity.
Tests/ValidationTests.cs Added test cases for dual dating formats including "1699/00" variants; re-enabled previously disabled test; renamed and updated a test method for dual date validation.
external/test-files Updated subproject commit reference; no code or logic changes.

Assessment against linked issues

Objective (Issue #) Addressed Explanation
DATE validation with calendars (#44)
DATE validating with dual dating (#44) Partial support added for dual dating format "1699/00" in validation, but full dual dating logic is not clearly implemented.
TIME validation (#44)

Poem

A bunny hops through dates anew,
With slashes, years, and months in view.
Regexes split, the logic neat—
Dual years now a tested feat!
From "FEB 1699/00" to "TO" and "FROM,"
Our code now knows where dates are from.
🐇✨
"""

Note

⚡️ AI Code Reviews for VS Code, Cursor, Windsurf

CodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback.
Learn more here.


Note

⚡️ Faster reviews with caching

CodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 16th. To opt out, configure Review - Disable Cache at either the organization or repository level. If you prefer to disable all data retention across your organization, simply turn off the Data Retention setting under your Organization Settings.
Enjoy the performance boost—your workflow just got faster.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 1530e4d and cc07d6a.

📒 Files selected for processing (1)
  • external/test-files (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • external/test-files
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: build (Release)
  • GitHub Check: build (Debug)

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
Gedcom551/GedcomStructure.cs (1)

478-512: Pre-compile the frequently-used regexes to boost performance

new Regex(...) is executed every time IsValidDatePeriod runs, which happens indirectly for every DATE payload.
On large GEDCOM files this results in thousands of allocations and the regex engine recompiling the same pattern over and over.

Declare the patterns once:

private static readonly Regex ToPeriodRegex   = new(@"^TO "   + DateRegex + "$", RegexOptions.Compiled);
private static readonly Regex FromToPeriodRegex = new(@"^FROM " + DateRegex + " TO " + DateRegex + "$", RegexOptions.Compiled);
private static readonly Regex FromPeriodRegex = new(@"^FROM " + DateRegex + "$", RegexOptions.Compiled);

and reuse them. Do the same in IsValidDateRange, IsValidDateApproximated, IsValidDateValue.

This is a low-risk change (no functional impact) that measurably reduces GC churn and CPU.

Tests/ValidationTests.cs (1)

1061-1068: Recommend adding a symmetric negative test for “/99” overflow

You already added several negative cases (1699/01, 1699/0, …).
Please also add something like ValidateInvalidDateValuePayload("1999/100") (or /99 where the base year isn’t one less) to ensure the (year + 1) % 100 == altYear rule is enforced.

This will protect against future regressions if the comparison logic is touched.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 000f158 and 866be3f.

📒 Files selected for processing (2)
  • Gedcom551/GedcomStructure.cs (3 hunks)
  • Tests/ValidationTests.cs (3 hunks)
🔇 Additional comments (1)
Tests/ValidationTests.cs (1)

1053-1056: Great addition – positive coverage for dual-dating edge-cases

The three new assertions (1699/00, with/without month/day) exercise the exact, tricky logic introduced in YearRegex and IsValidDateRegex. 👍

Comment on lines +600 to 630
private static bool IsValidDateValue(string value)
{
// Check for a valid date period.
if (IsValidDatePeriod(value))
{
return true;
}

// Check for a valid dateRange.
if (IsValidDateRange(value))
{
return true;
}

// Check for a valid dateApprox.
if (IsValidDateApproximated(value))
{
return true;
}

// Check for a valid date.
// This must be done after the other checks so that we don't try to parse
// a keyword like "BEF" or "FROM" as a month.
regex = new Regex("^" + DateRegex + "$");
match = regex.Match(value);
var regex = new Regex("^" + DateRegex + "$");
Match match = regex.Match(value);
if (match.Success)
{
string calendar = match.Groups[2].Value;
uint day = match.Groups[5].Success ? uint.Parse(match.Groups[5].Value) : 0;
string month = match.Groups[6].Value;
uint year = uint.Parse(match.Groups[9].Value);
string epoch = match.Groups[11].Value;
return IsValidDate(calendar, day, month, year, epoch);
return IsValidDateRegex(match.Groups, 2);
}

return false;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Magic constants 2 / 16 leak internal regex structure

IsValidDateRegex(groups, 2) / (..., 16) rely on the fact that one DateRegex contributes exactly 14 capturing groups. If any inner group is changed (see first comment) all these call-sites become wrong and bugs slip in silently.

After switching to named groups this entire parameter can be removed; until then, at least replace the literals with const int CalendarGroupIndex = 2; and const int SecondCalendarGroupIndex = CalendarGroupIndex + GroupsPerDate to make the dependency obvious.

This safeguards maintainability without affecting behaviour.

🤖 Prompt for AI Agents
In Gedcom551/GedcomStructure.cs around lines 600 to 630, the method
IsValidDateValue uses magic constants 2 and 16 as group indices in calls to
IsValidDateRegex, which depend on the internal structure of the DateRegex
capturing groups. To fix this, define descriptive constant integers like
CalendarGroupIndex = 2 and SecondCalendarGroupIndex = CalendarGroupIndex +
GroupsPerDate, then replace the literal numbers with these constants in all
relevant calls. This will make the code more maintainable and prevent silent
bugs if the regex structure changes.

Comment on lines +464 to 467
private const string DayRegex = @"((\d{1,2}) )?";
private const string YearRegex = @"(\d{1,4})(/(\d{2}))?";
private const string DateRegex = @"(" + Calendar551Regex + @" )?(" + DayRegex + MonthRegex + @" )?" + YearRegex + @"( " + Epoch551Regex + @")?";

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Replace needless capturing with non-capturing (?: … ) and/or named groups

DayRegex, YearRegex and the re-composed DateRegex introduce 14 positional capturing groups per date occurrence.
Most of these captures (e.g. the TagCharRegex inside the * / + repetitions) are never read – they only bloat the GroupCollection, force every caller to keep the brittle magic indices (offset+11, offset+13, …) and make any future tweak extremely error-prone (change the inner regex ⇒ all indices shift).

Consider:

  • turning purely structural groups into non-capturing groups (?: … )
  • giving the few required groups explicit names and reading them via match.Groups["year"], ["altYear"], …

That will:
• shrink the number of groups,
• get rid of the hard-coded offsets (2, 16, 3 …),
• make the intent self-documenting and resilient to later changes.

🤖 Prompt for AI Agents
In Gedcom551/GedcomStructure.cs around lines 464 to 467, the regex patterns
DayRegex, YearRegex, and DateRegex use many unnecessary capturing groups that
increase the GroupCollection size and require brittle positional indices.
Refactor these regexes by replacing all structural capturing groups with
non-capturing groups (?: … ) and assign explicit names to the essential
capturing groups. Then update the code to access these groups by their names
instead of numeric indices to improve readability, maintainability, and reduce
errors from future regex changes.

Comment on lines +572 to +593
{
int offset = calendarIndex - 2;
string calendar = groups[offset + 2].Value;
uint day = groups[offset + 5].Success ? uint.Parse(groups[offset + 5].Value) : 0;
string month = groups[offset + 6].Value;
uint year = uint.Parse(groups[offset + 9].Value);
string epoch = groups[offset + 13].Value;
if (!IsValidDate(calendar, day, month, year, epoch))
{
return false;
}
if (!string.IsNullOrEmpty(groups[offset + 11].Value))
{
uint altyear = uint.Parse(groups[offset + 11].Value);
if ((year + 1) % 100 == altyear)
{
return IsValidDate(calendar, day, month, year + 1, epoch);
}
return false;
}
return true;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Guard against 4-digit overflow and malformed alternate years

uint.Parse(groups[offset + 11].Value) assumes the substring always contains exactly two digits, yet the regex permits any \d{2} following the slash.
Before parsing, validate length and range explicitly to avoid surprises (/99 → 99 is fine, /000 would currently throw).

-uint altyear = uint.Parse(groups[offset + 11].Value);
+string altStr = groups[offset + 11].Value;
+if (altStr.Length != 2) return false;          // defensive – spec says 2-digit
+if (!uint.TryParse(altStr, out uint altyear)) return false;

Additionally, (year + 1) can reach 10000, which still fits uint but might violate GEDCOM constraints (1--9999). Consider bounding the result or reusing IsValidDate to check legitimacy after the increment.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
{
int offset = calendarIndex - 2;
string calendar = groups[offset + 2].Value;
uint day = groups[offset + 5].Success ? uint.Parse(groups[offset + 5].Value) : 0;
string month = groups[offset + 6].Value;
uint year = uint.Parse(groups[offset + 9].Value);
string epoch = groups[offset + 13].Value;
if (!IsValidDate(calendar, day, month, year, epoch))
{
return false;
}
if (!string.IsNullOrEmpty(groups[offset + 11].Value))
{
uint altyear = uint.Parse(groups[offset + 11].Value);
if ((year + 1) % 100 == altyear)
{
return IsValidDate(calendar, day, month, year + 1, epoch);
}
return false;
}
return true;
}
{
int offset = calendarIndex - 2;
string calendar = groups[offset + 2].Value;
uint day = groups[offset + 5].Success ? uint.Parse(groups[offset + 5].Value) : 0;
string month = groups[offset + 6].Value;
uint year = uint.Parse(groups[offset + 9].Value);
string epoch = groups[offset + 13].Value;
if (!IsValidDate(calendar, day, month, year, epoch))
{
return false;
}
if (!string.IsNullOrEmpty(groups[offset + 11].Value))
{
- uint altyear = uint.Parse(groups[offset + 11].Value);
+ string altStr = groups[offset + 11].Value;
+ if (altStr.Length != 2) return false; // defensive – spec says 2-digit
+ if (!uint.TryParse(altStr, out uint altyear)) return false;
if ((year + 1) % 100 == altyear)
{
return IsValidDate(calendar, day, month, year + 1, epoch);
}
return false;
}
return true;
}
🤖 Prompt for AI Agents
In Gedcom551/GedcomStructure.cs around lines 572 to 593, before parsing
groups[offset + 11].Value as uint, add explicit validation to ensure the string
is exactly two digits and within the valid range (0-99) to prevent parsing
errors from malformed input. Also, when calculating (year + 1), add a check to
ensure the incremented year does not exceed 9999, or use IsValidDate to verify
the incremented year is valid before proceeding. Adjust the logic to return
false if these validations fail.

Fixes #44

Signed-off-by: Dave Thaler <dthaler1968@gmail.com>
Signed-off-by: Dave Thaler <dthaler1968@gmail.com>
@dthaler dthaler merged commit e64f436 into main May 16, 2025
4 checks passed
@dthaler dthaler deleted the dual-dating branch May 16, 2025 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

support DATE validation (dual dating)

1 participant