Skip to content

Conversation

@epatey
Copy link

@epatey epatey commented Mar 18, 2018

…ing heads and tails before using LCS. For example, in the case of single insert, the algorithm changes from O(m*n) to O(m+n). When the arrays contain 1,000 entries, for example, this change reduces the number of comparisons ~1,000,000 to ~2,000 and the size of the table used by the algorithm from ~1,000,000 to 2.

…ing heads and tails before using LCS. For example, in the case of single insert, the algorithm changes from O(m*n) to O(m+n). When the arrays contain 1,000 entries, for example, this change reduces the number of comparisons ~1,000,000 to ~2,000 and the size of the table used by the algorithm from ~1,000,000 to 2.
@epatey
Copy link
Author

epatey commented Mar 19, 2018

I haven't had a chance to actually measure the perf difference, but you could replace

let matchingHeadCount = zip(lhs, rhs).prefix() { $0.0 == $0.1 }.map() { $0.0 }.count
let matchingTailCount = matchingHeadCount == minTotalCount
    ? 0 // if the matching head consumed all of either of the arrays, there's no tail
    : zip(lhs.reversed(), rhs.reversed()).prefix(minTotalCount - matchingHeadCount).prefix() { $0.0 == $0.1 }.reversed().map() { $0.0 }.count

with something like:

let (matchingHeadCount, matchingTailCount) = lhs.withUnsafeBufferPointer { unsafeLHS in
    return rhs.withUnsafeBufferPointer { unsafeRHS -> (Int, Int) in
        var mhc = minTotalCount
        for i in  0..<minTotalCount {
            if (unsafeLHS[i] != unsafeRHS[i]) {
                mhc = i
                break
            }
        }
        
        let maxPossibleTail = minTotalCount - mhc
        if (maxPossibleTail < 1) {
            return (mhc, 0)
        }
        
        var mtc = maxPossibleTail
        for i in  0..<maxPossibleTail {
            if (unsafeLHS[unsafeLHS.endIndex - i - 1] != unsafeRHS[unsafeRHS.endIndex - i - 1]) {
                mtc = i
                break
            }
        }
        
        return (mhc, mtc)
    }
}

if it makes a perf difference.

@epatey
Copy link
Author

epatey commented Mar 20, 2018

So I did the perf test. After I added the proper .lazy's to the sequences, the perf difference between the sequence oriented approach and the unsafeBufferPointer approach is pretty small, and the sequence approach is way more readable/concise.

I still don't have an instinct around what operations drop you out of the lazy domain.

Copy link
Owner

@jflinter jflinter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@epatey - left a couple of comments. Overall this is really nice, thank you!


internal static func matchingEndsInfo<Value: Equatable>(_ lhs: [Value], _ rhs: [Value]) -> (Int, ArraySlice<Value>, ArraySlice<Value>) {
let minTotalCount = min(lhs.count, rhs.count)
let matchingHeadCount = zip(lhs, rhs).lazy.prefix() { $0.0 == $0.1 }.count()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two lines feel a bit too clever for their own good - it's hard for me to understand what they're doing at a quick glance. I think using a couple of plain old for loops would be preferable here.

/// Namespace for the `diff` and `apply` functions.
public enum Dwifft {

internal static func matchingEndsInfo<Value: Equatable>(_ lhs: [Value], _ rhs: [Value]) -> (Int, ArraySlice<Value>, ArraySlice<Value>) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though it's an internal function, can you please add a docstring to this method to help with future debugging etc?

/// - rhs: another, uh, array
/// - Returns: the series of transformations that, when applied to `lhs`, will yield `rhs`.
public static func diff<Value: Equatable>(_ lhs: [Value], _ rhs: [Value]) -> [DiffStep<Value>] {
let (matchingHeadCount, lhs, rhs) = matchingEndsInfo(lhs, rhs)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicking, but can you not shadow the lhs and rhs variable names here? Made it harder for me to understand how this compiled at first glance...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants