Skip to content

Refactor inner join #53

@mchav

Description

@mchav

This was written in one shot so there is some repetition.

        leftIndicesToGroup = M.elems $ M.filterWithKey (\k _ -> k `elem` cs) (D.columnIndices left)
        leftRowRepresentations = VU.generate (fst (D.dimensions left)) (D.mkRowRep leftIndicesToGroup left)
        -- key -> [index0, index1]
        leftKeyCountsAndIndices   = VU.foldr (\(i, v) acc -> M.insertWith (++) v [i] acc) M.empty (VU.indexed leftRowRepresentations)
        -- key -> [index0, index1]
        rightIndicesToGroup = M.elems $ M.filterWithKey (\k _ -> k `elem` cs) (D.columnIndices right)
        rightRowRepresentations = VU.generate (fst (D.dimensions right)) (D.mkRowRep rightIndicesToGroup right)
        rightKeyCountsAndIndices  = VU.foldr (\(i, v) acc -> M.insertWith (++) v [i] acc) M.empty (VU.indexed rightRowRepresentations)
        -- key -> [(left_indexes0, right_indexes1)]
        mergedKeyCountsAndIndices = M.foldrWithKey (\k v m -> if k `M.member` rightKeyCountsAndIndices then M.insert k (VU.fromList v, VU.fromList (rightKeyCountsAndIndices M.! k)) m else m) M.empty leftKeyCountsAndIndices
        -- [(ints, ints)]
        leftAndRightIndicies = M.elems mergedKeyCountsAndIndices
        -- [(ints, ints)] (expanded to n * m)
        expandedIndices = map (\(l, r) -> (mconcat (replicate (VU.length r) l), mconcat (replicate (VU.length l) r))) leftAndRightIndicies
        expandedLeftIndicies = mconcat (map fst expandedIndices)
        expandedRightIndicies = mconcat (map snd expandedIndices)
        -- df
        expandedLeft = left { columns = VB.map (D.atIndicesStable expandedLeftIndicies) (D.columns left), dataframeDimensions = (VU.length expandedLeftIndicies, snd (D.dataframeDimensions left))}
        -- df 
        expandedRight = right { columns = VB.map (D.atIndicesStable expandedRightIndicies) (D.columns right), dataframeDimensions = (VU.length expandedRightIndicies, snd (D.dataframeDimensions right))}
        -- [string]
        leftColumns = D.columnNames left
        rightColumns = D.columnNames right

The comments are also not very informative.

This should be broken into functions and tested.

Metadata

Metadata

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions