Skip to content

data-forge not working nicely with collectionsjs #7

@btruhand

Description

@btruhand

First and foremost, I do not think this is a fault in data-forge but I feel like I should raise the issue here so others and author(s) are aware of it

I have the following code:

const dataForge = require('data-forge');
const util = require('util');
//const c = require('collections/fast-map');

function test() {
  const timestamps = [ [ '2018-05-21 15:38:04' ],
    [ '2018-05-21 15:38:09' ],
    [ '2018-05-21 15:38:09' ],
    [ '2018-05-21 15:38:08' ] ];
  const tsDataFrame = new dataForge.DataFrame({
    columnNames: ['Timestamp'],
    rows: timestamps
  }).setIndex('Timestamp');
  let groupedDf = tsDataFrame.groupBy(row => row.Timestamp).select(tsGroup => ({
    Timestamp: tsGroup.first().Timestamp,
    QPS: tsGroup.count()
  })).inflate();
  console.log(groupedDf.toString());
}

test();

Running the above code will give me the following result

__index__  Timestamp            QPS
---------  -------------------  ---
0          2018-05-21 15:38:04  1
1          2018-05-21 15:38:09  2
2          2018-05-21 15:38:08  1

which is what I expected

However if //const c = require('collections/fast-map'); is uncommented, running it again I will get

__index__  [object Object]  false
---------  ---------------  -----
0
1
2

Clearly this is a mistake. After hours of debugging I can at least spot one possible reason for the error. In the build version of data-forge within DataFrame.prototype.toString function we have the following (cut down for brevity's sake)

    DataFrame.prototype.toString = function () {
        var columnNames = this.getColumnNames();
        var header = ["__index__"].concat(columnNames);
        // more things down below
    };

Doing console.log(columnNames) with collectionsjs required gives me the following:

[ SelectIterable {
    iterable: [ [Object], [Object], [Object] ],
    selector: [Function] },
  false ]

Without collectionsjs I will get the expected result: [ 'Timestamp', 'QPS' ]

Inspecting further the getColumnNames function tells me that Array.from is used which is overridden by collectionsjs implementation: https://github.com/montagejs/collections/blob/master/shim-array.js#L26

I managed to fix things on data-forge side by doing a seemingly unnecessary function call:

let groupedDf = tsDataFrame.groupBy(row => row.Timestamp).select(tsGroup => ({
  Timestamp: tsGroup.first().Timestamp,
  QPS: tsGroup.count()
})).inflate().resetIndex();

This will give me the correct result regardless if collectionsjs was used or not

There's an issue raised already in collectionsjs regarding Array.from montagejs/collections#169 and there is also a PR montagejs/collections#173. I'm not sure about the progress of either

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions