First and foremost, I do not think this is a fault in data-forge but I feel like I should raise the issue here so others and author(s) are aware of it
I have the following code:
const dataForge = require('data-forge');
const util = require('util');
//const c = require('collections/fast-map');
function test() {
const timestamps = [ [ '2018-05-21 15:38:04' ],
[ '2018-05-21 15:38:09' ],
[ '2018-05-21 15:38:09' ],
[ '2018-05-21 15:38:08' ] ];
const tsDataFrame = new dataForge.DataFrame({
columnNames: ['Timestamp'],
rows: timestamps
}).setIndex('Timestamp');
let groupedDf = tsDataFrame.groupBy(row => row.Timestamp).select(tsGroup => ({
Timestamp: tsGroup.first().Timestamp,
QPS: tsGroup.count()
})).inflate();
console.log(groupedDf.toString());
}
test();
Running the above code will give me the following result
__index__ Timestamp QPS
--------- ------------------- ---
0 2018-05-21 15:38:04 1
1 2018-05-21 15:38:09 2
2 2018-05-21 15:38:08 1
which is what I expected
However if //const c = require('collections/fast-map'); is uncommented, running it again I will get
__index__ [object Object] false
--------- --------------- -----
0
1
2
Clearly this is a mistake. After hours of debugging I can at least spot one possible reason for the error. In the build version of data-forge within DataFrame.prototype.toString function we have the following (cut down for brevity's sake)
DataFrame.prototype.toString = function () {
var columnNames = this.getColumnNames();
var header = ["__index__"].concat(columnNames);
// more things down below
};
Doing console.log(columnNames) with collectionsjs required gives me the following:
[ SelectIterable {
iterable: [ [Object], [Object], [Object] ],
selector: [Function] },
false ]
Without collectionsjs I will get the expected result: [ 'Timestamp', 'QPS' ]
Inspecting further the getColumnNames function tells me that Array.from is used which is overridden by collectionsjs implementation: https://github.com/montagejs/collections/blob/master/shim-array.js#L26
I managed to fix things on data-forge side by doing a seemingly unnecessary function call:
let groupedDf = tsDataFrame.groupBy(row => row.Timestamp).select(tsGroup => ({
Timestamp: tsGroup.first().Timestamp,
QPS: tsGroup.count()
})).inflate().resetIndex();
This will give me the correct result regardless if collectionsjs was used or not
There's an issue raised already in collectionsjs regarding Array.from montagejs/collections#169 and there is also a PR montagejs/collections#173. I'm not sure about the progress of either
First and foremost, I do not think this is a fault in
data-forgebut I feel like I should raise the issue here so others and author(s) are aware of itI have the following code:
Running the above code will give me the following result
which is what I expected
However if
//const c = require('collections/fast-map');is uncommented, running it again I will get__index__ [object Object] false --------- --------------- ----- 0 1 2Clearly this is a mistake. After hours of debugging I can at least spot one possible reason for the error. In the build version of
data-forgewithinDataFrame.prototype.toStringfunction we have the following (cut down for brevity's sake)Doing
console.log(columnNames)withcollectionsjsrequired gives me the following:Without
collectionsjsI will get the expected result:[ 'Timestamp', 'QPS' ]Inspecting further the
getColumnNamesfunction tells me thatArray.fromis used which is overridden bycollectionsjsimplementation: https://github.com/montagejs/collections/blob/master/shim-array.js#L26I managed to fix things on
data-forgeside by doing a seemingly unnecessary function call:This will give me the correct result regardless if
collectionsjswas used or notThere's an issue raised already in
collectionsjsregardingArray.frommontagejs/collections#169 and there is also a PR montagejs/collections#173. I'm not sure about the progress of either