Skip to content

Scoring code #7

@ankit-vaghela30

Description

@ankit-vaghela30

In Features.py, Loader class with read method is there which reads and transforms the data into required RDDs. I ran below snippet:

import hydrus.features as hf

data, labels = hf.Loader(ctx).read(args.data_path, args.label_path)
testData, testLabel = hf.Loader(ctx).read(args.test_data_path, args.test_label_path)
print('training doc length: ', len(data.map(lambda x: (x[0][0], x[1])).reduceByKey(lambda x,y: x+y).collect()))
print('training label length: ', len(labels.collect()))

data = hf.TfIdfTransformer(ctx).fit(data).transform(data)
testData = hf.TfIdfTransformer(ctx).fit(testData).transform(testData)
print('testing doc length: ', len(testData.map(lambda x: (x[0][0], x[1])).reduceByKey(lambda x,y: x+y).collect()))
print('testing labels length: ', len(testLabel.collect()))

Now, for training data I got same size as 77, 77 while for testing data i got size as 13 and 14.

@cbarrick , Can you please take a look at this?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions