Hello, thank for your open source. I am trying to understand your code. However, in the data.py, it is confused for me to preprocess the data.
In building vocabulary,
print("Load corpus with train size %d, valid size %d, "
"test size %d raw vocab size %d vocab size %d at cut_off %d OOV rate %f"
% (len(self.train_corpus), len(self.valid_corpus), len(self.test_corpus),
raw_vocab_size, len(vocab_count), vocab_count[-1][1], float(discard_wc) / len(all_words)))
What do the train size, valid size, and test size mean?
The values of all are 2 since they are a tuple with length of 2.
Do you mean that all vocabularies are from the training, testing, and validation data?
However, it only uses the training data to build the vocabulary in the code.
In formatting dialogue,
Is it essential to add [<s>,<d>,</s>] in the start of the dialogue?
Can I not use this?
thank you.
Hello, thank for your open source. I am trying to understand your code. However, in the data.py, it is confused for me to preprocess the data.
In building vocabulary,
What do the train size, valid size, and test size mean?
The values of all are 2 since they are a tuple with length of 2.
Do you mean that all vocabularies are from the training, testing, and validation data?
However, it only uses the training data to build the vocabulary in the code.
In formatting dialogue,
Is it essential to add [<s>,<d>,</s>] in the start of the dialogue?
Can I not use this?
thank you.