Modeling Notebook#3
Open
alice39 wants to merge 20 commits into
Open
Conversation
…tebook This should be as an initial base which we could iterate and improve the data preparation for the model. This notebook includes the following: - Split dataset into training and test datasets by sklearn library - Extracting features from columns such as embedding, word count, etc - Feature selection, which drops comment text, and unnecesary columns - Sklearn pipelines, column transformer Acked-by: Atlls <alejandroaigner1999@hotmail.com> Signed-off-by: A.L.I.C.E <a@alice0.com>
The phase 3 contains two elements which are:
- final feature matrices: this outputs the transformed dataset from
original data in `data/` directory, along the needed columns
to operate with.
- artifacts persistence: this dumps the column transformer into
preprocess_pipeline.joblib, and asso dumps a tuple stored as
(X_train_final, X_test_final, Y_train, Y_test) into
split_data.joblib.
- output summary: a summary about data processing notebook outputs,
and some design choices..
Signed-off-by: A.L.I.C.E <a@alice0.com>
Signed-off-by: A.L.I.C.E <a@alice0.com>
Signed-off-by: A.L.I.C.E <a@alice0.com>
Signed-off-by: A.L.I.C.E <a@alice0.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Nota, optuna n_trials establecido a 2, verificar con 30.
Nada agregado de diferente respecto a #1 y #2 , solo verificar
notebook/modeling_and_model_selection.ipynb.