Skip to content

Modeling Notebook#3

Open
alice39 wants to merge 20 commits into
errodd:mainfrom
Atlls:modeling
Open

Modeling Notebook#3
alice39 wants to merge 20 commits into
errodd:mainfrom
Atlls:modeling

Conversation

@alice39

@alice39 alice39 commented Apr 17, 2026

Copy link
Copy Markdown
Collaborator

Nota, optuna n_trials establecido a 2, verificar con 30.

Nada agregado de diferente respecto a #1 y #2 , solo verificar notebook/modeling_and_model_selection.ipynb.

Atlls and others added 20 commits February 14, 2026 23:38
…tebook

This should be as an initial base which we could iterate and improve
the data preparation for the model.

This notebook includes the following:
  - Split dataset into training and test datasets by sklearn library
  - Extracting features from columns such as embedding, word count, etc
  - Feature selection, which drops comment text, and unnecesary columns
  - Sklearn pipelines, column transformer

Acked-by: Atlls <alejandroaigner1999@hotmail.com>
Signed-off-by: A.L.I.C.E <a@alice0.com>
The phase 3 contains two elements which are:

    - final feature matrices: this outputs the transformed dataset from
      original data in `data/` directory, along the needed columns
      to operate with.

    - artifacts persistence: this dumps the column transformer into
      preprocess_pipeline.joblib, and asso  dumps a tuple stored as
      (X_train_final, X_test_final, Y_train, Y_test) into
      split_data.joblib.

    - output summary: a summary about data processing notebook outputs,
      and some design choices..

Signed-off-by: A.L.I.C.E <a@alice0.com>
Signed-off-by: A.L.I.C.E <a@alice0.com>
Signed-off-by: A.L.I.C.E <a@alice0.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants