Problem
Both training and testing scripts use the same random split with random_state=17, which could lead to testing on training data if not managed carefully.
Code:
x_train, x_opt, x_test = np.split(df.sample(frac=1, random_state=17), ...)
Appears in:
train_og.py:26-27
test.py:37-38
Concern
If test.py is run on the same combined dataset used during training, it will test on data the model has already seen during training.
Recommendation
Use proper train/test split methodology:
- Separate hold-out test set
- Time-based split for network traffic
- Or different random seeds
Priority
MODERATE - Could affect validity of test results
Problem
Both training and testing scripts use the same random split with
random_state=17, which could lead to testing on training data if not managed carefully.Code:
Appears in:
train_og.py:26-27test.py:37-38Concern
If test.py is run on the same combined dataset used during training, it will test on data the model has already seen during training.
Recommendation
Use proper train/test split methodology:
Priority
MODERATE - Could affect validity of test results