-
Notifications
You must be signed in to change notification settings - Fork 1
Description
The problem
The problem is that the convergence warning can be solved but that it effects the run time. The convergence warning can be solved by decreasing the max iteration or increasing the eps parameters (see the warning). It makes sense that the run time is affected, because it will just do fewer iterations or take bigger steps, resulting in lower execution time.
The question that I think must be answered is whether the warning should be ignored or the parameters must be changed leading to different run times.
The warning:
/opt/conda/lib/python3.8/site-packages/sklearn/linear_model/_least_angle.py:615: ConvergenceWarning: Regressors in active set degenerate. Dropping a regressor, after 96 iterations, i.e. alpha=3.095e-05, with an active set of 89 regressors, and the smallest cholesky pivot element being 7.300e-08. Reduce max_iter or increase eps parameters.
warnings.warn('Regressors in active set degenerate. '
More information on the topic
With the following link you can search for the cholesky pivot and you will go to the corresponding place in the code.
https://github.com/scikit-learn/scikit-learn/blob/0fb307bf39bbdacd6ed713c00724f8f871d60370/sklearn/linear_model/_least_angle.py
The code
The function eval_MCFS() is called from experiment_config.py. The parameters the function receives are as follows:
Parameters for contruct_W function:
- W_kwargs = {"metric": "euclidean", "neighborMode": "knn", "weightMode": "heatKernel", "k": 5, 't': 1}
Number of features to select:
- num_features = 100
The number of true class labels is given by num_clusters.
X is the data en y are the labels of each sample.
from skfeature.function.sparse_learning_based import MCFS
def eval_MCFS(X, y, num_cluster, num_features, W_kwargs):
"""
Function that does unsupervised feature selection with MCFS algorithm.
PARAMS:
X: data to analyze
y: true labels
num_clusters: number of clusters in ground truth of dataset
W_kwargs: parameters for construc_W function
------------
Returns:
The nmi, acc and run time
"""
start_time = time()
# construct affinity matrix
W = construct_W.construct_W(X, **W_kwargs)
# obtain the feature weight matrix
Weight = MCFS.mcfs(X, n_selected_features=num_features, W=W, n_clusters=20)
# sort the feature scores in an ascending order according to the feature scores
idx = MCFS.feature_ranking(Weight)
# get run time for MCFS algorithm
run_time = time() - start_time
# obtain the dataset on the selected features
selected_features = X[:, idx[0:num_features]]
# perform kmeans clustering based on the selected features and repeats 20 times
nmi_total = 0
acc_total = 0
for _ in range(0, 20):
nmi, acc = unsupervised_evaluation.evaluation(X_selected=selected_features, n_clusters=num_cluster, y=y)
nmi_total += nmi
acc_total += acc
# get the averages
nmi = float(nmi_total)/20
acc = float(acc_total)/20
# output nmi, acc and run time
return nmi, acc, run_time `