Skip to content

Convergence Warning with MCFS algorithm #12

@TeunvdWeij

Description

@TeunvdWeij

The problem

The problem is that the convergence warning can be solved but that it effects the run time. The convergence warning can be solved by decreasing the max iteration or increasing the eps parameters (see the warning). It makes sense that the run time is affected, because it will just do fewer iterations or take bigger steps, resulting in lower execution time.

The question that I think must be answered is whether the warning should be ignored or the parameters must be changed leading to different run times.

The warning:

/opt/conda/lib/python3.8/site-packages/sklearn/linear_model/_least_angle.py:615: ConvergenceWarning: Regressors in active set degenerate. Dropping a regressor, after 96 iterations, i.e. alpha=3.095e-05, with an active set of 89 regressors, and the smallest cholesky pivot element being 7.300e-08. Reduce max_iter or increase eps parameters.
warnings.warn('Regressors in active set degenerate. '

More information on the topic

With the following link you can search for the cholesky pivot and you will go to the corresponding place in the code.
https://github.com/scikit-learn/scikit-learn/blob/0fb307bf39bbdacd6ed713c00724f8f871d60370/sklearn/linear_model/_least_angle.py

The code

The function eval_MCFS() is called from experiment_config.py. The parameters the function receives are as follows:

Parameters for contruct_W function:

  • W_kwargs = {"metric": "euclidean", "neighborMode": "knn", "weightMode": "heatKernel", "k": 5, 't': 1}

Number of features to select:

  • num_features = 100

The number of true class labels is given by num_clusters.

X is the data en y are the labels of each sample.

from skfeature.function.sparse_learning_based import MCFS

def eval_MCFS(X, y, num_cluster, num_features, W_kwargs):
    """
    Function that does unsupervised feature selection with MCFS algorithm.

    PARAMS:
    X: data to analyze
    y: true labels 
    num_clusters: number of clusters in ground truth of dataset
    W_kwargs: parameters for construc_W function

    ------------
    Returns:
    The nmi, acc and run time

    """
    start_time = time()

    # construct affinity matrix
    W = construct_W.construct_W(X, **W_kwargs)

    # obtain the feature weight matrix
    Weight = MCFS.mcfs(X, n_selected_features=num_features, W=W, n_clusters=20)

    # sort the feature scores in an ascending order according to the feature scores
    idx = MCFS.feature_ranking(Weight)

    # get run time for MCFS algorithm
    run_time = time() - start_time

    # obtain the dataset on the selected features
    selected_features = X[:, idx[0:num_features]]

    # perform kmeans clustering based on the selected features and repeats 20 times
    nmi_total = 0
    acc_total = 0
    for _ in range(0, 20):
        nmi, acc = unsupervised_evaluation.evaluation(X_selected=selected_features, n_clusters=num_cluster, y=y)
        nmi_total += nmi
        acc_total += acc

    # get the averages
    nmi = float(nmi_total)/20
    acc = float(acc_total)/20

    # output nmi, acc and run time 
    return nmi, acc, run_time `

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions