Suggestion to alter the behaviour of FEATURE_WEIGHTS when the PERCENT_TRAIN = 100 #426

paulvpop · 2026-04-21T09:00:22Z

paulvpop
Apr 21, 2026

At the moment, there are three preset options for FEATURE_WEIGHTS doing force-train. This works well when the PERCENT_TRAIN parameter is 70, 80 or some other non-100 value. But, when the PERCENT_TRAIN = 100, i.e. 100 percentage of the sample is being used for training instead of keeping some for validation, EQUALIZED, PROPORTIONAL, and ANTIPROPORTIONAL all uses up all of the training data i.e. they all behave the same. This behaviour is not desirable. When 100% of the sample is being used, in the case of EQUALIZED, the same number of samples from each class should be taken where number is equal to the number of samples of the smallest class. In case of PROPORTIONAL, it should remain the same. In case of ANTIPROPORTIONAL, most samples should be taken from the least numerous classes. This could be done by assigning the number of samples of the least numerous class as the highest number of samples, and then assign the number of samples of other classes anti-proportionally. I am not sure what logic is currently been applied, but the same can likely to be adapted to meet the requirements when the PERCENT_TRAIN = 100. Requesting for this change.

Note that the validation procedure in my case is done separately with another validation dataset from points collected from the classified imagery; so, validation can only be done after model prediction. So, that's why I am using 100% of the samples for training.

Here's an example parameter file for force-train

%SET%: 00001 00002 00003

%WEIGHT%: EQUALIZED PROPORTIONAL ANTIPROPORTIONAL

++PARAM_TRAIN_START++

# INPUT
# ------------------------------------------------------------------------
# File that is holding the features for training (and probably validation).
# The file needs to be a table with features in columns, and samples in rows.
# Column delimiter is whitespace. The same number of features must be given
# for each sample. Do not include a header. The samples need to match the
# response file.
# Type: full file path
FILE_FEATURES = /data/samples_rfc_30/features.txt
# File that is holding the response for training (class labels or numeric
# values). The file needs to be a table with one column, and samples in rows.
# Do not include a header. The samples need to match the feature file.
# Type: full file path
FILE_RESPONSE = /data/samples_rfc_30/response.txt

# OUTPUT
# ------------------------------------------------------------------------
# File for storing the Machine Learning model in xml format. This file
# will be overwritten if it exists.
# Type: full file path
FILE_MODEL = /data/third/train_svc_31_32_33/MODEL_CLASS_{%SET%}_{%WEIGHT%}.xml
# File for storing the logfile. This file will be overwritten if it exists.
# Type: full file path
FILE_LOG = /data/third/train_svc_31_32_33/FILE_LOG_{%SET%}_{%WEIGHT%}.log

# TRAINING
# ------------------------------------------------------------------------
# Response variable for training the model. This number refers to the column
# of the response file, in which the desired variable is stored (FILE_RESPONSE).
# Type: Integer. Valid range: [1,NUMBER_OF_VARIABLES]
RESPONSE_VARIABLE = 1
# This parameter specifies how many samples (in %) should be used for
# training the model. The other samples are left out, and used to vali-
# date the model.
# Type: Float. Valid range: ]0,100]
PERCENT_TRAIN = 100
# This parameter specifies whether the samples should be randomly drawn (TRUE)
# or if the first n samples (FALSE) should be used for training.
# Type: Logical. Valid values: {TRUE,FALSE}
RANDOM_SPLIT = TRUE
# Machine learning method. Currently implemented are Random Forest and
# Support Vector Machines, both in regression and classification flavors.
# Type: Character. Valid values: {SVR,SVC,RFR,RFC}
ML_METHOD = SVC
# Class weights. This parameter only applies for the classification flavor. 
# This parameter lets you define à priori class weights, which can be useful
# if the training data are inbalanced. This parameter can be set to a number
# of different values. EQUALIZED gives the same weight to all classes (default).
# PROPORTIONAL gives a weight proportional to the class frequency.
# ANTIPROPORTIONAL gives a weight, which is inversely proportional to the class
# frequency. Alternatively, you can use custom weights, i.e. a vector of weights
# for each class in your response file. The weights must sum to one, and must be
# given in ascending order.
# Type: Character / Float list. Valid values: {EQUALIZED,PROPORTIONAL,ANTIPROPORTIONAL} or ]0,1[
FEATURE_WEIGHTS = {%WEIGHT%}

paulvpop · 2026-04-21T12:04:08Z

paulvpop
Apr 21, 2026
Author

I went through the source code here and asked Deepseek what is happening behind the scene. I turns out my understanding of this was wrong. The log files I checked had class weights that were different and I hadn't noticed that.

EQUALIZED log:

class weights: NA

PROPORTIONAL log:

class weights: 0.120441 0.110687 0.069550 0.060221 0.068702 0.060645 0.091179 0.067006 0.061493 0.072519 0.082697 0.067006 0.067854

ANTIPROPORTIONAL log:

class weights: 0.046892 0.051024 0.081203 0.093784 0.082206 0.093128 0.061941 0.084287 0.091843 0.077879 0.068294 0.084287 0.083233

I assumed something was wrong because all the samples were shown as 1. But, I guess the not so apparent work of utilising FEATURE_WEIGHTS is done in the algorithms of openCV. My apologies. Please confirm if my understanding is correct or not.

0 replies

davidfrantz · 2026-04-22T08:22:05Z

davidfrantz
Apr 22, 2026
Maintainer

Dear @paulvpop ,

the weights are used internally in the OpenCVs training functions to finetune the machine learning algorithm's behaviour.

The sub-sampling is independent of this - subsampling controls which portion of the data will be presented to the learning algorithm.

Best,
David

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion to alter the behaviour of FEATURE_WEIGHTS when the PERCENT_TRAIN = 100 #426

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Suggestion to alter the behaviour of FEATURE_WEIGHTS when the PERCENT_TRAIN = 100 #426

Uh oh!

Uh oh!

paulvpop Apr 21, 2026

Replies: 2 comments

Uh oh!

Uh oh!

paulvpop Apr 21, 2026 Author

Uh oh!

davidfrantz Apr 22, 2026 Maintainer

paulvpop
Apr 21, 2026

paulvpop
Apr 21, 2026
Author

davidfrantz
Apr 22, 2026
Maintainer