Replies: 2 comments
-
|
I went through the source code here and asked Deepseek what is happening behind the scene. I turns out my understanding of this was wrong. The log files I checked had class weights that were different and I hadn't noticed that. EQUALIZED log: class weights: NA PROPORTIONAL log: class weights: 0.120441 0.110687 0.069550 0.060221 0.068702 0.060645 0.091179 0.067006 0.061493 0.072519 0.082697 0.067006 0.067854 ANTIPROPORTIONAL log: class weights: 0.046892 0.051024 0.081203 0.093784 0.082206 0.093128 0.061941 0.084287 0.091843 0.077879 0.068294 0.084287 0.083233 I assumed something was wrong because all the samples were shown as 1. But, I guess the not so apparent work of utilising FEATURE_WEIGHTS is done in the algorithms of openCV. My apologies. Please confirm if my understanding is correct or not. |
Beta Was this translation helpful? Give feedback.
-
|
Dear @paulvpop , the weights are used internally in the OpenCVs training functions to finetune the machine learning algorithm's behaviour. The sub-sampling is independent of this - subsampling controls which portion of the data will be presented to the learning algorithm. Best, |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
At the moment, there are three preset options for FEATURE_WEIGHTS doing force-train. This works well when the PERCENT_TRAIN parameter is 70, 80 or some other non-100 value. But, when the PERCENT_TRAIN = 100, i.e. 100 percentage of the sample is being used for training instead of keeping some for validation, EQUALIZED, PROPORTIONAL, and ANTIPROPORTIONAL all uses up all of the training data i.e. they all behave the same. This behaviour is not desirable. When 100% of the sample is being used, in the case of EQUALIZED, the same number of samples from each class should be taken where number is equal to the number of samples of the smallest class. In case of PROPORTIONAL, it should remain the same. In case of ANTIPROPORTIONAL, most samples should be taken from the least numerous classes. This could be done by assigning the number of samples of the least numerous class as the highest number of samples, and then assign the number of samples of other classes anti-proportionally. I am not sure what logic is currently been applied, but the same can likely to be adapted to meet the requirements when the PERCENT_TRAIN = 100. Requesting for this change.
Note that the validation procedure in my case is done separately with another validation dataset from points collected from the classified imagery; so, validation can only be done after model prediction. So, that's why I am using 100% of the samples for training.
Here's an example parameter file for force-train
Beta Was this translation helpful? Give feedback.
All reactions