Skip to content

Origin/reweighting#185

Open
pittlerf wants to merge 92 commits into
masterfrom
origin/reweighting
Open

Origin/reweighting#185
pittlerf wants to merge 92 commits into
masterfrom
origin/reweighting

Conversation

@pittlerf

Copy link
Copy Markdown
Contributor

Hi,

I started to add functionality to able to handle reweighting for correlation functions.
I use 'cfrw_boot' label for correlation functions that have been reweighted.
These correlation function should not be resampled, I removed the 'cf_orig' label from them.
Next thing is to make this consistent with the other functions in hadron.

Comment thread R/readutils.R Outdated
Comment thread R/readutils.R Outdated
Comment thread R/rw.R Outdated
Comment thread man/addStat.cf.Rd Outdated
Comment thread R/rw.R
Comment thread R/cf.R Outdated
Comment thread R/cf.R Outdated
Comment thread R/cf.R Outdated
Comment thread R/cf.R Outdated
Comment thread R/cf.R Outdated
Comment thread R/cf.R Outdated
@urbach

urbach commented Apr 9, 2021

Copy link
Copy Markdown
Member

It's not totally clear to me how the reweighting is supposed to work here? I had thought that a function like bootstrap_and_rw.cf was sufficient, with a cf and reweighting factors as input. How does it work here?

Why is it not allowed to resample a reweighted cf?

@urbach

urbach commented Apr 9, 2021

Copy link
Copy Markdown
Member

devtools::check output:

❯ checking examples ... ERROR
  Running examples in ‘hadron-Ex.R’ failed
  The error most likely occurred in:
  
  > base::assign(".ptime", proc.time(), pos = "CheckExEnv")
  > ### Name: is_empty.rw
  > ### Title: Checks whether the cf object contains no data
  > ### Aliases: is_empty.rw
  > 
  > ### ** Examples
  > 
  > # The empty rw object must be empty:
  > is_empty.rw(rw())
  Error in is_empty.rw(rw()) : could not find function "is_empty.rw"
  Execution halted

❯ checking examples with --run-donttest ... ERROR
  Running examples in ‘hadron-Ex.R’ failed
  The error most likely occurred in:
  
  > base::assign(".ptime", proc.time(), pos = "CheckExEnv")
  > ### Name: is_empty.rw
  > ### Title: Checks whether the cf object contains no data
  > ### Aliases: is_empty.rw
  > 
  > ### ** Examples
  > 
  > # The empty rw object must be empty:
  > is_empty.rw(rw())
  Error in is_empty.rw(rw()) : could not find function "is_empty.rw"
  Execution halted

❯ checking for missing documentation entries ... WARNING
  Undocumented code objects:
    ‘rw_unit’ ‘samplerw’ ‘samplerw_inverse’
  Undocumented data sets:
    ‘samplerw’ ‘samplerw_inverse’
  All user-level objects in a package should have documentation entries.
  See chapter ‘Writing R documentation files’ in the ‘Writing R
  Extensions’ manual.

❯ checking Rd \usage sections ... WARNING
  Undocumented arguments in documentation object 'read.rw'
    ‘monomial_id’
  
  Undocumented arguments in documentation object 'rw_orig'
    ‘rw’

  Functions with \usage entries need to have the appropriate \alias
  entries, and all their arguments documented.
  The \usage entries must correspond to syntactically valid R code.
  See chapter ‘Writing R documentation files’ in the ‘Writing R
  Extensions’ manual.

❯ checking package dependencies ... NOTE
  Package suggested but not available for checking: ‘rhdf5’

❯ checking DESCRIPTION meta-information ... NOTE
  Package listed in more than one of Depends, Imports, Suggests, Enhances:
    ‘dplyr’
  A package should be listed in only one of these fields.

❯ checking R code for possible problems ... NOTE
  *.rw: no visible binding for global variable ‘cf1’
  *.rw: no visible binding for global variable ‘cf2’
  read.rw: no visible binding for global variable ‘monomialid’
  Undefined global functions or variables:
    cf1 cf2 monomialid

2 errors ✖ | 2 warnings ✖ | 3 notes ✖

@urbach

urbach commented Apr 9, 2021

Copy link
Copy Markdown
Member

fixed most of the check problems.

where can I find an example for this? I'm still not convinced all of this is needed...!?

@urbach

urbach commented Apr 9, 2021

Copy link
Copy Markdown
Member

this is left:

   read.rw: no visible binding for global variable ‘monomialid’
   Undefined global functions or variables:
     monomialid

which I don't understand yet.

@urbach

urbach commented Apr 9, 2021

Copy link
Copy Markdown
Member

also, the data object will mean we can no longer install for R < 3.5.0

     NB: this package now depends on R (>= 3.5.0)
     WARNING: Added dependency on R >= 3.5.0 because serialized objects in  serialize/load version 3 cannot be read in older versions of R.  File(s) containing such objects: ‘hadron/data/samplerw.RData’  ‘hadron/data/samplerw_inverse.RData’

@pittlerf

pittlerf commented Apr 9, 2021

Copy link
Copy Markdown
Contributor Author

fixed most of the check problems.

where can I find an example for this? I'm still not convinced all of this is needed...!?

Yes, the reading is actually quite format dependent. In the beta12 project I analysed just the output of tmLQCD for the reweighting factors: (that looked like the following):

00 00000 0.163251250000 0.163265000000 0.000000000000 0.000000000000 6.9715302949e+01
00 00001 0.163251250000 0.163265000000 0.000000000000 0.000000000000 6.9523762274e+01
00 00002 0.163251250000 0.163265000000 0.000000000000 0.000000000000 6.9776317102e+01
00 00003 0.163251250000 0.163265000000 0.000000000000 0.000000000000 6.9501797443e+01
00 00004 0.163251250000 0.163265000000 0.000000000000 0.000000000000 6.9453382954e+01

In the PLNG project I got the reweighting factor from Marco, entirely different format.

@urbach

urbach commented Apr 9, 2021 via email

Copy link
Copy Markdown
Member

@kostrzewa

Copy link
Copy Markdown
Member

It's not totally clear to me how the reweighting is supposed to work here? I had thought that a function like bootstrap_and_rw.cf was sufficient, with a cf and reweighting factors as input. How does it work here?

Why is it not allowed to resample a reweighted cf?

I understood this to originate from the fact that the normalisation needs to be recomputed (the average of the weights). In other words, the data and the reweighting factors both need to be resampled consistently and separately, such that for each bootstrap resample, the normalisation and the corresponding reweighted data can be generated.

There are of course ways to handle this: reweighted data could be stored unnormalised:

d^{rw}_i = d_i * w_i 

which can be resampled any way one wants. However, when the reweighted data (and resampling thereof) is used, the corresponding normalisations need to be available and correctly applied to the central value and bootstrap samples. In other words, w_i need to be resampled too, giving boot.R values for the normalisation factor. The normalisation factor for the central value is of course just sum_i w_i.

Does the above sound reasonable and describe correctly, why one can't "blindly" resample the reweighted data?

@kostrzewa

Copy link
Copy Markdown
Member

Does the above sound reasonable and describe correctly, why one can't "blindly" resample the reweighted data?

Let me add another qualifying remark: we also don't deal with just a single reweighting factor, but sequences of factors which move us along in parameter space. For this, some sort of solution was required (such as supporting the multiplication of two sets of reweighting factors to form a third).

@urbach

urbach commented Apr 9, 2021 via email

Copy link
Copy Markdown
Member

@urbach

urbach commented Apr 9, 2021 via email

Copy link
Copy Markdown
Member

@urbach

urbach commented Apr 11, 2021

Copy link
Copy Markdown
Member

fixed most of the check problems. > > where can I find an example for this? I'm still not convinced all of this is needed...!? Yes, the reading is actually quite format dependent. In the beta12 project I analysed just the output of tmLQCD for the reweighting factors: (that looked like the following): ## 00 00000 0.163251250000 0.163265000000 0.000000000000 0.000000000000 6.9715302949e+01 00 00001 0.163251250000 0.163265000000 0.000000000000 0.000000000000 6.9523762274e+01 00 00002 0.163251250000 0.163265000000 0.000000000000 0.000000000000 6.9776317102e+01 00 00003 0.163251250000 0.163265000000 0.000000000000 0.000000000000 6.9501797443e+01 00 00004 0.163251250000 0.163265000000 0.000000000000 0.000000000000 6.9453382954e+01 ## In the PLNG project I got the reweighting factor from Marco, entirely different format.
I had in mind an example of the whole thing working?

@pittlerf In other words, is there a rmarkdown file which explains how to use this? Are there some tests?

@pittlerf

Copy link
Copy Markdown
Contributor Author

fixed most of the check problems. > > where can I find an example for this? I'm still not convinced all of this is needed...!? Yes, the reading is actually quite format dependent. In the beta12 project I analysed just the output of tmLQCD for the reweighting factors: (that looked like the following): ## 00 00000 0.163251250000 0.163265000000 0.000000000000 0.000000000000 6.9715302949e+01 00 00001 0.163251250000 0.163265000000 0.000000000000 0.000000000000 6.9523762274e+01 00 00002 0.163251250000 0.163265000000 0.000000000000 0.000000000000 6.9776317102e+01 00 00003 0.163251250000 0.163265000000 0.000000000000 0.000000000000 6.9501797443e+01 00 00004 0.163251250000 0.163265000000 0.000000000000 0.000000000000 6.9453382954e+01 ## In the PLNG project I got the reweighting factor from Marco, entirely different format.
I had in mind an example of the whole thing working?

@pittlerf In other words, is there a rmarkdown file which explains how to use this? Are there some tests?

Hi @urbach, I uploaded a how-to use cheat sheet in rmarkdown.

@urbach

urbach commented Apr 13, 2021

Copy link
Copy Markdown
Member

thanks.
There are still changes requested...

@urbach

urbach commented Apr 27, 2021

Copy link
Copy Markdown
Member

check(cran=TRUE) gives

❯ checking package dependencies ... NOTE
  Package suggested but not available for checking: ‘rhdf5’

❯ checking R code for possible problems ... NOTE
  read.rw: no visible binding for global variable ‘monomialid’
  Undefined global functions or variables:
    monomialid

❯ checking Rd line widths ... NOTE
  Rd file 'rw_orig.Rd':
    \examples lines wider than 100 characters:
       rw_factor <- rw_orig( rw=rw_data, conf.index=seq(1,20), max_value= max(rw_data),stochastic_error=rep(0,20))
  
  These lines will be truncated in the PDF manual.

thanks!

@urbach

urbach commented Apr 27, 2021

Copy link
Copy Markdown
Member

The comment on rhdf5 is on my side...

@urbach

urbach commented May 26, 2021

Copy link
Copy Markdown
Member

hmm?

@pittlerf

Copy link
Copy Markdown
Contributor Author

hmm?

ah, sorry I will do it now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants