I played a bit with the different options for computing the power spectrum in powerbox and tuesday and I have some comments/questions. I should say that I didn't go over the tutorials, so maybe some of my questions are answered there, but since not all users go over the tutorials, maybe it's actually a good thing that I'm reporting my thoughts without reading the tutorials :)
Before writing my comments below, I'd like to stress that I very much appreciate the work that the developers of powerbox and tuesday spent on these packages (@steven-murray, @DanielaBreitman), it is much easier to write comments on an existing code, rather than writing the code from scratch :)
Since I have many comments, I figured it would be better to make a single issue collecting all of them. We could then decide what to keep as "real" issues that we want to address, and what can be "closed".
Comments on powerbox
- The argument
bins in get_power can have several meanings. While the doc-string explains clearly the meaning of this argument in the case it is int (number of k-bins) or an array (the k-bins edges), I still find this naming and logic a bit confusing. Why not to have explicitly k_bins_edges and n_k_bins and raise an error if both of them are not None?
- The logic in the code is that
log_bins is ignored if bins is an array. While this logic makes perfect sense, it is not mentioned in the doc-string, which added a bit more to my confusion. I would mention this logic explicitly in the doc-string.
- If a user does
output = get_power(), then, according to the doc-string, output[1] corresponds to meank, which is “the bin-centres for the p_k array (in k). This is the mean k-value for cells in that bin”. However, this description is only correct if bin_ave=False! If bin_ave=True, then output[1] is actually k_bins_edges, but this is not mentioned at all in the doc-string. Besides of correcting the doc-string, I would let get_power to have both k_bins_edges and k_centers as (optional) outputs. That way, the user can understand clearly what were the k-bins and their centers in the calculation, without the need to "look under the hood”.
n_modes is another possible output of return_sumweights, but it does not mentioned in the doc-string!
- I was a bit surprised to see that
powerbox does not have an option to mimic the same k-binning logic that was in version 2 of 21cmFAST (yes, I’m old!). For example, in my scripts I use get_k_bins_edges (see below) and then set bins=get_k_bins_edges(box_len=BOX_LEN, dim=HII_DIM). I think this is equivalent to setting log_bins=True and then set bins to be the length of the output of my get_k_bins_edges (minus 1!), but I’m not sure. I would love to have a clarification on that.
- I really like the
SphericalPS and CylindricalPS classes of tuesday. I think that if powerbox had these classes, the output of get_power would be less confusing. While I understand that using these classes in powerbox breaks API, I think it’s worthwhile to consider having them in powerbox, rather than in tuesday.
def get_k_bins_edges(box_len, dim, k_factor=1.5):
k_min = 2 * np.pi / box_len
k_max = np.pi / box_len * dim # Nyquist frequency
# num_bins = how many k_factor steps fit between min and max
num_bins = int(np.floor(np.log(k_max / k_min) / np.log(k_factor)))
k_bins_edges = np.logspace(np.log10(k_min), np.log10(k_max), num_bins + 1)
return k_bins_edges
Comments on tuesday
- My biggest issue with how things are currently implemented in
tuesday is the lengthy doc-string and list of explicit optional arguments. The way I see it, calculate_ps_lc and calculate_ps_coeval should be compact and user-friendly wrappers to powerbox.get_power, such that if an inexperienced user needs to compute the power spectrum given a box/lightcone, they could do it very easily, without the need to understand each and every argument that get_power receives. Having said that, it is perfectly fine (and desireable!) to allow the users the flexibility to set manually all of their favorite get_power settings. However, in my opinion, there is no need to explicitly list all the parameters that get_power receives. Instead, we can use **kwargs and say in the doc-string that kwargs receives any optional keyword argument that goes to power_box.get_power (then, if the users are curious, they can read the doc-string of get_power by themselves).
- It seems to me that all what
calculate_ps_coeval does is essentially to call calculate_ps, except for the mu_min block at the beginning. In my opinion, the mu_min block should be found in power_box.get_power (as a new feature), not in tuesday. Then, I believe that we could only have calculate_ps_box (which will replace both calculate_ps and calculate_ps_coeval) and calculate_ps_lc, just to do the chunking. In my opinion, the call to powerbox.get_power should be done in only one function in tuesday (and preferably, only once in that function).
- I personally don’t like the
calc_1d and calc_2d arguments and their logic. I would rather call tuesday twice if I’m interested in computing both the spherical and cylindrical power spectra, instead of calling it once and specify whether I'n interested in computing both of them. If this design is agreed to be preferable, we could have a special keyword argument for tuesday called kind and let it have the options ”spherical” and ”cylindrical”. I think this is somewhat already controlled by the keyword argument res_ndim in powerbox.get_power, but this is not super user-friendly (which is okay, because powerbox.get_power is not aiming to be super user-friendly, unlike tuesday :).
- It seems that
tuesday forces its users to explicitly provide the units of the input box and box_length. Why? Some users could be lazy and not necessarily want to use astropy.units. I think we should support the option that box and box_length are not accompanied by astropy.units. In that context, it seems that box must be of units of temperature, which is really limiting the usage: some users would like to compute power spectra of fields like over-density, SFRD, and other fields that don’t have units of temperature.
- There is a mismatch between the names of the arguments in
powerbox and tuesday (e.g. deltax vs. box, boxlength vs. box_length). Preferably, I think we should aim for these arguments to have the same name.
- While the doc-string of
SphericalPS clearly says that SphericalPS.k could be interpreted as either the bin centers or the bin edges, I still find it confusing. I think it would be better to have k_bins_edges and k_centers as fields for SphericalPS, instead of just k, which is less descriptive (see also my above 3rd comment for powerbox).
- The default
k_weights in tuesday is ignore_zero_ki, and not ignore_zero_absk. What is the reason for that? According to Gemini, it actually seems like ignore_zero_absk would be a better choice for the default (I paste below its response).
I played a bit with the different options for computing the power spectrum in
powerboxandtuesdayand I have some comments/questions. I should say that I didn't go over the tutorials, so maybe some of my questions are answered there, but since not all users go over the tutorials, maybe it's actually a good thing that I'm reporting my thoughts without reading the tutorials :)Before writing my comments below, I'd like to stress that I very much appreciate the work that the developers of
powerboxandtuesdayspent on these packages (@steven-murray, @DanielaBreitman), it is much easier to write comments on an existing code, rather than writing the code from scratch :)Since I have many comments, I figured it would be better to make a single issue collecting all of them. We could then decide what to keep as "real" issues that we want to address, and what can be "closed".
Comments on
powerboxbinsinget_powercan have several meanings. While the doc-string explains clearly the meaning of this argument in the case it isint(number of k-bins) or an array (the k-bins edges), I still find this naming and logic a bit confusing. Why not to have explicitlyk_bins_edgesandn_k_binsand raise an error if both of them are notNone?log_binsis ignored ifbinsis an array. While this logic makes perfect sense, it is not mentioned in the doc-string, which added a bit more to my confusion. I would mention this logic explicitly in the doc-string.output = get_power(), then, according to the doc-string,output[1]corresponds tomeank, which is “the bin-centres for the p_k array (in k). This is the mean k-value for cells in that bin”. However, this description is only correct ifbin_ave=False! Ifbin_ave=True, thenoutput[1]is actuallyk_bins_edges, but this is not mentioned at all in the doc-string. Besides of correcting the doc-string, I would letget_powerto have bothk_bins_edgesandk_centersas (optional) outputs. That way, the user can understand clearly what were the k-bins and their centers in the calculation, without the need to "look under the hood”.n_modesis another possible output ofreturn_sumweights, but it does not mentioned in the doc-string!powerboxdoes not have an option to mimic the same k-binning logic that was in version 2 of21cmFAST(yes, I’m old!). For example, in my scripts I useget_k_bins_edges(see below) and then setbins=get_k_bins_edges(box_len=BOX_LEN, dim=HII_DIM). I think this is equivalent to settinglog_bins=Trueand then setbinsto be the length of the output of myget_k_bins_edges(minus 1!), but I’m not sure. I would love to have a clarification on that.SphericalPSandCylindricalPSclasses oftuesday. I think that ifpowerboxhad these classes, the output ofget_powerwould be less confusing. While I understand that using these classes inpowerboxbreaks API, I think it’s worthwhile to consider having them inpowerbox, rather than intuesday.Comments on
tuesdaytuesdayis the lengthy doc-string and list of explicit optional arguments. The way I see it,calculate_ps_lcandcalculate_ps_coevalshould be compact and user-friendly wrappers topowerbox.get_power, such that if an inexperienced user needs to compute the power spectrum given a box/lightcone, they could do it very easily, without the need to understand each and every argument thatget_powerreceives. Having said that, it is perfectly fine (and desireable!) to allow the users the flexibility to set manually all of their favoriteget_powersettings. However, in my opinion, there is no need to explicitly list all the parameters thatget_powerreceives. Instead, we can use**kwargsand say in the doc-string thatkwargsreceives any optional keyword argument that goes topower_box.get_power(then, if the users are curious, they can read the doc-string ofget_powerby themselves).calculate_ps_coevaldoes is essentially to callcalculate_ps, except for themu_minblock at the beginning. In my opinion, themu_minblock should be found inpower_box.get_power(as a new feature), not intuesday. Then, I believe that we could only havecalculate_ps_box(which will replace bothcalculate_psandcalculate_ps_coeval) andcalculate_ps_lc, just to do the chunking. In my opinion, the call topowerbox.get_powershould be done in only one function intuesday(and preferably, only once in that function).calc_1dandcalc_2darguments and their logic. I would rather calltuesdaytwice if I’m interested in computing both the spherical and cylindrical power spectra, instead of calling it once and specify whether I'n interested in computing both of them. If this design is agreed to be preferable, we could have a special keyword argument fortuesdaycalledkindand let it have the options”spherical”and”cylindrical”. I think this is somewhat already controlled by the keyword argumentres_ndiminpowerbox.get_power, but this is not super user-friendly (which is okay, becausepowerbox.get_poweris not aiming to be super user-friendly, unliketuesday:).tuesdayforces its users to explicitly provide the units of the inputboxandbox_length. Why? Some users could be lazy and not necessarily want to useastropy.units. I think we should support the option thatboxandbox_lengthare not accompanied byastropy.units. In that context, it seems thatboxmust be of units of temperature, which is really limiting the usage: some users would like to compute power spectra of fields like over-density, SFRD, and other fields that don’t have units of temperature.powerboxandtuesday(e.g.deltaxvs.box,boxlengthvs.box_length). Preferably, I think we should aim for these arguments to have the same name.SphericalPSclearly says thatSphericalPS.kcould be interpreted as either the bin centers or the bin edges, I still find it confusing. I think it would be better to havek_bins_edgesandk_centersas fields forSphericalPS, instead of justk, which is less descriptive (see also my above 3rd comment forpowerbox).k_weightsintuesdayisignore_zero_ki, and notignore_zero_absk. What is the reason for that? According to Gemini, it actually seems likeignore_zero_abskwould be a better choice for the default (I paste below its response).