Some thoughts on power spectrum calculations in `powerbox` and `tuesday`

I played a bit with the different options for computing the power spectrum in `powerbox` and `tuesday` and I have some comments/questions. I should say that I didn't go over the tutorials, so maybe some of my questions are answered there, but since not all users go over the tutorials, maybe it's actually a good thing that I'm reporting my thoughts without reading the tutorials :)

Before writing my comments below, I'd like to stress that I very much appreciate the work that the developers of `powerbox` and `tuesday` spent on these packages (@steven-murray, @DanielaBreitman), it is much easier to write comments on an existing code, rather than writing the code from scratch :)

Since I have many comments, I figured it would be better to make a single issue collecting all of them. We could then decide what to keep as "real" issues that we want to address, and what can be "closed".

## Comments on `powerbox`

1. The argument `bins` in `get_power` can have several meanings. While the doc-string explains clearly the meaning of this argument in the case it is `int` (number of k-bins) or an array (the k-bins edges), I still find this naming and logic a bit confusing. Why not to have explicitly `k_bins_edges` and `n_k_bins` and raise an error if both of them are not `None`? 
2. The logic in the code is that `log_bins` is ignored if `bins` is an array. While this logic makes perfect sense, it is not mentioned in the doc-string, which added a bit more to my confusion. I would mention this logic explicitly in the doc-string.
3. If a user does `output = get_power()`, then, according to the doc-string, `output[1]` corresponds to `meank`, which is “the bin-centres for the p_k array (in k). This is the mean k-value for cells in that bin”. However, this description is only correct if `bin_ave=False`! If `bin_ave=True`, then `output[1]` is actually `k_bins_edges`, but this is not mentioned at all in the doc-string. Besides of correcting the doc-string, I would let `get_power` to have both `k_bins_edges` and `k_centers` as (optional) outputs. That way, the user can understand clearly what were the k-bins and their centers in the calculation, without the need to "look under the hood”.
4. `n_modes` is another possible output of `return_sumweights`, but it does not mentioned in the doc-string!
5. I was a bit surprised to see that `powerbox` does not have an option to mimic the same k-binning logic that was in version 2 of `21cmFAST` (yes, I’m old!). For example, in my scripts I use `get_k_bins_edges` (see below) and then set `bins=get_k_bins_edges(box_len=BOX_LEN, dim=HII_DIM)`. I think this is equivalent to setting `log_bins=True` and then set `bins` to be the length of the output of my `get_k_bins_edges` (minus 1!), but I’m not sure. I would love to have a clarification on that.
6. I really like the `SphericalPS` and `CylindricalPS` classes of `tuesday`. I think that if `powerbox` had these classes, the output of `get_power` would be less confusing. While I understand that using these classes in `powerbox` breaks API, I think it’s worthwhile to consider having them in `powerbox`, rather than in `tuesday`.

```python
def get_k_bins_edges(box_len, dim, k_factor=1.5):
    k_min = 2 * np.pi / box_len
    k_max = np.pi / box_len * dim # Nyquist frequency
    # num_bins = how many k_factor steps fit between min and max
    num_bins = int(np.floor(np.log(k_max / k_min) / np.log(k_factor)))
    k_bins_edges = np.logspace(np.log10(k_min), np.log10(k_max), num_bins + 1)
    return k_bins_edges
```

## Comments on `tuesday`

1. My biggest issue with how things are currently implemented in `tuesday` is the lengthy doc-string and list of explicit optional arguments. The way I see it, `calculate_ps_lc` and `calculate_ps_coeval` should be **compact** and user-friendly wrappers to `powerbox.get_power`, such that if an inexperienced user needs to compute the power spectrum given a box/lightcone, they could do it very easily, without the need to understand each and every argument that `get_power` receives. Having said that, it is perfectly fine (and desireable!) to allow the users the flexibility to set manually all of their favorite `get_power` settings. However, in my opinion, there is no need to explicitly list all the parameters that `get_power` receives. Instead, we can use `**kwargs` and say in the doc-string that `kwargs` receives any optional keyword argument that goes to `power_box.get_power` (then, if the users are curious, they can read the doc-string of `get_power` by themselves).
2. It seems to me that all what `calculate_ps_coeval` does is essentially to call `calculate_ps`, except for the `mu_min` block at the beginning. In my opinion, the `mu_min` block should be found in `power_box.get_power` (as a new feature), not in `tuesday`. Then, I believe that we could only have `calculate_ps_box` (which will replace both `calculate_ps` and `calculate_ps_coeval`) and `calculate_ps_lc`, just to do the chunking. In my opinion, the call to `powerbox.get_power` should be done in only one function in `tuesday` (and preferably, only once in that function).
3. I personally don’t like the `calc_1d` and `calc_2d` arguments and their logic. I would rather call `tuesday` twice if I’m interested in computing both the spherical and cylindrical power spectra, instead of calling it once and specify whether I'n interested in computing both of them. If this design is agreed to be preferable, we could have a special keyword argument for `tuesday` called `kind` and let it have the options `”spherical”` and `”cylindrical”`. I think this is somewhat already controlled by the keyword argument `res_ndim` in `powerbox.get_power`, but this is not super user-friendly (which is okay, because `powerbox.get_power` is not aiming to be super user-friendly, unlike `tuesday` :). 
4. It seems that `tuesday` forces its users to explicitly provide the units of the input `box` and `box_length`. Why? Some users could be lazy and not necessarily want to use `astropy.units`. I think we should support the option that `box` and `box_length` are not accompanied by `astropy.units`. In that context, it seems that `box` must be of units of temperature, which is really limiting the usage: some users would like to compute power spectra of fields like over-density, SFRD, and other fields that don’t have units of temperature.
5. There is a mismatch between the names of the arguments in `powerbox` and `tuesday` (e.g. `deltax` vs. `box`, `boxlength` vs. `box_length`). Preferably, I think we should aim for these arguments to have the same name.
6. While the doc-string of `SphericalPS` clearly says that `SphericalPS.k` could be interpreted as either the bin centers or the bin edges, I still find it confusing. I think it would be better to have `k_bins_edges` and `k_centers` as fields for `SphericalPS`, instead of just `k`, which is less descriptive (see also my above 3rd comment for `powerbox`).
7. The default `k_weights` in `tuesday` is `ignore_zero_ki`, and not `ignore_zero_absk`. What is the reason for that? According to Gemini, it actually seems like `ignore_zero_absk` would be a better choice for the default (I paste below its response).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some thoughts on power spectrum calculations in `powerbox` and `tuesday` #78

Comments on `powerbox`

Comments on `tuesday`

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Some thoughts on power spectrum calculations in powerbox and tuesday #78

Description

Comments on powerbox

Comments on tuesday

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Some thoughts on power spectrum calculations in `powerbox` and `tuesday` #78

Comments on `powerbox`

Comments on `tuesday`