We started with equally spaced length bins throughout the range of observed lengths across fishery and survey data for each species. This made inefficient use of data (smallest and largest bins often missing data).
Options discussed in January 2023 include
- Start from min observed, not 0 for bin 1
- Combine bins 1-2? Finer structure thereafter? Bigger last bin? Plus group
A new bin definition algorithm implemented in hydradata first calculates quantiles for each species based on all input lengths aggregated over time. The current implementation uses the smaller of the survey or fishery 10%ile as the minimum size for bin width definition, and the larger of the survey or fishery 90%ile as the maximum size for bin width definition. Equal bin widths within this range are calculated, and then the first and last bin are extended to include 0 for the smallest and the max observed length for the largest bin.
A visualization of bin definitions (black vertical lines) for each species and aggregate dataset is below, based on the current (January 2023) mskeyrun Georges Bank dataset and 5 length bins:


Thoughts?
We started with equally spaced length bins throughout the range of observed lengths across fishery and survey data for each species. This made inefficient use of data (smallest and largest bins often missing data).
Options discussed in January 2023 include
A new bin definition algorithm implemented in hydradata first calculates quantiles for each species based on all input lengths aggregated over time. The current implementation uses the smaller of the survey or fishery 10%ile as the minimum size for bin width definition, and the larger of the survey or fishery 90%ile as the maximum size for bin width definition. Equal bin widths within this range are calculated, and then the first and last bin are extended to include 0 for the smallest and the max observed length for the largest bin.
A visualization of bin definitions (black vertical lines) for each species and aggregate dataset is below, based on the current (January 2023) mskeyrun Georges Bank dataset and 5 length bins:
Thoughts?