@martinholmer Now that the areas optimization seems to be working extremely well (thank you), should we test out this method as a replacement for the current stochastic gradient descent method?
Testing this out could have several benefits:
- Assuming it works well (I suspect it will), we will be a lot closer to having machine-independent 100% replicable results. (We aren't quite there with areas yet - it looks like the minimization on my machine is performing more iterations and obtaining a smaller objective function than on your machine. That's worth exploring.)
- Again, assuming it works well, it appears to be converging much more completely to good solutions, in much faster time, than SGD.
- It has a regularization term, which I think is very important to having plausible results for the nation (perhaps even more important for the nation than for subnational areas).
- It should give us insights into how well the areas approach will work with real-world problems. The national optimization problem, with hundreds of targets, is far larger than the test problems we've been solving for faux areas, and likely to be considerably larger than real area problems. If something's going to go wrong, it would be nice to know early, and if we encounter problems applying the method to the nation, it's an indication that we might run into problems for subnational areas. (Note, however, that national problems probably will be easier than area problems in other ways. We use national reweighting to calibrate to published IRS data and, in some instances, to move from one year to the next. Differences from national sample-based totals to national published totals and from national totals one year to totals in the next are not likely to be as large as differences from national averages to averages for subnational areas. Still, I expect testing the method for the nation will yield insights.)
@martinholmer Now that the areas optimization seems to be working extremely well (thank you), should we test out this method as a replacement for the current stochastic gradient descent method?
Testing this out could have several benefits: