Skip to content

new PUF files#2925

Merged
jdebacker merged 11 commits into
PSLmodels:masterfrom
bodiyang:puf_files
Sep 8, 2025
Merged

new PUF files#2925
jdebacker merged 11 commits into
PSLmodels:masterfrom
bodiyang:puf_files

Conversation

@bodiyang

@bodiyang bodiyang commented Jul 3, 2025

Copy link
Copy Markdown
Contributor

This PR updates puf_weights.csv.gz and puf_ratios.csv files.

The puf.csv file produced by the taxdata repository needs to be matched with the corresponding puf weights and ratios files.

@jdebacker

@bodiyang bodiyang marked this pull request as draft July 3, 2025 15:51
@codecov

codecov Bot commented Jul 3, 2025

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (8b89bb4) to head (c066a83).
⚠️ Report is 12 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff            @@
##            master     #2925   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           13        13           
  Lines         2662      2662           
=========================================
  Hits          2662      2662           
Flag Coverage Δ
unittests 100.00% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@martinholmer

Copy link
Copy Markdown
Collaborator

@bodiyang said in draft PR #2925:

This PR updates puf_weights.csv.gz and puf_ratios.csv files.

WHY? Please provide a rationale for the changes you are proposing.

@bodiyang

bodiyang commented Jul 7, 2025

Copy link
Copy Markdown
Contributor Author

@bodiyang said in draft PR #2925:

This PR updates puf_weights.csv.gz and puf_ratios.csv files.

WHY? Please provide a rationale for the changes you are proposing.

Recently noticed that Tax-Data producing a different puf.csv file, than the original one.
~ this was noticed recently, since there was no logic change in the script producing puf.csv and puf.csv was not produced regularly. From the commit history, there has been only one PR in 2023 to sort the column order of the puf file. The reason why puf.csv is produced differently is not clear yet. (maybe because of the PR which reshuffles the order of tax units, or changes in the dependency packages, or sth else).

Therefore, the corresponding weights and ratios files need to be updated to match the new puf.csv. I have run the new puf.csv along with the new puf_weights.csv.gz and puf_ratios.csv in the PR in Tax-Calc, and the new files should produce correct results on tax liability.

I have shared this with @jdebacker , who confirmed this on his local machine.

I am still in investigation on this issue. ~ to see if this issue only exists on our two machines or we should make an update in the model.

@martinholmer If you want to have a check on your machine:
Firstly, (1) In Tax-Data folder, activate taxdata-dev ; (2) Run 'python createpuf.py' to produce puf.csv (3) Update puf.csv into Tax-Calc and run some calculation checks on tax liability. --> This should produce an incorrect tax liability calculation.

Secondly, (1) run 'make all' in Tax-Data to produce new weights and ratios files, (2) Update these files into Tax-Calc and run tax liability calculation. --> This should produce a correct result

@bodiyang

bodiyang commented Jul 25, 2025

Copy link
Copy Markdown
Contributor Author

note: I will proceed with this PR, after the implementation of OBBB into the model.

This is to make it clear that 5.1 & 5.2 reflect OBBBA, then 5.3 will reflect this PR to update the PUF files.

@martinholmer

Copy link
Copy Markdown
Collaborator

@bodiyang, Yes, I agree that it is best to wait on PR #2925 until the OBBBA work is finished.
Thanks for clarifying this.

@martinholmer

Copy link
Copy Markdown
Collaborator

@bodiyang, Can you or Jason make available the new puf.csv file that is generating the new test results?

@bodiyang

bodiyang commented Aug 18, 2025

Copy link
Copy Markdown
Contributor Author

@martinholmer Do you have 2011 raw puf? if so, can run taxdata to produce puf.csv; if not, I can share you from a dropbox, what's your email address?

@PSLmodels PSLmodels deleted a comment from bodiyang Aug 18, 2025
@martinholmer

Copy link
Copy Markdown
Collaborator

@bodiyang, Thanks for sharing the new puf.csv file. Using it on my computer with PR #2925, I was able to pass all the tests executed by the make pytest-all command.

@martinholmer

Copy link
Copy Markdown
Collaborator

@bodiyang, Are you planning any other changes in PR #2925?
If not, is there a reason for it to be marked as a draft?

@bodiyang

bodiyang commented Aug 25, 2025

Copy link
Copy Markdown
Contributor Author

@martinholmer Was doing some additional checks locally to reconfirm that the old and new PUF file tax liability calculations are the same.

I will merge the new updates into this PR. Then it will be ready for review @jdebacker

In next version 5.3.0 release info, would recommend to mark out that PUF users are required to produce new puf.csv file from taxdata.

correct usage:
Taxcalc <= 5.2.0 + old puf.csv(produced by TaxData versions before Sep 2025)
Taxcalc >= 5.3.0 + new puf.csv (produced by TaxData versions after Sep 2025)

@bodiyang bodiyang marked this pull request as ready for review August 25, 2025 14:04
@bodiyang

Copy link
Copy Markdown
Contributor Author

all tests pass

@martinholmer martinholmer requested a review from jdebacker August 27, 2025 18:35
@jdebacker

Copy link
Copy Markdown
Member

Changes seem reasonable to me. I was unable to use files generated from TaxData PR #452 and pass the test here. But it could be something I did wrong in the process of creating the new weights and ratios files.

@martinholmer it sounds like you can confirm the results here. If so, please feel free to merge this PR.

@martinholmer

martinholmer commented Aug 28, 2025

Copy link
Copy Markdown
Collaborator

@jdebacker said in his review of PR #2925:

Changes seem reasonable to me. I was unable to use files generated from TaxData PSLmodels/taxdata#452 and pass the test here. But it could be something I did wrong in the process of creating the new weights and ratios files.

@martinholmer it sounds like you can confirm the results here. If so, please feel free to merge this PR.

Let's start with the final statement. No, I cannot "confirm the results here" because, as the above discussion shows, I did not check that I could produce the same results as @bodiyang gets in TaxData PR PSLmodels/taxdata#452. All I did is get the new puf.csv file he generated from him, install it in Tax-Calculator PR #2925 on my computer, and confirm that all the Tax-Calculator tests pass (using make pytest-all).

Until at least two people can independently generate on their computers the results in TaxData PR PSLmodels/taxdata#452, then I think this Tax-Calculator PR is premature and should be closed.

@jdebacker, I don't agree with your statement that these "Changes seem reasonable to me" because the reason for TaxData PR PSLmodels/taxdata#452 is a mystery (as the discussion in 452 makes clear). You and @bodiyang getting different TaxData results on your computers suggests that something is amiss with your local TaxData installations, or that you used different taxcalc packages, or something else.

@bodiyang

bodiyang commented Aug 28, 2025

Copy link
Copy Markdown
Contributor Author

@martinholmer I had meetings with @jdebacker and both of us can confirm the necessity of taxdata PR 452. Reason is as documented in this conversation. Tax-Data needs to be updated in order to produce correct PUF files. Then Tax-Calc will be updated to reflect such changes.

Will Zoom meet with @jdebacker at a later time to discuss the issue met when producing PUF files.
Will reopen the PR after both can confirm to generate correct files from taxdata PR 452

@martinholmer

Copy link
Copy Markdown
Collaborator

@bodiyang said in closed PR #2925:

I had meetings with @jdebacker and both of us can confirm the necessity of PSLmodels/taxdata#452. Reason is as documented in #2925 (comment)

But in that discussion you say:

The reason why puf.csv is produced differently is not clear yet.

My point is that it is not good practice to undertake changes without knowing why those changes are necessary.

@bodiyang

bodiyang commented Aug 28, 2025

Copy link
Copy Markdown
Contributor Author

@bodiyang said in closed PR #2925:

I had meetings with @jdebacker and both of us can confirm the necessity of PSLmodels/taxdata#452. Reason is as documented in #2925 (comment)

But in that discussion you say:

The reason why puf.csv is produced differently is not clear yet.

My point is that it is not good practice to undertake changes without knowing why those changes are necessary.

I see, probably will expand a bit more on the last comment.

In a brief word, it is confirmed that this problem exists and it's necessary to be fixed (agreed by @jdebacker). While the reason for this problem is not certain (I have some speculations).

(1) Current version of Tax-Data does not correctly produce PUF files. (puf.csv, puf_weights.csv, puf_ratios.csv). So it needs to fixed.

If we use the puf.csv produced by the current version of Tax-Data, we will see out of range tax liability calculation from Tax-Calc. (iitax is $ 50,000+ billion; while the correct amount should be around $2,500 billion)

This mistake needs to be fixed by taxdata PR 452 and taxcalc PR 2925

(2) For the reason of this problem, it is hard to check, because the related code to produce puf.csv was not actively used in the past years.

My speculation is that there have been some changes in the software packages Tax-Data relies on (it might come from the packages related to optimization functions). The order of tax unit in puf.csv get reshuffled and does not match with the order of tax unit in weights and ratio files. So each tax unit is matching with a wrong weight, and that might be why the tax liability calculation goes wrong.
This issue might be (indirectly) related.

@martinholmer

Copy link
Copy Markdown
Collaborator

@bodiyang said in PR #2925:

Will Zoom meet with @jdebacker at a later time to discuss the issue met when producing PUF files [that were different on @bodiyang and @jdebacker computers].

I suggest that the two of you should discuss another issue as well.

Open PR #2538 wants to eliminate the presence of all PUF-related files from the Tax-Calculator repository. When the TMD data became first available, @jdebacker pointed to PR #2538 saying I should not include any TMD-related files in the Tax-Calculator repository. So that is what I did: there is a small amount of TMD-related code in Tax-Calculator that makes it easier for users who want to use TMD data, but there are no TMD data files in the Tax-Calculator repository.

So, I think the two of you should discuss whether or not the same approach should be taken with PUF-related data files (that is, remove all the PUF-related data files from the Tax-Calculator repository). In other words, quit working on PR #2925 and finish PR #2538.

My personal opinion is that there is real value in including the publicly-available CPS data files in the Tax-Calculator repository.

@jdebacker jdebacker reopened this Sep 8, 2025
@jdebacker

Copy link
Copy Markdown
Member

@bodiyang I am not able to replicate your results here.

I appreciate @martinholmer's comments regarding the relation between this PR and PR #2538. I am in agreement with him that it would be useful to keep the publicly available CPS, but remove the PUF files from Tax-Calculator.

But my suggestion is we do that in a future PR. Let's get these files updated so users get correct results. Then we can refactor the TaxData package to output the necessary files in a single location (like the TMD package). At that point, let's come back to Tax-Calculator and remove PUF-related files.

@jdebacker jdebacker merged commit b441c7f into PSLmodels:master Sep 8, 2025
16 checks passed
@bodiyang bodiyang deleted the puf_files branch September 9, 2025 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants