To be released as 0.7.0
have overhauled augment() for general consistency improvements (hopefully, pending getting safepredict() going urgh)
-
If you pass a dataset to
augment()via thedataornewdataarguments, you are now guaranteed that the augmented dataset will have exactly the same number of rows as the original dataset. This differs from previous behavior primarily when there are missing values. Previouslyaugment()would drop rows containingNA. This should no longer be the case. -
augment()no longer accepts anna.actionargument -
We no longer cram everything through
augment.lm()and it has subsequently losts a lot of arguments that were needed when it was a frankenstein do everything function -
augment()tries to give an informative error whendataisn't the original training data
Previously the df column in glance reported the rank of the design matrix. Now it reports degrees of freedom of the numerator for the overall F-statistic. This is always equals equal to the rank of the model matrix minus one, so the new df will always be the old df minus one.
TODO: sort out what happens to glance.aov()
- We now use
rlang::arg_match()when possible instead ofarg.match()to give more informative errors on argument mismatches. - Moved core tests to the
modeltestspackage - Added new vignette detailing use of
modelgenericsandmodeltestspackages - Added
dataargument toaugment()generic (did this happen?)
- All
conf.intarguments now default toFALSE. This primarily effectstidy.survreg(), which previously always returned confidence intervals. - All
conf.levelarguments now default to0.95. tidy.lsmobj()gained aconf.intargument
- Added
tidy.regsubsets()for best subsets linear regression from theleapspackage - Added method
tidy.lm.beta()to tidylm.betaclass models (#545 by @mattle24) tidy.kmeans()now uses the names of the input variables in the output by default. Setcol.names = NULLto recover the old behavior.tidy_optim()now returns the standard error provides the standard error if the Hessian is present. (#529 by @billdenney) (TODO: think about this)glance.biglm()now returns adf.residualcolumntidy.htest()column names are now run throughmake.names()to ensure syntactic correctness (#549 by @karissawhiting) (TODO: use tidyverse name repair?)- Many
glance()methods now return the number of observations in anobscolumn, which is typically the rightmost column. tidy.lmodel2()now returns ap.valuecolumn (#570)
augment.htest():.residuals->.resid.stdres->.std.resid- These changes only effect chi-squared tests
tidy.ridgelm()will now always return aGCVcolumn and never returns anxmcolumn (#532)
- Bug fix to allow
augment.Mclust()to work on univariate data (#490) - Bug fix to allow
tidy.htest()to supports equal variances (#608) - Bug fix for
tidy.mlm()when passedquick = TRUE(#539 by @MatthieuStigler) - Bug fix for
tidy.polr()when passedconf.int = TRUE(#498) - Bug fix in
glance.lavaan()(#577)
Planned
- Data frame, rowwise data frame, vector and matrix tidiers have been removed
bootstrap()has been removed
Unplanned
The following tidiers have been removed from broom but were not soft deprecated in the previous release:
glance.summary.lm()augment.glmRob()
We regret that we were unable to provide any warning for these changes.
The robust package does not provide the functionality necessary to implement an augment method. We are looking into supporting the robustbase package in the future.
The following have all been deprecated in favor of broom.mixed:
tidy.brmsfit()tidy.merMod(),glance.merMod(),augment.merMod()tidyMCMC(),tidy.rjags(),tidy.stanfit()tidy.lme(),glance.lme(),augment.lme()tidy.stanreg(),glance.stanreg()
tidy.table()andtidy.ftable()have been deprecated in favor oftibble::as_tibble()tidy.summaryDefault()has been deprecated in favor ofskimr::skim()
tidy(),glance()andaugment()are now re-exported from the generics package.
Tidiers now return tibble::tibble()s. This release also includes several new tidiers, new vignettes and a large number of bugfixes. We've also begun to more rigorously define tidier specifications: we've laid part of the groundwork for stricter and more consistent tidying, but the new tidier specifications are not yet complete. These will appear in the next release.
Additionally, users should note that we are in the process of migrating tidying methods for mixed models and Bayesian models to broom.mixed. broom.mixed is not on CRAN yet, but all mixed model and Bayesian tidiers will be deprecated once broom.mixed is on CRAN. No further development of mixed model tidiers will take place in broom.
Almost all tidiers should now return tibbles rather than data.frames. Deprecated tidying methods, Bayesian and mixed model tidiers still return data.frames.
Users are mostly to experience issues when using augment in situations where tibbles are stricter than data frames. For example, specifying model covariates as a matrix object will now error:
library(broom)
library(quantreg)
fit <- rq(stack.loss ~ stack.x, tau = .5)
broom::augment(fit)
#> Error: Column `stack.x` must be a 1d atomic vector or a listThis is because the default data argument data = model.frame(fit) cannot be coerced to tibble.
Another consequence of this is that augment.survreg and augment.coxph from the survival package now require that the user explicitly passes data to either the data or newdata arguments.
These restrictions will be relaxed in an upcoming release of broom pending support for matrix-columns in tibbles.
Developers are likely to experience issues:
- subsetting tibbles with
[, which returns a tibble rather than a vector. - setting rownames on tibbles, which is deprecated.
- using matrix and vector tidiers, now deprecated.
- handling the additional tibble classes
tbl_dfandtblbeyond thedata.frameclass - linking to defunct documentation files -- broom recently moved all tidiers to a
roxygen2template based documentation system.
This version of broom includes several new vignettes:
vignette("available-methods", package = "broom")contains a table detailing which tidying methods are availablevignette("adding-tidiers", package = "broom")is an in-progress guide for contributors on how to add new tidiers to broomvignette("glossary", package = "broom")contains tables describing acceptable argument names and column names for the in-progress new specification.
Several old vignettes have also been updated:
vignette("bootstrapping", package = "broom")now relies on thersamplepackage and atidyr::nest-purrr::map-tidyr::unnestworkflow. This is now the recommended workflow for working with multiple models, as opposed to the olddplyr::rowwise-dplyr::dobased workflow.
- Matrix and vector tidiers have been deprecated in favor of
tibble::as_tibbleandtibble::enframe - Dataframe tidiers and rowwise dataframe tidiers have been deprecated
bootstrap()has been deprecated in favor of thersampleinflatehas been removed frombroom
- The
alphaargument has been removed fromquantregtidy methods - The
separate.levelsargument has been removed fromtidy.TukeyHSD. To obtain the effect ofseparate.levels = TRUE, users maytidyr::separateafter tidying. This is consistent with themultcomptidier behavior. - The
fe.errorargument was removed fromtidy.felm. When fixed effects are tidier, their standard errors are now always included. - The
diagargument intidy.disthas been renameddiagonal - Advice to help beginners make PRs (#397 by @karldw)
glancesupport forarimaobjects fit withmethod = "CSS"(#396 by @josue-rodriguez)- A bug fix to re-enable tidying
glmnetobjects withfamily = multinomial(#395 by @erleholgersen) - A bug fix to allow tidying
quantregintercept only models (#378 by @erleholgersen) - A bug fix for
aovlistobjects (#377 by @mvevans89) - Support for
glmnetUtilsobjects (#352 by @Hong-Revo) - A bug fix to allow
tidy_emmeansto handle column names with dashes (#351 by @bmannakee) augment.felmno longer returns.fe_and.compcolumns- Support saved formulas in
augment.felm(#347 by @ShreyasSingh) confint_tidynow drops rows of allNA(#345 by @atyre2)- A new tidier for
caret::confusionMatrixobjects (#344 by @mkuehn10) - Tidiers for
Kendall::Kendallobjects (#343 by @cimentadaj) - A new tidying method for
car::durbinWatsonTestobjects (#341 by @mkuehn10) glancethrows an informative error forquantreg:rqmodels fit with multipletauvalues (#338 by @bfgray3)tidy.glmnetgains the ability to retain zero-valued coefficients with areturn_zerosargument that defaults toFALSE(#337 by @bfgray3)tidy.manovanow retains aResidualsrow (#334 by @jarvisc1)- Tidiers for
ordinal::clm,ordinal::clmm,survey::svyolrandMASS::polrordinal model objects (#332 by @larmarange) - Support for
anovaobjects fromcar::Anova(#325 by @mariusbarth) - Tidiers for
tseries::garchmodels (#323 by @wilsonfreitas) - Removed dependency on
psychpackage (#313 by @nutterb) - Improved error messages (#303 by @michaelweylandt)
- Compatibility with new
rstanarmandloopackages (#298 by @jgabry) - Support for tidying lists return by
irlba::irlba - A truly huge increase in unit tests (#267 by @dchiu911)
- Bug fix for
tidy.prcompwhen missing labels (#265 by @corybrunson) - Added a
pkgdownsite at https://broom.tidyverse.org/ (#260 by @jayhesselberth) - Added tidiers for
AER::ivregmodels (#247 by @hughjonesd) - Added tidiers for the
lavaanpackage (#233 by @puterleat) - Added
conf.intargument totidy.coxph(#220 by @larmarange) - Added
augmentmethod for chi-squared tests (#138 by @larmarange) - changed default se.type for
tidy.rqto match that ofquantreg::summary.rq()(#404 by @ethchr) - Added argument
quickfortidy.plmandtidy.felm(#502 and #509 by @MatthieuStigler) - Many small improvements throughout
Many many thanks to all the following for their thoughtful comments on design, bug reports and PRs! The community of broom contributors has been kind, supportive and insightful and I look forward to working you all again!
@atyre2, @batpigandme, @bfgray3, @bmannakee, @briatte, @cawoodjm, @cimentadaj, @dan87134, @dgrtwo, @dmenne, @ekatko1, @ellessenne, @erleholgersen, @ethchr, @Hong-Revo, @huftis, @IndrajeetPatil, @jacob-long, @jarvisc1, @jenzopr, @jgabry, @jimhester, @josue-rodriguez, @karldw, @kfeilich, @larmarange, @lboller, @mariusbarth, @michaelweylandt, @mine-cetinkaya-rundel, @mkuehn10, @mvevans89, @nutterb, @ShreyasSingh, @stephlocke, @strengejacke, @topepo, @willbowditch, @WillemSleegers, @wilsonfreitas, and @MatthieuStigler
- Fixed gam tidiers to work with "Gam" objects, due to an update in gam 1.15. This fixes failing CRAN tests
- Improved test coverage (thanks to #267 from Derek Chiu)
- Changed the deprecated
dplyr::failwithtopurrr::possibly augmentandglanceon NULLs now return an empty data frame- Deprecated the
inflate()function in favor oftidyr::crossing - Fixed confidence intervals in the gmm tidier (thanks to #242 from David Hugh-Jones)
- Fixed a bug in bootstrap tidiers (thanks to #167 from Jeremy Biesanz)
- Fixed tidy.lm with
quick = TRUEto return terms as character rather than factor (thanks to #191 from Matteo Sostero) - Added tidiers for
ivregobjects from the AER package (thanks to #245 from David Hugh-Jones) - Added tidiers for
survdiffobjects from the survival package (thanks to #147 from Michał Bojanowski) - Added tidiers for
emmeansfrom the emmeans package (thanks to #252 from Matthew Kay) - Added tidiers for
speedlmandspeedglmfrom the speedglm package (thanks to #248 from David Hugh-Jones) - Added tidiers for
muhazobjects from the muhaz package (thanks to #251 from Andreas Bender) - Added tidiers for
decomposeandstlobjects from stats (thanks to #165 from Aaron Jacobs)
- Added tidiers for
lsmobjandref.gridobjects from the lsmeans package - Added tidiers for
betaregobjects from the betareg package - Added tidiers for
lmRobandglmRobobjects from the robust package - Added tidiers for
brmsobjects from the brms package (thanks to #149 from Paul Buerkner) - Fixed tidiers for orcutt 2.0
- Changed
tidy.glmnetto filter out rows where estimate == 0. - Updates to
rstanarmtidiers (thanks to #177 from Jonah Gabry) - Fixed issue with survival package 2.40-1 (thanks to #180 from Marcus Walz)
- Added AppVeyor, codecov.io, and code of conduct
- Changed name of "NA's" column in summaryDefault output to "na"
- Fixed
tidy.TukeyHSDto includetermcolumn. Also addedseparate.levelsargument, with option to separatecomparisonintolevel1andlevel2 - Fixed
tidy.manovato use correct column name for test (previously, alwayspillai) - Added
kde_tidiersto tidy kernel density estimates - Added
orcutt_tidiersto tidy the results ofcochrane.orcuttorcutt package - Added
tidy.distto tidy the distance matrix output ofdistfrom the stats package - Added
tidyandglanceforlmodel2objects from the lmodel2 package - Added tidiers for
poLCAobjects from the poLCA package - Added tidiers for sparse matrices from the Matrix package
- Added tidiers for
prcompobjects - Added tidiers for
Mclustobjects from the Mclust package - Added tidiers for
acfobjects - Fixed to be compatible with dplyr 0.5, which is being submitted to CRAN
- Added tidiers for geeglm, nlrq, roc, boot, bgterm, kappa, binWidth, binDesign, rcorr, stanfit, rjags, gamlss, and mle2 objects.
- Added
tidymethods for lists, including u, d, v lists fromsvd, and x, y, z lists used byimageandpersp - Added
quickargument totidy.lm,tidy.nls, andtidy.biglm, to create a smaller and faster version of the output. - Changed
rowwise_df_tidiersto allow the original data to be saved as a list column, then provided as a column name toaugment. This required removingdatafrom theaugmentS3 signature. Also addedtests-rowwise.R - Fixed various issues in ANOVA output
- Fixed various issues in lme4 output
- Fixed issues in tests caused by dev version of ggplot2
- Added tidiers for "plm" (panel linear model) objects from the plm package.
- Added
tidy.coeftestfor coeftest objects from the lmtest package. - Set up
tidy.lmto work with "mlm" (multiple linear model) objects (those with multiple response columns). - Added
tidyandglancefor "biglm" and "bigglm" objects from the biglm package. - Fixed bug in
tidy.coxphwhen one-row matrices are returned - Added
tidy.power.htest - Added
tidyandglanceforsummaryDefaultobjects - Added tidiers for "lme" (linear mixed effects models) from the nlme package
- Added
tidyandglanceformultinomobjects from the nnet package.
- Fixed bug in
tidy.pairwise.htest, which now can handle cases where the grouping variable is numeric. - Added
tidy.aovlistmethod. This addedstringrpackage to IMPORTS to trim whitespace from the beginning and end of thetermandstratumcolumns. This also required adjustingtidy.aovso that it could handle strata that are missing p-values. - Set up
glance.lmto work withaovobjects along withlmobjects. - Added
tidyandglancefor matrix objects, withtidy.matrixconverting a matrix to a data frame with rownames included, andglance.matrixreturning the same result asglance.data.frame. - Changed DESCRIPTION Authors@R to new format
- Fixed small bug in
felmwhere the.fittedand.residcolumns were matrices rather than vectors. - Added tidiers for
rlm(robust linear model) andgam(generalized additive model) objects, including adjustments to "lm" tidiers in order to handle them. See?rlm_tidiersand?gam_tidiersfor more. - Removed rownames from
tidy.cv.glmnetoutput
-
The behavior of
augment, particularly with regard to missing data and thena.excludeargument, has through the use of theaugment_columnsfunction been made consistent across the following models:lmglmnlsmerMod(lme4)survreg(survival)coxph(survival)
Unit tests in
tests/testthat/test-augment.Rwere added to ensure consistency across these models. -
tidy,augmentandglancemethods were added forrowwise_dfobjects, and are set up to apply across their rows. This allows for simple patterns such as:regressions <- mtcars %>% group_by(cyl) %>% do(mod = lm(mpg ~ wt, .)) regressions %>% tidy(mod) regressions %>% augment(mod)See
?rowwise_df_tidiersfor more. -
Added
tidyandglancemethods forArimaobjects, andtidyforpairwise.htestobjects. -
Fixes for CRAN: change package description to title case, removed NOTES, mostly by adding
globals.Rto declare global variables. -
This is the original version published on CRAN.
- Tidiers have been added for S3 objects from the following packages:
lme4glmnetsurvivalzoofelmMASS(ridgelmobjects)
tidyandglancemethods for data.frames have also been added, andaugment.data.frameproduces an error (rather than returning the same data.frame).stderrorhas been changed tostd.error(affects many functions) to be consistent with broom's naming conventions for columns.- A function
bootstraphas been added based on this example, to perform the common use case of bootstrapping models.
- Added "augment" S3 generic and various implementations. "augment" does something different from tidy: it adds columns to the original dataset, including predictions, residuals, or cluster assignments. This was originally described as "fortify" in ggplot2.
- Added "glance" S3 generic and various implementations. "glance" produces a one-row data frame summary, which is necessary for tidy outputs with values like R^2 or F-statistics.
- Re-wrote intro broom vignette/README to introduce all three methods.
- Wrote a new kmeans vignette.
- Added tidying methods for multcomp, sp, and map objects (from fortify-multcomp, fortify-sp, and fortify-map from ggplot2).
- Because this integrates substantial amounts of ggplot2 code (with permission), added Hadley Wickham as an author in DESCRIPTION.