Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Depends: R (>= 3.1.0), qdapDictionaries (>= 1.0.2), qdapRegex (>= 0.1.2), qdapTo
Imports: chron, dplyr (>= 0.3), gdata, gender (>= 0.5.1), ggplot2 (>= 2.1.0), grid, gridExtra,
igraph, methods, NLP, openNLP (>= 0.2-1), parallel, plotrix, RCurl, reports,
reshape2, scales, stringdist, tidyr, tm (>= 0.7.2), tools, venneuler, wordcloud,
xlsx, XML
xlsx, XML, mgsub
Suggests: koRpus, knitr, lda, proxy, stringi, SnowballC, testthat
LazyData: TRUE
VignetteBuilder: knitr
Expand Down
15 changes: 12 additions & 3 deletions R/multigsub.R
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,10 @@
#' \code{pattern} string is sorted by number of characters to prevent substrings
#' replacing meta strings (e.g., \code{pattern = c("the", "then")} resorts to
#' search for "then" first).
#' @param simultaneous logical. If \code{TRUE} then a slower method is used which
#' guarantees no conflicts in simulataneous string substitution (e.g., pattern =
#' c("hey", "ho"), replacement = c("ho", "hey"), text.var = "hey ho, let's go!"
#' will return "ho hey, let's go!").
#' @param \dots Additional arguments passed to \code{\link[base]{gsub}}.
#' @rdname multigsub
#' @return \code{multigsub} - Returns a vector with the pattern replaced.
Expand All @@ -33,6 +37,7 @@
#' multigsub(c("it's", "I'm"), c("it is", "I am"), DATA$state)
#' mgsub(c("it's", "I'm"), c("it is", "I am"), DATA$state)
#' mgsub("[[:punct:]]", "PUNC", DATA$state, fixed = FALSE)
#' mgsub("hey ho, let's go!", c("hey", "ho"), c("ho", "hey"), simultaneous = TRUE)
#'
#' ## ======================
#' ## `sub_holder` Function
Expand All @@ -54,7 +59,7 @@
multigsub <-
function (pattern, replacement, text.var, leadspace = FALSE,
trailspace = FALSE, fixed = TRUE, trim = TRUE, order.pattern = fixed,
...) {
simultaneous = FALSE, ...) {

if (leadspace | trailspace) replacement <- spaste(replacement, trailing = trailspace, leading = leadspace)

Expand All @@ -64,9 +69,13 @@ function (pattern, replacement, text.var, leadspace = FALSE,
if (length(replacement) != 1) replacement <- replacement[ord]
}
if (length(replacement) == 1) replacement <- rep(replacement, length(pattern))

for (i in seq_along(pattern)){

if (simultaneous) {
mgsub::mgsub(text.var, pattern, replacement, fixed = fixed, ...)
} else {
for (i in seq_along(pattern)){
text.var <- gsub(pattern[i], replacement[i], text.var, fixed = fixed, ...)
}
}

if (trim) text.var <- gsub("\\s+", " ", gsub("^\\s+|\\s+$", "", text.var, perl=TRUE), perl=TRUE)
Expand Down
5 changes: 4 additions & 1 deletion inst/Rmd_vignette/qdap_vignette.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -1045,7 +1045,9 @@ trans_cloud(text, c("greg", "bob"), target.words=list(obs), caps.list=obs,

<h4 id="mgsub">Multiple gsub</h4>

The researcher may have the need to make multiple substitutions in a text. An example of when this is needed is when a transcript is marked up with transcription coding convention specific to a particular transcription method. These codes, while useful in some contexts, may lead to inaccurate word statistics. The base R function `r HR2("http://stat.ethz.ch/R-manual/R-devel/library/base/html/grep.html", "gsub")` makes a single replacement of these types of coding conventions. The `r FUN("multigsub")` (alias `r FUN("mgsub")`) takes a vector of patterns to search for as well as a vector of replacements. Note that the replacements occur sequentially rather than all at once. This means a previous (first in pattern string) sub could alter or be altered by a later sub. `r FUN("mgsub")` is useful throughout multiple stages of the research process.
The researcher may have the need to make multiple substitutions in a text. An example of when this is needed is when a transcript is marked up with transcription coding convention specific to a particular transcription method. These codes, while useful in some contexts, may lead to inaccurate word statistics. The base R function `r HR2("http://stat.ethz.ch/R-manual/R-devel/library/base/html/grep.html", "gsub")` makes a single replacement of these types of coding conventions. The `r FUN("multigsub")` (alias `r FUN("mgsub")`) takes a vector of patterns to search for as well as a vector of replacements. Note that the replacements occur sequentially rather than all at once (unless the simultaneous flag is set to `r TRUE`). This means a previous (first in pattern string) sub could alter or be altered by a later sub. `r FUN("mgsub")` is useful throughout multiple stages of the research process.

When the simultaneous flag is set, replacements occur simultaneously. This results in slower processing, but ensures the integrity of the substitutions.

`r FT(orange, 5, text="&diams;")` **Multiple Substitutions**`r FT(orange, 5, text="&diams;")`
```{r}
Expand All @@ -1056,6 +1058,7 @@ mgsub(c("it's", "I'm"), "SINGLE REPLACEMENT", DATA$state)
mgsub("[[:punct:]]", "PUNC", DATA$state, fixed = FALSE)
## Iterative "I'm" converts to "I am" which converts to "INTERATIVE"
mgsub(c("it's", "I'm", "I am"), c("it is", "I am", "ITERATIVE"), DATA$state)
mgsub("hey ho, let's go!", c("hey", "ho"), c("ho", "hey"), simultaneous = TRUE)
```


Expand Down