A user wants to learn about dplyr and wants to group by multiple columns by using input as a string vector.Below is his code.How to fix that?
# make data with weird column names that can't be hard coded
data = data.frame(
asihckhdoydkhxiydfgfTgdsx = sample(LETTERS[1:3], 100, replace=TRUE), a30mvxigxkghc5cdsvxvyv0ja = sample(LETTERS[1:3], 100, replace=TRUE), value = rnorm(100)
)
# get those columns which we want the average
columns = names(data)[-3]
# plyr - works
ddply(data, columns, summarize, value=mean(value))
# dplyr - raises an error
data %.% group_by(columns) %.% summarise(Value = mean(value))
#> Error in eval(expr, envir, enclos) : index out of bounds
Here in this plyr library is working but dplyr raises an error.
Dplyr uses scoped version after updating and we can perform this with function like select(). Let us do the same
dataset = data.frame(
asihckhdoydkhxiydfgfTgdsx = sample(LETTERS[1:3], 100, replace=TRUE), a30mvxigxkghc5cdsvxvyv0ja = sample(LETTERS[1:3], 100, replace=TRUE), value = rnorm(100)
)
# get those columns which we want the average
columns = names(dataset)[-3]
library(dplyr)
dataframe1 <- data %>%
group_by_at(vars(one_of(columns))) %>% summarize(Value = mean(value))
#compare plyr for reference
dataframe2 <- plyr::ddply(data, columns, plyr::summarize, value=mean(value)) table(dataframe1 == dataframe2, useNA = 'ifany')
No the above code gives the following output as expected.