A user wants to learn about dplyr and wants to group by multiple columns by using input as a string vector.Below is his code.How to fix that?

1.0K Asked by Sunitapandey in Data Science , Asked on Nov 5, 2019

# make data with weird column names that can't be hard coded

data = data.frame(

asihckhdoydkhxiydfgfTgdsx = sample(LETTERS[1:3], 100, replace=TRUE), a30mvxigxkghc5cdsvxvyv0ja = sample(LETTERS[1:3], 100, replace=TRUE), value = rnorm(100)

)

# get those columns which we want the average

columns = names(data)[-3]

# plyr - works

ddply(data, columns, summarize, value=mean(value))

# dplyr - raises an error

data %.% group_by(columns) %.% summarise(Value = mean(value))

#> Error in eval(expr, envir, enclos) : index out of bounds

Here in this plyr library is working but dplyr raises an error.

Dplyr uses scoped version after updating and we can perform this with function like select(). Let us do the same

dataset = data.frame(

asihckhdoydkhxiydfgfTgdsx = sample(LETTERS[1:3], 100, replace=TRUE), a30mvxigxkghc5cdsvxvyv0ja = sample(LETTERS[1:3], 100, replace=TRUE), value = rnorm(100)

)

# get those columns which we want the average

columns = names(dataset)[-3]

library(dplyr)

dataframe1 <- data %>%

group_by_at(vars(one_of(columns))) %>% summarize(Value = mean(value))

#compare plyr for reference

dataframe2 <- plyr::ddply(data, columns, plyr::summarize, value=mean(value)) table(dataframe1 == dataframe2, useNA = 'ifany')

No the above code gives the following output as expected.

A user wants to learn about dplyr and wants to group by multiple columns by using input as a string vector.Below is his code.How to fix that?

Your Answer