A user wants to learn about dplyr and wants to group by multiple columns by using input as a string vector.Below is his code.How to fix that?

836    Asked by Sunitapandey in Data Science , Asked on Nov 5, 2019
Answered by Nitin Solanki

# make data with weird column names that can't be hard coded

data = data.frame(

asihckhdoydkhxiydfgfTgdsx = sample(LETTERS[1:3], 100, replace=TRUE), a30mvxigxkghc5cdsvxvyv0ja = sample(LETTERS[1:3], 100, replace=TRUE), value = rnorm(100)

 )

# get those columns which we want the average

columns = names(data)[-3]

# plyr - works

 ddply(data, columns, summarize, value=mean(value))

# dplyr - raises an error

data %.% group_by(columns) %.% summarise(Value = mean(value))

#> Error in eval(expr, envir, enclos) : index out of bounds

Here in this plyr library is working but dplyr raises an error.

Dplyr uses scoped version after updating and we can perform this with function like select(). Let us do the same

dataset = data.frame(

asihckhdoydkhxiydfgfTgdsx = sample(LETTERS[1:3], 100, replace=TRUE), a30mvxigxkghc5cdsvxvyv0ja = sample(LETTERS[1:3], 100, replace=TRUE), value = rnorm(100)

 )

# get those columns which we want the average

columns = names(dataset)[-3]

library(dplyr)

dataframe1 <- data %>%

group_by_at(vars(one_of(columns))) %>% summarize(Value = mean(value))

 #compare plyr for reference

dataframe2 <- plyr::ddply(data, columns, plyr::summarize, value=mean(value)) table(dataframe1 == dataframe2, useNA = 'ifany')

No the above code gives the following output as expected.




Your Answer

Interviews

Parent Categories