How can we create log files to track data merges?

1.0K Asked by ColinPayne in Data Science , Asked on Nov 9, 2019

To be able to make sure that this does not lead to any errors (e.g., inflated data sets due to non-unique identifiers), we can maintain a merge tracker like this.

merge <- "dat1+dat2=dat1"

count <- nrow(dat1)

check_t1 <- data.frame(merge, count)

dat1 <- join(dat1, dat2, by = "id1", type = "left")

count <- nrow(dat1)

check_t2 <- data.frame(merge, count)

checkmerge <- rbind(checkmerge, check_t1, check_t2)

We can use a function like this which contains a stopifnot condition. It will throw an error if our join inflates our data.frame

myfun <- function(df1, df2, id, jtype, msg) {

require(plyr)

print(msg)

M <- join(df1, df2, by = id, type = jtype)

stopifnot(nrow(df1)==nrow(M))

return(M)

}

library(plyr)

myfun(mtcars, mtcars, "cyl", "left", "mtcars, mtcars")

How can we create log files to track data merges?

Your Answer