在data.frame(或data.table)中,我想用最接近的先前非 NA 值“填充” NA。一个简单的例子,使用向量(而不是 a data.frame)如下:
data.frame
data.table
> y <- c(NA, 2, 2, NA, NA, 3, NA, 4, NA, NA)
我想要一个fill.NAs()允许我构造yy这样的函数:
fill.NAs()
yy
> yy [1] NA NA NA 2 2 2 2 3 3 3 4 4
我需要对许多(总共约 1 Tb)小型data.frames(约 30-50 Mb)重复此操作,其中一行是 NA 是它的所有条目。解决问题的好方法是什么?
我制作的丑陋解决方案使用此功能:
last <- function (x){ x[length(x)] } fill.NAs <- function(isNA){ if (isNA[1] == 1) { isNA[1:max({which(isNA==0)[1]-1},1)] <- 0 # first is NAs # can't be forward filled } isNA.neg <- isNA.pos <- isNA.diff <- diff(isNA) isNA.pos[isNA.diff < 0] <- 0 isNA.neg[isNA.diff > 0] <- 0 which.isNA.neg <- which(as.logical(isNA.neg)) if (length(which.isNA.neg)==0) return(NULL) # generates warnings later, but works which.isNA.pos <- which(as.logical(isNA.pos)) which.isNA <- which(as.logical(isNA)) if (length(which.isNA.neg)==length(which.isNA.pos)){ replacement <- rep(which.isNA.pos[2:length(which.isNA.neg)], which.isNA.neg[2:max(length(which.isNA.neg)-1,2)] - which.isNA.pos[1:max(length(which.isNA.neg)-1,1)]) replacement <- c(replacement, rep(last(which.isNA.pos), last(which.isNA) - last(which.isNA.pos))) } else { replacement <- rep(which.isNA.pos[1:length(which.isNA.neg)], which.isNA.neg - which.isNA.pos[1:length(which.isNA.neg)]) replacement <- c(replacement, rep(last(which.isNA.pos), last(which.isNA) - last(which.isNA.pos))) } replacement }
该函数fill.NAs的使用如下:
fill.NAs
y <- c(NA, 2, 2, NA, NA, 3, NA, 4, NA, NA) isNA <- as.numeric(is.na(y)) replacement <- fill.NAs(isNA) if (length(replacement)){ which.isNA <- which(as.logical(isNA)) to.replace <- which.isNA[which(isNA==0)[1]:length(which.isNA)] y[to.replace] <- y[replacement] }
输出
> y [1] NA 2 2 2 2 3 3 3 4 4 4
…这似乎工作。但是,伙计,它丑吗!有什么建议么?
您可能希望使用zoo包中的na.locf()函数来 进行最后一次观察 以替换您的 NA 值。 __
na.locf()
这是帮助页面中其用法示例的开头:
library(zoo) az <- zoo(1:6) bz <- zoo(c(2,NA,1,4,5,2)) na.locf(bz) 1 2 3 4 5 6 2 2 1 4 5 2 na.locf(bz, fromLast = TRUE) 1 2 3 4 5 6 2 1 1 4 5 2 cz <- zoo(c(NA,9,3,2,3,2)) na.locf(cz) 2 3 4 5 6 9 3 2 3 2