matrix - compare each rows and assign number and total it in R -
i new r , used refer lot here in stackoverflow. compare each row rest of rows calculate modified similarity matrix.
mat <- matrix("", 10, 12) mat[c(1, 4, 6),] <- sample(c("aa", "ab", "bb"), 18, true) mat[c(2, 3, 10),] <- sample(c("aa", "bb", "ab"), 18, true) mat[c(5, 8),] <- sample(c("bb", "ab", "bb"), 12, true) mat[c(7, 9),] <- sample(c("aa", "aa", "bb"), 12, true) mat[3,4] = 'na' mat[2,5] = 'na' this provides:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [1,] "aa" "aa" "ab" "aa" "aa" "aa" "aa" "aa" "ab" "aa" "aa" "aa" [2,] "ab" "aa" "bb" "bb" "na" "ab" "ab" "aa" "bb" "bb" "bb" "ab" [3,] "bb" "aa" "ab" "na" "aa" "aa" "bb" "aa" "ab" "aa" "aa" "aa" [4,] "aa" "aa" "bb" "ab" "aa" "ab" "aa" "aa" "bb" "ab" "aa" "ab" [5,] "ab" "ab" "bb" "bb" "ab" "ab" "ab" "ab" "bb" "bb" "ab" "ab" [6,] "aa" "aa" "ab" "aa" "ab" "aa" "aa" "aa" "ab" "aa" "ab" "aa" [7,] "bb" "aa" "aa" "bb" "aa" "aa" "bb" "aa" "aa" "bb" "aa" "aa" [8,] "ab" "bb" "bb" "bb" "ab" "bb" "ab" "bb" "bb" "bb" "ab" "bb" [9,] "aa" "aa" "bb" "bb" "aa" "aa" "aa" "aa" "bb" "bb" "aa" "aa" [10,] "bb" "ab" "aa" "bb" "bb" "bb" "bb" "ab" "aa" "bb" "bb" "bb" i compare each row rest of rows calculate modified similarity matrix.
step 1: assign values comparing 2 rows
aa vs aa = 1; aa vs ab = 0.5; aa vs na = 0.0; na vs na = 0.0; ab vs aa = 0.5; aa vs bb = 0.0; ab vs ab = 0.5 step 2: total scores (example row 1 versus row 2 = 7.0)
step 3: count total numbers excluding instances there 1 or 2 'na' (example row 1 versus row 2 = 11.0),
step 4: divide total scores count(example row 1 versus row2 7/11=0.636363)
step 5: each rows , result in matrix populated in both diagonals (example 10 x 10)
thanks in advance !
i change matrix definition bit make "na" characters actual missing values (na) have special meaning in r close behavior want.
mat <- matrix("", 10, 12) mat[c(1, 4, 6),] <- sample(c("aa", "ab", "bb"), 18, true) mat[c(2, 3, 10),] <- sample(c("aa", "bb", "ab"), 18, true) mat[c(5, 8),] <- sample(c("bb", "ab", "bb"), 12, true) mat[c(7, 9),] <- sample(c("aa", "aa", "bb"), 12, true) mat[3,4] <- na mat[2,5] <- na you have not provided values of possible matches, going make assumptions. these values can changed without breaking code.
for step 1, going make named vector can indexed using pair names bunched together. aa vs ba becomes 'aaba'.
pair <- c('aaaa', 'aaab', 'aabb', 'abab', 'abbb', 'bbbb') value <- c(1, 0.5, 0, 0.5, 0.5, 1) # add reverse pairing (i assuming symmetry) pair <- c(pair, paste0(substr(pair, 3, 4), substr(pair, 1, 2))) value <- c(value, value) names(value) <- pair check how vector value looks @ point make sure it's want. next define function uses globally defined vector , returns want @ end of step 4. may want include vector definition in function body, feel not efficient.
compare <- function(row1, row2){ # total value of match 2 vectors # vector of complete cases (not having nas) good.cases <- complete.cases(cbind(row1, row2)) na.cases <- length(row1) - good.cases total.value <- sum(value[paste0(row1, row2)], na.rm=true) + 0.5*na.cases total.value/good.cases } at point total.value of 6.5 comparing first 2 rows, due wrong assumption in value.
for step 5, use double loop:
# start empty matrix match values n <- nrow(mat) matches <- matrix(rep(na, n*n), nrow=n) (i in 1:n){ (j in i:n){ ## if symmetric, half matrix enough matches[i, j] <- compare(mat[i, ], mat[j, ]) } } i hope helps.
edit: changed compare() assign value na cases after request in comments.
Comments
Post a Comment