matrix - compare each rows and assign number and total it in R -


i new r , used refer lot here in stackoverflow. compare each row rest of rows calculate modified similarity matrix.

mat <- matrix("", 10, 12) mat[c(1, 4, 6),] <- sample(c("aa", "ab", "bb"), 18, true) mat[c(2, 3, 10),] <- sample(c("aa", "bb", "ab"), 18, true) mat[c(5, 8),] <- sample(c("bb", "ab", "bb"), 12, true) mat[c(7, 9),] <- sample(c("aa", "aa", "bb"), 12, true) mat[3,4] = 'na' mat[2,5] = 'na' 

this provides:

      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]  [1,] "aa" "aa" "ab" "aa" "aa" "aa" "aa" "aa" "ab" "aa"  "aa"  "aa"   [2,] "ab" "aa" "bb" "bb" "na" "ab" "ab" "aa" "bb" "bb"  "bb"  "ab"   [3,] "bb" "aa" "ab" "na" "aa" "aa" "bb" "aa" "ab" "aa"  "aa"  "aa"   [4,] "aa" "aa" "bb" "ab" "aa" "ab" "aa" "aa" "bb" "ab"  "aa"  "ab"   [5,] "ab" "ab" "bb" "bb" "ab" "ab" "ab" "ab" "bb" "bb"  "ab"  "ab"   [6,] "aa" "aa" "ab" "aa" "ab" "aa" "aa" "aa" "ab" "aa"  "ab"  "aa"   [7,] "bb" "aa" "aa" "bb" "aa" "aa" "bb" "aa" "aa" "bb"  "aa"  "aa"   [8,] "ab" "bb" "bb" "bb" "ab" "bb" "ab" "bb" "bb" "bb"  "ab"  "bb"   [9,] "aa" "aa" "bb" "bb" "aa" "aa" "aa" "aa" "bb" "bb"  "aa"  "aa"  [10,] "bb" "ab" "aa" "bb" "bb" "bb" "bb" "ab" "aa" "bb"  "bb"  "bb"  

i compare each row rest of rows calculate modified similarity matrix.

step 1: assign values comparing 2 rows

aa vs aa = 1; aa vs ab = 0.5; aa vs na = 0.0; na vs na = 0.0; ab vs aa = 0.5; aa vs bb = 0.0; ab vs ab = 0.5 

step 2: total scores (example row 1 versus row 2 = 7.0)

step 3: count total numbers excluding instances there 1 or 2 'na' (example row 1 versus row 2 = 11.0),

step 4: divide total scores count(example row 1 versus row2 7/11=0.636363)

step 5: each rows , result in matrix populated in both diagonals (example 10 x 10)

thanks in advance !

i change matrix definition bit make "na" characters actual missing values (na) have special meaning in r close behavior want.

mat <- matrix("", 10, 12) mat[c(1, 4, 6),] <- sample(c("aa", "ab", "bb"), 18, true) mat[c(2, 3, 10),] <- sample(c("aa", "bb", "ab"), 18, true) mat[c(5, 8),] <- sample(c("bb", "ab", "bb"), 12, true) mat[c(7, 9),] <- sample(c("aa", "aa", "bb"), 12, true) mat[3,4] <- na mat[2,5] <- na 

you have not provided values of possible matches, going make assumptions. these values can changed without breaking code.

for step 1, going make named vector can indexed using pair names bunched together. aa vs ba becomes 'aaba'.

pair <- c('aaaa', 'aaab', 'aabb', 'abab', 'abbb', 'bbbb') value <- c(1, 0.5, 0, 0.5, 0.5, 1) # add reverse pairing (i assuming symmetry) pair <- c(pair, paste0(substr(pair, 3, 4), substr(pair, 1, 2))) value <- c(value, value) names(value) <- pair 

check how vector value looks @ point make sure it's want. next define function uses globally defined vector , returns want @ end of step 4. may want include vector definition in function body, feel not efficient.

compare <- function(row1, row2){   # total value of match 2 vectors   # vector of complete cases (not having nas)   good.cases <- complete.cases(cbind(row1, row2))   na.cases <- length(row1) - good.cases   total.value <- sum(value[paste0(row1, row2)], na.rm=true) + 0.5*na.cases   total.value/good.cases } 

at point total.value of 6.5 comparing first 2 rows, due wrong assumption in value.

for step 5, use double loop:

# start empty matrix match values n <- nrow(mat) matches <- matrix(rep(na, n*n), nrow=n) (i in 1:n){   (j in i:n){  ## if symmetric, half matrix enough     matches[i, j] <- compare(mat[i, ], mat[j, ])   } } 

i hope helps.

edit: changed compare() assign value na cases after request in comments.


Comments

Popular posts from this blog

javascript - jquery or ashx not working -

opencv - DataType<cv::detail::deriv_type>::depth what is it used for -

python 3.x - Mapping specific letters onto a list of words -