Linux Uniq - identical lines not matching -


i have file, asdf2, 4 identical lines, not consistent results linux uniq command. there no carriage return in file, line feeds.

the file has 4 lines:

$ cat asdf2 | wc -l 4 

uniq claims first 2 lines identical:

$ cat asdf2 | uniq -c | wc -l 3 

removing special character makes lines identical:

$ cat asdf2 | sed 's/\xfe//g' | uniq -c | wc -l 1 

this character not in file:

$ cat asdf2 | sed 's/\x1c/@/g' | tr -dc '@' 

replacing different character makes lines identical:

$ cat asdf2 | sed 's/\xfe/\x1c/g' | uniq -c | wc -l 1 

how can happen?

for concreteness, here hex dump:

0000000: 506c 616e 6e65 6420 436f 7374 fe41 6374 planned cost.act 0000010: 6976 6974 79fe 4163 7469 7669 7479 2047 ivity.activity g 0000020: 726f 7570 fe41 6374 6976 6974 7920 4772 roup.activity gr 0000030: 6f75 7020 4944 fe41 6374 6976 6974 7920 oup id.activity 0000040: 4944 fe41 64fe 4164 2049 44fe 4164 2053 id.ad.ad id.ad s 0000050: 7461 7475 73fe 4164 2054 7970 65fe 4164 tatus.ad type.ad 0000060: 7665 7274 6973 6572 fe41 6476 6572 7469 vertiser.adverti 0000070: 7365 7220 4772 6f75 70fe 4164 7665 7274 ser group.advert 0000080: 6973 0a50 6c61 6e6e 6564 2043 6f73 74fe is.planned cost. 0000090: 4163 7469 7669 7479 fe41 6374 6976 6974 activity.activit 00000a0: 7920 4772 6f75 70fe 4163 7469 7669 7479 y group.activity 00000b0: 2047 726f 7570 2049 44fe 4163 7469 7669 group id.activi 00000c0: 7479 2049 44fe 4164 fe41 6420 4944 fe41 ty id.ad.ad id.a 00000d0: 6420 5374 6174 7573 fe41 6420 5479 7065 d status.ad type 00000e0: fe41 6476 6572 7469 7365 72fe 4164 7665 .advertiser.adve 00000f0: 7274 6973 6572 2047 726f 7570 fe41 6476 rtiser group.adv 0000100: 6572 7469 730a 506c 616e 6e65 6420 436f ertis.planned co 0000110: 7374 fe41 6374 6976 6974 79fe 4163 7469 st.activity.acti 0000120: 7669 7479 2047 726f 7570 fe41 6374 6976 vity group.activ 0000130: 6974 7920 4772 6f75 7020 4944 fe41 6374 ity group id.act 0000140: 6976 6974 7920 4944 fe41 64fe 4164 2049 ivity id.ad.ad 0000150: 44fe 4164 2053 7461 7475 73fe 4164 2054 d.ad status.ad t 0000160: 7970 65fe 4164 7665 7274 6973 6572 fe41 ype.advertiser.a 0000170: 6476 6572 7469 7365 7220 4772 6f75 70fe dvertiser group. 0000180: 4164 7665 7274 6973 0a50 6c61 6e6e 6564 advertis.planned 0000190: 2043 6f73 74fe 4163 7469 7669 7479 fe41 cost.activity.a 00001a0: 6374 6976 6974 7920 4772 6f75 70fe 4163 ctivity group.ac 00001b0: 7469 7669 7479 2047 726f 7570 2049 44fe tivity group id. 00001c0: 4163 7469 7669 7479 2049 44fe 4164 fe41 activity id.ad.a 00001d0: 6420 4944 fe41 6420 5374 6174 7573 fe41 d id.ad status.a 00001e0: 6420 5479 7065 fe41 6476 6572 7469 7365 d type.advertise 00001f0: 72fe 4164 7665 7274 6973 6572 2047 726f r.advertiser gro 0000200: 7570 fe41 6476 6572 7469 730a up.advertis. 

in fact, pasting directly command line seems work well:

$ echo 'planned costþactivityþactivity groupþactivity group idþactivity idþadþad idþad statusþad typeþadvertiserþadvertiser groupþadvertis planned costþactivityþactivity groupþactivity group idþactivity idþadþad idþad statusþad typeþadvertiserþadvertiser groupþadvertis planned costþactivityþactivity groupþactivity group idþactivity idþadþad idþad statusþad typeþadvertiserþadvertiser groupþadvertis planned costþactivityþactivity groupþactivity group idþactivity idþadþad idþad statusþad typeþadvertiserþadvertiser groupþadvertis' | uniq -c | wc -l 

assuming you're using gnu uniq:

let's have file a contains:

a a a a 

uniq a

a 

uniq -u a

# no output 

running uniq a prints out first 2 lines because without options uniq merges matching lines first occurance. when specify -u, however, uniq only prints unique lines.

read fine manpage learn more.

note uniq not detect repeated lines unless adjacent. may want sort input first, or use sort -u without uniq.


Comments

Popular posts from this blog

javascript - jquery or ashx not working -

opencv - DataType<cv::detail::deriv_type>::depth what is it used for -

python 3.x - Mapping specific letters onto a list of words -