encoding - Perl Handling Malformed Characters -


i'd advice perl.

i have text files want process perl. text files encoded in cp932, reasons may contain malformed characters.

my program like:

#! /usr/bin/perl -w  use strict; use encoding 'utf-8';  # 'workfile.txt' supposed encoded in cp932 open $in, "<:encoding(cp932)", "./workfile.txt";  while ( $line = <$in> ) {    # process comes here    print $line;  } 

if workfile.txt includes malformed characters, perl complains:

cp932 "\x81" not map unicode @ ./my_program.pl line 8, <$in> line 1234. 

perl knows if input contains malformed characters. want rewrite see if input or bad , act accordingly, print lines (lines not contain malformed characters) output filehandle a, , print lines contain malformed characters output filehandle b.

#! /usr/bin/perl -w  use strict; use encoding 'utf-8'; use english;  # 'workfile.txt' supposed encoded in cp932 open $in, "<:encoding(cp932)", "./workfile.txt";  open $output_good, ">:encoding(utf8)", "good.txt"; open $output_bad,  ">:encoding(utf8)", "bad.txt";  select $output_good;   # in cases workfile.txt lines  while ( $line = <$in> ) {    if ( $line contains malformed characters ) {      select $output_bad;    }    print "$input_line_number: $line";    select $output_good;  } 

my question how can write "if ($line contains malfoomed characters)" part. how can check if input or bad.

thanks in advance.

#! /usr/bin/perl -w  use strict;  use utf8;                             # source encoded using utf-8 use open ':std', ':encoding(utf-8)';  # std* utf-8;                                       #   utf-8 default encoding open. use encode qw( decode );  open $fh_in,   "<:raw", "workfile.txt"    or die $!; open $fh_good, ">",     "good.txt"    or die $!; open $fh_bad,  ">:raw", "bad.txt"    or die $!;  while ( $line = <$fh_in> ) {    $decoded_line =       eval { decode('cp932', $line, encode::fb_croak|encode::leave_src) };    if (defined($decoded_line)) {       print($fh_good "$. $decoded_line");    } else {       print($fh_bad  "$. $line");    } } 

Comments

Popular posts from this blog

apache - Remove .php and add trailing slash in url using htaccess not loading css -

javascript - jQuery show full size image on click -