encoding - Perl Handling Malformed Characters -
i'd advice perl.
i have text files want process perl. text files encoded in cp932, reasons may contain malformed characters.
my program like:
#! /usr/bin/perl -w use strict; use encoding 'utf-8'; # 'workfile.txt' supposed encoded in cp932 open $in, "<:encoding(cp932)", "./workfile.txt"; while ( $line = <$in> ) { # process comes here print $line; }
if workfile.txt includes malformed characters, perl complains:
cp932 "\x81" not map unicode @ ./my_program.pl line 8, <$in> line 1234.
perl knows if input contains malformed characters. want rewrite see if input or bad , act accordingly, print lines (lines not contain malformed characters) output filehandle a, , print lines contain malformed characters output filehandle b.
#! /usr/bin/perl -w use strict; use encoding 'utf-8'; use english; # 'workfile.txt' supposed encoded in cp932 open $in, "<:encoding(cp932)", "./workfile.txt"; open $output_good, ">:encoding(utf8)", "good.txt"; open $output_bad, ">:encoding(utf8)", "bad.txt"; select $output_good; # in cases workfile.txt lines while ( $line = <$in> ) { if ( $line contains malformed characters ) { select $output_bad; } print "$input_line_number: $line"; select $output_good; }
my question how can write "if ($line contains malfoomed characters)" part. how can check if input or bad.
thanks in advance.
#! /usr/bin/perl -w use strict; use utf8; # source encoded using utf-8 use open ':std', ':encoding(utf-8)'; # std* utf-8; # utf-8 default encoding open. use encode qw( decode ); open $fh_in, "<:raw", "workfile.txt" or die $!; open $fh_good, ">", "good.txt" or die $!; open $fh_bad, ">:raw", "bad.txt" or die $!; while ( $line = <$fh_in> ) { $decoded_line = eval { decode('cp932', $line, encode::fb_croak|encode::leave_src) }; if (defined($decoded_line)) { print($fh_good "$. $decoded_line"); } else { print($fh_bad "$. $line"); } }
Comments
Post a Comment