perltw - ¥¿Å餤¤å Perl «ü«n
Åwªï¨Ó¨ì Perl ªº¤Ñ¦a!
±q 5.8.0 ª©¶}©l, Perl ¨ã³Æ¤F§¹µ½ªº Unicode (¸U°ê½X) ¤ä´©, ¤]³s±a¤ä´©¤F³\¦h©Ô¤B»y¨t¥H¥~ªº½s½X¤è¦¡; CJK (¤¤¤éÁú) «K¬O¨ä¤¤ªº¤@³¡¥÷. Unicode ¬O°ê»Ú©Êªº¼Ð·Ç, ¸Õ¹Ï²[»\¥@¬É¤W©Ò¦³ªº¦r²Å: ¦è¤è¥@¬É, ªF¤è¥@¬É, ¥H¤Î¨âªÌ¶¡ªº¤@¤Á (§Æþ¤å, ±Ô§Q¨È¤å, ªü©Ô§B¤å, §Æ§B¨Ó¤å, ¦L«×¤å, ¦L¦a¦w¤å, µ¥µ¥). ¥¦¤]®e¯Ç¤F¦hºØ§@·~¨t²Î»P¥»O (¦p PC ¤Î³Áª÷¶ð).
Perl ¥»¨¥H Unicode ¶i¦æ¾Þ§@. ³oªí¥Ü Perl ¤º³¡ªº¦r¦ê¸ê®Æ¥i¥Î Unicode ªí¥Ü; Perl ªº¨ç¦¡»Pºâ²Å (¨Ò¦p¥¿³Wªí¥Ü¦¡¤ñ¹ï) ¤]¯à¹ï Unicode ¶i¦æ¾Þ§@. ¦b¿é¤J¤Î¿é¥X®É, ¬°¤F³B²z¥H Unicode ¤§«eªº½s½X¤è¦¡Àx¦sªº¸ê®Æ, Perl ´£¨Ñ¤F Encode ³oÓ¼Ò²Õ, ¥i¥HÅý§A»´©ö¦aŪ¨ú¤Î¼g¤J¦³ªº½s½X¸ê®Æ.
Encode ©µ¦ù¼Ò²Õ¤ä´©¤U¦C¥¿Å餤¤åªº½s½X¤è¦¡ ('big5' ªí¥Ü 'big5-eten'):
big5-eten Big5 ½s½X (§tʤѩµ¦ù¦r§Î) big5-hkscs Big5 + »´ä¥~¦r¶°, 2001 ¦~ª© cp950 ¦r½X¶ 950 (Big5 + ·L³n²K¥[ªº¦r²Å)
Á|¨Ò¨Ó»¡, ±N Big5 ½s½XªºÀÉ®×Âন Unicode, ¯»ÝÁä¤J¤U¦C«ü¥O:
perl -Mencoding=big5,STDOUT,utf8 -pe1 < file.big5 > file.utf8
Perl ¤]¤ºªþ¤F "piconv", ¤@¤ä§¹¥þ¥H Perl ¼g¦¨ªº¦r²ÅÂà´«¤u¨ãµ{¦¡, ¥Îªk¦p¤U:
piconv -f big5 -t utf8 < file.big5 > file.utf8 piconv -f utf8 -t big5 < file.utf8 > file.big5
¥t¥~, §Q¥Î encoding ¼Ò²Õ, §A¥i¥H»´©ö¼g¥X¥H¦r²Å¬°³æ¦ìªºµ{¦¡½X, ¦p¤U©Ò¥Ü:
#!/usr/bin/env perl # ±Ò°Ê big5 ¦r¦ê¸ÑªR; ¼Ð·Ç¿é¥X¤J¤Î¼Ð·Ç¿ù»~³£³]¬° big5 ½s½X use encoding 'big5', STDIN => 'big5', STDOUT => 'big5'; print length("Àd¾m"); # 2 (Âù¤Þ¸¹ªí¥Ü¦r²Å) print length('Àd¾m'); # 4 (³æ¤Þ¸¹ªí¥Ü¦ì¤¸²Õ) print index("½Î½Î±Ð»£", "να"); # -1 (¤£¥]§t¦¹¤l¦r¦ê) print index('½Î½Î±Ð»£', 'να'); # 1 (±q²Ä¤GӦ줸²Õ¶}©l)
¦b³Ì«á¤@¦C¨Ò¤l¸Ì, "½Î" ªº²Ä¤GӦ줸²Õ»P "½Î" ªº²Ä¤@Ӧ줸²Õµ²¦X¦¨ Big5 ½Xªº "ν"; "½Î" ªº²Ä¤GӦ줸²Õ«h»P "±Ð" ªº²Ä¤@Ӧ줸²Õµ²¦X¦¨ "α". ³o¸Ñ¨M¤F¥H«e Big5 ½X¤ñ¹ï³B²z¤W±`¨£ªº°ÝÃD.
¦pªG»Ýn§ó¦hªº¤¤¤å½s½X, ¥i¥H±q CPAN (http://www.cpan.org/) ¤U¸ü Encode::HanExtra ¼Ò²Õ. ¥¦¥Ø«e´£¨Ñ¤U¦C½s½X¤è¦¡:
cccii 1980 ¦~¤å«Ø·|ªº¤¤¤å¸ê°T¥æ´«½X euc-tw Unix ©µ¦ù¦r²Å¶°, ¥]§t CNS11643 ¥± 1-7 big5plus ¤¤¤å¼Æ¦ì¤Æ§Þ³N±À¼s°òª÷·|ªº Big5+ big5ext ¤¤¤å¼Æ¦ì¤Æ§Þ³N±À¼s°òª÷·|ªº Big5e
¥t¥~, Encode::HanConvert ¼Ò²Õ«h´£¨Ñ¤F²ÁcÂà´«¥Îªº¨âºØ½s½X:
big5-simp Big5 ¥¿Å餤¤å»P Unicode ²Å餤¤å¤¬Âà gbk-trad GBK ²Å餤¤å»P Unicode ¥¿Å餤¤å¤¬Âà
Y·Q¦b GBK »P Big5 ¤§¶¡¤¬Âà, ½Ð°Ñ¦Ò¸Ó¼Ò²Õ¤ºªþªº b2g.pl »P g2b.pl ¨â¤äµ{¦¡, ©Î¦bµ{¦¡¤º¨Ï¥Î¤U¦C¼gªk:
use Encode::HanConvert; $euc_cn = big5_to_gb($big5); # ±q Big5 Âର GBK $big5 = gb_to_big5($euc_cn); # ±q GBK Âର Big5
½Ð°Ñ¦Ò Perl ¤ºªþªº¤j¶q»¡©ú¤å¥ó (¤£©¯¥þ¬O¥Î^¤å¼gªº), ¨Ó¾Ç²ß§ó¦hÃö©ó Perl ªºª¾ÃÑ, ¥H¤Î Unicode ªº¨Ï¥Î¤è¦¡. ¤£¹L, ¥~³¡ªº¸ê·½¬Û·íÂ×´I:
Encode, Encode::TW, encoding, perluniintro, perlunicode
Jarkko Hietaniemi <jhi@iki.fi>
Autrijus Tang (ð©vº~) <autrijus@autrijus.org>