I enclose an excerpt from
http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/OTHER/BIG5.TXT.
It says that some TC char X cannot be converted into UCS chars without losing
roundtrip compatibility. ie, In a roundtrip conversion X --> UCS(X) --> X',
It may arise that X != X'.
Any legacy-encoded IRI(IDN)s including X, may fail to be compared
successfully if they had undergone conversions into/from unicode.
I will appreaciate if anyone present the history of BIG5 versions and its
round-trip compatilibity problems in more detail.
Soobok Lee
--------------------------------------------------------------------------------
# Name: BIG5 to Unicode table (complete)
# Unicode version: 1.1
# Table version: 0.0d3
# Table format: Format A
# Date: 11 February 1994
#
# Copyright (c) 1991-1994 Unicode, Inc. All Rights reserved.
(snip)
# If you have carefully considered the fact that the mappings in
# this table are only one possible set of mappings between BIG5 and
# Unicode and have no normative status, but still feel that you
# have located an error in the table that requires fixing, you may
# report any such error to errata@unicode.org.
#
# WARNING! It is currently impossible to provide round-trip compatibility
# between BIG5 and Unicode.
#
# A number of characters are not currently mapped because
# of conflicts with other mappings. They are as follows:
#
# BIG5 Description Comments
#
# 0xA15A SPACING UNDERSCORE duplicates A1C4
# 0xA1C3 SPACING HEAVY OVERSCORE not in Unicode
# 0xA1C5 SPACING HEAVY UNDERSCORE not in Unicode
# 0xA1FE LT DIAG UP RIGHT TO LOW LEFT duplicates A2AC
# 0xA240 LT DIAG UP LEFT TO LOW RIGHT duplicates A2AD
# 0xA2CC HANGZHOU NUMERAL TEN conflicts with A451 mapping
# 0xA2CE HANGZHOU NUMERAL THIRTY conflicts with A4CA mapping
#
# We currently map all of these characters to U+FFFD REPLACEMENT CHARACTER.
# It is also possible to map these characters to their duplicates, or to
# the user zone.
#
# Notes:
#
# 1. In addition to the above, there is some uncertainty about the
# mappings in the range C6A1 - C8FE, and F9DD - F9FE. The ETEN
# version of BIG5 organizes the former range differently, and adds
# additional characters in the latter range. The correct mappings
# these ranges need to be determined.
#
# 2. There is an uncertainty in the mapping of the Big Five character
# 0xA3BC. This character occurs within the Big Five block of tone marks
# for bopomofo and is intended to be the tone mark for the first tone in
# Mandarin Chinese. We have selected the mapping U+02C9 MODIFIER LETTER
# MACRON (Mandarin Chinese first tone) to reflect this semantic.
# However, because bopomofo uses the absense of a tone mark to indicate
# the first Mandarin tone, most implementations of Big Five represent
# this character with a blank space, and so a mapping such as U+2003 EM
# SPACE might be preferred.
#
# Format: Three tab-separated columns
# Column #1 is the BIG5 code (in hex as 0xXXXX)
# Column #2 is the Unicode (in hex as 0xXXXX)
# Column #3 is the Unicode name (follows a comment sign, '#')
# The official names for Unicode characters U+4E00
# to U+9FA5, inclusive, is "CJK UNIFIED IDEOGRAPH-XXXX",
# where XXXX is the code point. Including all these
# names in this file increases its size substantially
# and needlessly. The token "" is used for the
# name of these characters. If necessary, it can be
# expanded algorithmically by a parser or editor.
#
# The entries are in BIG5 order
#
#