Home > Database > Download > Introduction




 File format and Character codes
All data are in text-file form, and are provided in both Shift-JIS and Unicode. In order to facilitate searches, we have not attempted to reproduce the older traditional Chinese characters as they appear in original text, but instead have changed them to their more modern form. The Unicode data were produced through a simple conversion of the traditional Chinese characters (Big5) on the ZenBase CD1. This should be kept in mind when doing searches of the material.

 Text Format
The text data are provided principally in APP format. The APP format, proposed by Urs App, former Assistant Director of the IRIZ, is designed for speedy line-based search, changing the original lines to ones that end with punctuation mark, thus preventing possible interruptions of Chinese compound.

 Gaiji (nonstamndard) character notation
Numbering of gaiji (nonstamndard) characters is in two formats:
  1. The original KanjiBase format, found on the ZenBase CD1 (for Shift-JIS).
  2. The Mojikyō format (for Unicode).

page top  

  Last Update: 2003/02/21