Text encodings

CASIO uses specific character tables on their calculators.

Note

These tables or encodings are sometimes named FONTCHARACTER by the community, since that was the name of the type that could contain a character code in the fx-9860G SDK published by CASIO in 2004.

In these tables, every code can represent either:

  • Control characters, e.g. 0x000A (newline);

  • Graph characters, e.g. 0x0023 (#);

  • Operation codes, or “opcodes” for short, e.g. 0xF710 (Locate).

Note

All of these types will be named “characters” in this section.

CASIO has had two separate character tables following similar logics:

  • The legacy character table, applied on calculators up to the fx-9860G, excluded.

  • The fx-9860G character table, applied on all compatible calculators post-2004, including fx-CG and fx-CP calculators.

Both have the same multi-byte leader logic, i.e. characters have a “lead” character within a specific set, then a code relative to this set. Sets for the above character tables are the following:

  • For legacy: 0x00, 0x7F, 0xF7.

  • For fx-9860G: 0x00, 0x7F, 0xE5, 0xE6, 0xE7, 0xF7, 0xF9.

It is important to distinguish both, as while a lot of characters are common between both tables, some characters have been removed or replaced from one to the other, and the legacy table uses some of the fx-9860G table’s multi-byte leaders as characters.

The following sections will present the character encodings and associated tables used within and surrounding CASIO calculators.

Variable width encoding

This encoding can be used with either the legacy or fx-9860G character table. Every character is represented with either one or two bytes, depending on the first byte of the sequence:

  • If the first byte is a multi-byte leader for the character table, the sequence is two-bytes long;

  • Otherwise, the sequence is one-byte long.

For example, take the encoded sequence \x12\xE5\xAB:

  • With the legacy character table, since none of the characters are multi-byte leaders, the sequence represents three characters 0x0012, 0x00E5, 0x00AB.

  • With the fx-9860G character table, \xE5 is a multi-byte leader, which means that the sequence represents two characters 0x0012 and 0xE5AB.

Fixed-width encoding

This encoding can be used with either the legacy or fx-9860G character table. Every character is represented using two bytes, using either big or little endian.

For example, take the sequence of characters 0x0012 and 0xE5AB:

  • If using big endian, the encoded sequence will be \x00\x12\xE5\xAB;

  • If using little endian, the encoded sequence will be \x12\x00\xAB\xE5.

CAT data encoding

This encoding can be used with both the legacy or fx-9860G character table, and represents every supported character with an ASCII-compatible character sequence.

Some example sequences are the following:

  • The legacy or fx-9860G character 0x0040 (-) is represented in CAT data encoding using the ASCII sequence -;

  • The legacy or fx-9860G character 0xF718 is represented in CAT data encoding using the ASCII sequence \ClrText;

  • The legacy character 0x00E6 is represented in CAT data encoding using the ASCII sequence CL.

CTF data encoding

Todo

Write this. ASCII-based.

UTF-32 encoding

Cahute supports the UTF-32 fixed-length encoding without Byte-Order Mark (BOM), with big and little endiannesses.

UTF-8 encoding

Cahute supports the UTF-8 variable-length encoding without Byte-Order Mark (BOM).