Mitchell Hashimoto

Grapheme Clusters and Terminal Emulators

October 2, 2023

Copy and paste "๐Ÿง‘โ€๐ŸŒพ" in your terminal emulator. How many cells forward did your cursor move? Depending on your terminal emulator, it may have moved 2, 4, 5, or 6 cells1. Yikes. This blog post describes why this happens and how terminal emulator and program authors can achieve consistent spacing for all characters.


Character Grids, Historically

Terminals operate on a grid of fixed size cells. This is fundamental to how terminal emulators operate because many control sequences programs can send to the terminal operate on cells.

For example, there is a control sequence to move the cursor left CSI n D, right CSI n C, up CSI n A, or down CSI n B. The "n" in each of these is the number of cells to move. There is a sequence to request the cursor position CSI 6 n. The terminal responds to the program in the form CSI y ; x R where both "y" and "x" are cell coordinates.

Traditionally, terminals simply read an input byte stream and mapped each individual byte to a cell in the grid. For example, the stream "1234" is 4 bytes, and programmers can very easily read a byte at a time and place it into the next cell, move the cursor right one, repeat.

Eventually, "wide characters" came along. Common wide characters are Asian characters such as ๆฉ‹ or Emoji such as ๐Ÿ˜ƒ. A function wcwidth was added to libc to return the width of a wide character in cells. Wide characters were given a width of "2" (usually). Therefore, if you type ๆฉ‹ in a terminal emulator, the character will take up two grid cells and your cursor should jump forward by two cells.

And this is how most terminal emulators and terminal programs (shells, TUIs, etc.) are implemented today: they process input characters via wcwidth and move the cursor accordingly. And for a short period of time, this worked completely fine. But today, this is no longer adequate and results in many errors.


Grapheme Clustering

It turns out that a single 32-bit value is not adequate to represent every user-perceived character in the world. A "user-perceived character" is how the Unicode Standard defines a grapheme.

Let's consider the emoji "๐Ÿง‘โ€๐ŸŒพ". The emoji should look something like this in case your computer doesn't support it. I think every human would agree this is a single "user-perceived character" or grapheme. The Unicode Standard itself defines this as a single grapheme so regardless of your personal opinion, international standards say this is one grapheme.

For computers, its not so obvious. "๐Ÿง‘โ€๐ŸŒพ" is three codepoints (U+1F9D1 ๐Ÿง‘, U+200D, and U+1F33E ๐ŸŒพ), three 32-bit values when UTF-32 encoded, or 11 bytes2 when UTF-8 encoded.

If we only consider the 32-bit values and send each one individually to wcwidth as we historically have done, we'll get the results of "2", then "0", then "2", respectively for each codepoint. Therefore, the cursor moves forward by 4 cells. This is why with most terminals (at the time of writing this), you'll see your cursor move forward by 4 cells

What's with the zero-width character? The codepoint U+200D is known as a Zero-Width Joiner (ZWJ) and has a standards-defined width of zero. The ZWJ tells text processing systems to treat the codepoints around it as joined into a single character. That's why you can type both "๐Ÿง‘โ€๐ŸŒพ" and "๐Ÿง‘๐ŸŒพ"; the only difference between these two quoted values is the farmer on the left has a zero-width joiner between the two emoji.

Enter grapheme clustering. Grapheme clustering is the process of determing a single grapheme from a stream of codepoints. Grapheme clustering is the process that lets a program see three 32-bit values as a single user-perceived character. The algorithm for grapheme clustering is defined in UAX #29, "Unicode Text Segmentation". I won't go into detail how the algorithm works, just grab any modern Unicode library for your programming language and it should be able to perform grapheme clustering.

A particular challenge with grapheme clustering is that it is stateful. You must have access to the previous codepoint, the current codepoint, and an integer state value to robustly determine the break points for graphemes. Therefore, it isn't super easy (but its not hard) to drop into an existing program.

With grapheme clustering, a terminal would see "๐Ÿง‘โ€๐ŸŒพ" as a single wide-character grapheme and move the cursor forward only two cells instead of four.

This isn't just for emoji. I'm using emoji as an example but grapheme clustering is also extremely important for correctly handling the languages of the world. For example, Arabic characters are often made up of multiple codepoints.

Aside: font shaping. Grapheme clustering only solves the problem of determining grapheme boundaries in a stream of codepoints. It does not solve the problem of rendering a stream of codepoints. For that, you need a font shaper such as Harfbuzz. Harfbuzz sees a stream of codepoints, detects the graphemes, and is able to then map those graphemes to individual glyphs in a font.

This blog post won't cover font shaping, but if you paste "๐Ÿง‘โ€๐ŸŒพ" in a terminal and see two emoji instead of one, its either due to the terminal stripping the zero-width joiner or more likely because the terminal doesn't support font shaping.


Grapheme Clustering in Terminals

Most terminals today do not support grapheme clustering. A major reason is because advanced terminal applications such as shells and text editors often need to constantly know exactly where the cursor is and remain in sync with the terminal grid state.

Because historically terminals used wcwidth, shells, editors, and other TUI apps also use wcwidth, and continue to do so today. Even though it produces the wrong values for multi-codepoint graphemes, at least the wrong values are often consistently wrong across terminal emulators.

When I first implemented text processing for my terminal, I implemented full grapheme cluster support because I assumed that would be a good thing. However, I was immediately disappointed to find that when I did proper grapheme clustering, fish shell would redraw my prompt in the wrong place when I moved my cursor. ๐Ÿ˜ž My shell assumed my terminal used wcwidth, and my terminal assumed programs didn't care. We both were wrong, so I disabled grapheme clustering...

Enter mode 2027. Mode 2027 is a proposal for grapheme support in terminals. This proposal is from the author of the Contour terminal. The idea is that a program running in a terminal can notify the terminal that it wishes to operate with full support for grapheme clustering, and this feature can be turned on and off. The running program can also query the terminal to see if it supports this feature.

I recently implemented support for mode 2027 in my terminal and you can see it working below. With mode 2027 off, the cursor moves to column 5 after the farmer emoji (width of 4). With mode 2027 on, the cursor moves to column 3 (width of 2).


Terminal Comparison

The table below shows the reported width of "๐Ÿง‘โ€๐ŸŒพ" by various terminals. For each terminal, I used the latest version I could find as of October 2, 2023. I put my own terminal first in the list but otherwise listed them alphabetically.

TerminalWidthMode 2027Notes
Ghostty2โœ…Falls back to wcwidth if mode 2027 is disabled
Alacritty4โŒDoesn't support shaping, displays as two separate emoji
Contour2โœ…Invented Mode 2027, always performs grapheme clustering
Foot2โœ…Falls back to wcwidth if mode 2027 is disabled
Gnome4โŒDoesn't support shaping, displays as two separate emoji
iTerm2โŒAlways performs grapheme clustering
Kitty4โŒ
Tmux4โŒParticularly fun when this doesn't match the terminal emulator
Terminal.app6โŒ๐Ÿคก Living in its own cursed little world
Warp4โŒDoesn't support shaping, displays as two separate emoji
Wezterm2โœ…Always performs grapheme clustering
Windows Terminal5โŒ๐Ÿง Considers ZWJ one cell, displays as two separate emoji
Xfce4โŒDoesn't support shaping, displays as two separate emoji
xterm4โŒDoesn't support shaping, displays as two separate emoji

Mode 2027 support is still rare and relatively unsupported. Its interesting to see however the variance in reported width across all terminals. "2" and "4" are both understandable values. The terminals reporting "5" and "6" are living a weird reality.

One special challenge is terminal multiplexers such as tmux or zellij. Notice above that tmux uses wcwidth and thus moves the cursor forward by four. But tmux itself lives in a terminal emulator which also sees the value when printed. If tmux and the terminal don't match, the cursors become out of sync and the resulting bugs can be comical3.


What Can Program Authors Do Today?

If you're the author of a program that runs in a terminal (text editor, CLI, TUI, etc.), there are things you can do today to handle grapheme clusters more appropriately.

One, you can just not care. If your program doesn't need to keep track of where the cursor is, doesn't need to manually line-break, etc. then you can simply not care and let the terminal do whatever it does.

Two, you can and should query for mode 2027 support and try to use mode 2027 if it works. You can query for mode 2027 support using the standard sequence DECRQM (request mode): CSI ? 2027 $ p.

Three, you can query the cursor position after outputing a series of text using CSI 6 n. The terminal will then report where the cursor is and you can use this to calculate the width of your text.

What you should not do is assume wcwidth behavior or grapheme clustering behavior. You can see from the table in the previous section that either assumption would be wrong across multiple terminal emulators, even if you only consider the popular or mainstream ones.


Hopefully in the future more terminals and terminal programs will support mode 2027 and proper grapheme clustering. As noted earlier, this doesn't just impact "cute" features such as emoji but also affects various world languages such as Arabic.

If you're not a terminal or terminal program author, hopefully you found this blog post insightful. Implementing a terminal emulator was certainly eye-opening for me to see the complexities in text processing and the burden of legacy behaviors.

I also want to give a quick shout out to Christian Parpart for proposing mode 2027. Christian is the first person I saw very publicly talk about the issues with grapheme clusters and terminals and also decided to try to do somthing about it. Thank you!

Footnotes

  1. I haven't found a terminal emulator that moves 3, thankfully. โ†ฉ

  2. We're going to assume 8-bits is a byte, which is a fairly safe assumption nowadays. โ†ฉ

  3. It is luckily very difficult to trigger this because tmux very actively controls the cursor position manually. โ†ฉ