Telegu Character

In recent years, Apple has been heavily criticized for the security implications of their market centralization and policy of irreversible operating system updates. Mobile device users are strongly pressured to install packaged iOS upgrades that cannot be rolled back. While this practice greatly increases security for most users, there is an inherent danger to this centralization.

Every flaw or weakness leaves over 100 million Apple device users vulnerable to exploitation for illegal purposes. The most recent iPhone bug (CVE-2018-4124) has been wreaking havoc due to unexpected behavior when the operating system attempts to display a particular Indian character from the Telugu language.

Any application that attempts to display the character on an iOS device crashes instantly, and cannot recover until the offending character is removed. The entire device crashes and restarts if the character triggers the bug in a component of the operating system, such as SpringBoard. This can result in an endless bootloop that can only be halted by a Device Firmware Update (DFU) restore, which causes the loss of all data.

Unfortunately, this frustrating bug extends beyond the iPhone; users have reported the same flaw in other devices such as iPads, the Apple Watch, and Mac computers. It is remarkable that a company with Apple’s resources overlooked such a widespread bug that affects their entire product line.

Introduction to Unicode characters

The Unicode standard allows a wide range of languages to be consistently displayed by assigning a unique number to each character. Consequently, text from any alphabet can be faithfully reproduced independent of language or computing platform. This universal encoding eliminates many of the misunderstandings that arise with the use of limited character sets, such as ASCII. Unicode characters are referenced in text by writing “U+” followed by the character code represented in hexadecimal format (for example, a capital “A” can be indicated by U+0041).

Unicode is organized into different groups, whose characters/letters correspond to various alphabets and their writing styles. For example, this article is written with characters that belong to the “Basic Latin” group.

In addition to letters, numbers, and symbols, Unicode also contains invisible “special characters” that modify how visible characters are displayed (for example: U+0020, the “space” that places a gap between two words). A complete list of Unicode characters can be found here.

By default, Unicode character engines write text from left to right. However, the standard must also accomodate languages such as Arabic and Hebrew that are written from right to left. This is accomplished by including an invisible “right-to-left mark” character that encodes the appropriate alignment. The modifier signals to the device to align the text like this:

بالعالم

The Telugu Language

Telugu is a Dravidian language spoken by 5% of the Indian population (approximately 75 million individuals). In Western languages, each word is typically represented by several characters grouped horizontally.

However, words in Telugu can be written by combining several letters into a compound symbol, which must be included in Unicode standard. If these characters were omitted, then it would be difficult to translate devices and programs into certain languages without introducing compatibility issues. Let’s take a look at the Telugu symbol “jñā” that causes Apple software to crash. Don’t worry, the characters shown below are displayed safely and will not harm your device in any way.

జ్ఞా - jñā.

The above compound symbol is created by combining five different characters:

  • జ, the consonant letter “ja” (U+0C1C)
  • ్, a “virama” mark placed above consonants to indicate that the following vowel should be muted (U+0C4D)
  • ఞ, the consonant letter “ña” (U+0C1E)
  • The “zero-width” non-joiner, abbreviated ZWNJ (U+200C)
  • ాా, the long vowel “aa” (U+0C3E)

The Zero-Width Non-Joiner: A Very Special Character

The ZWNJ separator used in the jñā symbol is an invisible (“non-printing”) character that slightly changes the appearance of the letters on either side. Some two-letter combinations are connected by a “ligature” between the characters.

If a ZWNJ is placed between the letters, the characters will be printed separately. This subtle visual modification typically does not change the meaning of the letters or word.

The following image shows a phrase featuring two frequently-connected letter combinations (“Th” and “fi”) displayed with a ligature (top) and separated by a ZWNJ (bottom).

Non joiner Character

The issue

When symbols are constructed from several characters, RAM memory must be allocated for each component. In this case, to display జ్ఞ the device must load: జ, ్, ఞ, ZWNJ, and ా.

An incorrect handling of the ZWNJ separator while combining the characters seems to be the cause of the Telugu bug. The symbol buffer in iOS returns a null pointer to the application, instead of a pointer to correctly allocated memory. When the application tries to access a memory location that doesn’t exist, iOS detects this and revokes that application’s RAM read/write permissions to avoid further memory corruption.

Thus, the attempt to combine the separate Telugu characters into a single symbol results in the unrecoverable error:

Exception Type: EXC_BAD_ACCESS (SIGSEGV)
Exception Subtype: KERN_INVALID_ADDRESS at 0x0000000000000000
Termination Signal: Segmentation fault: 11
Termination Reason: Namespace SIGNAL, Code 0xb
Terminating Process: exc handler [0]
Triggered by Thread: 0

The offending process is stopped abruptly to protect the core of the operating system from total corruption that would potentially “brick” the device, rendering it useless and unable to restore even from DFU mode.

This emergency halt occurs every time the system tries to display జ్ఞా with the ZWNJ. This does not mean that any ZWNJ character causes a potential stoppage of the anomalous application. Theoretically, if the Telugu symbol resides in an application like SpringBoard, the system will have no problem closing it, since it gives precedence to the Core of the iOS.

Apple Response

Apple quickly released a patch for this issue with iOS update 11.2.6 for iPhones and iPads. Notably, this is not the first time that Apple products have been afflicted with glitches caused by peculiar Unicode symbols. This is somewhat ironic, since Apple was one of the early leaders for implementation and standardization through Unicode.

Many Apple device operating systems (iOS, MacOS, WatchOS) share the same Core, which includes the kernel and key components of the operating system, such as the graphics engine for displaying fonts and icons. Consequently, vulnerabilities such as the Telugu bug are shared across the whole family of products.

Another misstep for Apple? Is this umpteenth mistake another consequence of the company’s prioritization of resources toward marketing instead of engineering and development?