Although most early computer development was done by natives of Latin-based languages, the need to support additional scripts was realized fairly early. Unfortunately, many independent and incompatible attempts were made at accommodating differing language scripts and directionality. This led to frustration and confusion, not to mention dataloss, when storing or transferring non-Latin scripts, or scripts not written from left to right. Fortunately, the Unicode encoding standard was developed, which addresses and provides solutions for nearly all of the issues encountered in non-Latin scripts' usage. This article discusses those issues specific to Right-To-Left scripts, and provides working examples so that the reader may avoid the common issues. Special thanks to Stan Goodman for invaluable editorial advice.
The term directionality refers to the direction in which text flows from character to character. For purposes of this article, there exist three distinct directionalities: LTR, RTL, and neutral. Directionality is a property of characters, paragraphs, and documents. Characters' directionality is an intrinsic property, whereas paragraphs and documents have their directionality assigned to them. Note that I use the term "paragraphs" and "documents" here as the terms pertain to word processor files. In other applications the terminology will be different, however the concepts remain the same. In HTML the terms are "paragraphs/divs/spans" and "pages", whereas in plain text environments such as text editors and email there are no "paragraphs" and the "document" is always LTR.
The directionality of a character is intrinsic to that character. A character may possess LTR (for example, a Latin character), RTL (for example, a Hebrew character), or neutral directionality (for example, a punctualion mark). When two LTR characters are placed next to one another, the second character appears to the right of the first character. Likewise, when two RTL characters are placed next to one another, the second character appears to the left of the first character. Blocks of neutral characters generally take the directionality of non-neutral characters surrounding them, however when surrounded by characters of differing directionality the block generally takes the directionality of the paragraph. Additionally, special nonprinting control characters can be used to set directionality. These are discussed below.
A character block is a continuum of characters of like-directionality, be that LTR, RTL, or neutral. LTR and RTL blocks have their directionality as an intrinsic property, however neutral blocks are assigned the directionality of their surrounding blocks. Should a neutral block be surrounded by blocks of differing directionality, then it would be assigned the directionality of the paragraph. A character block can be 0 printed characters long in some cases, such as when using nonprinting control characters. An LTR block is a span of characters that contains only LTR characters, such as "Hello". Likewise, an RTL block is a span of characters that contains only RTL characters, such as "שלום". As a block cannot contain characters of differeing directionality, the phrase "Hello, world!" is actually composed of four blocks: one LTR block containing the characters H-e-l-l-o, then a neutral block containing a comma and a space, then another LTR block containing the characters w-o-r-l-d, and finally another neutral block containing an exclamation point. In an LTR environment the blocks would be laid out from left to right and the text would appear as expected. However, this is not the case in an RTL environment! In an RTL enviornment the second block (consisting of a comma and a period) would be assigned LTR as it is surrounded by LTR blocks, however the last neutral block (consisting of the exclamation point) would be assigned RTL as there is no LTR block following it. Thus, the comma-space block would be absorbed into the surrounding LTR blocks (because itself is LTR, and thus part of the continuum of LTR) and we are left with two blocks: the first consisting of the LTR continuum "Hello, world" and the second consisting of the single character "!". The bloacks would be laid out from right to left, and thus the exclamation point would appear to the left of the "Hello, world" block!
Paragraphs, unlike characters, do not possess intrinsic directionality but rather have their directionality assigned to them. A paragraph can be either LTR or RTL, but not neutral. This setting affects the order in which LTR, RTL, and neutral character blocks are displayed on the screen. The blocks of LTR and RTL characters will be arranged in the directionality of the paragraph. So if we have three blocks: "first", "שני", and "third" they will be ordered like so in LTR paragraph and RTL paragraphs:
|first שני third||first שני third|
Note that the order of the blocks follows the directionality of the paragraph. Neutral characters are assigned the directionality of the characters surrounding them:
|My 1st page הדף מספר 2 שלי My 3rd page||My 1st page הדף מספר 2 שלי My 3rd page|
However, neutral characters blocks which are adjacent to LTR characters on one side and RTL characters on the other, or who appear at the beginning or end of the paragraph, are assigned the directionality of the paragraph itself. Some common examples:
|This is a nice page.||This is a nice page.|
|This year is 2011.||This year is 2011.|
|The English was typed first והעברית הוקלד שני||The English was typed first והעברית הוקלד שני|
|העברית הוקלד ראשון and the English was typed second||העברית הוקלד ראשון and the English was typed second|
|זה דף נחמד.||זה דף נחמד.|
|השנה היא 2011.||השנה היא 2011.|
Like paragraphs, documents have their directionality assigned to them. The directionality of a document determines the order in which elements appear in the document. For example, an RTL spreadsheet will have the A column on the right side, with the B column to its left and so forth. A two-column RTL word processor document will have the first column on the right and the text will flow out of it into the left column.
Nonprinting characters are characters that do not appear as glyphs on the screen, but rather affect how the printing characters are displayed. A familiar nonprinting character is the Carriage Return, which does not produce a glyph on the screen but rather moves the text position the the beginning of the next line. Another familiar nonprinting character is the Tab character, which does not produce a glyph on the screen but rather moves the text position to the beginning of the next column.
The LRM (Left to Right Mark) and RLM (Right to Left Mark) characters are used to simulate the presence of a character of a specific directionality where no printed character is desired. A common use case of the LRM and RLM characters is to properly format a small span or line of text with directionally differing from that of it's parent paragraph. For instance, notice the position of the period at the end of a Hebrew sentence in an LTR paragraph:
|זאת השפה העברית.||זאת השפה העברית.|
The letters are RTL characters, but the period is a neutral character. The neutral character is assigned the directionality of it's surrounding characters, in this case RTL beforehand and nothing after (the period is the last character of the text). Thus, with no RTL character following it the period is assigned the directionality of the paragraph, which as stated is LTR. Thus we have two blocks: an RTL block with the characters "זאת שפה העברית" and a LTR block with the single period character. Being an LTR paragraph, the blocks are laid out from left to right. Thus, the period appears to the right of the sentence. However, if we add an RTL character after the period, it will then be assigned itself as an RTL character. This example makes use of the א character:
|זאת השפה העברית.א||זאת השפה העברית.א|
It can be seen that the period is now in the correct place, however we have the unwanted א character in there. The א character can be replaced with an RTL nonprinting character, the RLM (Right to Left Mark), in order to give the period its wanted directionality without also introducing a superfluous character. This is the same LTR block with the א character replaced by an RLM:
|זאת השפה העברית.||זאת השפה העברית.|
It can be seen that the period now appears in the expected place to the left of the text in the LTR environment. Likewise, here is the same exercise containing an English sentence in an RTL paragraph. Pay attention to the location of the punctuation:
|This is English.||This is English.|
And now adding an a character to the end in order to demonstrate the assignment of directionality to the period:
|This is English.a||This is English.a|
And now replacing the a character with an LRM (Left to Right Mark):
|This is English.||This is English.|
The LRE (Left to Right Embedding) and RLE (Right to Left Embedding) characters are used to simulate the presence of a LTR or RTL paragraph in a paragraph of differing or unknown directionality. The PDF (Pop Directional Formatting) character marks the end of an embedded section. As their name implies, the LRE and RLE characters change the directionality of the paragraph until the paragraph ends, or until a PDF character is encountered, or until an opposing LRE/RLE character is encountered. The usage of these characters is usually limited to environments in which paragraph or document directionality cannot be configured, such as plain text files, email, and legacy software.
By far the easiest way to enter nonprinting characters is to simply type them. However, not all characters are available on common keyboard variants. Other solutions include typing the characters as Unicode symbols or using a software character tool to add the characters directly to the text or system clipboard.
The Hebrew keyboard has a Lyx variant which places the RLM on Shift-ט and the LRM on Shift-א. On Linux and Mac OS-X the Lyx variant can be selected from the keyboard layout options. Windows users will have to find and install the Lyx variant themselves as it is not included in the OS. Note that there exists an unrelated document processor called LyX (note capitalization), this is not to be confused with the Lyx keyboard layout. Although useful for typing the LRM, RLM, and other Hebrew characters such as diacritic marks and the Shekel currency sign, the Lyx keyboard variant does not include keys for the LRE, RLE, or PDF characters.
In most common desktop environments Unicode symbols can be typed by holding the Alt key and pressing the numeric keypad plus sign then the Unicode value in hex (see table below). A more detailed explanation specific to Windows can be found on the FileFormat page: How to enter Unicode characters in Microsoft Windows. Note that some Windows installations need a registry hack before they will accept Unicode symbols. KDE users cannot use this method as KDE relegates responsibility for implementing this feature to Xorg, and Xorg relegates to Qt, and Qt relegates back to Xorg.
The simplest way to insert characters not available on a keyboard layout is to use a software character tool. Word processors such as OpenOffice usually have such a tool built in. In OpenOffice Writer, for example, use the menu item Insert → Formatting Mark for the LRM and RLM, or Insert → Special Character for all the nonprinting characters. Standalone software such as KCharSelect can be used to insert the characters into any other application. Gnome users have Gucharmap, and MS Windows ships with the Character Map application.
In OpenOffice Writer, pages can be configured as RTL in Format → Page → Page → Text Direction. Paragraphs can be configured to RTL in Format → Paragraph → Alignment → Text Direction. General RTL compatibility and associated toolbar buttons can be added from Tools → Options → Language Settings → Languages → Enhanced language support → Enabled for complex text layout (CTL).
In OpenOffice Calc, sheets can be configured as RTL in Format → Sheet → Right To Left. Cells can be configured to RTL in Format → Cells → Alignment → Text Direction. General RTL compatibility and associated toolbar buttons can be added from Tools → Options → Language Settings → Languages → Enhanced language support → Enabled for complex text layout. This setting is shared with Writer and other OpenOffice components.
HTML is fairly straightforward to configure for RTL. Like word processors documents, entire pages or individual paragraphs can be set to RTL. One common caveat here is to always use UTF-8 encoding for the text, and to declare it both in the page headers and in the meta tag.
<html dir="rtl" xmlns="http://www.w3.org/1999/xhtml">
<?php header("Content-Type: text/html; charset=utf-8"); ?>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
The input fields of web browsers varies from browser to browser and from platform to platform. Often the selection can be made from the input field's context menu (right-click). If not, then two popular keyboard shortcuts for changing directionality are Ctrl-leftShift or Ctrl-rightShift (Google Chrome) and Ctrl-Shift-X (Mozilla Firefox)
Anki has very configurable RTL controls. Each individual field can have its default directionality set to RTL in Settings → Deck Properties → Basic → Models → [highlight the desired model] → Edit → [highlight the desired card template] → Card Layout → Fields → Reverse text direction (rtl). Additionally, when inputing new data the user can change the current apparent directionality of the field or insert any of the nonprinting characters discussed above from the context menu. To change the current directionality, press Ctrl-leftShift or Ctrl-rightShift. To insert a nonprinting character, open the context menu (right-click) and select Insert Unicode control character.
Zim does not let the user control the directionality of the text. The directionality of the text is determined by the directionality of the first non-neutral character in the text on a per-line basis.
In plain text and Email there are no paragraphs to set the directionality for, and the document directionality is usually LTR. Some text editors and Email editors allow the user to change the apparent directionality on a per-document or a per-line basis. It is important to note that this apparent directionality is not a property of the document, and it will be lost when the file is closed or the Email is sent. It is up to the user who opens the document or the email (or his software) to set the proper apparent directionality.
The apparent directionality for the entire document can be set with Ctrl-leftShift or Ctrl-rightShift.
Kate does not let the user control the apparent directionality of the text. The apparent directionality is determined on a per-line basis by the directionality of the first non-neutral character in the line.
For writing Email, the apparent directionality for the entire document can be set with Ctrl-Shift-X. For reading mail, I recommend the BiDi Mail UI, which gets it right so often that I don't even know how to change the directionality because I've never needed it.
Though this should be obvious, the directionality of the text editor is determined by your web browser. See the section on configuring input field direction for web browsers.
|Character Name||Unicode Symbol||HTML Entity||Character|
|LRM - Left-to-Right Mark||U+200E||‎|
|RLM - Right-to-Left Mark||U+200F||‏|
|LRE - Left-to-Right Embedding||U+202A||‪|
|RLE - Right-to-Left Embedding||U+202B||‫|
|PDF - Pop Directional Formatting||U+202C||‬|
|To easily copy non-printing characters, click the relevent Select button and then press Ctrl-C on your keyboard.
For additional information about each character, see the linked FileFormat page.
If you are having problems with punctuation appearing on the wrong side of statements, then try setting the directionality of your paragraphs and documents. If that cannot be done, then try using LRM and RLM characters for short bits of text or LRE, RLE, and PDF characters for longer bits of text.
Date Published: 2011-05-29
Date Revised: 2011-07-28 Updated with invaluable editorial advice by Stan Goodman
Date Revised: 2011-08-14 Updated Typing nonprinting characters on the keyboard with tip from Alan Yaniger.
rtl, right to left, ltr, hebrew, arabic, farsi, typing A quick introduction to typing and using RTL text in email, documents, and webpages. rtl, right to left, ltr, hebrew, arabic, farsi, typing
[email protected] [email protected] [email protected] [email protected] [email protected]