RSS and displaying non-Latinate languages

BBC’s Richard Sambrook has a post in the comments at Rebecca McKinnon’s blog that caught my attention big-time. He describes talking to an engineer at the BBC who points out that RSS is currently better suited to displaying text in Latinate languages–viz–
“The issue is RSS does not have a way to display right to left languages correctly and is not very compatible with non Latin languages. I believe it just was not thought about deeply by the people and development effort behind RSS.

This slows down the growth of non Latin RSS adoption. We need to develop multiple language RSS and hopefully redefine standards and approaches.”

What do others have to say about this?
It it an issue? A perceived issue?
I’d like to learn more.

Latest Comments

  1. Xslf says:

    Yes, it is a real issue.
    As I wrote in an email to Ian Forrester (in response to a qestion he asked me about his post: ) –
    The problem I am facing is simple:
    If I use valid RSS with no dir=rtl, then 99% of the RSS readers will display the text block as LTR, with punctuation digits and English in wrong locations, making the whole thing unreadable.
    When adding dir=rtl, at least I can get about 50% of the RSS readers to display the post body properly (titles are still a mess).
    I don’t use unicode control characters for a few reasons:
    * They are a real pain to input- it is like entering the control characters for CR/LF or tag manually (but worse)- there are just to many places to enter them.
    * Most keyboard layouts do not have a direct way to enter them.
    * They make a mess of the text- they are only used for the RSS, and unneeded for the editing or the html display, and can produce unexpected results when entered into the text.
    * There are many clients that incorrectly display them as visible characters in the text.
    * They make the text much more difficult to edit- if you change the text, you need to go back and change them as well. And since they are invisible, you get an awful lot of trial an error.
    * They force me to use explicit directionality, which complicates things and makes the text less portable.
    * My web app that creates the RSS from my HTML does not know how to add them automatically.
    * Since they are rarely used in other contexts, I can’t focus on the content when writing, and have to start thinking more closely about the presentation.
    * Moving from me to other users- most Hebrew/Arabic users don’t know about them, and don’t want to know. You try to explain to your mother that when she is writing in her weblog, she can’t write in here usual manner, but has to enter this strange codes in a foreign language which have complicated rules (I have seen many pros get confuses with these characters, I don’t expect laypeople to understand them).
    * It doesn’t scale- think about a an Israeli blog hosting service- they want to offer RSS feeds for all the blogs, with minimum work for the users. Relaying on unicode control characters just doesn’t do it.
    * Since they are complex, it is difficult to create a GUI for entering them (unlike general RTL/LTR controls, which are available everywhere).
    Not having the dir attribute in RSS gets rid of some markup- in favor of lower level much more complex control characters. A bad deal, IMO, and one which is a major cause for the problems when dealing with Hebrew/Arabic RSS.
    I think that the root of the problem is that bidi is part presentation and part structure. And since even in the best of cases (for example, the automatic bidi control in recent QT or GTK applications on Linux) there are still many many cases that can *not* be covered reliably by the display algorithms of the software, I tend to think that for practical prepossess, bidi is more structure then presentation.
    I sure wish there was a way in RSS to tell the client “this element is RTL” or “this area is LTR” without resorting to HTML hacks. But at the moment, those hacks are the only practical tool I have to get at least *some* of the readers out there to display the text properly (more like “mostly properly”).

    Shoshannah Forbes

Latest Comments

Comments are closed.