Supporting International Characters

While many pieces of web software already support unicode and other character-encoding standards theoretically, their way of interoperating using them might be disfunctional.

Freedesktop.org runs the Project UTF-8, a resource for advocating unicode support in Open Source software. Maybe OSCOM should also do something in this space.

Joel Spolsky has published a short how-to on what every programmer should know about unicode. This is a very good starter on working with different character sets, as is Sam Ruby's i18n survival guide.

The Midgard CMS project has supported UTF-8 since late 1999, but still ships with latin-1 as the default. While this will change with the 1.6.0 release, a common habit of rawurlencode()ing document names can still lead to funny-looking URLs, as can be seen on the www.silja.ru site. This problem is fixed by MidCOM's requirement of stricter URL policy.

Read more Decoupled CMS posts.