История изменений

Исправление Zubok, 30.06.17 20:08 (текущая версия) :

Вот мне и непонятно как ISO8859-* или KOI8-R участвуют в рисовке юникодных строк.

А вот, похоже, что так и есть. Тут один человечек пишет, что покопался в Xlib в части преобразования строчки из UTF-8. Утверждает, что Xlib почему-то пытается преобразовать строчку не к iso10646-1, хотя ты его и выбрал.

Вот ссылочка: https://blog.summercat.com/x-fonts-and-rpbar.html Смотри раздел под названием «Xlib and iso10646-1 charset fonts». Выдержка:

I figured this out through examining the Xlib code.
The reason is similar to the problem I describe with the problem characters: Xlib's conversion code does not care what charsets are available in the fontset you are using. It converts to various charsets, in order, based on those listed in the X Locale Database. If you then try to display text with your fontset, then it's just too bad if it converted to a charset not available in your fontset.
Take the character a. If we load only an ISO10646-1 charset font into the fontset, and then try to display it, Xlib takes the input as UTF-8 and converts it internally to ISO8859-1, and then tries to display it using this charset in your fontset. But your fontset does not have a charset listed for ISO8859-1 as that is one of the charsets the fontset is missing.
Even though Xlib would be able to convert the a to ISO10646-1, it accepts the match to ISO8859-1 first and does not try any others.
We might think that a is the same in both of these encodings (if we take ISO10646-1 to be UTF-8), but in Xlib this is not the case. I suspect it may be because internally ISO10646-1 is actually UCS-2, at least for fonts. There are comments to that effect in lcUTF8.c anyway.
You can see this is the problem by playing with lcUTF8.c's create_tofontcs_conv() where we set up the encodings to try.

Исправление Zubok, 30.06.17 20:08:

Вот мне и непонятно как ISO8859-* или KOI8-R участвуют в рисовке юникодных строк.

А вот, похоже, что так и есть. Тут один человечек пишет, что покопался в Xlib в части преобразования строчки в UTF-8. Утверждает, что Xlib почему-то пытается преобразовать строчку не к iso10646-1, хотя ты его и выбрал.

Вот ссылочка: https://blog.summercat.com/x-fonts-and-rpbar.html Смотри раздел под названием «Xlib and iso10646-1 charset fonts». Выдержка:

I figured this out through examining the Xlib code.
The reason is similar to the problem I describe with the problem characters: Xlib's conversion code does not care what charsets are available in the fontset you are using. It converts to various charsets, in order, based on those listed in the X Locale Database. If you then try to display text with your fontset, then it's just too bad if it converted to a charset not available in your fontset.
Take the character a. If we load only an ISO10646-1 charset font into the fontset, and then try to display it, Xlib takes the input as UTF-8 and converts it internally to ISO8859-1, and then tries to display it using this charset in your fontset. But your fontset does not have a charset listed for ISO8859-1 as that is one of the charsets the fontset is missing.
Even though Xlib would be able to convert the a to ISO10646-1, it accepts the match to ISO8859-1 first and does not try any others.
We might think that a is the same in both of these encodings (if we take ISO10646-1 to be UTF-8), but in Xlib this is not the case. I suspect it may be because internally ISO10646-1 is actually UCS-2, at least for fonts. There are comments to that effect in lcUTF8.c anyway.
You can see this is the problem by playing with lcUTF8.c's create_tofontcs_conv() where we set up the encodings to try.

Исходная версия Zubok, 30.06.17 20:07:

Вот мне и непонятно как ISO8859-* или KOI8-R участвуют в рисовке юникодных строк.

А вот, похоже, что так и есть. Тут один человечк пишет, что покопался в Xlib в части преобразования строчки в UTF-8. Утверждает, что Xlib почему-то пытается преобразовать строчку не к iso10646-1, хотя ты его и выбрал.

Вот ссылочка: https://blog.summercat.com/x-fonts-and-rpbar.html Смотри раздел под названием «Xlib and iso10646-1 charset fonts». Выдержка:

I figured this out through examining the Xlib code.
The reason is similar to the problem I describe with the problem characters: Xlib's conversion code does not care what charsets are available in the fontset you are using. It converts to various charsets, in order, based on those listed in the X Locale Database. If you then try to display text with your fontset, then it's just too bad if it converted to a charset not available in your fontset.
Take the character a. If we load only an ISO10646-1 charset font into the fontset, and then try to display it, Xlib takes the input as UTF-8 and converts it internally to ISO8859-1, and then tries to display it using this charset in your fontset. But your fontset does not have a charset listed for ISO8859-1 as that is one of the charsets the fontset is missing.
Even though Xlib would be able to convert the a to ISO10646-1, it accepts the match to ISO8859-1 first and does not try any others.
We might think that a is the same in both of these encodings (if we take ISO10646-1 to be UTF-8), but in Xlib this is not the case. I suspect it may be because internally ISO10646-1 is actually UCS-2, at least for fonts. There are comments to that effect in lcUTF8.c anyway.
You can see this is the problem by playing with lcUTF8.c's create_tofontcs_conv() where we set up the encodings to try.