Ticket #1627 (closed defect: fixed)
Viewer doesn't support CJK
Reported by: | egmont | Owned by: | slavazanko |
---|---|---|---|
Priority: | major | Milestone: | 4.7.0-pre4 |
Component: | mcview | Version: | 4.7.0-pre2 |
Keywords: | Cc: | galtgendo@… | |
Blocked By: | Blocking: | ||
Branch state: | Votes for changeset: | committed-master |
Description
In a fully UTF-8 environment, the builtin viewer doesn't properly display double width (CJK) characters. (The editor does display them, though.) Locale, mc's display width are set to UTF-8, file charset (T) set to UTF-8 or No translation.
Open any text file with UTF-8 CJK characters in one of its lines. All but the last characters are replaced by a single (not double) space, and then the last one is shown correctly (but positioned incorrectly).
Terminal is from Mac OSX 10.5.8; I don't think it matters. mc-4.7-pre2 with slang ran from Mac as well as from Linux over ssh.
Attachments
Change History
comment:2 Changed 15 years ago by andrew_b
Currentrly, MC supports 7-bit, 8-bit and UTF-8 locales. Multibyte non-UTF-8 locales are not supported. Parches are welcome!
comment:3 follow-up: ↓ 4 Changed 15 years ago by egmont
I *am* talking about UTF-8.
CJK is a well-known abbreviation on Chinese, Japanese and Korean characters. They require special treatment in terminals because they take up 2 character cells. It has nothing to do with encoding.
CJK characters in UTF-8 are supported by mc's two panel mode, it seems to me that they are handled correctly on UI strings (mc's Japanese translation for example), in filenames, in command line, in dialog boxes etc. They are also handled correctly (minor bugs put aside) by mcedit. mcview seems to be the only component that doesn't support them at all. This is a regression from 4.6+utf8 patches - they did correctly display CJK in the viewer.
comment:4 in reply to: ↑ 3 Changed 15 years ago by angel_il
Replying to egmont:
CJK is a well-known abbreviation on Chinese, Japanese and Korean characters. They require special treatment in terminals because they take up 2 character cells. It has nothing to do with encoding.
you right, but it seems to me that I have repaired it in the current version "master", or not?
comment:6 Changed 15 years ago by mnk
I really hate such regressions.
I ranted about this problem several months ago - it got fixed then.
But whoever redone viewer broke it again.
I'm in the process of re-figuring out midnight (as I've been out of touch lately),
but the outlook looks good.
Perhaps I'll have a correct patch soon.
comment:7 Changed 15 years ago by mnk
v.1.0 is mostly working - 2 issues
- floating wrapping: if you scroll through long CJK text
(meaning - nearly no spaces, almost whole double-width),
you'll see line break of the first line is moving as you scroll
(bad description, see yourself); IIRC, it was that way even before regression
- 4096 bytes break: every 4096 bytes there's a chance it happens in the middle
of an utf8 char (I'm not sure if this can happen on non-double-width and
don't know yet where does this number come from), leading to a valid char(s)
treated as unprintable
comment:8 Changed 15 years ago by mnk
Well, second issue seems to come from mcview_file_load_data,
but short of moving distinction between utf8 and 1-byte from
display only to load data stage, I can't see how to fix it.
comment:10 Changed 15 years ago by mnk
v.1.0 is only for src/viewer/plain.c, but
src/viewer/nroff.c may need that too.
comment:11 Changed 15 years ago by angel_il
branch: 1627_widechar_in_viewer
- 4096 bytes break: every 4096 bytes there's a chance it happens in the middle...
i need think about this...
comment:12 Changed 15 years ago by angel_il
changeset: 62a8f92ef661d10c6ed4cb79ab765f7d50404e17
comment:13 Changed 15 years ago by angel_il
- Status changed from new to accepted
- Owner set to angel_il
comment:14 Changed 15 years ago by mnk
Good example for floating wrapping can be seen,
when viewing ftp://ftp.monash.edu.au/pub/nihongo/radkfile.gz
(after locally uncompressing).
You'll see that as you scroll down, break in the first line moves.
You'll see the 4096 bug there too.
comment:15 Changed 15 years ago by mnk
And as for your fix, are you sure zero-width chars
won't be a problem (OK, I'm not sure which are those
and if they're printable, hoping you do) ?
comment:18 Changed 15 years ago by angel_il
nroff: b0c06ef13fbb559a16218241d3327490d08c2a4d
other known troubles should be fixed in #1730
comment:20 Changed 15 years ago by angel_il
first: 30c6de773e4c26f871b898b4c61efdd2040fb803
g_unichar_iszerowidth: 7a2f89cc7168d936986fde87065c984e9243af27
zerowidth: 14e91ecff3bc5904fd0ebe0e46d17a057c5c80a8
comment:21 Changed 15 years ago by angel_il
first: 9a4b71a0cb198e378d0698fe959ccfccb8dbc7dd
g_unichar_iszerowidth: 393932a06133f45a96fd1da38a6b12f5baf7b2f8
zerowidth: f6b8dc12b5c6d3fba21263872cbe110e8e128fc2
comment:22 Changed 15 years ago by slavazanko
- Votes for changeset changed from andrew_b to andrew_b slavazanko
- severity changed from on review to approved
comment:23 Changed 15 years ago by angel_il
- Status changed from accepted to testing
- Votes for changeset changed from andrew_b slavazanko to commited-master
- Resolution set to fixed
- severity changed from approved to merged
comment:25 Changed 15 years ago by slavazanko
- Status changed from closed to reopened
- Votes for changeset commited-master deleted
- Resolution fixed deleted
- severity changed from merged to no branch
src/glibcompat.c have incorrect code.
comment:26 Changed 15 years ago by slavazanko
- Status changed from reopened to accepted
- Owner changed from angel_il to slavazanko
- severity changed from no branch to on review
Created branch 1627_glib_macros_fix
Initial changeset:7f056d01edf85b0790ed0cfe748d24d0ca904e18
Review, please.
comment:27 Changed 15 years ago by slavazanko
Branch rebased: 35f90097d284b9a733764f09992f8442af9baa95
comment:29 Changed 15 years ago by andrew_b
- Votes for changeset changed from angel_il to angel_il andrew_b
- severity changed from on review to approved
comment:30 Changed 15 years ago by slavazanko
- Status changed from accepted to testing
- Votes for changeset changed from angel_il andrew_b to commited-master
- Resolution set to fixed
- severity changed from approved to merged