Ticket #1611 (accepted defect) — at Version 4
--enable-charset by default
Reported by: | egmont | Owned by: | iNode |
---|---|---|---|
Priority: | major | Milestone: | 4.7.0-pre3 |
Component: | mc-core | Version: | master |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Branch state: | Votes for changeset: |
Description (last modified by iNode) (diff)
viewer/editor not doing utf-8
mc-4.7.0-pre2 with UTF-8 everywhere. Locale is set to UTF-8, mc's Display bits is UTF-8 too. The main screen is fine.
mc's builtin viewer and editor, though, still use some 8-bit character set, so accents don't appear correctly.
This is a serious regression from 4.6.x+utf8 patches where the viewer and editor had reasonably good UTF-8 support.
Change History
comment:1 in reply to: ↑ description Changed 15 years ago by andrew_b
comment:2 Changed 15 years ago by egmont
Sorry, I wasn't clear. They do assume UTF-8 when communicating with the terminal, but they assume ISO-8859-whatever encoding for the content of the file. So it appears in "double utf8" encoding, every accented letter replaced by two symbols.
E.g. The file contains (in UTF-8 encoding): áéõûőű
What I see in mcview/mcedit: áéõûÅ<ű
Is it supposed to work correctly? I'd be more than happy to hear that it's already implemented, it's just something unusual in my environment. Probably I forgot an option to ./configure or to change something in setting? Any idea? I'm eager to figure it out.
Thanks!
comment:3 Changed 15 years ago by egmont
Hah, --enable-charset lets you choose the charset of the file, including "No translation" and UTF-8. Then it works fine.
Without charset support, the default is "no translation" for filenames, but "latin1" for file content. This doesn't sound logical to me. I think the behavior should be "no translation" for file contents too.
Or, alternatively, charset support should be turned on by default.
Nowadays more and more distributions use UTF-8 by default and it's the recommended encoding for everything: filenames, file content etc. Imagine thousands of users downloading and installing mc-4.7 just as I did and figuring out that file contents are not displayed correctly. Imagine tons of stupid bug reports just as this one :) You don't want that, users don't want that either. The default behavior (simplest way of compiling and running mc) should provide proper support for fully UTF-8 systems.
I've got a some other similar philosophical corners, I'll file separate report for them.
Overall, however, mc's forthcoming official UTF-8 support looks super great, HUGE THANKS to everyone involved!!!
Replying to egmont:
What do you mean?