Ticket #1611 (accepted defect) — at Version 4

Opened 15 years ago

Last modified 15 years ago

--enable-charset by default

Reported by: egmont Owned by: iNode
Priority: major Milestone: 4.7.0-pre3
Component: mc-core Version: master
Keywords: Cc:
Blocked By: Blocking:
Branch state: Votes for changeset:

Description (last modified by iNode) (diff)

viewer/editor not doing utf-8

mc-4.7.0-pre2 with UTF-8 everywhere. Locale is set to UTF-8, mc's Display bits is UTF-8 too. The main screen is fine.

mc's builtin viewer and editor, though, still use some 8-bit character set, so accents don't appear correctly.

This is a serious regression from 4.6.x+utf8 patches where the viewer and editor had reasonably good UTF-8 support.

Change History

comment:1 in reply to: ↑ description Changed 15 years ago by andrew_b

Replying to egmont:

mc's builtin viewer and editor, though, still use some 8-bit character set, so accents don't appear correctly.

What do you mean?

comment:2 Changed 15 years ago by egmont

Sorry, I wasn't clear. They do assume UTF-8 when communicating with the terminal, but they assume ISO-8859-whatever encoding for the content of the file. So it appears in "double utf8" encoding, every accented letter replaced by two symbols.

E.g. The file contains (in UTF-8 encoding): áéõûőű
What I see in mcview/mcedit: áéõûÅ<ű

Is it supposed to work correctly? I'd be more than happy to hear that it's already implemented, it's just something unusual in my environment. Probably I forgot an option to ./configure or to change something in setting? Any idea? I'm eager to figure it out.

Thanks!

comment:3 Changed 15 years ago by egmont

Hah, --enable-charset lets you choose the charset of the file, including "No translation" and UTF-8. Then it works fine.

Without charset support, the default is "no translation" for filenames, but "latin1" for file content. This doesn't sound logical to me. I think the behavior should be "no translation" for file contents too.

Or, alternatively, charset support should be turned on by default.

Nowadays more and more distributions use UTF-8 by default and it's the recommended encoding for everything: filenames, file content etc. Imagine thousands of users downloading and installing mc-4.7 just as I did and figuring out that file contents are not displayed correctly. Imagine tons of stupid bug reports just as this one :) You don't want that, users don't want that either. The default behavior (simplest way of compiling and running mc) should provide proper support for fully UTF-8 systems.

I've got a some other similar philosophical corners, I'll file separate report for them.

Overall, however, mc's forthcoming official UTF-8 support looks super great, HUGE THANKS to everyone involved!!!

comment:4 Changed 15 years ago by iNode

  • Owner set to iNode
  • Status changed from new to accepted
  • Description modified (diff)
  • Summary changed from viewer/editor not doing utf-8 to --enable-charset by default

Yes, egmont, you are right.
I'm also propose --enable-charset by default.

Note: See TracTickets for help on using tickets.