Ticket #2386 (closed defect: fixed)

Opened 8 years ago

Last modified 6 years ago

Interpretation of LANG variable needs to be case insensitive.

Reported by: urkle Owned by: andrew_b
Priority: major Milestone: 4.8.3
Component: mc-core Version: 4.7.4
Keywords: Cc:
Blocked By: Blocking:
Branch state: merged Votes for changeset: committed-master committed-stable

Description

Related bug in iTerm 2

http://code.google.com/p/iterm2/issues/detail?id=204

When the LANG variable is set to en_US.utf-8 mcedit specifically does not correctly accept input (every character press is interpreted as a '.'). However when LANG is set to en_US.UTF-8 mcedit works correctly.

From the work on the bug against iTerm 2 it was discovered that in reality midnight commander is not handling the LANG and LC_* environment variable correctly.

From the IANA document on character sets.

The character set names may be up to 40 characters taken from the
printable characters of US-ASCII. However, no distinction is made
between use of upper and lower case letters.

http://www.iana.org/assignments/character-sets

Attachments

utf8.cc (2.1 KB) - added by urkle 8 years ago.
UTF test script

Change History

comment:1 Changed 8 years ago by andrew_b

MC doesn't directly interpret the LC_* and LANG variables. It detects the encoding using nl_langinfo (CODESET).

I cannot reproduce this bug on Linux. Both ru_RU.UTF-8 and ru_RU.utf-8 values of LANG are interpreted as utf-8 locale and MC works fine for me with that both values.

I can't find MC details at http://code.google.com/p/iterm2/issues/detail?id=204: MC version, GLib version, wich screen library MC is built with (S-Lang or NCurses).

comment:2 Changed 8 years ago by urkle

MC version has been 4.7.+ (First noticed it with 4.7.0.3 currently using 4.7.4)

glib2 version is 2.22.4
MC is currently built with slang (issue occurred when built with ncurses as well)

This is on Mac OS X 10.6.4.

And the issue is NOT specific to iTerm either.. the Standard Mac OSX terminal also exhibits the same behavior if the LANG is set to a lowercase utf-8. (the default there is upper case though)

BTW, I can't recreate on my linux box either, only the Mac system.

Changed 8 years ago by urkle

UTF test script

comment:3 Changed 8 years ago by urkle

I attached a test C++ program that I used for actually a different purpose but it does show some "oddities" between how Mac OS X and Linux return back information about the character set.

Specifically, the nl_langinfo(CODESET); call.

On linux it ALWAYS returns upper case UTF-8 whether the LANG is set to utf-8 or UTF-8.
On Mac OS X, it returns the same case as the LANG input.

comment:4 Changed 7 years ago by andrew_b

  • Branch state set to no branch
  • Milestone changed from 4.7 to Future Releases

comment:5 Changed 6 years ago by andrew_b

  • Owner set to andrew_b
  • Status changed from new to accepted
  • Component changed from mcedit to mc-core
  • Branch state changed from no branch to on review
  • Milestone changed from Future Releases to 4.8.3

Branch: 2386_LANG_case_insensitive (parent: master).
changeset:c45e5a67123f6c483a4032a7130042295a273254

urkle, plese test this fix.

comment:6 Changed 6 years ago by slavazanko

  • Votes for changeset set to slavazanko

comment:7 Changed 6 years ago by angel_il

  • Votes for changeset changed from slavazanko to slavazanko angel_il
  • Branch state changed from on review to approved

comment:8 Changed 6 years ago by andrew_b

  • Status changed from accepted to testing
  • Keywords stable-candidate added
  • Votes for changeset changed from slavazanko angel_il to committed-master
  • Resolution set to fixed
  • Branch state changed from approved to merged

comment:9 Changed 6 years ago by andrew_b

  • Keywords stable-candidate removed
  • Status changed from testing to closed
  • Votes for changeset changed from committed-master to committed-master committed-stable
Note: See TracTickets for help on using tickets.