Ticket #1978 (new defect)

Opened 14 years ago

Last modified 7 months ago

case insensitive sort puts entries in panel in unexpected order

Reported by: yury_t Owned by:
Priority: major Milestone: Future Releases
Component: mc-core Version: 4.7.0.1
Keywords: Cc: gotar@…
Blocked By: Blocking:
Branch state: no branch Votes for changeset:

Description

If "Case sensitive" is unchecked on the "Sort options", then entries get sorted not only disregarding the case, but also discarding the leading non-alpha characters.

Reproducing:

  1. In some dir, make dir +++b
  2. In the same dir, make dir a

3.1. With "case sensitive" ON (and "reverse" OFF) these entries are output in such order: +++b, then a. That is a correct order.
3.2. With "case sensitive" OFF these entries are output in such order: a, then +++b. So, the procedure seems to ignore the leading non-alphas, which is INCORRECT.

Change History

comment:1 Changed 14 years ago by andrew_b

What is your locale?

comment:2 Changed 14 years ago by yury_t

The locale is be_BY.UTF-8. However, I don't think the locale is to blame.

comment:3 follow-up: ↓ 4 Changed 14 years ago by andrew_b

I see the same in ru_RU.KOI8-R.

Sort procedure doesn't ignore anything. In general, MC uses strcmp(3) for case sensitive sort (filenames are converted to lower case before comparision).

In case insensitive sort, strcoll(3) is used, and filenames are used as is without any modifications. Therefore sort result is depends on strcoll() return value.

comment:4 in reply to: ↑ 3 Changed 14 years ago by andrew_b

Replying to andrew_b:

In general, MC uses strcmp(3) for case sensitive sort (filenames are converted to lower case before comparision).

In case insensitive sort, strcoll(3) is used, and filenames are used as is without any modifications. Therefore sort result is depends on strcoll() return value.

Sorry, I was wrong.

In case sensitive sort, the strcmp(3) is used, and native filenames are used.

In case insensitive sort, the strcoll(3) is used, and filenames are converted to lower case before comparision.

comment:5 Changed 14 years ago by yury_t

So, what's the final word on this? Will MC sort the entries correctly with case insensivity on?

comment:6 Changed 14 years ago by yury_t

I suppose this incorrect working of strcoll(3) may be attributed to the collation tables in UTF-8 locales in newer distributions of glibc (e.g., as packaged with the slackware 13.0 but not with the Slackware 12.2).

comment:7 Changed 14 years ago by yury_t

Ultimately, yes, this is indeed caused by the incorrect collation tables in the UTF-8 locales. Using the MC with the LC_COLLATE=C produces correct sort order, at least for the ASCII names.

Could the case insensitive sort be made to use the strcmp with tolower or something?

comment:8 follow-up: ↓ 9 Changed 14 years ago by wayfarer

In version 2.8 of GLib new function "g_utf8_collate_key_for_filename" was introduced, exactly for this purpose - to have correct sorting of filenames:

http://library.gnome.org/devel/glib/unstable/glib-Unicode-Manipulation.html#g-utf8-collate-key-for-filename

Unfortunately, it still doesn't consider all special chars, but at least dots are handled properly - and it's a big part of a problem. So maybe it is worth using?

It's really the collation table, which should be blamed: as far as I understand, in all locales special symbols are just ignored during collation. In C locale it works right, cause there collation order is just the same as order of standard ASCII codes.

OTOH, it's probably a right solution for general-purpose sorting to ignore special chars. But sorting file names has its specialties. So IMHO using above mentioned function and asking GLib guys to extend it to deal with other chars would be the better solution than trying to fix collation tables.

comment:9 in reply to: ↑ 8 ; follow-up: ↓ 10 Changed 14 years ago by andrew_b

Replying to wayfarer:

In version 2.8 of GLib new function "g_utf8_collate_key_for_filename" was introduced, exactly for this purpose - to have correct sorting of filenames:
http://library.gnome.org/devel/glib/unstable/glib-Unicode-Manipulation.html#g-utf8-collate-key-for-filename

Unfortunately, it still doesn't consider all special chars, but at least dots are handled properly - and it's a big part of a problem. So maybe it is worth using?

Unfortunately, g_utf8_collate_key_for_filename() doesn't provide the correct sorting order. Look at the #1536. That's why we don't use this function.

comment:10 in reply to: ↑ 9 Changed 14 years ago by wayfarer

Replying to andrew_b:

Unfortunately, g_utf8_collate_key_for_filename() doesn't provide the correct sorting order. Look > at the #1536. That's why we don't use this function.

Hm, but they did that trick with numbers intentionally. I think, different opinions about what "correct sorting order" is, are possible here.

But the root of all these problems, I suppose, is that "case sensitive" and "case insensitive" sorting in MC are inherently inconsistent, since different methods are used. Is it possible to use the same method, whether string comparison or collation, in both "sensitive" and "insensitive" modes? In ideal case, the exact behavior could be selected using MC settings/config params.

Cause solution provided in #1894 seems just like a workaround for one specific case.

comment:11 Changed 14 years ago by gotar

  • Cc gotar@… added

comment:12 Changed 13 years ago by andrew_b

  • Milestone changed from 4.7.1 to 4.8

comment:13 Changed 9 years ago by andrew_b

  • Branch state set to no branch
  • Milestone changed from 4.8 to Future Releases

comment:14 Changed 15 months ago by flocsy

13 years later...

I use mc 4.8.28 on MacOS (13.1 Ventura) installed from homebrew, and still the case-insensitive sort is "random", even when talking about ASCII and even English letters like "a" and "c"

For me even setting LC_COLLATE=C didn't help

comment:15 Changed 7 months ago by mjosifek

I use brew package (4.8.30) on macOS 13.5.2 and I can confirm that sort order is "random" if case-insensitive is enabled. Eg. filenames starting with c or C are always first. LANG=en_US.UTF-8

Last edited 7 months ago by mjosifek (previous) (diff)
Note: See TracTickets for help on using tickets.