Ticket #1978 (new defect)
case insensitive sort puts entries in panel in unexpected order
Reported by: | yury_t | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | Future Releases |
Component: | mc-core | Version: | 4.7.0.1 |
Keywords: | Cc: | gotar@… | |
Blocked By: | Blocking: | ||
Branch state: | no branch | Votes for changeset: |
Description
If "Case sensitive" is unchecked on the "Sort options", then entries get sorted not only disregarding the case, but also discarding the leading non-alpha characters.
Reproducing:
- In some dir, make dir +++b
- In the same dir, make dir a
3.1. With "case sensitive" ON (and "reverse" OFF) these entries are output in such order: +++b, then a. That is a correct order.
3.2. With "case sensitive" OFF these entries are output in such order: a, then +++b. So, the procedure seems to ignore the leading non-alphas, which is INCORRECT.
Change History
comment:2 Changed 15 years ago by yury_t
The locale is be_BY.UTF-8. However, I don't think the locale is to blame.
comment:3 follow-up: ↓ 4 Changed 15 years ago by andrew_b
I see the same in ru_RU.KOI8-R.
Sort procedure doesn't ignore anything. In general, MC uses strcmp(3) for case sensitive sort (filenames are converted to lower case before comparision).
In case insensitive sort, strcoll(3) is used, and filenames are used as is without any modifications. Therefore sort result is depends on strcoll() return value.
comment:4 in reply to: ↑ 3 Changed 15 years ago by andrew_b
Replying to andrew_b:
In general, MC uses strcmp(3) for case sensitive sort (filenames are converted to lower case before comparision).
In case insensitive sort, strcoll(3) is used, and filenames are used as is without any modifications. Therefore sort result is depends on strcoll() return value.
Sorry, I was wrong.
In case sensitive sort, the strcmp(3) is used, and native filenames are used.
In case insensitive sort, the strcoll(3) is used, and filenames are converted to lower case before comparision.
comment:5 Changed 15 years ago by yury_t
So, what's the final word on this? Will MC sort the entries correctly with case insensivity on?
comment:6 Changed 15 years ago by yury_t
I suppose this incorrect working of strcoll(3) may be attributed to the collation tables in UTF-8 locales in newer distributions of glibc (e.g., as packaged with the slackware 13.0 but not with the Slackware 12.2).
comment:7 Changed 15 years ago by yury_t
Ultimately, yes, this is indeed caused by the incorrect collation tables in the UTF-8 locales. Using the MC with the LC_COLLATE=C produces correct sort order, at least for the ASCII names.
Could the case insensitive sort be made to use the strcmp with tolower or something?
comment:8 follow-up: ↓ 9 Changed 14 years ago by wayfarer
In version 2.8 of GLib new function "g_utf8_collate_key_for_filename" was introduced, exactly for this purpose - to have correct sorting of filenames:
Unfortunately, it still doesn't consider all special chars, but at least dots are handled properly - and it's a big part of a problem. So maybe it is worth using?
It's really the collation table, which should be blamed: as far as I understand, in all locales special symbols are just ignored during collation. In C locale it works right, cause there collation order is just the same as order of standard ASCII codes.
OTOH, it's probably a right solution for general-purpose sorting to ignore special chars. But sorting file names has its specialties. So IMHO using above mentioned function and asking GLib guys to extend it to deal with other chars would be the better solution than trying to fix collation tables.
comment:9 in reply to: ↑ 8 ; follow-up: ↓ 10 Changed 14 years ago by andrew_b
Replying to wayfarer:
In version 2.8 of GLib new function "g_utf8_collate_key_for_filename" was introduced, exactly for this purpose - to have correct sorting of filenames:
http://library.gnome.org/devel/glib/unstable/glib-Unicode-Manipulation.html#g-utf8-collate-key-for-filename
Unfortunately, it still doesn't consider all special chars, but at least dots are handled properly - and it's a big part of a problem. So maybe it is worth using?
Unfortunately, g_utf8_collate_key_for_filename() doesn't provide the correct sorting order. Look at the #1536. That's why we don't use this function.
comment:10 in reply to: ↑ 9 Changed 14 years ago by wayfarer
Replying to andrew_b:
Unfortunately, g_utf8_collate_key_for_filename() doesn't provide the correct sorting order. Look > at the #1536. That's why we don't use this function.
Hm, but they did that trick with numbers intentionally. I think, different opinions about what "correct sorting order" is, are possible here.
But the root of all these problems, I suppose, is that "case sensitive" and "case insensitive" sorting in MC are inherently inconsistent, since different methods are used. Is it possible to use the same method, whether string comparison or collation, in both "sensitive" and "insensitive" modes? In ideal case, the exact behavior could be selected using MC settings/config params.
Cause solution provided in #1894 seems just like a workaround for one specific case.
comment:13 Changed 9 years ago by andrew_b
- Branch state set to no branch
- Milestone changed from 4.8 to Future Releases
comment:14 Changed 22 months ago by flocsy
13 years later...
I use mc 4.8.28 on MacOS (13.1 Ventura) installed from homebrew, and still the case-insensitive sort is "random", even when talking about ASCII and even English letters like "a" and "c"
For me even setting LC_COLLATE=C didn't help
comment:15 Changed 14 months ago by mjosifek
I use brew package (4.8.30) on macOS 13.5.2 and I can confirm that sort order is "random" if case-insensitive is enabled. Eg. filenames starting with c or C are always first. LANG=en_US.UTF-8
comment:16 Changed 5 months ago by vfxsup
I use version 4.8.31 on MacOS 14.4.1 and I'm experiencing the same problem as the previous commenter. "Creative Cloud Files" comes before "Applications" (followed by "Desktop") when case-sensitive sort is disabled.
This is a pretty major defect in my opinion.
comment:17 Changed 5 months ago by zaytsev
So Andrew maybe we could indeed lowercase stuff or something like that or what do you think?
What is your locale?