Ticket #3883 (closed defect: fixed)
size_trunc_sep() breaks on non-ASCII locales
Reported by: | slyfox | Owned by: | andrew_b |
---|---|---|---|
Priority: | major | Milestone: | 4.8.21 |
Component: | mc-core | Version: | 4.8.20 |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Branch state: | merged | Votes for changeset: | committed-master |
Description
When mc is ran on ru_RU.UTF-8 locale it renders incorrectly
summary of file sizes for selected files:
2,478,873,80 ,К�,�Б в 1 файле
That happens because https://midnight-commander.org/browser/lib/util.c#L386
does not account for non-ASCII symbols and prepaturely stops skipping non-numeric leftover:
386 while (p >= y && (isalpha ((unsigned char) *p) || (unsigned char) *p == ' ')) 387 *d-- = *p--;
The following workaround makes rendering work again:
diff --git a/lib/util.c b/lib/util.c index 0326f65e8..8cf6ab6a9 100644 --- a/lib/util.c +++ b/lib/util.c @@ -385,3 +385,3 @@ size_trunc_sep (uintmax_t size, gboolean use_si) *d-- = '\0'; - while (p >= y && (isalpha ((unsigned char) *p) || (unsigned char) *p == ' ')) + while (p >= y && (isalpha ((unsigned char) *p) || (unsigned char) *p == ' ') || (((unsigned char) *p & 0x80) == 0x80)) *d-- = *p--;
Change History
comment:2 follow-up: ↓ 6 Changed 7 years ago by zaytsev
So, if I understand the code correctly, this checks if the character could be the 2+ byte of the UTF-8 sequence. Clearly, this is a hack, but maybe an acceptable one? Otherwise, we can of course check if the display locale is UTF-8 and use glib functions to check for valid characters, but it will make the code only more complicated and slow.
I think a passable solution could be (display_encoding == UTF-8 && 0x80), so proposed patch + extra condition that the display locale is UTF-8, plus a comment in the source code. Andrew, what do you think?
comment:3 Changed 7 years ago by slyfox
Perhaps nicer fix would be to change size_trunc() to return length of localised suffix as well.
That way size_trunc_sep() would know to copy it as-is without mangling it with commas.
comment:5 Changed 7 years ago by andrew_b
The format of file size if "size unit", i.e. "[digits][space][letters]". I propose copy all characters after digits:
- while (p >= y && (isalpha ((unsigned char) *p) || (unsigned char) *p == ' ')) + while (p >= y && !isdigit ((unsigned char) *p))
comment:6 in reply to: ↑ 2 Changed 7 years ago by andrew_b
Replying to zaytsev:
we can of course check if the display locale is UTF-8 and use glib functions to check for valid characters, but it will make the code only more complicated and slow.
We already have some functions in lib/strutil (str_prev_char_safe & Co) that make the code locale independent.
comment:7 Changed 7 years ago by andrew_b
- Status changed from new to accepted
- Owner set to andrew_b
- Version changed from master to 4.8.19
comment:8 Changed 7 years ago by andrew_b
[a0c6dd7bb6da0fcf75b4fda358dd7709527fc177] is the first bad commit.
comment:9 follow-up: ↓ 10 Changed 7 years ago by andrew_b
- Branch state changed from no branch to on review
Branch: 3883_size_trunc_sep
changeset:069306eb2e7174f431bd3eecbc8f489f9652b782
comment:10 in reply to: ↑ 9 Changed 7 years ago by slyfox
Replying to andrew_b:
Branch: 3883_size_trunc_sep
changeset:069306eb2e7174f431bd3eecbc8f489f9652b782
That fixes issue for me. Thanks!
comment:11 Changed 7 years ago by zaytsev
- Votes for changeset set to slyfox zaytsev
- Version changed from 4.8.19 to 4.8.20
- Branch state changed from on review to approved
comment:12 Changed 7 years ago by andrew_b
- Status changed from accepted to testing
- Votes for changeset changed from slyfox zaytsev to committed-master
- Resolution set to fixed
- Branch state changed from approved to merged
Merged to master: [37dcdf65b3ad0cf968771cad0d59a44717d99f13].