Ticket #3883 (closed defect: fixed)

Opened 7 years ago

Last modified 7 years ago

size_trunc_sep() breaks on non-ASCII locales

Reported by: slyfox Owned by: andrew_b
Priority: major Milestone: 4.8.21
Component: mc-core Version: 4.8.20
Keywords: Cc:
Blocked By: Blocking:
Branch state: merged Votes for changeset: committed-master

Description

When mc is ran on ru_RU.UTF-8 locale it renders incorrectly
summary of file sizes for selected files:

2,478,873,80 ,К�,�Б в 1 файле

That happens because https://midnight-commander.org/browser/lib/util.c#L386
does not account for non-ASCII symbols and prepaturely stops skipping non-numeric leftover:

386	    while (p >= y && (isalpha ((unsigned char) *p) || (unsigned char) *p == ' '))
387	        *d-- = *p--;

The following workaround makes rendering work again:

diff --git a/lib/util.c b/lib/util.c
index 0326f65e8..8cf6ab6a9 100644
--- a/lib/util.c
+++ b/lib/util.c
@@ -385,3 +385,3 @@ size_trunc_sep (uintmax_t size, gboolean use_si)
     *d-- = '\0';
-    while (p >= y && (isalpha ((unsigned char) *p) || (unsigned char) *p == ' '))
+    while (p >= y && (isalpha ((unsigned char) *p) || (unsigned char) *p == ' ') || (((unsigned char) *p & 0x80) == 0x80))
         *d-- = *p--;

Change History

comment:1 Changed 7 years ago by zaytsev

  • Milestone changed from Future Releases to 4.8.21

comment:2 follow-up: ↓ 6 Changed 7 years ago by zaytsev

So, if I understand the code correctly, this checks if the character could be the 2+ byte of the UTF-8 sequence. Clearly, this is a hack, but maybe an acceptable one? Otherwise, we can of course check if the display locale is UTF-8 and use glib functions to check for valid characters, but it will make the code only more complicated and slow.

I think a passable solution could be (display_encoding == UTF-8 && 0x80), so proposed patch + extra condition that the display locale is UTF-8, plus a comment in the source code. Andrew, what do you think?

comment:3 Changed 7 years ago by slyfox

Perhaps nicer fix would be to change size_trunc() to return length of localised suffix as well.
That way size_trunc_sep() would know to copy it as-is without mangling it with commas.

comment:4 Changed 7 years ago by zaytsev

Would you be so kind as to devise a patch?

comment:5 Changed 7 years ago by andrew_b

The format of file size if "size unit", i.e. "[digits][space][letters]". I propose copy all characters after digits:

-    while (p >= y && (isalpha ((unsigned char) *p) || (unsigned char) *p == ' '))
+    while (p >= y && !isdigit ((unsigned char) *p))

comment:6 in reply to: ↑ 2 Changed 7 years ago by andrew_b

Replying to zaytsev:

we can of course check if the display locale is UTF-8 and use glib functions to check for valid characters, but it will make the code only more complicated and slow.

We already have some functions in lib/strutil (str_prev_char_safe & Co) that make the code locale independent.

comment:7 Changed 7 years ago by andrew_b

  • Status changed from new to accepted
  • Owner set to andrew_b
  • Version changed from master to 4.8.19

comment:8 Changed 7 years ago by andrew_b

Version 0, edited 7 years ago by andrew_b (next)

comment:9 follow-up: ↓ 10 Changed 7 years ago by andrew_b

  • Branch state changed from no branch to on review

comment:10 in reply to: ↑ 9 Changed 7 years ago by slyfox

Replying to andrew_b:

Branch: 3883_size_trunc_sep
changeset:069306eb2e7174f431bd3eecbc8f489f9652b782

That fixes issue for me. Thanks!

comment:11 Changed 7 years ago by zaytsev

  • Votes for changeset set to slyfox zaytsev
  • Version changed from 4.8.19 to 4.8.20
  • Branch state changed from on review to approved

comment:12 Changed 7 years ago by andrew_b

  • Status changed from accepted to testing
  • Votes for changeset changed from slyfox zaytsev to committed-master
  • Resolution set to fixed
  • Branch state changed from approved to merged

comment:13 Changed 7 years ago by andrew_b

  • Status changed from testing to closed
Note: See TracTickets for help on using tickets.