Ticket #3255 (new defect)

Opened 10 years ago

Inconsistent Unicode normalization

Reported by: egmont Owned by:
Priority: minor Milestone: Future Releases
Component: mc-core Version: master
Keywords: Cc:
Blocked By: Blocking:
Branch state: no branch Votes for changeset:

Description

tty_print_string(s) calls str_term_form(s), which, among other things, normalizes the string (i.e. converts to NFC).

tty_printf(fmt, ...) on the other hand doesn't normalize it.

This leads to the unexpected situation that tty_print_string(s) and tty_printf("%s", s) are not identical. Probably they are not used consistently with whether formatting and normalization is required at the given position. At least the method names could reflect this, to help choose the desired one.

Normalization might have advantages if the file to be viewed is in NFD but the terminal (e.g. Linux console) does not support combining characters. Then you at least get to see some (probably most) of the accents. But it has disadvantages too. For example it mangles the bytes (e.g. filename) if the user copy-pastes it in a graphical terminal.

Given that most text is in NFC and most terminals are capable of displaying NFD, I vote for not mangling the text.

Even if we agree on normalization and make tty_print_string() and tty_printf() consistent in this regard, they'll still differ in other things that str_term_form() does not, it should be double checked if that's okay.

Note: See TracTickets for help on using tickets.