id,summary,reporter,owner,description,type,status,priority,milestone,component,version,resolution,keywords,cc,blockedby,blocking,branch_state,votes 3843,"mcedit, 8-bit terminal, encoding=UTF-8: characters between 0x80 and 0xff are broken",lzsiga,zaytsev,"Hi, this problem is present in mc-4.8.19 (reproduced on Debian/Linux and AIX), it affects the editor. Problem description: I use a 8-bit terminal-emulator with LC_CYTPE=hu_HU.ISO-8859-2 (for Hungarian language), and it works properly; in mcedit's 'Choose encoding' dialog I select 'UTF-8' (as I want to create a file in UTF-8). Then I try some accented letters in the editor, such as á é ő ű. They are all in ISO-8859-2 (codes e1, e9, f5, fb), but only first two are in ISO-8859-1; the unicodes are: U+E1, U+E9, U+0151, U+0171 The problem is that only the two latter characters are properly displayed and stored in file; for the two former, editor displays dots instead of them; and saving into file, instead of UTF8-sequences (c3e1, c3e9) it stores single-bytes (the ISO-8859-2 codes: e1 e9) I think I found the source of the problem in src/editor/edit.c, line 3559 {{{ if (char_for_insertion > 255 && !mc_global.utf8_display) }}} It ignores characters between 128 and 255 even if 'UTF-8' is selected (it is mc_global.source_codepage==12 in my case) The change I suggest is this: {{{ if ((char_for_insertion > 255 || (char_for_insertion > 127 && str_isutf8 (get_codepage_id (mc_global.source_codepage)))) && !mc_global.utf8_display) { }}} I tested it on linux and AIX in different contexts (8-bit emulator vs unicode-emulator; 8-bit file-encoding vs unicode), and it seemed working in all cases. (I admit, the method of checking whether mc_global.source_codepage is UTF-8 or not is a bit clumsy, but I couldn't find a simpler method.)",defect,closed,major,4.8.20,mcedit,master,fixed,mcedit utf8,egmont,,,merged,committed-master