Ticket #1628 (new defect) — at Initial Version
Editor can cut CJK in half
Reported by: | egmont | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | 4.7.0 |
Component: | mcedit | Version: | 4.7.0-pre4 |
Keywords: | Cc: | egmont, | |
Blocked By: | Blocking: | ||
Branch state: | Votes for changeset: |
Description
Fully UTF-8 environment. The built-in editor can easily cut a double width (CJK) character in half.
Take a simple text file: first line contains English text, second line contains CJK characters. Move the cursor in the 1st line so that it stands above the second half of a CJK. Then press the down arrow.
Sometimes (unfortunately I cannot consistently reproduce it) the display immediately becomes corrupt: the CJK character gets replaced by three inverse dots.
The character code and the file offset displayed in the header line are incorrect.
Press any letter and it gets inserted in the middle of the UTF-8 sequence of the CJK character, resulting in two inverse dots (single bytes of invalid UTF-8) before the new character, and one after.
Backspace and Delete also behave strange, and the modification they make to the file content is not always immediately correctly reflected on the screen (e.g. backspace seems to remove the character the cursor cut in half, but actually removes the preceding one).
Left arrow moves the cursor two columns to the left, so it doesn't synchronize to character boundary, it stays at an invalid position. Right arrow does synchronize, though.
The editor shouldn't be able to break valid UTF-8 file and make it invalid. The header line should always show properties of one of the Unicode characters of the file.
Any operation that moves the cursor within the line, or changes the line's content, should first make sure the cursor is at character boundaries. Also the header line should reflect the whole CJK character.
You might want to look at the behavior of joe text editor, it does a reasonably good job. I have no info on the behavior of other text editors. One interesting property of joe: inserting a character synchronizes the cursor first and then inserts that character - backspace and delete, however, do not delete anything, only synchronize the cursor if that was necessary. I quite like this behavior, but obviously it's not the only possible good solution.
One non-trivial caveat: You should make sure that moving the cursor up or down by many lines does not cumulatively move the cursor to the left or to the right. That is, logically moving the cursor to a character boundary should not happen immediately, only when attempting to modify that line or moving the cursor horizontally. The cursor may or may not be visually adjusted immediately, but then the logical column should be remembered and restored as soon as possible (unless the line was edited or explicit horizontal move was made).