Ticket #1628 (reopened defect) — at Version 22
Editor can cut CJK in half
Reported by: | egmont | Owned by: | angel_il |
---|---|---|---|
Priority: | major | Milestone: | 4.7.0 |
Component: | mcedit | Version: | 4.7.0-pre4 |
Keywords: | Cc: | egmont, | |
Blocked By: | Blocking: | ||
Branch state: | Votes for changeset: | andrew_b |
Description (last modified by andrew_b) (diff)
Fully UTF-8 environment. The built-in editor can easily cut a double width (CJK) character in half.
Take a simple text file: first line contains English text, second line contains CJK characters. Move the cursor in the 1st line so that it stands above the second half of a CJK. Then press the down arrow.
Sometimes (unfortunately I cannot consistently reproduce it) the display immediately becomes corrupt: the CJK character gets replaced by three inverse dots.
The character code and the file offset displayed in the header line are incorrect.
Press any letter and it gets inserted in the middle of the UTF-8 sequence of the CJK character, resulting in two inverse dots (single bytes of invalid UTF-8) before the new character, and one after.
Backspace and Delete also behave strange, and the modification they make to the file content is not always immediately correctly reflected on the screen (e.g. backspace seems to remove the character the cursor cut in half, but actually removes the preceding one).
Left arrow moves the cursor two columns to the left, so it doesn't synchronize to character boundary, it stays at an invalid position. Right arrow does synchronize, though.
The editor shouldn't be able to break valid UTF-8 file and make it invalid. The header line should always show properties of one of the Unicode characters of the file.
Any operation that moves the cursor within the line, or changes the line's content, should first make sure the cursor is at character boundaries. Also the header line should reflect the whole CJK character.
You might want to look at the behavior of joe text editor, it does a reasonably good job. I have no info on the behavior of other text editors. One interesting property of joe: inserting a character synchronizes the cursor first and then inserts that character - backspace and delete, however, do not delete anything, only synchronize the cursor if that was necessary. I quite like this behavior, but obviously it's not the only possible good solution.
One non-trivial caveat: You should make sure that moving the cursor up or down by many lines does not cumulatively move the cursor to the left or to the right. That is, logically moving the cursor to a character boundary should not happen immediately, only when attempting to modify that line or moving the cursor horizontally. The cursor may or may not be visually adjusted immediately, but then the logical column should be remembered and restored as soon as possible (unless the line was edited or explicit horizontal move was made).
Change History
comment:3 Changed 15 years ago by angel_il
- Status changed from new to accepted
- Owner set to angel_il
comment:4 Changed 15 years ago by angel_il
- severity changed from no branch to on review
branch: 1628_CJK_fix (parent: master)
changeset: 6a1131827145df464c836e59bc95eb2632adebc5
comment:5 Changed 15 years ago by angel_il
test case:
1) create file test.txt with:
123<cursor here> 椅子
2) save this
3) exit MC
4) start MC
5) open test.txt (F4 under test.txt)
6) press down
comment:6 Changed 15 years ago by slavazanko
- Votes for changeset set to slavazanko
No need to remove some empty lines... therefore to cleanup changeset:6a1131827145df464c836e59bc95eb2632adebc5 before merge
comment:7 Changed 15 years ago by andrew_b
- Votes for changeset changed from slavazanko to slavazanko andrew_b
- severity changed from on review to approved
comment:8 Changed 15 years ago by angel_il
- Votes for changeset slavazanko andrew_b deleted
- severity changed from approved to on rework
comment:9 Changed 15 years ago by angel_il
- severity changed from on rework to on review
d8937b4e6b06cd7c242f22191424326ecb06cc86 (forced update)
please review
comment:12 Changed 15 years ago by slavazanko
- Blocked By 1755 added
comment:14 Changed 15 years ago by andrew_b
- Votes for changeset changed from slavazanko to slavazanko andrew_b
- severity changed from on review to approved
comment:15 Changed 15 years ago by angel_il
- Status changed from accepted to testing
- Votes for changeset changed from slavazanko andrew_b to commited-master
- Resolution set to fixed
- severity changed from approved to merged
comment:18 Changed 15 years ago by egmont
- Status changed from closed to reopened
- Version changed from 4.7.0-pre2 to 4.7.0-pre4
- Resolution fixed deleted
- Milestone changed from 4.7.0-pre4 to 4.7.0
Reopening, still not fixed in 4.7.0-pre4 or git master.
My current file contains two lines, in UTF-8:
asdfghjkl
ノリポ
I open it in mcedit, press right arrow (cursor above "s"), and then press down arrow. The first Japanese character quite often (but not always) turns into three white-on-black dots. See screenshot.
Even though the cursor seems to stand above the second Japanese character, if I type a letter it gets inserted between the second and third dot.
comment:19 Changed 15 years ago by angel_il
branch: 1628_editor_CJK_fix
changeset: 09ea60a6250c34696d3ccb293bdd1796ad2cf352
2egmont: try this branch, please.
solution of the problem: we try to get the next a UTF character and then get the previous UTF character.
comment:20 Changed 15 years ago by angel_il
- Votes for changeset commited-master deleted
- severity changed from merged to on review
comment:21 Changed 15 years ago by slavazanko
- 09af0c62307bc3d7e4a356a389f1f8cd14551652: Merged code of two functions edit_move_up() and edit_move_down() into one function with additional parameter.
comment:22 Changed 15 years ago by andrew_b
- Votes for changeset set to andrew_b
- Description modified (diff)