Ticket #1628 (closed defect: fixed)

Opened 15 years ago

Last modified 14 years ago

Editor can cut CJK in half

Reported by: egmont Owned by: angel_il
Priority: major Milestone: 4.7.0
Component: mcedit Version: 4.7.0-pre4
Keywords: Cc: egmont,
Blocked By: Blocking:
Branch state: Votes for changeset: committed-master

Description (last modified by andrew_b) (diff)

Fully UTF-8 environment. The built-in editor can easily cut a double width (CJK) character in half.
Take a simple text file: first line contains English text, second line contains CJK characters. Move the cursor in the 1st line so that it stands above the second half of a CJK. Then press the down arrow.
Sometimes (unfortunately I cannot consistently reproduce it) the display immediately becomes corrupt: the CJK character gets replaced by three inverse dots.
The character code and the file offset displayed in the header line are incorrect.
Press any letter and it gets inserted in the middle of the UTF-8 sequence of the CJK character, resulting in two inverse dots (single bytes of invalid UTF-8) before the new character, and one after.
Backspace and Delete also behave strange, and the modification they make to the file content is not always immediately correctly reflected on the screen (e.g. backspace seems to remove the character the cursor cut in half, but actually removes the preceding one).
Left arrow moves the cursor two columns to the left, so it doesn't synchronize to character boundary, it stays at an invalid position. Right arrow does synchronize, though.
The editor shouldn't be able to break valid UTF-8 file and make it invalid. The header line should always show properties of one of the Unicode characters of the file.
Any operation that moves the cursor within the line, or changes the line's content, should first make sure the cursor is at character boundaries. Also the header line should reflect the whole CJK character.
You might want to look at the behavior of joe text editor, it does a reasonably good job. I have no info on the behavior of other text editors. One interesting property of joe: inserting a character synchronizes the cursor first and then inserts that character - backspace and delete, however, do not delete anything, only synchronize the cursor if that was necessary. I quite like this behavior, but obviously it's not the only possible good solution.
One non-trivial caveat: You should make sure that moving the cursor up or down by many lines does not cumulatively move the cursor to the left or to the right. That is, logically moving the cursor to a character boundary should not happen immediately, only when attempting to modify that line or moving the cursor horizontally. The cursor may or may not be visually adjusted immediately, but then the logical column should be remembered and restored as soon as possible (unless the line was edited or explicit horizontal move was made).

Attachments

mc-cjk-cut-half.png (14.7 KB) - added by egmont 14 years ago.
screenshot

Change History

comment:1 Changed 15 years ago by andrew_b

  • Component changed from mc-core to mcedit

comment:2 Changed 15 years ago by angel_il

  • Milestone changed from 4.7 to 4.7.0-pre4

ok, need fix it

comment:3 Changed 14 years ago by angel_il

  • Status changed from new to accepted
  • Owner set to angel_il

comment:4 Changed 14 years ago by angel_il

  • severity changed from no branch to on review

branch: 1628_CJK_fix (parent: master)
changeset: 6a1131827145df464c836e59bc95eb2632adebc5

comment:5 Changed 14 years ago by angel_il

test case:
1) create file test.txt with:

123<cursor here>
椅子

2) save this
3) exit MC
4) start MC
5) open test.txt (F4 under test.txt)
6) press down

comment:6 Changed 14 years ago by slavazanko

  • Votes for changeset set to slavazanko

No need to remove some empty lines... therefore to cleanup changeset:6a1131827145df464c836e59bc95eb2632adebc5 before merge

comment:7 Changed 14 years ago by andrew_b

  • Votes for changeset changed from slavazanko to slavazanko andrew_b
  • severity changed from on review to approved

comment:8 Changed 14 years ago by angel_il

  • Votes for changeset slavazanko andrew_b deleted
  • severity changed from approved to on rework

comment:9 Changed 14 years ago by angel_il

  • severity changed from on rework to on review

d8937b4e6b06cd7c242f22191424326ecb06cc86 (forced update)

please review

comment:10 Changed 14 years ago by angel_il

  • Cc egmont added

egmont, please test the branch

comment:11 Changed 14 years ago by slavazanko

  • Votes for changeset set to slavazanko

comment:12 Changed 14 years ago by slavazanko

  • Blocked By 1755 added

(In #1755) I think, branch ready to review. But need also check for compile after #1628.

comment:13 Changed 14 years ago by slavazanko

  • Blocking 1755 added
  • Blocked By 1755 removed

comment:14 Changed 14 years ago by andrew_b

  • Votes for changeset changed from slavazanko to slavazanko andrew_b
  • severity changed from on review to approved

comment:15 Changed 14 years ago by angel_il

  • Status changed from accepted to testing
  • Votes for changeset changed from slavazanko andrew_b to commited-master
  • Resolution set to fixed
  • severity changed from approved to merged

comment:16 Changed 14 years ago by angel_il

  • Status changed from testing to closed

comment:17 Changed 14 years ago by slavazanko

  • Blocking 1755 removed

(In #1755) Ready to review

comment:18 Changed 14 years ago by egmont

  • Status changed from closed to reopened
  • Version changed from 4.7.0-pre2 to 4.7.0-pre4
  • Resolution fixed deleted
  • Milestone changed from 4.7.0-pre4 to 4.7.0

Reopening, still not fixed in 4.7.0-pre4 or git master.

My current file contains two lines, in UTF-8:
asdfghjkl
ノリポ

I open it in mcedit, press right arrow (cursor above "s"), and then press down arrow. The first Japanese character quite often (but not always) turns into three white-on-black dots. See screenshot.

Even though the cursor seems to stand above the second Japanese character, if I type a letter it gets inserted between the second and third dot.

Changed 14 years ago by egmont

screenshot

comment:19 Changed 14 years ago by angel_il

branch: 1628_editor_CJK_fix
changeset: 09ea60a6250c34696d3ccb293bdd1796ad2cf352

2egmont: try this branch, please.

solution of the problem: we try to get the next a UTF character and then get the previous UTF character.

comment:20 Changed 14 years ago by angel_il

  • Votes for changeset commited-master deleted
  • severity changed from merged to on review

comment:21 Changed 14 years ago by slavazanko

comment:22 Changed 14 years ago by andrew_b

  • Votes for changeset set to andrew_b
  • Description modified (diff)

comment:23 follow-up: ↓ 24 Changed 14 years ago by egmont

Sorry guys, I'm extremely overloaded nowadays, I don't think I'll have time to test the patch in the near future. Please test it thoroughly yourselves. Close the bugreport if you believe the fix is correct - I'll reopen if I'll figure out one day that it's still not okay. Thanks!

comment:24 in reply to: ↑ 23 Changed 14 years ago by angel_il

Replying to egmont:

Sorry guys, I'm extremely overloaded nowadays, I don't think I'll have time to test the patch in the near future.

ok, no problem. and thank you.

comment:25 Changed 14 years ago by angel_il

  • Votes for changeset changed from andrew_b to andrew_b angel_il
  • severity changed from on review to approved

comment:26 Changed 14 years ago by angel_il

  • Status changed from reopened to closed
  • Votes for changeset changed from andrew_b angel_il to committed-master
  • Resolution set to fixed
  • severity changed from approved to merged

comment:27 Changed 14 years ago by egmont

Finally tried it out, and couldn't fool it in mc-4.7.0.1. Thanks guys for the fix!

Note: See TracTickets for help on using tickets.