Ticket #2081 (closed defect: fixed)
[Patch] mcedit overwrite mode buggy in UTF-8
Reported by: | egmont | Owned by: | angel_il |
---|---|---|---|
Priority: | major | Milestone: | 4.7.2 |
Component: | mcedit | Version: | master |
Keywords: | Cc: | eg | |
Blocked By: | Blocking: | ||
Branch state: | Votes for changeset: | committed-master |
Description
Start mcedit 4.7.0.3 in a fully UTF-8 environment, turn on overwrite mode (F9 -> Edit -> Toggle ins/overw).
Type a couple of English letters, move the cursor back, and then type an accented letter.
Notice that typing the accented letter does not only overwrite the letter under the cursor, it also swallows the next character (or next two for 3-byte UTF-8 sequences, etc).
Interestingly though, in the mean time overwriting an accented letter with a plain English one works as expected, and overwriting an accented one with another accented one (of same byte length) is buggy.
(Seems to me that first as many _characters_ are removed as the number of _bytes_ in the letter to be inserted, and then the new letter is inserted correctly.)
Overwrite should always replace one letter with another letter, increasing/decreasing file size as necessary.
Attachments
Change History
comment:2 Changed 14 years ago by egmont
- Summary changed from mcedit overwrite mode buggy in UTF-8 to [Patch] mcedit overwrite mode buggy in UTF-8
Friendly ping...
A very simple patch is waiting to get applied - I believe it shouldn't take more than 5 minutes to verify it.
comment:3 Changed 14 years ago by angel_il
Friendly pong...
1 packets transmitted, 1 received, 0% packet loss, time 4 days
you are right, patch is really small, but i dont sure that is good solution... i try search another way...
comment:5 Changed 14 years ago by angel_il
- Status changed from new to accepted
- severity changed from no branch to on review
- Cc eg added
- Version changed from 4.7.0.3 to master
- Milestone changed from 4.7 to 4.7.2
- Owner set to angel_il
branch: 2081_edit_overwrite_fix
changeset: a528eaa90f74fb618c97ebabaee7e566b03afb22
comment:6 Changed 14 years ago by egmont
Oh yep, I didn't know about edit->charpoint but clearly this is the right way to go. Thanks!
comment:7 Changed 14 years ago by slavazanko
- Votes for changeset set to slavazanko
Vote here.
Please note: apply changeset:b05422bf07e7a3e340f9d8a1c7cd88a53f8f4604 after rebase on master branch (to avoid conflicts) and before merge into 'master'.
comment:8 Changed 14 years ago by andrew_b
- Votes for changeset changed from slavazanko to slavazanko andrew_b
- Component changed from mc-core to mcedit
- severity changed from on review to approved
comment:9 Changed 14 years ago by angel_il
- Status changed from accepted to testing
- Votes for changeset changed from slavazanko andrew_b to committed-master
- Resolution set to fixed
- severity changed from approved to merged
comment:11 Changed 14 years ago by egmont
Thanks guys!
Have you considered cherrypicking it to the 4.7.0/4.7.1 series? I think it would be nice to have the fix there too.
comment:12 Changed 14 years ago by angel_il
- Status changed from testing to closed
- Keywords stable-candidate removed
Cherry-picked to 4.7.0-stable. 9b4714b2f419ca9c2c11053bcc28eb2a4fe71483
My assumption (in parentheses above) was right.
src/editor/edit.c has this function:
void
edit_execute_cmd (WEdit *edit, unsigned long command, int char_for_insertion)
and this function calls this in case of overwrite mode:
edit_delete (edit, 0);
whereas the boolean 0 means to remove a whole character, not just a byte (see the comment and implementation inside edit_delete).
The problem is: even though I'd expect char_for_insertion to hold the Unicode character to be inserted (based on the function's signature), this is not true. char_for_insertion is a single byte, and edit_execute_cmd is called individually for each byte of the UTF-8 sequence to be inserted.
I attach a quick and dirty patch which only calls edit_delete() on the first character of an UTF-8 sequence.
The real proper fix would probably require a major rewrite whereas the input handler composes full Unicode characters before actually modifying the file. Until this is done by someone, I believe my patch is perfectly reasonable, and should work perfectly if the input is really valid UTF-8 (which it should be).