Ticket #3280 (new defect)

Opened 10 years ago

Last modified 9 years ago

[patch] mcview: search off-by-one between plain and hex modes

Reported by: egmont Owned by:
Priority: minor Milestone: Future Releases
Component: mcview Version: master
Keywords: Cc:
Blocked By: Blocking:
Branch state: no branch Votes for changeset:

Description

echo abcdefghijklmnopqrstuvwxyz > abc
mcview abc

Search for efg, it is highlighted.

Switch to Hex mode, now fgh is highlighted as the match.

The same (with opposite direction) the other way around: search for klm in hex mode, it is highlighted, switch back to plain mode, jkl is highlighted.

Attachments

mc-4.8.13-search_selected-offbyone.patch (1.7 KB) - added by and 9 years ago.
patch for search off-by-one between plain and hex mode
mc-3280-search-off-by-one-v2.patch (1.1 KB) - added by egmont 9 years ago.
My fix

Change History

Changed 9 years ago by and

patch for search off-by-one between plain and hex mode

comment:1 Changed 9 years ago by and

  • Summary changed from mcview: search off-by-one between plain and hex modes to [patch] mcview: search off-by-one between plain and hex modes

comment:2 Changed 9 years ago by egmont

Hi Andreas,

I'm glad we have a new contributor, please keep up sending patches/fixes :)

You should work against git master instead of the latest tarball - especially the viewer has received modifications that make your patch no longer apply.

Unfortunately, I don't like what you're doing in the patch. The off-by-one presumably comes from sometimes using the right semantics for the start and end offsets and sometimes a wrong one. While your patch might actually fix the bug, it does it by switching to the wrong semantics everywhere, making it prone to breaking the next time someone touches it.

The standard and right way to handle intervals is the start being inclusive and the end being exclusive. Another, probably better way to think about it, which is also clearly demonstrated in Python's string slices (e.g. http://stackoverflow.com/a/509297/4457671) is that whenever possible you should think about pointers as pointing between objects rather than to one of them. With this way of thinking, both start and end should have the value 0 for the beginning of the file, 1 for the boundary between the first letter and the rest, and so on, and exactly the file's size (without ±1) if pointing to the end of the file. This is pretty logical: the start and end of the match are indeed boundaries between letters rather than concrete letters. If something needs to point to an actual object (e.g. a character), you should probably still imagine the pointer pointing to in between two characters just as start and end do, and take the character on its right.

Using this standard approach, the first two hunks of your patch modify the code from the right semantics to the wrong one, and should be reverted. The third hunk is obviously correct, off-by-one corrections should never be necessary.

Could you please locate where start and end are being used incorrectly according to these semantics, and fix those instead?

Changed 9 years ago by egmont

My fix

comment:3 Changed 9 years ago by egmont

Please see my fix.

The start and end values were correct. Removal of an off-by-one adjustment is indeed required as and's patch does.

The problem was that state->offset has already advanced by the character we're about to print, although we'd need its previous value instead.

While the first patch (or a port thereof to git head) might be technically perfect (although I'm not totally certain with combining accents etc. in the game), I think my proposed version has cleaner semantics.

Thanks for the original work!

comment:4 Changed 9 years ago by egmont

Hang on... both of the patches break the "search next" feature :(

comment:5 Changed 9 years ago by egmont

First we seem to hit actions_cmd.c:mcview_search() that does some adjustment for the hex mode (but not for the plain text mode).

Then it calls mcview_do_search which starts with some adjustments, along the lines of decrementing by 2 if searching backward but not altering if searching forwarad, which totally relies on finding a match incrementing the start value by 1 (potentially pointing in the middle of a UTF-8 or nroff sequence).

I guess the adjustment should be carefully modified here to be +1 for forward, -1 for backward search.

We'll need to double check the initial value, probably we'll have to begin with -1 instead of 0, so that a first search attempt that starts by incrementing it actually starts at the beginning of the file.

Note: See TracTickets for help on using tickets.