Ticket #400 (new defect)

Opened 9 years ago

Last modified 14 months ago

Support for multiline search, and for searching for 0x0A bytes.

Reported by: Hubbitus Owned by:
Priority: major Milestone: Future Releases
Component: mc-search Version: master
Keywords: Cc:
Blocked By: #3694 Blocking:
Branch state: no branch Votes for changeset:

Description

So, as I known, now search made line by line, and no chance match/replace pattern like "Some text\n+Another text".

This possibilities is highly appreciated.

Change History

comment:1 Changed 9 years ago by slavazanko

  • Milestone changed from 4.7 to future releases

comment:2 in reply to: ↑ description Changed 8 years ago by sxmboer

  • severity set to no branch

Replying to Hubbitus:

So, as I known, now search made line by line, and no chance match/replace pattern like "Some text\n+Another text".

This possibilities is highly appreciated.

This is already possible by typing CTRL-Q, Enter for the newline
This works for search and search and replace.

comment:3 Changed 8 years ago by Hubbitus

By CTRL+Q pressing "Insert Literal" dialog appeared.
mc-4.6.99.3-0.9.52.gd40065d

comment:4 Changed 3 years ago by mooffie

  • Branch state set to no branch

No, one can't search over multiple lines in mcedit.

The search library searches line by line. See mc_search__run_regex() in lib/search/regex.c. There's a loop there that accumulates a line:

   while (TRUE) {
      ...
      if (current_chr == '\n' || ...)
        break;
   }

(You can find the last \n on a line, but this has no real benefit. BTW, you don't have to use CTRL-Q: if you choose "Regular expression" you can simply type the two characters "\n".)

mc2 comes with a snippet that lets one search over multiple lines.

comment:5 Changed 17 months ago by mooffie

We can solve this by adding to the search dialog a "[x] Search over lines" checkbox what would turn off the "current_chr == '\n'" check I quoted in comment 4. This would make the text accumulate in one giant string instead of line-by-line (but we should set some limit in case one searches in a 20 terabyte file...).

If we detect a newline in the regexp, we'll search-over-lines automatically. But having a checkbox is good for the case one wants to search for things like "struct \{.*?\}", where it's a dot that stands for a newline.

Last edited 17 months ago by mooffie (previous) (diff)

comment:6 Changed 17 months ago by andrew_b

  • Component changed from mcedit to mc-search

comment:7 Changed 17 months ago by andrew_b

Ticket #3454 has been marked as a duplicate of this ticket.

comment:8 Changed 17 months ago by andrew_b

  • Blocked By 3694 added

comment:9 Changed 15 months ago by mooffie

  • Type changed from enhancement to defect
  • Summary changed from Please add multiline search in mcedit to Support for multiline search, and for searching for 0x0A bytes.

Since we can't find 0x0A bytes (in hex search), I'm changing this to 'defect'.

Last edited 15 months ago by mooffie (previous) (diff)

comment:10 Changed 15 months ago by mooffie

Ok, here's a series of three small patches to fix this ticket:

What's in it?

(1) "introduce slurp mode"

This mode disables the "\n" chunking, which, as comment comment:4 explains, is the cause of all the problems. There's very little actual code here.

(2) "hex search: make it possible to search for 0x0A bytes."

This patch uses slurp mode to fix the hex search bug.

(3) "mcedit: make it possible to search across lines."

This patch uses slurp mode to implement a new feature: letting users search across lines. It adds an "Across lines" checkbox to the search and search-replace dialogs:

┌─────────────────────── Search ───────────────────────┐
│ Enter search string:                                 │
│ \{.*?\}                                          [^] │
├──────────────────────────────────────────────────────┤
│ ( ) Normal                [ ] Case sensitive         │
│ (*) Regular expression    [ ] Backwards              │
│ ( ) Hexadecimal           [ ] In selection           │
│ ( ) Wildcard search       [ ] Whole words            │
│                           [x] Across lines           │
│                           [ ] All charsets           │
├──────────────────────────────────────────────────────┤
│           [< OK >] [ Find all ] [ Cancel ]           │
└──────────────────────────────────────────────────────┘

(In comment:5 I also suggested enabling this feature automatically if the pattern has newlines in it, but I no longer think this "extra wisdom" it a good idea; it will just add confusion.)

What's next?

Andrew, if you're fine with this solution, let me know and I'll work on some refinements:

  • Work on the 'todo' mentioned inside patch (1).
  • Write help text for the "Across lines" checkbox. As #1561 points out, there's no help page for the search box, so this will be done in a separate ticket.
  • Add the "Across lines" checkbox to the viewer's search dialog as well.
  • Automatically turn off the "Across lines" checkbox when the user clicks radio other than "Regular expression" (just so the user doesn't forget to turn it off). We may also disable it.

Performance

Initially I thought backwards-search, which is not implemented quite efficiently, would put the kibosh on slurp mode. Surprisingly, however, I didn't notice a substantial difference in performance between slurp and non-slurp mode when doing backwards-search. Neither was the memory strained.

Maintainability

There's very little code added, and the patches essentially modify only two lines --in regex.c. To me this looks like a safe solution.

Changed 15 months ago by mooffie

comment:11 follow-up: ↓ 12 Changed 15 months ago by zaytsev

"Across lines" sounds a bit weird to me, do you know how it is done/called in other editors? I quickly checked gedit and kate, and they don't seem to have such a checkbox, but if the pattern contains \n, they capture multiple lines by default... Is there a downside to this behavior?

comment:12 in reply to: ↑ 11 Changed 15 months ago by mooffie

Replying to zaytsev:

"Across lines" sounds a bit weird to me


I had to pick something that matches the grammatical setting of the surrounding checkboxes: "[search] backwards", "[search] in selection", "[search] whole words", "[search] all charsets", and therefore "[search] across lines". I don't mind changing it once somebody comes up with an alternative.

I quickly checked gedit and kate, and they don't seem to have such a checkbox, but if the pattern contains \n, they capture multiple lines by default...


(These editors evidently aren't searching the text line-by-line like we do, so they don't face the challenge of having to detect when to turn off this behavior.)

if the pattern contains \n, they capture multiple lines by default... Is there a downside to this behavior?


First, let's see if we agree that we have to have a checkbox at all. Consider "struct \{.*?\}". We can't know that the user intends for the "." to match any char. So we have to let her tell us. A checkbox seems a straightforward way. (BTW, the regex world has a somewhat similar virtual checkbox: the //s flag (I say "somewhat" because MC unconditionally turns G_REGEX_DOTALL on).)

Once we agree that a checkbox is necessary, the bonus issue of automatically detecting \n can be blissfully handled independently, in a separate ticket.

if the pattern contains \n, they capture multiple lines by default... Is there a downside to this behavior?


I'm not against the idea. But then there's the question of how to correctly detect \n, and whether we'll surprise the user (i.e., "Begin.*\n.*End" won't match just two lines). We'll have discussions that have the potential to hold up the ticket for months and years.

So I think there are two possibilities:

(A) We can delegate the \n detection to a separate ticket.
(B) We can split this ticket into two: patches (1) and (2) will go back into ticket #3454, and hopefully get committed. Patch (3) will stay here and be discussed further.

comment:13 Changed 14 months ago by ossi

"spanning lines" seems like a reasonable label.

defaulting to this behavior also seems reasonable. the user just needs to use "[^\n]" instead of ".".

Last edited 14 months ago by zaytsev (previous) (diff)
Note: See TracTickets for help on using tickets.