Ticket #2396 (closed defect: fixed)

Opened 8 years ago

Last modified 3 years ago

Find File "Whole words" search bug

Reported by: x906 Owned by: slavazanko
Priority: critical Milestone: 4.7.5
Component: mc-search Version: master
Keywords: Cc: gotar@…, zaytsev, shirsch
Blocked By: Blocking:
Branch state: no branch Votes for changeset: committed-master committed-stable

Description

when searching in files for non english word with "Whole words" set "on" - then nothig will be found
try search word: "время" and also "time" in attached file
mc ver: 4.7.4-90-g1e265ea

Attachments

f1 (40 bytes) - added by x906 8 years ago.
f2 (114 bytes) - added by x905 8 years ago.

Change History

Changed 8 years ago by x906

comment:1 follow-up: ↓ 3 Changed 8 years ago by gotar

  • Cc gotar@… added

Works fine with polish diacritics and ISO-8859-2 locale (LC_CTYPE to be exact). Are you using UTF-8? Try with KOI8-R.

comment:2 Changed 8 years ago by gotar

However it doesn't work for me in mcedit and mcview.

comment:3 in reply to: ↑ 1 Changed 8 years ago by x905

Replying to gotar:

Are you using UTF-8? Try with KOI8-R.

yes, UTF-8
KOI8-R is not acceptable for me

comment:4 follow-ups: ↓ 5 ↓ 6 Changed 8 years ago by andrew_b

  • Priority changed from major to critical
  • Version changed from 4.7.4 to master

It seems, we have a global bug in search engine. For me, search of whole non-ASCII words (cyrillic, for example) doesn't work ar all: neither in files neither in editor nor in viewer.

comment:5 in reply to: ↑ 4 Changed 8 years ago by andrew_b

Replying to andrew_b:

doesn't work ar all

doesn't work at all

comment:6 in reply to: ↑ 4 ; follow-up: ↓ 7 Changed 8 years ago by gotar

x905 - I suspected it may be UTF-8 related, but apparently it's not
andrew_b - what is weird that it works for me (latin2 characters) in Find file (and only there)

comment:7 in reply to: ↑ 6 Changed 8 years ago by andrew_b

Replying to gotar:

andrew_b - what is weird that it works for me (latin2 characters) in Find file (and only there)

It doesn't work with russian cyrillic (as KOI8-R as UTF-8) words (for example, "время" in ticket text). Moreover, search using "\bвремя\b" regular expression ("Regular expression" is on, "Whole words" is off) also doesn't find anything.

comment:8 follow-ups: ↓ 9 ↓ 17 Changed 8 years ago by slavazanko

As i found, non-latin chars isn't in word: http://www.regular-expressions.info/wordboundaries.html

"In all flavors, the characters [a-zA-Z0-9_] are word characters"

I don't know how fix this trouble, sorry :(

gotar: is 'Search whole words' works with 'ą,ć,ę,ł,ń,ó,ś,ź,ż' letters (and with uppercase analogs)?

comment:9 in reply to: ↑ 8 ; follow-up: ↓ 10 Changed 8 years ago by x905

Replying to slavazanko:

I don't know how fix this trouble, sorry :(

maybe look at source of grep?
this works: grep -iw "время" ./f1

comment:10 in reply to: ↑ 9 Changed 8 years ago by andrew_b

Replying to x905:

maybe look at source of grep?

Yes :). We found a soluton in grep:

      static char const word_beg[] = "(^|[^[:alnum:]_])(";
      static char const word_end[] = ")([^[:alnum:]_]|$)";

This works for me in KOI8-R.

comment:11 Changed 8 years ago by slavazanko

  • Owner set to slavazanko
  • Status changed from new to accepted
  • severity changed from no branch to on review
  • Keywords stable-candidate added
  • Milestone changed from 4.7 to 4.7.5

Created branch 2396_find_whole_words (parent: master)

changeset:c859e906d0bd6e91b1e52a8002c98de043dcb817

All other changeset is typo fixes and code refactoring:

Review, please.

x905: thanks for tip :)

comment:12 follow-up: ↓ 15 Changed 8 years ago by x905

not work: in new attached file (f2) mc finds all first 1-6 lines
but grep also fail on line 6 :(

mc 4.7.4-103-g4e2ffca

also found another bug in this version: then press F1, then error window appears: "Cannot open file /usr/local/share/mc/help/mc.hlp"

Changed 8 years ago by x905

comment:13 Changed 8 years ago by slavazanko

See new start changeset:f46302b2651bee6246f9f3349cdbbb67144fb284
Bug should be fixed :)

Review branch again, please.

comment:14 Changed 8 years ago by x905

better, but not complete - line "6. невремя" still in search results
(4.7.4-102-g02acc44)

comment:15 in reply to: ↑ 12 Changed 8 years ago by andrew_b

Replying to x905:

also found another bug in this version: then press F1, then error window appears: "Cannot open file /usr/local/share/mc/help/mc.hlp"

Did you run new mc binary with old mc environment, i.e. without installation? If yes, this is not a bug. Some files changed their locations (#1424):

Install help files into /usr/share/mc/help instead of /usr/share/mc.
Install hint files into /usr/share/mc/hints instead of /usr/share/mc.

comment:16 Changed 8 years ago by x905

yes, with help is my fault - i do sudo make install, but has another instances of mc running

comment:17 in reply to: ↑ 8 Changed 8 years ago by gotar

Replying to slavazanko:

As i found, non-latin chars isn't in word: http://www.regular-expressions.info/wordboundaries.html

"In all flavors, the characters [a-zA-Z0-9_] are word characters"

gotar: is 'Search whole words' works with 'ą,ć,ę,ł,ń,ó,ś,ź,ż' letters (and with uppercase analogs)?

Yes and no:
yes - it does find my string (in 'find file', not mcedit or mcview)
no - it treats every letter as separate word (i.e. despite of 'Search whole words' any substring of example 'ąćęśłżźńó' is being found, which means that all the characters are treated as word boundaries).

comment:18 Changed 8 years ago by slavazanko

Okay, check branch now. I have changed regexp for emulating '\b' behaviour (changeset:d5aa913edffc824075c72bcdd6411657df91f347). Hope this helps...

Review again, please.

comment:19 Changed 8 years ago by andrew_b

  • Votes for changeset set to andrew_b

Fine. This is works for me.

comment:20 Changed 8 years ago by x905

works

comment:21 Changed 8 years ago by angel_il

  • Votes for changeset changed from andrew_b to andrew_b angel_il

comment:22 Changed 8 years ago by angel_il

  • severity changed from on review to approved

comment:23 Changed 8 years ago by andrew_b

Don't forget update po files after merge to master.

comment:24 Changed 8 years ago by slavazanko

  • Status changed from accepted to testing
  • Votes for changeset changed from andrew_b angel_il to commited-master
  • Resolution set to fixed
  • severity changed from approved to merged

comment:25 Changed 8 years ago by zaytsev

  • Cc zaytsev added

comment:26 Changed 8 years ago by shirsch

  • Status changed from testing to reopened
  • Resolution fixed deleted

Running 4.7.0.9 on Ubuntu Lucid x86_64. String search is "intermittant". It works on some files and not others. When not working, it can find nothing.

comment:27 follow-up: ↓ 29 Changed 8 years ago by slavazanko

  • Cc shirsch added
  • Status changed from reopened to closed
  • Resolution set to invalid

Running 4.7.0.9 on Ubuntu Lucid x86_64. String search is "intermittant". It works on some files and not others. When not working, it can find nothing.

Stop reopen! :)

Bug was fixed in our 'master' branch (in repository) and fix will included in near future release (4.7.0.10 in your case). Just await for new version. ;)

comment:29 in reply to: ↑ 27 ; follow-up: ↓ 31 Changed 8 years ago by gotar

Replying to slavazanko:

Stop reopen! :)

So shouldn't you changed resolution to fixed again? ;)

comment:30 Changed 8 years ago by andrew_b

  • Status changed from closed to reopened
  • Keywords stable-candidate removed
  • Votes for changeset changed from commited-master to committed-master committed-stable
  • Resolution invalid deleted

comment:31 in reply to: ↑ 29 Changed 8 years ago by andrew_b

  • Status changed from reopened to closed
  • Resolution set to fixed

Replying to gotar:

So shouldn't you changed resolution to fixed again? ;)

Done.

comment:32 Changed 3 years ago by egmont

  • Branch state set to no branch

Commenting on this 5 year old ticket:

The fix caused a regression, in fact it further broke a feature that was already broken a bit: Regex search with Whole words enabled. Before this patch it just ignored Whole words, now it doesn't even highlight the match.

The regex could be made simpler by using lookahead/lookbehind. If using that, coincidentally, the further regression doesn't occur.

At this point, for me totally reverting the patch back to using \b doesn't break anything: the originally reported bug is not reproducible, it works as expected. Could be a fix in glib in the last 5 years, or anything else.

Details in #3524.

Please let me know if you know how to reproduce the bug with the change being reverted.

Note: See TracTickets for help on using tickets.