Ticket #2396 (closed defect: fixed)
Find File "Whole words" search bug
Reported by: | x906 | Owned by: | slavazanko |
---|---|---|---|
Priority: | critical | Milestone: | 4.7.5 |
Component: | mc-search | Version: | master |
Keywords: | Cc: | gotar@…, zaytsev, shirsch | |
Blocked By: | Blocking: | ||
Branch state: | no branch | Votes for changeset: | committed-master committed-stable |
Description
when searching in files for non english word with "Whole words" set "on" - then nothig will be found
try search word: "время" and also "time" in attached file
mc ver: 4.7.4-90-g1e265ea
Attachments
Change History
comment:1 follow-up: ↓ 3 Changed 14 years ago by gotar
- Cc gotar@… added
Works fine with polish diacritics and ISO-8859-2 locale (LC_CTYPE to be exact). Are you using UTF-8? Try with KOI8-R.
comment:4 follow-ups: ↓ 5 ↓ 6 Changed 14 years ago by andrew_b
- Priority changed from major to critical
- Version changed from 4.7.4 to master
It seems, we have a global bug in search engine. For me, search of whole non-ASCII words (cyrillic, for example) doesn't work ar all: neither in files neither in editor nor in viewer.
comment:6 in reply to: ↑ 4 ; follow-up: ↓ 7 Changed 14 years ago by gotar
x905 - I suspected it may be UTF-8 related, but apparently it's not
andrew_b - what is weird that it works for me (latin2 characters) in Find file (and only there)
comment:7 in reply to: ↑ 6 Changed 14 years ago by andrew_b
Replying to gotar:
andrew_b - what is weird that it works for me (latin2 characters) in Find file (and only there)
It doesn't work with russian cyrillic (as KOI8-R as UTF-8) words (for example, "время" in ticket text). Moreover, search using "\bвремя\b" regular expression ("Regular expression" is on, "Whole words" is off) also doesn't find anything.
comment:8 follow-ups: ↓ 9 ↓ 17 Changed 14 years ago by slavazanko
As i found, non-latin chars isn't in word: http://www.regular-expressions.info/wordboundaries.html
"In all flavors, the characters [a-zA-Z0-9_] are word characters"
I don't know how fix this trouble, sorry :(
gotar: is 'Search whole words' works with 'ą,ć,ę,ł,ń,ó,ś,ź,ż' letters (and with uppercase analogs)?
comment:9 in reply to: ↑ 8 ; follow-up: ↓ 10 Changed 14 years ago by x905
Replying to slavazanko:
I don't know how fix this trouble, sorry :(
maybe look at source of grep?
this works: grep -iw "время" ./f1
comment:10 in reply to: ↑ 9 Changed 14 years ago by andrew_b
Replying to x905:
maybe look at source of grep?
Yes :). We found a soluton in grep:
static char const word_beg[] = "(^|[^[:alnum:]_])("; static char const word_end[] = ")([^[:alnum:]_]|$)";
This works for me in KOI8-R.
comment:11 Changed 14 years ago by slavazanko
- Owner set to slavazanko
- Status changed from new to accepted
- severity changed from no branch to on review
- Keywords stable-candidate added
- Milestone changed from 4.7 to 4.7.5
Created branch 2396_find_whole_words (parent: master)
changeset:c859e906d0bd6e91b1e52a8002c98de043dcb817
All other changeset is typo fixes and code refactoring:
- 466b34b8fb1ebf30e19dc30ccd75c85623f85aef: Code cleanup for avoid compiler warnings
- 4af1277e4fc7e093de1b72ac2d5fa02bd4c30413: Fixed bit operations in mc_search_regexprocess_append_str()
- 2793a312492079fe99ad56bfeb81802886a1a5d6: Removed mc_search_cond_t->len (used mc_search_cond_t->str->len instead).
- 100df42d9578af79ce4b80d5a6572d9a81fbb683: Avoid extra-allocation of string while prepare to regexp-search.
Review, please.
x905: thanks for tip :)
comment:12 follow-up: ↓ 15 Changed 14 years ago by x905
not work: in new attached file (f2) mc finds all first 1-6 lines
but grep also fail on line 6 :(
mc 4.7.4-103-g4e2ffca
also found another bug in this version: then press F1, then error window appears: "Cannot open file /usr/local/share/mc/help/mc.hlp"
comment:13 Changed 14 years ago by slavazanko
See new start changeset:f46302b2651bee6246f9f3349cdbbb67144fb284
Bug should be fixed :)
Review branch again, please.
comment:14 Changed 14 years ago by x905
better, but not complete - line "6. невремя" still in search results
(4.7.4-102-g02acc44)
comment:15 in reply to: ↑ 12 Changed 14 years ago by andrew_b
Replying to x905:
also found another bug in this version: then press F1, then error window appears: "Cannot open file /usr/local/share/mc/help/mc.hlp"
Did you run new mc binary with old mc environment, i.e. without installation? If yes, this is not a bug. Some files changed their locations (#1424):
Install help files into /usr/share/mc/help instead of /usr/share/mc. Install hint files into /usr/share/mc/hints instead of /usr/share/mc.
comment:16 Changed 14 years ago by x905
yes, with help is my fault - i do sudo make install, but has another instances of mc running
comment:17 in reply to: ↑ 8 Changed 14 years ago by gotar
Replying to slavazanko:
As i found, non-latin chars isn't in word: http://www.regular-expressions.info/wordboundaries.html
"In all flavors, the characters [a-zA-Z0-9_] are word characters"
gotar: is 'Search whole words' works with 'ą,ć,ę,ł,ń,ó,ś,ź,ż' letters (and with uppercase analogs)?
Yes and no:
yes - it does find my string (in 'find file', not mcedit or mcview)
no - it treats every letter as separate word (i.e. despite of 'Search whole words' any substring of example 'ąćęśłżźńó' is being found, which means that all the characters are treated as word boundaries).
comment:18 Changed 14 years ago by slavazanko
Okay, check branch now. I have changed regexp for emulating '\b' behaviour (changeset:d5aa913edffc824075c72bcdd6411657df91f347). Hope this helps...
Review again, please.
comment:19 Changed 14 years ago by andrew_b
- Votes for changeset set to andrew_b
Fine. This is works for me.
comment:20 Changed 14 years ago by x905
works
comment:21 Changed 14 years ago by angel_il
- Votes for changeset changed from andrew_b to andrew_b angel_il
comment:23 Changed 14 years ago by andrew_b
Don't forget update po files after merge to master.
comment:24 Changed 14 years ago by slavazanko
- Status changed from accepted to testing
- Votes for changeset changed from andrew_b angel_il to commited-master
- Resolution set to fixed
- severity changed from approved to merged
Merged to master: b60f00df0d8d1d52840ad81ed6529672957d555c
Updated *.po files: 5bf5dd170e00a4dc8e3fbf862dfbc23c1774a79e
comment:26 Changed 14 years ago by shirsch
- Status changed from testing to reopened
- Resolution fixed deleted
Running 4.7.0.9 on Ubuntu Lucid x86_64. String search is "intermittant". It works on some files and not others. When not working, it can find nothing.
comment:27 follow-up: ↓ 29 Changed 14 years ago by slavazanko
- Status changed from reopened to closed
- Cc shirsch added
- Resolution set to invalid
Running 4.7.0.9 on Ubuntu Lucid x86_64. String search is "intermittant". It works on some files and not others. When not working, it can find nothing.
Stop reopen! :)
Bug was fixed in our 'master' branch (in repository) and fix will included in near future release (4.7.0.10 in your case). Just await for new version. ;)
comment:28 Changed 14 years ago by slavazanko
Cherry-picked in stable branch:
comment:29 in reply to: ↑ 27 ; follow-up: ↓ 31 Changed 14 years ago by gotar
comment:30 Changed 14 years ago by andrew_b
- Status changed from closed to reopened
- Keywords stable-candidate removed
- Votes for changeset changed from commited-master to committed-master committed-stable
- Resolution invalid deleted
comment:31 in reply to: ↑ 29 Changed 14 years ago by andrew_b
- Status changed from reopened to closed
- Resolution set to fixed
comment:32 Changed 9 years ago by egmont
- Branch state set to no branch
Commenting on this 5 year old ticket:
The fix caused a regression, in fact it further broke a feature that was already broken a bit: Regex search with Whole words enabled. Before this patch it just ignored Whole words, now it doesn't even highlight the match.
The regex could be made simpler by using lookahead/lookbehind. If using that, coincidentally, the further regression doesn't occur.
At this point, for me totally reverting the patch back to using \b doesn't break anything: the originally reported bug is not reproducible, it works as expected. Could be a fix in glib in the last 5 years, or anything else.
Details in #3524.
Please let me know if you know how to reproduce the bug with the change being reverted.