Ticket #3885 (new enhancement)

Opened 6 years ago

Last modified 6 years ago

Ability to exclude files from search results when using shell patterns

Reported by: esauloff Owned by:
Priority: major Milestone: Future Releases
Component: mc-search Version: master
Keywords: search exclude Cc:
Blocked By: Blocking:
Branch state: no branch Votes for changeset:

Description

It looks like MC does not support file patterns exclusion when searching by shell patterns.
No other ticket or enhancement found to address it as well.

For example if user wants to search for all files but *.a files, search criteria might look like:
*|*.a

It is achievable partly with regex patterns but it is tricky to address it if files extensions are used both to include and exclude files, for example if user wants to search for *.h* files but exclude *.hxx files at the same time:
*.h*|*.hxx

On Windows platform, FAR Manager supports such functionality using '|' symbol as delimiter in search field.

Attachments

3885-search-exclude.patch (48.5 KB) - added by esauloff 6 years ago.
Patch for 3885_search_exclude branch
partial.00.3885-search-exclude.patch (15.7 KB) - added by esauloff 6 years ago.
partial.01.3885-search-exclude.patch (32.8 KB) - added by esauloff 6 years ago.

Change History

comment:1 follow-up: ↓ 3 Changed 6 years ago by esauloff

I implemented this functionality in forked repository on my github:
https://github.com/esauloff/mc/tree/feature/exclude-files

I ended up with creating additional containers to store possible exclusion patterns.
It works only for MC_SEARCH_T_GLOB search type.
Delimiter symbol is hardcoded (do we need to configure it at all??) as '|' (pipe).

/* prepared conditions */
GPtrArray *conditions;
/* prepared conditions: to exclude patterns for MC_SEARCH_T_GLOB type */
GPtrArray *conditions_exclude;

/* original search string */
gchar *original;
gsize original_len;

/* original search string: to exclude patterns for MC_SEARCH_T_GLOB type */
gchar *original_exclude;
gsize original_exclude_len;

Please let me know if I can proceed with pull request to send changes to main MC repository.
Need to rename my branch though to reflect this newly created task number.

Thanks.

Version 0, edited 6 years ago by esauloff (next)

comment:2 Changed 6 years ago by esauloff

Need to update documentation to reflect this change if it is go-forward case.
I can do that in separate branch if needed.

Last edited 6 years ago by esauloff (previous) (diff)

comment:3 in reply to: ↑ 1 Changed 6 years ago by andrew_b

Replying to esauloff:

Please let me know if I can proceed with pull request to send changes to main MC repository.

We don't use pull requests. You can read detailed info in .github/PULL_REQUEST_TEMPLATE.md in the MC source tree.

comment:4 Changed 6 years ago by esauloff

  • Keywords search added; search; removed

comment:5 Changed 6 years ago by esauloff

Added patch for suggested 3885_search_exclude branch.
Branch is based on Ticket #3884: mceditor: syntax: add rust. from 12/07/2017, hash 3a5da75181e48a34281e846e9c86237f13a4e581.

Cannot push branch itself to main MC repository because of permissions.

Please let me know what else is needed to proceed with this change / or I can help you with.

Thanks.

comment:6 Changed 6 years ago by andrew_b

Thanks!
It would be great if you change your function descriptions to doxygen format and provide a description of this new feature in the mc man file.

comment:7 Changed 6 years ago by egmont

Hi guys,

Thanks for working on this, sounds like a cool feature!

I have a few questions/concerns/nits (but I have to admit I haven't tried or taken a closer look at the patch):

  • Would it work everywhere throughout mc where a shell pattern is expected, or only in Find Files?
  • If only in Find Files (or a few special designated places), I'd consider a separate text entry for exclusion. That's more user-friendly and discoverable than a special symbol mentioned somewhere in the docs.
  • What if someone wishes to search for the | symbol? Would e.g. [|] work, is | within [...] parsed specially not to start an exclusion pattern? Or \|?
  • What if multiple | signs are present, how is that parsed? Does it add multiple exclusion criteria? If so, it's a bit silly ifs I can exclude multiple patterns at once, but cannot include multiple ones at once.
  • FAR might use the | symbol, but it's a weird choice to me, I've never seen this symbol meaning negation/exclusion, and its frequent meaning of "or"-ing makes it even more misleading (upon seeing something like *.h*|*.hxx I believe it's the inclusion of both patterns that would first occur to people). Do we really want to stick to it? I mean, we could perhaps go for multiple space-separated patterns, one starting with -, or maybe ~ or ^ meaning exclusion.
  • I might consider renaming "Using shell patterns" to let's say "Using simple patterns" if we're diverging from shells, to emphasize that it's not the same, and to suggest to users to look up the docs. It might make it more confusing, though.

comment:8 follow-up: ↓ 10 Changed 6 years ago by esauloff

Thanks for your feedback.

  • Yes, it should work in all places where files can be found/selected with shell patterns. Find Files dialog and Select Group of Files are main targets here.
  • While I agree that just a sign in search field is not discoverable, I would disagree just personally to have additional search field. New field could overload user interface with something not really popular which most users do not use at all. But it is just me - let's create new field if we agreed on it.
  • Unless there are more than one '|' sign in search criteria (please see next point for more details) - it does not work. It is not possible to search for such files and escaping does not work too. E.g. search criteria 1\|2.txt is parsed to include 1\ and exclude 2.txt patterns currently. I did not try it beforehand and running couple tests right now - but wow, it works pretty unexpected originally in MC. Let's say, we have a directory with 8 files:
      1.txt
      1|2.txt
      1|2|3.txt
      2.txt
      22.txt
      222.txt
      2222.txt
      3.txt
    
    Search criteria 1|2.txt finds all but 3.txt file - 7 files totally. So criteria 1\|2.txt works to find only one 1|2.txt file. For this change - it is a miss from my side and I will cover the case to escape '|' sign with escape character '\'.
  • If multiple '|' signs are present, search criteria is not parsed into "include" and "exclude" parts and search is done like usually. So, no multiple search patterns via multiple '|' signs - but it is still possible to exclude multiple patterns via regular shell expressions, e.g. {*.a,*.o} excludes all *.a and *.o files. Single '|' sign serves as a delimiter point in this case. Search with multiple patterns in both parts works fine:
      {*}|{*.a,*.o}
      {*.h*,*.c*}|{*.hxx,*.cpp}
    
  • Sure, let's discuss what symbol to use. Or maybe symbol itself can be configured via options / config files. I see advantage in '|' sign as it separates search criteria in two parts visually. But I agree, it is not really expected exclusion sign programmatically. Personally, do not mind to use any other symbol - just really used to '|' sign as a FAR user for 15+ years.
  • As it is targeted mainly on shell patterns for file search - do you still want to rename it?

I will add the case to escape '|' sign - hopefully in next 2-3 days - and send the patch. Will postpone any doxygen comments until we have a solid vision for this feature. And obviously no changes to man files too.

comment:9 Changed 6 years ago by zaytsev

Speaking of which, how about tests?! This touches pretty much core functionality of mc which everyone is using most of the time, and as far as I know test coverage is pretty much not there.

If we break something here, I would expect at least torch lit protest marches, or worse even peasants with pitchforks coming after us immediately as a new release comes out.

comment:10 in reply to: ↑ 8 Changed 6 years ago by egmont

Replying to esauloff:

  • While I agree that just a sign in search field is not discoverable, I would disagree just personally to have additional search field.

If it'll work everywhere then I agree that a simple field is better than two fields _everywhere_.

  • If multiple '|' signs are present, search criteria is not parsed into "include" and "exclude" parts

This implies that [||] is aleady a valid "escaped" shell pattern for a literal | to be searched for. _If_ we're sticking with FAR's model, I believe we could totally stick with that and just mention this "escaping" in the docs.

Note, however, that there'll be no way of searching for a pattern containing a | _and_ using exclusion at the same time. As pipe is a rarely used character in filenames, I practically don't mind this too much (although technically it's not a nice design).

How does FAR handle escaping by the way, does it also treat '\|' as a literal backslash followed by a pipe (which is either a literal pipe or the separator)?

  • Sure, let's discuss what symbol to use. Or maybe symbol itself can be configured via options / config files. I see advantage in '|' sign as it separates search criteria in two parts visually. But I agree, it is not really expected exclusion sign programmatically. Personally, do not mind to use any other symbol - just really used to '|' sign as a FAR user for 15+ years.

What I had in mind wasn't just a different single character for separator, but rather a quite different (and sure more complicated) means of parsing. As you showed in your examples, it's probably not necessary.

  • As it is targeted mainly on shell patterns for file search - do you still want to rename it?

What worries me a little bit (but really just a little bit) is the deviation from shell patterns that might confuse people. But probably a different name would confuse them even more :-)

I will add the case to escape '|' sign

Maybe it's not necessary (as per [||]) - and adding special handling for \| probably requires a real proper parser that also handles \\| as a literal backslash followed by an unescaped pipe, etc. Maybe not worth the hassle.

Thanks for your answers! Actually, with adding [||] to the docs, I'm now okay with the current approach :)

Changed 6 years ago by esauloff

Patch for 3885_search_exclude branch

Changed 6 years ago by esauloff

Changed 6 years ago by esauloff

comment:11 Changed 6 years ago by esauloff

Replying to zaytsev:

Speaking of which, how about tests?! This touches pretty much core functionality of mc which everyone is using most of the time, and as far as I know test coverage is pretty much not there.

That's true, I did not find many tests for search functionality and did not add new for now, but this change affects MC_SEARCH_T_GLOB only. And _GLOB is called much rarely than _REGEX.

Replying to egmont:

This implies that [||] is aleady a valid "escaped" shell pattern for a literal | to be searched for. _If_ we're sticking with FAR's model, I believe we could totally stick with that and just mention this "escaping" in the docs.

Yes, [||] is an "escaped" sequence for shell but it was not able to make it work in MC even originally. If you want to search for filename 1|2.txt literally, 1\|2.txt criteria works in MC. Because all search types are converted to regular expressions in the end, looks like escaping with backslash \ is needed.
I added basic method to check if | is escaped in mc_search_is_character_escaped() to cover this case. With this is_escaped check, my changed code works the same as original MC "escaped" search.

Note, however, that there'll be no way of searching for a pattern containing a | _and_ using exclusion at the same time. As pipe is a rarely used character in filenames, I practically don't mind this too much (although technically it's not a nice design).

Totally agree. I hope that exclusion logic combined with filenames with | are rare beast.

How does FAR handle escaping by the way, does it also treat '\|' as a literal backslash followed by a pipe (which is either a literal pipe or the separator)?

FAR does something very similar. They have lists of inclusion and exclusion patterns and check if file matches for inclusion and does not match for exclusion.

In ./far/filemasks.hpp:

class filemasks
    ...
    std::list<masks> Include, Exclude;

In ./far/filemasks.cpp:

bool filemasks::Compare(const string_view& FileName) const
    return contains(Include, FileName) && !contains(Exclude, FileName);

But logic is easier for Windows platform as | is reserved character and cannot be used in filenames:
https://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).aspx#naming_conventions

In ./far/filemasks.cpp again:

static const wchar_t ExcludeMaskSeparator = L'|';

auto ptr = ExpMasks.cbegin();
...
while (ptr != End)
...
   if (ptr != End && *ptr == ExcludeMaskSeparator)

What worries me a little bit (but really just a little bit) is the deviation from shell patterns that might confuse people. But probably a different name would confuse them even more :-)

Ok, so we are stick with current name I believe.

Actually, with adding [||] to the docs, I'm now okay with the current approach :)

Added description of this change to English and Russian hlp/man files. I mentioned that escaped sequence \| works to search for | literally.

So, I am done with my changes until further notice :)
Everything is still based on Ticket #3884: mceditor: syntax: add rust. from 12/07/2017, hash 3a5da75181e48a34281e846e9c86237f13a4e581.

I renamed original attachment 3885-search-exclude.patch​ I sent couple days ago to new name partial.00.3885-search-exclude.patch​. Then I added second patch partial.01.3885-search-exclude.patch​ which includes the rest of my commits.
And added full patch file 3885-search-exclude.patch​ with all changes I've done for this task. Just not sure what's the best way for you when you review it.

Note: See TracTickets for help on using tickets.