Ticket #55 (closed defect: fixed)
savannah: tab completion vs. spaces and escaping
Reported by: | egmont | Owned by: | slavazanko |
---|---|---|---|
Priority: | major | Milestone: | 4.8.8 |
Component: | mc-core | Version: | master |
Keywords: | Cc: | ossi, egmont@… | |
Blocked By: | Blocking: | ||
Branch state: | merged | Votes for changeset: | committed-master |
Description (last modified by ossi) (diff)
Original: http://savannah.gnu.org/bugs/?18695
Submitted by: | Egmont Koblinger <egmont> | Submitted on: | Fri 05 Jan 2007 01:07:06 PM UTC |
Category: | None | Severity: | 3 - Normal |
Status: | None | Privacy: | Public |
Assigned to: | None | Open/Closed: | Open |
Release: | current (CVS or snapshot) | Operating System: | GNU/Linux |
Original submission:
Tab completion (Alt+Tab or Escape+Tab, I'll refer to these simply as Tab) works exactly the same way in the command line and in dialog boxes (such as copy/move file). The function complete() in src/complete.c only takes a widget as its input and performs the completion. This approach is wrong, since the dialog boxes and the command line have different word splitting and escaping rules, they cannot be treated the same way. Currently they both have bugs. (I found these while trying to tidy up my mp3s, where lots of file and directory names contain spaces.) In the file copy and similar dialogs, filenames are typed as pure strings without any kind of escaping, and a text entry only takes one filename. However, splitting at spaces is still performed. Suppose I have a directory named "/tmp/a bcde". I press F5/F6 on a file, and type "/tmp/a" and then Tab, it completes to "/tmp/a bcde". This is okay. But if I type "/tmp/a bc" and press Tab now, it completes to the files in the current directory whose name begins with "bc", due to the splitting at the space. Similarly, if I already have "/tmp/a bcde/" in the edit field, it won't complete to subdirectories or files of this directory. In this case the whole content of the field should be treated as one, not being split at spaces. In the command line, splitting at spaces is okay, however, we would need some escaping and unescaping too, which doesn't happen. (Note: Esc followed by Enter correctly escapes the filename when it puts it into the cmdline.) Example: type "ls /tmp/a" and then Tab, you'll have "ls /tmp/a bcdefgh/" in the command line, which clearly won't list the contents of that directory due to the unescaped space character. Another example: type "ls /tmp/a\ b" and hit Tab, it won't complete to "ls /tmp/a\ bcde/", but it should. I understand that it is beyond the scope of mc to perform a full shell syntax parsing (single and double quotes, nesting them etc.). This doesn't even happen when Esc Enter is pressed: backslashes are inserted even between quotes, but I'm fine with that. However, mc should produce proper escaping when inserting spaces or other special chars as a result of completion, and should be able to parse the backslash escaping it created (or the user typed) when doing further completion. Obviously the current design of the complete() function to take only a widget is not okay (unless there's some room in the widget to store splitting and escaping rules). There are two main possible solutions I can see, but probably they'd eventually lead to basically the same code. 1) modify complete() so that it receives information on word splitting and escaping, so that it becomes more complex and probably more buggy and harder to maintain, but solves everything we need. 2) remove the word splitting code from complete(), so that it becomes simpler and perfect for the file copy/move widgets. Slightly modify its interface to take and return string and cursor positions instead of widget. Then make the command line editor split at unescaped spaces, unescape the resulted word, call complete() on this temporary string, escape the result of completion and put back in the command line. Finally, we should check how the internal "cd" command handles its arguments. Currently it seems to do a lot of heuristics. For example, if I type "cd a\\\\b" (four backslashes) then it tries to change to the directory called "a\\\\b" (four backslashes) but if there's no such directory, it changes to "a\\b" (two backslashes). This seems to be terrible, and leads to bugs, e.g. if the cursor is on "a\\b" and I type "cd " followed by a Tab, I get the command "cd a\\\\b" and hence it'll change to "a\\\\b" which is not the desired directory. I'd remove all the heuristics and perform the usual command line unescaping, just as if it was an external command.
Comment 1 by Pavel Tsekov <ptsekov> at Fri 05 Jan 2007 01:53:04 PM UTC:
Sounds familiar... http://savannah.gnu.org/bugs/?16176
Change History
comment:4 Changed 14 years ago by andrew_b
- Version set to master
- severity set to no branch
- Blocked By 157 removed
comment:7 Changed 13 years ago by slavazanko
- Owner set to slavazanko
- Status changed from new to accepted
- Branch state set to no branch
comment:8 Changed 13 years ago by slavazanko
- Cc egmont added
- Branch state changed from no branch to on review
Created branch 55_filename_complete (parent: master)
Initial 55_filename_complete
Review, please.
comment:11 Changed 13 years ago by slavazanko
- Branch state changed from on rework to on review
- Milestone changed from 4.8 to 4.8.3
comment:12 Changed 13 years ago by ossi
in lib/widget/input_complete.c, the tab char should also be treated like that. you can insert a tab with ctrl-q tab easily, even if this is not really useful in the normal case.
INPUT_COMPLETE_FILES_ESC sounds like a misnomer to me. at the level of the input api it doesn't relate to files at all (or at least it shouldn't - i can't see from the patch alone whether that's actually the case). INPUT_COMPLETE_SPACE_ESC sounds like a better name.
comment:13 Changed 13 years ago by slavazanko
comment:14 Changed 13 years ago by ossi
actually, the logic in lib/widget/input_complete.c is broken. i think it should be more or less like this:
// no space, no tab here. // XXX this list seems pretty arbitrary to me. if shell, missing at least: ()& if (strchr (";|<>", *s) != NULL) break; if ((*s == 32 || *s == 9) && (!(in->completion_flags & INPUT_COMPLETE_SPACE_ESC) || !strutils_is_char_escaped (in->buffer, s))) break;
comment:16 Changed 13 years ago by slavazanko
- Branch state changed from on rework to on review
- Milestone changed from 4.8.3 to 4.8.4
rebased to latest master and have respected ossi's thoughts.
Review, please.
comment:17 Changed 13 years ago by ossi
a recommendation for the commit message: use the past tense for the broken things that your commit fixes. it makes no sense to describe *some* state in the present tense, because it is not clear whether that's the new or the old state.
your re-definition of INPUT_COMPLETE_DEFAULT makes it a misnomer - it's more like INPUT_COMPLETE_NONE now.
anyway, the more i look into this code, the less i understand. what exactly is the above fragment supposed to do? the "stop chars" in the strchr are applied irrespective of mode, which makes no sense to me at all. what is INPUT_COMPLETE_SPACE_ESC meant to actually do? as far as i can see, there are only two modes: plain (regular line inputs) and shell-like (command prompt. fwiw, the built-in cd should just pretend to be a regular cd command (i.e., follow the normal shell splitting and quoting rules), then there would be no ambiguity). i can't find the code which is supposed to quote completions for the command prompt - am i or the code missing something?
try_complete() contains a lot of fuzzy logic, in particular with backslash escaping being handled rather arbitrarily or not at all. strutils_is_char_escaped() should be consistently applied, with regard to the mode.
INPUT_COMPLETE_DIRNAMES needs to be split out from INPUT_COMPLETE_CD, otherwise the "cd" detection magic is activated outside command prompt context as well.
generally, i'd suggest that you *precisely* describe the intended semantics of the INPUT_COMPLETE_* enum values next to their declarations, and then match these descriptions against both the implementation and the usages.
wow. two hours gone.
comment:18 Changed 13 years ago by slavazanko
- Branch state changed from on review to on rework
a recommendation for the commit message: use the past tense for the broken things that your commit fixes. it makes no sense to describe *some* state in the present tense, because it is not clear whether that's the new or the old state.
Thank for explanation, my English isn't good, and I accept all lessons and criticism.
your re-definition of INPUT_COMPLETE_DEFAULT makes it a misnomer - it's more like INPUT_COMPLETE_NONE now.
Yep, you right, INPUT_COMPLETE_NONE have more sense now. I changed the name of constant.
All your other notes I have to take a closer look in the evening with a relaxing cup of tea. So, ticket going to 'rework' stage.
wow. two hours gone.
But I hope, you got a fun in this two hours, because a fun it an engine of OpenSource (in most cases) :)
comment:21 Changed 12 years ago by slavazanko
- Branch state changed from on rework to on review
- Milestone changed from Future Releases to 4.8.8
Branch was recreated. See 55_filename_complete for details. Review & vote, please.
comment:23 Changed 12 years ago by angel_il
- Votes for changeset changed from andrew_b to andrew_b angel_il
- Branch state changed from on review to approved
comment:24 Changed 12 years ago by slavazanko
- Status changed from accepted to testing
- Votes for changeset changed from andrew_b angel_il to committed-master
- Resolution set to fixed
- Blocking 2450, 2626 removed
- Branch state changed from approved to merged
merged to master:
git log --pretty=oneline 74d71e7..0608af2
comment:26 Changed 11 years ago by ossi
- Cc ossi added; ossi@…, egmont removed
- Description modified (diff)
- Reporter changed from slavazanko to egmont
Related ticket: #41