Ticket #2361 (closed task: fixed)
VFS URI reimplementation
Reported by: | andrew_b | Owned by: | slavazanko |
---|---|---|---|
Priority: | major | Milestone: | 4.8.0-pre1 |
Component: | mc-vfs | Version: | master |
Keywords: | ipv6 URL | Cc: | zaytsev |
Blocked By: | Blocking: | ||
Branch state: | Votes for changeset: | commited-master |
Description
VFS used in mc has some leaks in design. For instance, the VFS URI contains a VFS name: testcase.tar#utar. In such case VFS name looks like a part of file name. Such desision a) limits a usage of # and @ symbols in filenames and passwords and b) is a root of many problems in URI parsing (see tickets in Blocking: field).
The new VFS URI is proposed: create a special path element /#vfs:vfsname/ that contains only vfs name with #vfs: prefix (like path encoding /#enc:encodingname/). This scheme simplifies the VFS URI parsing and removes the limitation of @ and # symbols usage in file names and passwords.
Examples:
/some/path/#vfs:patchfs/foo.diff
/#vfs:ftp/user:password@host/path/file
Change History
comment:2 in reply to: ↑ description Changed 14 years ago by andrew_b
Replying to andrew_b:
Examples:
/some/path/#vfs:patchfs/foo.diff
/#vfs:ftp/user:password@host/path/file
Well, this is not good solution due to an extra token in path. That extra token makes the path parsing more complicated.
The better solutions are
/some/path/#vfs:patchfs:foo.diff
or
/some/path/#vfs:patchfs@foo.diff
or
/some/path/#vfs:patchfs$foo.diff
In any case, some defined symbol is used to split vfs name and file name. Vfs prefix, vfs name and file name are joined into sungle token.
comment:4 follow-up: ↓ 5 Changed 14 years ago by ossi
oioi, deep waters ahead ... :D
first of all, let's get the terminology clear. mc's vfs paths are *not* URIs in an RFC sense - and it's good that way, as it's optimized for local file names which are by far the most common use case.
however, this intentionally creates an incompatibility with modern desktop environments which usually don't have much interest in user-editable complex paths, but like using (real) URIs for everything. or something completely bizarre like gio does. this topic needs to be re-visited at some point, but i think automatic transformation between mc's user-friendly format and something desktop-compatible would be feasible without much effort.
i don't think we actually need the "vfs:" prefix - it just adds verbosity. i think /# followed by the vfs name (and a colon if parameters follow) is a sufficiently distinct magic. for the rare case where this would cause a false hit, a /#literal:<original> pseudo-vfs could be introduced.
to determine appropriate quoting/escaping rules, try constructing a path which represents a file inside a bzip-compressed tar inside an encrypted rar which resides in a subdirectory of an ftp-server which needs authentication. afterwards you try something really weird. :D
comment:5 in reply to: ↑ 4 ; follow-up: ↓ 6 Changed 14 years ago by andrew_b
Replying to ossi:
first of all, let's get the terminology clear. mc's vfs paths are *not* URIs in an RFC sense
Yes, if you mean RFC2396. But it is Uniform Resource Identifier within MC VFS.
however, this intentionally creates an incompatibility with modern desktop environments
avfs created that incompatibility for years ago and I don't care about that now.
i don't think we actually need the "vfs:" prefix - it just adds verbosity. i think /# followed by the vfs name (and a colon if parameters follow) is a sufficiently distinct magic. for the rare case where this would cause a false hit, a /#literal:<original> pseudo-vfs could be introduced.
#2230? #a, #rpm can be a real file name. #rpms can be a real directoriy name. I want to decrease potential false hits and some verbosity like usage of #vfs: prefix will help us to do that.
comment:6 in reply to: ↑ 5 Changed 14 years ago by ossi
Replying to andrew_b:
the first problem is that the #a# is parsed as a vfs entry even though it clearly does not match the regexp /#(vfs1|vfs2|...)(:|/|$) (as implied by my first comment).
second, the real bug in mc is that it does not escape the path it produces (which may be impossible given the current situation - that's why we have this ticket, after all).
and lastly, this is an utter corner case. let's optimize for the legibility of the more common case.
comment:7 Changed 14 years ago by ossi
fwiw, we should probably use a less verbose escape - simply doubling the hash would already work: /#a/ would become /##a/. whether the the escape should be applied to uris which don't strictly need it (because they don't match the above regexp, like the /#a# example) is a matter of taste, but _must_ be determined upfront - i think i'd enforce it for uniformity (and it makes the parser simpler, because it doesn't have to do look-ahead).
comment:9 Changed 14 years ago by andrew_b
- Status changed from new to accepted
- Owner set to andrew_b
- severity changed from no branch to on review
- Milestone changed from 4.7 to 4.8.0-pre1
Created 2361_vfs_uri branch. Parent branch is master.
Initial changeset:c7105e59899019bb823bc7d571e07f097585f29c
comment:10 follow-up: ↓ 11 Changed 14 years ago by slavazanko
Create test environment:
$ tar cf arch.tar /etc/hosts $ mkdir -p '#vfs:utar:arch.tar/test'
Testing in mc:
1) go to '#vfs:utar:arch.tar' directory and then go up again. Cursor will moved to arch.tar file
2) try to copy/rename/move/delete '#vfs:utar:arch.tar' directory. All these operations will applied to file 'arch.tar', not to directory
comment:11 in reply to: ↑ 10 Changed 14 years ago by andrew_b
Replying to slavazanko:
All these operations will applied to file 'arch.tar', not to directory
Fixed.
comment:12 Changed 14 years ago by ossi
why did you completely disregard my comments about proper escaping and optimizing the common case?
i actually use cd /#sh:somehost rather often, and i certainly won't like having to type some silly and completely unnecessary vfs: prefix now.
comment:13 Changed 14 years ago by andrew_b
You can use sh://somehost.
comment:14 Changed 14 years ago by ossi
as far as i'm concerned, that doesn't exactly emphasize the U in URI.
also, the vfs: syntax is simply a gratuitous incompatible change of long-standing behavior. i've yet to see a compelling justification for it.
comment:15 Changed 14 years ago by slavazanko
- Votes for changeset set to slavazanko
ossi, can you help with writing new URI VFS parser as you like?
comment:16 Changed 14 years ago by ossi
you mean, like, more than reviewing? maybe when i find a long enough free timeslice while my brain is not toasted from work yet. ;)
reviewing the existing branch is quite a nightmare due to a lot of mixed in coding style cleanups ... you should take some inspiration from http://qt.gitorious.org/qt/pages/CommitPolicy point 8.
anyway, on the theory of the necessary changes:
if you want to start from the existing branch, first you need to get rid of the vfx prefix changes, leaving only the refactoring in. i'd probably throw away the commits and extract interesting parts from the cumulative diff with git gui.
you need a single structure to represent a parsed URI, as a list of segments - plain path segments and vfs root segments. then you need a function which parses URI strings (using callbacks to registered vfs handlers) and unescapes literal hashmarks, and a function which assembles the parts back to an URI (again using callbacks), including escaping hashmarks. a single entry point per direction ensures consistent handling.
i wonder what to do about the file#utar syntax. i'd personally get rid of it and require file/#utar (which, btw, kinda works until you try to leave the archive), but this cannot be done without a deprecation phase of the old syntax. but this is non-critical anyway - when hashmarks are properly escaped, a single hashmark could actually still be used as hierarchy delimiter, only that it looks weird (imagine #ftp:foo:b##r@host/yo/my#utar/thefile#urar:passwd/holla/die/wald##fee.gz#gz vs. /#ftp:foo:b##r@host/yo/my/#utar/thefile/#urar:passwd/holla/die/wald##fee.gz/#gz - i find the latter way more readable).
comment:19 Changed 13 years ago by slavazanko
- Owner changed from andrew_b to slavazanko
- Priority changed from major to critical
- Votes for changeset slavazanko deleted
- severity changed from on rework to on review
Branch changeset:2361_vfs_uri have been totally rewritten.
New initial changeset:a89c7d252dd34af26473459e5ccf8d0bd2870821
Review, please.
comment:20 Changed 13 years ago by slavazanko
Note: VFS paths now handled as
vfsprefix1://vfsdata/vfsprefix2://vfsdata
Also, 'bindings' user file was renamed to 'mc.ext', so you need search in this file all
Open=file.ext#vfsprefix
and replace them to
Open=file.ext/vfsprefix://
After this you should rename your 'bindings' file to 'mc.ext'
Old-style paths are handled too, but you couldn't mix URL-like and old style path elements in one path string.
Support of old-style paths will be removed in next major release (probably in 4.9, who knows...)
Be aware.
comment:21 Changed 13 years ago by ossi
i'll do a proper review in the next days.
why did you introduce the new syntax? a) it's noisier (see my previous arguments) b) it pretends to be something which it isn't ("proper" URLs) c) it is an apparently gratuitous change (i didn't see the commit message explain an actual need to do it)
comment:23 Changed 13 years ago by slavazanko
why did you introduce the new syntax?
The new syntax more reliable to parse path string and easily for reading by humans :)
comment:24 Changed 13 years ago by ossi
care to back up that claim with some examples and counter-examples?
comment:25 Changed 13 years ago by angel_il
- Votes for changeset changed from andrew_b to andrew_b angel_il
- severity changed from on review to approved
comment:26 Changed 13 years ago by slavazanko
- Status changed from accepted to testing
- Votes for changeset changed from andrew_b angel_il to commited-master
- Resolution set to fixed
- severity changed from approved to merged
- Blocking 33, 81, 1605, 1687, 2016, 2073, 2220, 2251, 2360 removed
Merge changeset:f2ebbd2eb4bd9e196963ecaf1f79b31986ad64ac
For getting list of commits in branch type:
git log --pretty=oneline fcfa76b..ef676d3
comment:28 Changed 13 years ago by ossi
$ cd
$ mkdir '#utar'
enter that directory
leave that directory. you are in /home now.
$ cd
$ mkdir 'foo#utar'
enter that directory
make directory "bar". you get "function not implemented".
comment:29 follow-up: ↓ 30 Changed 13 years ago by slavazanko
well, is we need to drop old-style path parser?
comment:30 in reply to: ↑ 29 Changed 13 years ago by andrew_b
comment:31 Changed 13 years ago by slavazanko
I think, need to leave old-style pathes just in one place: in "Directory hotlist".
But new Directory hotlist entries should be saved in URL-like format.
comment:32 Changed 13 years ago by slavazanko
- Status changed from closed to reopened
- Priority changed from critical to major
- Resolution fixed deleted
- severity changed from merged to on rework
comment:33 Changed 13 years ago by slavazanko
- Status changed from reopened to accepted
- Votes for changeset commited-master deleted
comment:34 Changed 13 years ago by andrew_b
Support of relative symlinks was broken in new vfs.
comment:35 Changed 13 years ago by slavazanko
Created branch 2361_url_path
Initial changeset:e6c25891e7b5cd8a0b8cae0d566e5c1c68a52f7a
Review, please.
comment:38 Changed 13 years ago by angel_il
- Votes for changeset changed from andrew_b to andrew_b angel_il
- severity changed from on review to approved
comment:39 Changed 13 years ago by slavazanko
- Status changed from accepted to testing
- Keywords ipv6 URL added
- Votes for changeset changed from andrew_b angel_il to commited-master
- Resolution set to fixed
- severity changed from approved to merged
Merge changeset:906f6887426279ca179ef6e247d9c3a780b613b2
For getting list of commits type:
git log --pretty=oneline 92d2119..0f249bd