Ticket #2361 (closed task: fixed)

Opened 14 years ago

Last modified 13 years ago

VFS URI reimplementation

Reported by: andrew_b Owned by: slavazanko
Priority: major Milestone: 4.8.0-pre1
Component: mc-vfs Version: master
Keywords: ipv6 URL Cc: zaytsev
Blocked By: Blocking:
Branch state: Votes for changeset: commited-master

Description

VFS used in mc has some leaks in design. For instance, the VFS URI contains a VFS name: testcase.tar#utar. In such case VFS name looks like a part of file name. Such desision a) limits a usage of # and @ symbols in filenames and passwords and b) is a root of many problems in URI parsing (see tickets in Blocking: field).

The new VFS URI is proposed: create a special path element /#vfs:vfsname/ that contains only vfs name with #vfs: prefix (like path encoding /#enc:encodingname/). This scheme simplifies the VFS URI parsing and removes the limitation of @ and # symbols usage in file names and passwords.

Examples:

/some/path/#vfs:patchfs/foo.diff
/#vfs:ftp/user:password@host/path/file

Change History

comment:1 Changed 14 years ago by zaytsev

  • Cc zaytsev added

comment:2 in reply to: ↑ description Changed 14 years ago by andrew_b

Replying to andrew_b:

Examples:

/some/path/#vfs:patchfs/foo.diff
/#vfs:ftp/user:password@host/path/file

Well, this is not good solution due to an extra token in path. That extra token makes the path parsing more complicated.

The better solutions are
/some/path/#vfs:patchfs:foo.diff
or
/some/path/#vfs:patchfs@foo.diff
or
/some/path/#vfs:patchfs$foo.diff
In any case, some defined symbol is used to split vfs name and file name. Vfs prefix, vfs name and file name are joined into sungle token.

comment:3 Changed 14 years ago by andrew_b

  • Blocked By 2137 removed

comment:4 follow-up: ↓ 5 Changed 14 years ago by ossi

oioi, deep waters ahead ... :D

first of all, let's get the terminology clear. mc's vfs paths are *not* URIs in an RFC sense - and it's good that way, as it's optimized for local file names which are by far the most common use case.
however, this intentionally creates an incompatibility with modern desktop environments which usually don't have much interest in user-editable complex paths, but like using (real) URIs for everything. or something completely bizarre like gio does. this topic needs to be re-visited at some point, but i think automatic transformation between mc's user-friendly format and something desktop-compatible would be feasible without much effort.

i don't think we actually need the "vfs:" prefix - it just adds verbosity. i think /# followed by the vfs name (and a colon if parameters follow) is a sufficiently distinct magic. for the rare case where this would cause a false hit, a /#literal:<original> pseudo-vfs could be introduced.

to determine appropriate quoting/escaping rules, try constructing a path which represents a file inside a bzip-compressed tar inside an encrypted rar which resides in a subdirectory of an ftp-server which needs authentication. afterwards you try something really weird. :D

comment:5 in reply to: ↑ 4 ; follow-up: ↓ 6 Changed 14 years ago by andrew_b

Replying to ossi:

first of all, let's get the terminology clear. mc's vfs paths are *not* URIs in an RFC sense

Yes, if you mean RFC2396. But it is Uniform Resource Identifier within MC VFS.

however, this intentionally creates an incompatibility with modern desktop environments

avfs created that incompatibility for years ago and I don't care about that now.

i don't think we actually need the "vfs:" prefix - it just adds verbosity. i think /# followed by the vfs name (and a colon if parameters follow) is a sufficiently distinct magic. for the rare case where this would cause a false hit, a /#literal:<original> pseudo-vfs could be introduced.

#2230? #a, #rpm can be a real file name. #rpms can be a real directoriy name. I want to decrease potential false hits and some verbosity like usage of #vfs: prefix will help us to do that.

comment:6 in reply to: ↑ 5 Changed 14 years ago by ossi

Replying to andrew_b:

#2230?

the first problem is that the #a# is parsed as a vfs entry even though it clearly does not match the regexp /#(vfs1|vfs2|...)(:|/|$) (as implied by my first comment).
second, the real bug in mc is that it does not escape the path it produces (which may be impossible given the current situation - that's why we have this ticket, after all).
and lastly, this is an utter corner case. let's optimize for the legibility of the more common case.

comment:7 Changed 14 years ago by ossi

fwiw, we should probably use a less verbose escape - simply doubling the hash would already work: /#a/ would become /##a/. whether the the escape should be applied to uris which don't strictly need it (because they don't match the above regexp, like the /#a# example) is a matter of taste, but _must_ be determined upfront - i think i'd enforce it for uniformity (and it makes the parser simpler, because it doesn't have to do look-ahead).

comment:8 Changed 14 years ago by andrew_b

  • Blocking 2230 removed
  • Blocked By 1512 removed

comment:9 Changed 14 years ago by andrew_b

  • Status changed from new to accepted
  • Owner set to andrew_b
  • severity changed from no branch to on review
  • Milestone changed from 4.7 to 4.8.0-pre1

Created 2361_vfs_uri branch. Parent branch is master.
Initial changeset:c7105e59899019bb823bc7d571e07f097585f29c

comment:10 follow-up: ↓ 11 Changed 14 years ago by slavazanko

Create test environment:

$ tar cf arch.tar /etc/hosts
$ mkdir -p '#vfs:utar:arch.tar/test'

Testing in mc:
1) go to '#vfs:utar:arch.tar' directory and then go up again. Cursor will moved to arch.tar file
2) try to copy/rename/move/delete '#vfs:utar:arch.tar' directory. All these operations will applied to file 'arch.tar', not to directory

comment:11 in reply to: ↑ 10 Changed 14 years ago by andrew_b

Replying to slavazanko:

All these operations will applied to file 'arch.tar', not to directory

Fixed.

comment:12 Changed 14 years ago by ossi

why did you completely disregard my comments about proper escaping and optimizing the common case?
i actually use cd /#sh:somehost rather often, and i certainly won't like having to type some silly and completely unnecessary vfs: prefix now.

comment:13 Changed 14 years ago by andrew_b

You can use sh://somehost.

comment:14 Changed 14 years ago by ossi

as far as i'm concerned, that doesn't exactly emphasize the U in URI.

also, the vfs: syntax is simply a gratuitous incompatible change of long-standing behavior. i've yet to see a compelling justification for it.

comment:15 Changed 14 years ago by slavazanko

  • Votes for changeset set to slavazanko

ossi, can you help with writing new URI VFS parser as you like?

comment:16 Changed 14 years ago by ossi

you mean, like, more than reviewing? maybe when i find a long enough free timeslice while my brain is not toasted from work yet. ;)

reviewing the existing branch is quite a nightmare due to a lot of mixed in coding style cleanups ... you should take some inspiration from http://qt.gitorious.org/qt/pages/CommitPolicy point 8.

anyway, on the theory of the necessary changes:

if you want to start from the existing branch, first you need to get rid of the vfx prefix changes, leaving only the refactoring in. i'd probably throw away the commits and extract interesting parts from the cumulative diff with git gui.

you need a single structure to represent a parsed URI, as a list of segments - plain path segments and vfs root segments. then you need a function which parses URI strings (using callbacks to registered vfs handlers) and unescapes literal hashmarks, and a function which assembles the parts back to an URI (again using callbacks), including escaping hashmarks. a single entry point per direction ensures consistent handling.

i wonder what to do about the file#utar syntax. i'd personally get rid of it and require file/#utar (which, btw, kinda works until you try to leave the archive), but this cannot be done without a deprecation phase of the old syntax. but this is non-critical anyway - when hashmarks are properly escaped, a single hashmark could actually still be used as hierarchy delimiter, only that it looks weird (imagine #ftp:foo:b##r@host/yo/my#utar/thefile#urar:passwd/holla/die/wald##fee.gz#gz vs. /#ftp:foo:b##r@host/yo/my/#utar/thefile/#urar:passwd/holla/die/wald##fee.gz/#gz - i find the latter way more readable).

comment:17 Changed 14 years ago by andrew_b

  • Blocking 1687 added

comment:18 Changed 14 years ago by slavazanko

  • severity changed from on review to on rework

comment:19 Changed 13 years ago by slavazanko

  • Owner changed from andrew_b to slavazanko
  • Priority changed from major to critical
  • Votes for changeset slavazanko deleted
  • severity changed from on rework to on review

Branch changeset:2361_vfs_uri have been totally rewritten.

New initial changeset:a89c7d252dd34af26473459e5ccf8d0bd2870821

Review, please.

comment:20 Changed 13 years ago by slavazanko

Note: VFS pathes now handled as

vfsprefix1://vfsdata/vfsprefix2://vfsdata

Also, user 'bindings' file renamed to 'mc.ext', so you need search in this file all

 Open=file.ext#vfsprefix

and replace them to

 Open=file.ext/vfsprefix://

After this you should rename your 'bindings' file to 'mc.ext'

Old-style paths processed too, but you couldn't mix URL-like and old style path elements in one path string.

Support of old-style paths will be removed in next major release (probably in 4.9, who know...)

Be aware.

Version 0, edited 13 years ago by slavazanko (next)

comment:21 Changed 13 years ago by ossi

i'll do a proper review in the next days.

why did you introduce the new syntax? a) it's noisier (see my previous arguments) b) it pretends to be something which it isn't ("proper" URLs) c) it is an apparently gratuitous change (i didn't see the commit message explain an actual need to do it)

comment:22 Changed 13 years ago by andrew_b

  • Votes for changeset set to andrew_b

comment:23 Changed 13 years ago by slavazanko

why did you introduce the new syntax?

The new syntax more reliable to parse path string and easily for reading by humans :)

comment:24 Changed 13 years ago by ossi

care to back up that claim with some examples and counter-examples?

comment:25 Changed 13 years ago by angel_il

  • Votes for changeset changed from andrew_b to andrew_b angel_il
  • severity changed from on review to approved

comment:26 Changed 13 years ago by slavazanko

  • Status changed from accepted to testing
  • Votes for changeset changed from andrew_b angel_il to commited-master
  • Resolution set to fixed
  • severity changed from approved to merged
  • Blocking 33, 81, 1605, 1687, 2016, 2073, 2220, 2251, 2360 removed

Merge changeset:f2ebbd2eb4bd9e196963ecaf1f79b31986ad64ac

For getting list of commits in branch type:

git log --pretty=oneline fcfa76b..ef676d3

comment:27 Changed 13 years ago by slavazanko

  • Status changed from testing to closed

comment:28 Changed 13 years ago by ossi

$ cd
$ mkdir '#utar'
enter that directory
leave that directory. you are in /home now.

$ cd
$ mkdir 'foo#utar'
enter that directory
make directory "bar". you get "function not implemented".

comment:29 follow-up: ↓ 30 Changed 13 years ago by slavazanko

well, do we need to drop old-style path parser?

Last edited 13 years ago by andrew_b (previous) (diff)

comment:30 in reply to: ↑ 29 Changed 13 years ago by andrew_b

Replying to slavazanko:

well, do we need to drop old-style path parser?

I think, yes.

comment:31 Changed 13 years ago by slavazanko

I think, need to leave old-style pathes just in one place: in "Directory hotlist".
But new Directory hotlist entries should be saved in URL-like format.

comment:32 Changed 13 years ago by slavazanko

  • Status changed from closed to reopened
  • Priority changed from critical to major
  • Resolution fixed deleted
  • severity changed from merged to on rework

comment:33 Changed 13 years ago by slavazanko

  • Status changed from reopened to accepted
  • Votes for changeset commited-master deleted

comment:34 Changed 13 years ago by andrew_b

Support of relative symlinks was broken in new vfs.

comment:35 Changed 13 years ago by slavazanko

Created branch 2361_url_path

Initial changeset:e6c25891e7b5cd8a0b8cae0d566e5c1c68a52f7a

Review, please.

comment:36 Changed 13 years ago by slavazanko

  • severity changed from on rework to on review

comment:37 Changed 13 years ago by andrew_b

  • Votes for changeset set to andrew_b

comment:38 Changed 13 years ago by angel_il

  • Votes for changeset changed from andrew_b to andrew_b angel_il
  • severity changed from on review to approved

comment:39 Changed 13 years ago by slavazanko

  • Status changed from accepted to testing
  • Keywords ipv6 URL added
  • Votes for changeset changed from andrew_b angel_il to commited-master
  • Resolution set to fixed
  • severity changed from approved to merged

Merge changeset:906f6887426279ca179ef6e247d9c3a780b613b2

For getting list of commits type:

git log --pretty=oneline 92d2119..0f249bd

comment:40 Changed 13 years ago by slavazanko

  • Status changed from testing to closed
Note: See TracTickets for help on using tickets.