Ticket #3073 (closed defect: fixed)
[urar helper] fix filenames with spaces handling with unrar v5
Reported by: | post-factum | Owned by: | andrew_b |
---|---|---|---|
Priority: | trivial | Milestone: | 4.8.12 |
Component: | mc-vfs | Version: | master |
Keywords: | urar unrar | Cc: | dnh@… |
Blocked By: | Blocking: | ||
Branch state: | merged | Votes for changeset: |
Description
unrar v5 helper script doesn't print filenames with spaces. The following pull request fixes that.
Attachments
Change History
comment:1 Changed 11 years ago by andrew_b
- Owner set to andrew_b
- Status changed from new to accepted
- Votes for changeset set to andrew_b
- Branch state changed from no branch to on review
- Milestone changed from Future Releases to 4.8.11
comment:2 Changed 11 years ago by slavazanko
- Votes for changeset changed from andrew_b to andrew_b slavazanko
- Branch state changed from on review to approved
comment:3 Changed 11 years ago by andrew_b
- Status changed from accepted to testing
- Votes for changeset changed from andrew_b slavazanko to committed-master
- Resolution set to fixed
- Branch state changed from approved to merged
Merged to master: [c4a546ac01911e10bb79e92f3757cebd8ef6454f].
comment:5 follow-up: ↓ 6 Changed 11 years ago by Hamer13
- Status changed from closed to reopened
- Resolution fixed deleted
I think this is not so good solution.
The filenames can have more than one whitespaces in one chunk. For example: "File_name__with_spaces". "_" means whitespace. Awk will squeeze this double (treble and so on) whitespaces into one whitespace.
I propose this patch:
$ diff -ub /usr/lib/mc/extfs.d/urar.orig /usr/lib/mc/extfs.d/urar --- /usr/lib/mc/extfs.d/urar.orig 2013-09-17 12:19:11.000000000 +0300 +++ /usr/lib/mc/extfs.d/urar 2013-10-26 15:22:22.000000000 +0300 @@ -43,13 +43,14 @@ BEGIN { flag=0 } /^-----------/ { flag++; if (flag > 1) exit 0; next } flag==1 { + str=substr($0, 65) split($5, a, "-") if (index($1, "D") != 0) $1="drwxr-xr-x" else if (index($1, ".") != 0) $1="-rw-r--r--" - printf "%s 1 %s %s %d %02d/%02d/%02d %s ./%s\n", $1, uid, gid, $2, a[2], a[1], a[3], $6, $8 + printf "%s 1 %s %s %d %02d/%02d/%02d %s ./%s\n", $1, uid, gid, $2, a[2], a[1], a[3], $6, str }' }
comment:6 in reply to: ↑ 5 ; follow-up: ↓ 7 Changed 11 years ago by andrew_b
Replying to Hamer13:
I think this is not so good solution.
The filenames can have more than one whitespaces in one chunk.
Agree.
+ str=substr($0, 65)
Why 65?
comment:7 in reply to: ↑ 6 Changed 11 years ago by andrew_b
Replying to andrew_b:
Why 65?
Ah, sorry... No question.
I'm not sure that width of 'Size' and 'Packet' fields is constant always. Can it be wider for huge files?
comment:8 Changed 11 years ago by andrew_b
- Votes for changeset changed from committed-master to andrew_b
- Branch state changed from merged to on review
Branch: 3073_urar5_spaces
changeset:af4d458981f499d27d8b6d3c5b0c8ce2406d385e
comment:9 Changed 11 years ago by andrew_b
How about this patch?
nf = split($0, fields, " ", seps);
It's not portable. seps a) is a gawk extension and b) gawk >= 4.0.0. is required.
comment:10 Changed 11 years ago by dnh
- Cc dnh@… added
*ARGH* *darn* Too good to be portable ... Thanks a bundle andrew_b. Didn't
notice that I need gawk >= 4.0.0! And no wonder I did not know that feature
before, but it came in just too handy ;) Maybe "keep in in mind" though ;)
I still dislike indexing by the CRC field though, just on general principle.
I'll ponder programming around that and submit if I come up with anything
more useful, but for the time being, I guess that indexing by the CRC-field
should work. For openSUSE I'll just require gawk >= 4.0.0 and keep (adjust)
my patch, I think it's just somewhat more robust than the indexing ...
PS: I'm one of the maintainers of the mc package for openSUSE
comment:11 Changed 11 years ago by slavazanko
Dnh, we have to maintain lot of the things related to backward compatibility (such as: support of OpenWrt-distros, BusyBox-toolset and so on). Would be great to find a way to split few first parameters and to stay 'filename' parameter 'as is' with spaces. Your patch looks pretty good, but as Andrew say, your way will have restricted use cases.
Probably, we should check a version of gawk in configure.ac script and next we may set up some flag to switch between algorithms in the VFS-helper...
comment:12 Changed 11 years ago by dnh
Once you think about the change it was easy to replace the four-arg split. The only problem was getting the seps[i+1] right.
comment:13 Changed 11 years ago by andrew_b
I've attached the test archive with the following content
testfile1 <space>testfile2 testfile3<space> test<space>file<3 spaces>4
<spase> above is the mark of space in the file name.
This patch has some trouble with "testfile3 " file (name is finished with space). Trailing space is lost. Other names are processed correctly.
comment:14 follow-up: ↓ 15 Changed 11 years ago by dnh
Yes, some oSuSE user (not sure if he wants to be mentioned here) already
noticed and notified me, even though he suggested a slightly different
solution (saving n-seps, checking that for greater nameparts, and add
'seps[n-seps]' if true ;). I thought about it quite a bit before getting up
last night ;) I'll ask him if and how he wants to be credited here for the
patch if "you" deem it better than the indexing by the CRC field version
currently used and integrate it into coming versions (He found the problem
independently of this existing bug (ticket) here). He suggested solutions, I
rewrote/improved those (significantly), so it's quite a shared credit if any!
I have a pending mail with him on the topic and will ask him about it (and
he's to go first in the mention if...)) (and will divulge his mail address to
the maintainers so they can ask themselves, just not as publicly as here).
I'll update the attached patch. My solution is to add a 'name=name seps[i]'
after the name-concatenating loop. If it's "filled", trailing seps get added,
if there are none, awk autovivifies the array-variable to the empty string, so
no extra checking is needed. "Blindly" concatenating seps[i] (and 'i' was
in-loop incremented as required, the 'i++' is executed "post-loop" (I checked
albeit only with gawk this time ;)) 'seps[i]' is just the last part of any
trailing seps characters.
This still get's torpedoed for names less than 13 chars length by the
following pruning of trailing spaces (which I have not checked in unrar's
sources if unrar5 still adds those flaky padding spaces sometimes. I just took
that "as true" from the existing urar helper/patch. IIRC. If that's wrong I'm
the first to say to leave that out ;) Maybe it's better to just "pass on"
that IMO bug of unrar to our users and document it. I've no idea. I just found
it in the existing "logic" and I tried not to change that (it's not in 4.8.10, but in the patch I started from). So, if in doubt, just prune those lines
+ ### remove padding blanks from short names
+ if (length(name)<13) {
+ sub(" *$", "", name);
+ }
from the patch.
BTW: shouldn't we gather some "testing" .rar (and other archive) files in the
"tests" directory? I've got a couple more ;) You rarely need more than
"touched" files, so archive size is basically archive header size + length of
included filenames, a couple of KB maybe ... Maybe create one really "sicko"
archive and pack that differently (of course, not all archivers may handle all
filenames, e.g. how does .zip handle '::' ?). We could even distribute that "sicko" archive as it's useful when your developing e.g. shell scripts :)
Sorry this got so long ...
Changed 11 years ago by dnh
- Attachment mc-extfs-helpers-urar.patch added
fix to add trailing seps[*] stuff
comment:15 in reply to: ↑ 14 ; follow-up: ↓ 16 Changed 11 years ago by andrew_b
Indeed, in output if unrar v or unrar l there are padding spaces after file name.
mprintf(L"%-12ls",Name);
They cannot be removed correctly. If file name finished with space(s) we get corrupted file name at the output.
But output of unrar vt contains correct file names w/o any padding:
$ unrar vt test.rar UNRAR 5.00 freeware Copyright (c) 1993-2013 Alexander Roshal Archive: test.rar Details: RAR 4 Name: testfile1 Type: File Size: 0 Packed size: 8 Ratio: 0% mtime: 2013-11-25 13:04,000 Attributes: -rw-r--r-- CRC32: 00000000 Host OS: Unix Compression: RAR 3.0(v29) -m3 -md=128K Name: testfile2 Type: File Size: 0 Packed size: 8 Ratio: 0% mtime: 2013-11-25 13:04,000 Attributes: -rw-r--r-- CRC32: 00000000 Host OS: Unix Compression: RAR 3.0(v29) -m3 -md=128K Name: testfile3 Type: File Size: 0 Packed size: 8 Ratio: 0% mtime: 2013-11-25 13:04,000 Attributes: -rw-r--r-- CRC32: 00000000 Host OS: Unix Compression: RAR 3.0(v29) -m3 -md=128K Name: test file 4 Type: File Size: 0 Packed size: 8 Ratio: 0% mtime: 2013-11-25 13:04,000 Attributes: -rw-r--r-- CRC32: 00000000 Host OS: Unix Compression: RAR 3.0(v29) -m3 -md=128K
I think, we can try to parse this output. We need only following three lines:
Name: testfile1 Size: 0 mtime: 2013-11-25 13:04,000 Attributes: -rw-r--r--
comment:16 in reply to: ↑ 15 Changed 11 years ago by dnh
Replying to andrew_b:
Indeed, in output if unrar v or unrar l there are padding spaces after file name.
mprintf(L"%-12ls",Name);They cannot be removed correctly. If file name finished with space(s) we get corrupted file name at the output.
Exactly. And short filenames with trailing spaces are probably more seldom than
short ones without, so stripping spaces is/was the correct approach IMO. Upstream
bug. And perl's Archive::Rar is also no solution as it uses the external 'rar'.
But output of unrar vt contains correct file names w/o any padding:
I think, we can try to parse this output. We need only following three lines:
Name: testfile1 Size: 0 mtime: 2013-11-25 13:04,000 Attributes: -rw-r--r--
Got a seemingly working first draft already. At least working with "my" test archive containing cases mentionend in this ticket. Even though I wrote it pretty tired I'll attach the new patch for you all to test on whatever "weird" archives you have and for portability issues.
Changed 11 years ago by dnh
- Attachment mc-extfs-helpers-urar.2.patch added
Reimplementation of mcrar5fs_list in vfs/extfs/helpers/urar using 'unrar vt'
comment:17 Changed 11 years ago by andrew_b
Great!
Branch: 3073_urar5_spaces (rebased)
changeset:6d9f09b81e2a2f9c04b929ed8075ac7d8e6c7063
comment:18 Changed 11 years ago by slavazanko
- Votes for changeset changed from andrew_b to andrew_b slavazanko
- Branch state changed from on review to approved
comment:19 Changed 11 years ago by andrew_b
- Status changed from reopened to closed
- Resolution set to fixed
Merged to master: [9d190f66ebb927d59e57dae41cfa415a4f17b7a2].
comment:20 Changed 11 years ago by andrew_b
- Votes for changeset changed from andrew_b slavazanko to committed-master
- Branch state changed from approved to merged
Changed 11 years ago by dnh
- Attachment urar.in.patch added
*argh* missed a '' in a regex ... Should not be critical, but please add ...
comment:21 Changed 11 years ago by dnh
- Status changed from closed to reopened
- Resolution fixed deleted
comment:22 Changed 11 years ago by andrew_b
- Votes for changeset committed-master deleted
- Branch state changed from merged to no branch
- Blocked By 3113 added
- Milestone changed from 4.8.11 to 4.8.12
comment:23 follow-up: ↓ 24 Changed 11 years ago by slodki
- Priority changed from minor to trivial
Simplest solution for this error (just print the rest of rar output without interpreting filenames by awk):
--- libexec/mc/extfs.d/urar.org 2013-10-12 02:25:29.000000000 +0200 +++ libexec/mc/extfs.d/urar.my 2013-12-31 15:50:54.000000000 +0100 @@ -49,7 +49,7 @@ else if (index($1, ".") != 0) $1="-rw-r--r--" - printf "%s 1 %s %s %d %02d/%02d/%02d %s ./%s\n", $1, uid, gid, $2, a[2], a[1], a[3], $6, $8 + printf "%s 1 %s %s %d %02d/%02d/%02d %s ./%s\n", $1, uid, gid, $2, a[2], a[1], a[3], $6, substr($0,index($0,$8)) }' }
Testes with mc v4.8.10 and rar v5.01.
comment:24 in reply to: ↑ 23 Changed 11 years ago by dnh
Replying to slodki:
Simplest solution for this error (just print the rest of rar output without interpreting filenames by awk):
The problem is, that 'unrar [vl]' is padding short filenames with spaces at the end! So 'unrar vt' is used as of 4.8.11, see the changeset above in this ticket and the rest of this ticket.
comment:26 Changed 11 years ago by andrew_b
- Status changed from reopened to closed
- Resolution set to fixed
- Branch state changed from no branch to merged
Branch: 3073_urar5_spaces
changeset:c4a546ac01911e10bb79e92f3757cebd8ef6454f