Ticket #3073 (closed defect: fixed)

Opened 11 years ago

Last modified 11 years ago

[urar helper] fix filenames with spaces handling with unrar v5

Reported by: post-factum Owned by: andrew_b
Priority: trivial Milestone: 4.8.12
Component: mc-vfs Version: master
Keywords: urar unrar Cc: dnh@…
Blocked By: Blocking:
Branch state: merged Votes for changeset:

Description

unrar v5 helper script doesn't print filenames with spaces. The following pull request fixes that.

https://github.com/MidnightCommander/mc/pull/28

Attachments

test.rar (237 bytes) - added by andrew_b 11 years ago.
mc-extfs-helpers-urar.patch (1.6 KB) - added by dnh 11 years ago.
fix to add trailing seps[*] stuff
mc-extfs-helpers-urar.2.patch (2.3 KB) - added by dnh 11 years ago.
Reimplementation of mcrar5fs_list in vfs/extfs/helpers/urar using 'unrar vt'
urar.in.patch (350 bytes) - added by dnh 11 years ago.
*argh* missed a '' in a regex ... Should not be critical, but please add …

Change History

comment:1 Changed 11 years ago by andrew_b

  • Owner set to andrew_b
  • Status changed from new to accepted
  • Votes for changeset set to andrew_b
  • Branch state changed from no branch to on review
  • Milestone changed from Future Releases to 4.8.11

comment:2 Changed 11 years ago by slavazanko

  • Votes for changeset changed from andrew_b to andrew_b slavazanko
  • Branch state changed from on review to approved

comment:3 Changed 11 years ago by andrew_b

  • Status changed from accepted to testing
  • Votes for changeset changed from andrew_b slavazanko to committed-master
  • Resolution set to fixed
  • Branch state changed from approved to merged

comment:4 Changed 11 years ago by andrew_b

  • Status changed from testing to closed

comment:5 follow-up: ↓ 6 Changed 11 years ago by Hamer13

  • Status changed from closed to reopened
  • Resolution fixed deleted

I think this is not so good solution.
The filenames can have more than one whitespaces in one chunk. For example: "File_name__with_spaces". "_" means whitespace. Awk will squeeze this double (treble and so on) whitespaces into one whitespace.
I propose this patch:

$ diff -ub /usr/lib/mc/extfs.d/urar.orig /usr/lib/mc/extfs.d/urar
--- /usr/lib/mc/extfs.d/urar.orig       2013-09-17 12:19:11.000000000 +0300
+++ /usr/lib/mc/extfs.d/urar    2013-10-26 15:22:22.000000000 +0300
@@ -43,13 +43,14 @@
 BEGIN { flag=0 }
 /^-----------/ { flag++; if (flag > 1) exit 0; next }
 flag==1 {
+    str=substr($0, 65)
     split($5, a, "-")
     if (index($1, "D") != 0)
        $1="drwxr-xr-x"
     else
        if (index($1, ".") != 0)
            $1="-rw-r--r--"
-    printf "%s 1 %s %s %d %02d/%02d/%02d %s ./%s\n", $1, uid, gid, $2, a[2], a[1], a[3], $6, $8
+    printf "%s 1 %s %s %d %02d/%02d/%02d %s ./%s\n", $1, uid, gid, $2, a[2], a[1], a[3], $6, str
 }'
 }

comment:6 in reply to: ↑ 5 ; follow-up: ↓ 7 Changed 11 years ago by andrew_b

Replying to Hamer13:

I think this is not so good solution.
The filenames can have more than one whitespaces in one chunk.

Agree.

+    str=substr($0, 65)

Why 65?

comment:7 in reply to: ↑ 6 Changed 11 years ago by andrew_b

Replying to andrew_b:

Why 65?

Ah, sorry... No question.

I'm not sure that width of 'Size' and 'Packet' fields is constant always. Can it be wider for huge files?

comment:8 Changed 11 years ago by andrew_b

  • Votes for changeset changed from committed-master to andrew_b
  • Branch state changed from merged to on review

comment:9 Changed 11 years ago by andrew_b

How about this patch?

nf = split($0, fields, " ", seps);

It's not portable. seps a) is a gawk extension and b) gawk >= 4.0.0. is required.

Last edited 11 years ago by andrew_b (previous) (diff)

comment:10 Changed 11 years ago by dnh

  • Cc dnh@… added

*ARGH* *darn* Too good to be portable ... Thanks a bundle andrew_b. Didn't
notice that I need gawk >= 4.0.0! And no wonder I did not know that feature
before, but it came in just too handy ;) Maybe "keep in in mind" though ;)

I still dislike indexing by the CRC field though, just on general principle.

I'll ponder programming around that and submit if I come up with anything
more useful, but for the time being, I guess that indexing by the CRC-field
should work. For openSUSE I'll just require gawk >= 4.0.0 and keep (adjust)
my patch, I think it's just somewhat more robust than the indexing ...

PS: I'm one of the maintainers of the mc package for openSUSE

comment:11 Changed 11 years ago by slavazanko

Dnh, we have to maintain lot of the things related to backward compatibility (such as: support of OpenWrt-distros, BusyBox-toolset and so on). Would be great to find a way to split few first parameters and to stay 'filename' parameter 'as is' with spaces. Your patch looks pretty good, but as Andrew say, your way will have a restricted use cases.

Probably, we should check a version of gawk in configure.ac script and next we may set up some flag to switch between algorithms in the VFS-helper...

Version 2, edited 11 years ago by slavazanko (previous) (next) (diff)

comment:12 Changed 11 years ago by dnh

Once you think about the change it was easy to replace the four-arg split. The only problem was getting the seps[i+1] right.

Changed 11 years ago by andrew_b

comment:13 Changed 11 years ago by andrew_b

I've attached the test archive with the following content

testfile1
<space>testfile2
testfile3<space>
test<space>file<3 spaces>4

<spase> above is the mark of space in the file name.

This patch has some trouble with "testfile3 " file (name is finished with space). Trailing space is lost. Other names are processed correctly.

comment:14 follow-up: ↓ 15 Changed 11 years ago by dnh

Yes, some oSuSE user (not sure if he wants to be mentioned here) already
noticed and notified me, even though he suggested a slightly different
solution (saving n-seps, checking that for greater nameparts, and add
'seps[n-seps]' if true ;). I thought about it quite a bit before getting up
last night ;) I'll ask him if and how he wants to be credited here for the
patch if "you" deem it better than the indexing by the CRC field version
currently used and integrate it into coming versions (He found the problem
independently of this existing bug (ticket) here). He suggested solutions, I
rewrote/improved those (significantly), so it's quite a shared credit if any!
I have a pending mail with him on the topic and will ask him about it (and
he's to go first in the mention if...)) (and will divulge his mail address to
the maintainers so they can ask themselves, just not as publicly as here).

I'll update the attached patch. My solution is to add a 'name=name seps[i]'
after the name-concatenating loop. If it's "filled", trailing seps get added,
if there are none, awk autovivifies the array-variable to the empty string, so
no extra checking is needed. "Blindly" concatenating seps[i] (and 'i' was
in-loop incremented as required, the 'i++' is executed "post-loop" (I checked
albeit only with gawk this time ;)) 'seps[i]' is just the last part of any
trailing seps characters.

This still get's torpedoed for names less than 13 chars length by the
following pruning of trailing spaces (which I have not checked in unrar's
sources if unrar5 still adds those flaky padding spaces sometimes. I just took
that "as true" from the existing urar helper/patch. IIRC. If that's wrong I'm
the first to say to leave that out ;) Maybe it's better to just "pass on"
that IMO bug of unrar to our users and document it. I've no idea. I just found
it in the existing "logic" and I tried not to change that (it's not in 4.8.10, but in the patch I started from). So, if in doubt, just prune those lines

+ ### remove padding blanks from short names
+ if (length(name)<13) {
+ sub(" *$", "", name);
+ }

from the patch.

BTW: shouldn't we gather some "testing" .rar (and other archive) files in the
"tests" directory? I've got a couple more ;) You rarely need more than
"touched" files, so archive size is basically archive header size + length of
included filenames, a couple of KB maybe ... Maybe create one really "sicko"
archive and pack that differently (of course, not all archivers may handle all
filenames, e.g. how does .zip handle '::' ?). We could even distribute that "sicko" archive as it's useful when your developing e.g. shell scripts :)

Sorry this got so long ...

Changed 11 years ago by dnh

fix to add trailing seps[*] stuff

comment:15 in reply to: ↑ 14 ; follow-up: ↓ 16 Changed 11 years ago by andrew_b

Indeed, in output if unrar v or unrar l there are padding spaces after file name.

mprintf(L"%-12ls",Name);

They cannot be removed correctly. If file name finished with space(s) we get corrupted file name at the output.

But output of unrar vt contains correct file names w/o any padding:

$ unrar vt test.rar              

UNRAR 5.00 freeware      Copyright (c) 1993-2013 Alexander Roshal

Archive: test.rar
Details: RAR 4

        Name: testfile1
        Type: File
        Size: 0
 Packed size: 8
       Ratio: 0%
       mtime: 2013-11-25 13:04,000
  Attributes: -rw-r--r--
       CRC32: 00000000
     Host OS: Unix
 Compression: RAR 3.0(v29) -m3 -md=128K

        Name:  testfile2
        Type: File
        Size: 0
 Packed size: 8
       Ratio: 0%
       mtime: 2013-11-25 13:04,000
  Attributes: -rw-r--r--
       CRC32: 00000000
     Host OS: Unix
 Compression: RAR 3.0(v29) -m3 -md=128K

        Name: testfile3 
        Type: File
        Size: 0
 Packed size: 8
       Ratio: 0%
       mtime: 2013-11-25 13:04,000
  Attributes: -rw-r--r--
       CRC32: 00000000
     Host OS: Unix
 Compression: RAR 3.0(v29) -m3 -md=128K

        Name: test file   4
        Type: File
        Size: 0
 Packed size: 8
       Ratio: 0%
       mtime: 2013-11-25 13:04,000
  Attributes: -rw-r--r--
       CRC32: 00000000
     Host OS: Unix
 Compression: RAR 3.0(v29) -m3 -md=128K

I think, we can try to parse this output. We need only following three lines:

        Name: testfile1
        Size: 0
       mtime: 2013-11-25 13:04,000
  Attributes: -rw-r--r--

comment:16 in reply to: ↑ 15 Changed 11 years ago by dnh

Replying to andrew_b:

Indeed, in output if unrar v or unrar l there are padding spaces after file name.

mprintf(L"%-12ls",Name);

They cannot be removed correctly. If file name finished with space(s) we get corrupted file name at the output.

Exactly. And short filenames with trailing spaces are probably more seldom than
short ones without, so stripping spaces is/was the correct approach IMO. Upstream
bug. And perl's Archive::Rar is also no solution as it uses the external 'rar'.

But output of unrar vt contains correct file names w/o any padding:

..

I think, we can try to parse this output. We need only following three lines:

        Name: testfile1
        Size: 0
       mtime: 2013-11-25 13:04,000
  Attributes: -rw-r--r--

Got a seemingly working first draft already. At least working with "my" test archive containing cases mentionend in this ticket. Even though I wrote it pretty tired I'll attach the new patch for you all to test on whatever "weird" archives you have and for portability issues.

Changed 11 years ago by dnh

Reimplementation of mcrar5fs_list in vfs/extfs/helpers/urar using 'unrar vt'

comment:17 Changed 11 years ago by andrew_b

Great!

Branch: 3073_urar5_spaces (rebased)
changeset:6d9f09b81e2a2f9c04b929ed8075ac7d8e6c7063

comment:18 Changed 11 years ago by slavazanko

  • Votes for changeset changed from andrew_b to andrew_b slavazanko
  • Branch state changed from on review to approved

comment:19 Changed 11 years ago by andrew_b

  • Status changed from reopened to closed
  • Resolution set to fixed

comment:20 Changed 11 years ago by andrew_b

  • Votes for changeset changed from andrew_b slavazanko to committed-master
  • Branch state changed from approved to merged

Changed 11 years ago by dnh

*argh* missed a '' in a regex ... Should not be critical, but please add ...

comment:21 Changed 11 years ago by dnh

  • Status changed from closed to reopened
  • Resolution fixed deleted

comment:22 Changed 11 years ago by andrew_b

  • Votes for changeset committed-master deleted
  • Branch state changed from merged to no branch
  • Blocked By 3113 added
  • Milestone changed from 4.8.11 to 4.8.12

comment:23 follow-up: ↓ 24 Changed 11 years ago by slodki

  • Priority changed from minor to trivial

Simplest solution for this error (just print the rest of rar output without interpreting filenames by awk):

--- libexec/mc/extfs.d/urar.org 2013-10-12 02:25:29.000000000 +0200
+++ libexec/mc/extfs.d/urar.my  2013-12-31 15:50:54.000000000 +0100
@@ -49,7 +49,7 @@
     else
        if (index($1, ".") != 0)
            $1="-rw-r--r--"
-    printf "%s 1 %s %s %d %02d/%02d/%02d %s ./%s\n", $1, uid, gid, $2, a[2], a[1], a[3], $6, $8
+    printf "%s 1 %s %s %d %02d/%02d/%02d %s ./%s\n", $1, uid, gid, $2, a[2], a[1], a[3], $6, substr($0,index($0,$8))
 }'
 }

Testes with mc v4.8.10 and rar v5.01.

comment:24 in reply to: ↑ 23 Changed 11 years ago by dnh

Replying to slodki:

Simplest solution for this error (just print the rest of rar output without interpreting filenames by awk):

The problem is, that 'unrar [vl]' is padding short filenames with spaces at the end! So 'unrar vt' is used as of 4.8.11, see the changeset above in this ticket and the rest of this ticket.

comment:25 Changed 11 years ago by andrew_b

  • Blocked By 3113 removed

comment:26 Changed 11 years ago by andrew_b

  • Status changed from reopened to closed
  • Resolution set to fixed
  • Branch state changed from no branch to merged
Note: See TracTickets for help on using tickets.