Ticket #1952 (accepted defect)

Opened 11 years ago

Last modified 4 weeks ago

mc cd foo.tar#utar does not handle POSIX ustar archives, only GNU tar vendor-specific/legacy ones

Reported by: mirabilos Owned by: andrew_b
Priority: major Milestone: Future Releases
Component: mc-vfs Version: master
Keywords: Cc: miros-discuss@…, zaytsev, mrmazda@…, nerijus@…
Blocked By: #3 Blocking: #2201
Branch state: on hold Votes for changeset:

Description (last modified by andrew_b) (diff)

Hi,
please see http://www.opengroup.org/onlinepubs/9699919799/utilities/pax.html#tag_20_92_13_06
for the specification of the POSIX ustar interchange format.
GNU cpio (-Hustar), paxtar, and GNU tar --format=ustar
all create archives of this format; bsdtar probably does
as well. However, I cannot cd#utar or “Enter” them in
both mc-4.6.1-16 (MirPorts?) and mc_3:4.7.0-1 (Debian sid).
After looking at tar.c I think you only support the legacy
or vendor-specific/proprietary GNU tar archive format.
The new boot floppies of MirBSD as of today are ustar
archives, with the bootsector squeezed into an ustar
header and closely following the standard. Introspection
would be nice.

Attachments

untar.txt (5.7 KB) - added by zaytsev 4 weeks ago.

Change History

comment:1 Changed 10 years ago by andrew_b

  • Status changed from new to accepted
  • Owner set to andrew_b
  • severity changed from no branch to on review
  • Milestone changed from 4.7 to 4.7.3

Created 1952_branch. Parent branch is master.
changeset:ff37dc26d46f652538c34475fec3f2b9bc9aa536

In this branch, MC uses external TAR program instead of self parsing TAR archives. This branch also fixes #2201.

comment:2 Changed 10 years ago by andrew_b

  • severity changed from on review to on rework

comment:3 Changed 10 years ago by andrew_b

  • severity changed from on rework to on review

Fixed extraction files from TAR archive.

comment:4 Changed 10 years ago by andrew_b

  • severity changed from on review to on rework

There are problems with devices.

comment:5 Changed 10 years ago by andrew_b

  • severity changed from on rework to on review

comment:6 Changed 10 years ago by mirabilos

I’m reading the unidiff… this now looks better, but the various tar
utilities’ output formats *also* differ:

GNU tar

tg@frozenfish:~ $ tar tzvf mksh_39.3.orig.tar.gz
-rw-r--r-- root/wheel 296033 2010-02-25 22:03 mksh-39.3.orig/mksh-R39c.cpio.gz
-rw-r--r-- root/wheel 11840 2010-01-28 16:22 mksh-39.3.orig/printf.c.1.14

paxtar (OpenBSD, MirBSD, maybe others; I have a Debian package):

tg@blau:~ $ tar tzvf mksh_39.3.orig.tar.gz
-rw-r--r-- 1 root wheel 296033 Feb 25 22:03 mksh-39.3.orig/mksh-R39c.cpio.gz
-rw-r--r-- 1 root wheel 11840 Jan 28 16:21 mksh-39.3.orig/printf.c.1.14

bsdtar (libarchive-based; native on FreeBSD, MidnightBSD and others):

mirabilos@stargazer:~ $ tar tzvf mksh_39.3.orig.tar.gz
-rw-r--r-- 0 root wheel 296033 Feb 25 22:03 mksh-39.3.orig/mksh-R39c.cpio.gz
-rw-r--r-- 0 root wheel 11840 Jan 28 16:22 mksh-39.3.orig/printf.c.1.14

There may very well be others, but these three are the most often
used – although, on FreeWRT, we have busybox tar (because one of the
libc functions paxtar uses seems to be broken with µClibc):

root@wlan1:~ # tar tvf mksh_39.3.orig.tar
-rw-r--r-- 0/0 296033 2010-02-25 23:03:39 mksh-39.3.orig/mksh-R39c.cpio.gz
-rw-r--r-- 0/0 11840 2010-01-28 17:22:12 mksh-39.3.orig/printf.c.1.14

And yes, I’m also the maintainer of mc on FreeWRT ;-)

comment:7 Changed 10 years ago by andrew_b

OK, I see.

What we can do?

  • We can parse the output of tar --version and call the according function for each TAR utility (GNU, paxtar, bsdtar, ...) in new utar script.
  • We can support all tar formats in binary (as is currently in MC), but it will enlarge the size of main MC file (for reference, the size of GNU tar binary is more than 200 kB).
  • We can use some 3rd-party library or framework that supports tar archives:
  • Something else
Version 0, edited 10 years ago by andrew_b (next)

comment:8 follow-up: ↓ 9 Changed 10 years ago by mirabilos

Only GNU tar supports --long-options.

I see two ways:

Either support all formats (I see two "major" differences with two subtle
subformats each) in the vfs script, or detect which tar /bin/tar is at
configure time (e.g. by checking a minimal tar file, I can produce one
which is 2K in size) and patch the vfs script (using the .in mechanism
would be fine) and hardcode /bin/tar as $TAR.

We could whitelist the four supported output formats (also consider that
gid 0 can be root, wheel, or something else…) and reject unknowns, thus
getting people to send in the actual output THEY get. Locale settings may
be an issue with GNU software (and some other) too.

This would break cross compilation though.

Or we could just try to apply guesswork (for instance, uid/gid or
uid<whitespacespace>gid, and it doesn’t matter whether uid and gid are
numeric or not… just the time/date format is annoying – the ls(1)-like
format is something I loathe to parse, but you can relatively easily
check for it). FWIW:

tg@blau:~ $ tar tzvf /MirOS/dist/mir/mksh/mksh-R24.cpio.gz | head -1
-rw-r--r-- 1 root wheel 125442 Jul 6 2005 mksh/mksh.1

This is the format I see with “old” files.

So I’d all be for the first way – support all of them in the vfs script.
If you want I could have a look at hacking this too; I have access to
Solaris, possibly HP-UX and AIX (if they get the lpar to boot/work again),
so I could test it on relatively many systems. I’d need to be pointed to
a specification of what exact arguments, input and output the vfs scripts
receive and are supposed to output though.

comment:9 in reply to: ↑ 8 Changed 10 years ago by andrew_b

  • Description modified (diff)

Replying to mirabilos:

Only GNU tar supports --long-options.

Long options are not used in recent version of vfs script .

I see two ways:
[skip]
This would break cross compilation though.

Cross compilation wouldn't be broken.

I’d need to be pointed to
a specification of what exact arguments, input and output the vfs scripts
receive and are supposed to output though.

You can found that in MC source tree (lib/vfs/mc-vfs/extfs/README) or in installed MC in you system (/usr/libexec/mc/extfs.d/README or /usr/lib/mc/extfs.d/README).

Thanks!

comment:10 Changed 10 years ago by andrew_b

  • severity changed from on review to on rework

comment:11 Changed 10 years ago by angel_il

  • Milestone changed from 4.7.3 to 4.7.4

comment:12 Changed 10 years ago by andrew_b

  • Blocking 2297 added

comment:13 Changed 10 years ago by andrew_b

  • Blocking 2201 added

comment:14 Changed 10 years ago by zaytsev

  • Cc zaytsev added

There is a re-implementation of tar script in Debian bugzilla:

http://bugs.debian.org/500693

Maybe you can steal something from there.

comment:15 Changed 10 years ago by andrew_b

  • Keywords ustar tar vfs removed
  • Version changed from 4.6.1 to master
  • severity changed from on rework to on review
  • Milestone changed from 4.7.4 to 4.7.5

Branch 1952_tar. Parent: master.
changeset:cae7459699f6a22d63272e66dcfa4eedc017a765

comment:16 Changed 10 years ago by andrew_b

Recent master contains modified VFS layer. Branch 1952_tar has been rebased.
Initial changeset:dbf60df91916ca167270aa06d2cd1c88c0ac3cc7

comment:17 Changed 10 years ago by slavazanko

  • severity changed from on review to on hold
  • Blocked By 3 added

Ticket frozen until ticket:3 unfixed.

comment:18 Changed 9 years ago by slavazanko

  • severity changed from on hold to no branch
  • Branch state set to on hold

comment:19 Changed 9 years ago by andrew_b

  • Milestone changed from 4.7.5 to 4.8

comment:20 Changed 9 years ago by andrew_b

  • Blocking 2201 removed

comment:21 Changed 5 years ago by andrew_b

  • Milestone changed from 4.8 to Future Releases

comment:22 Changed 4 years ago by andrew_b

  • Blocking 2297 removed

comment:23 Changed 7 months ago by mrmazda

I've been extracting mozilla.org's Linux archives for two decades on various Gnu Linux distributions using MC exclusively, in virtually all cases the MC version packaged by the distro.

http://archive.mozilla.org/pub/firefox/releases/68.5.0esr/linux-x86_64/en-US/firefox-68.5.0esr.tar.bz2 2020-02-10 is the last version I was able to do this with successfully.

As of http://archive.mozilla.org/pub/firefox/releases/68.6.0esr/linux-x86_64/en-US/firefox-68.6.0esr.tar.bz2 2020-03-09 the destination has corrupted timestamps, 1970-01-01 for ordinary files, current date/time for directories, using 4.8.24 on Fedora 32, Debian Testing/Bullseye? and openSUSE Tumbleweed.

Same problem with http://archive.mozilla.org/pub/firefox/releases/68.7.0esr/linux-x86_64/en-US/firefox-68.7.0esr.tar.bz2 2020-04-06.

comment:24 Changed 7 months ago by mrmazda

  • Cc mrmazda@… added

comment:25 Changed 4 months ago by nerijus

  • Cc nerijus@… added

comment:26 Changed 4 weeks ago by zaytsev

  • Blocking 2201 added

Changed 4 weeks ago by zaytsev

comment:27 Changed 4 weeks ago by zaytsev

So, Suse people updated the script in mid-2018 and apparently it has been working well for quite some time. Andrew, what's your opinion? Is there a good reason (performance? availability on embedded w/o tar executable?) why we should keep our tar code?

If it makes more sense to keep our code, I wonder if we could steal somewhere a modern and clean implementation from all tar subformats floating around instead having an old unmaintained own implementation which probably was branched from whatever at some point...

comment:28 follow-up: ↓ 29 Changed 4 weeks ago by andrew_b

File extraction will be too slow (like an uzip). tar doesn't contain a list of files. To extract a file you should walk through archive to find it. To extract next file, you should walk through archive again. Again and again.

In the MC's tar implementation, position of all files are stored while archive reading and then used while file reading/extraction.

I'm working on update of tar -- I'm trying to sync it code with GNU tar one. But, unfortunately, haven't enough time for that. It's not trivial task because MC'tar is GNU tar approx. 25 years ago.

comment:29 in reply to: ↑ 28 Changed 4 weeks ago by andrew_b

Replying to andrew_b:

tar doesn't contain a list of files.

It couldn't help in any case in the current VFS implementation (see #3).

comment:30 Changed 4 weeks ago by zaytsev

Oh wow, thank you very much for the explanation. Yes, if you think that it's possible to sync up the code with GNU tar, this would be perfect. Hopefully if done right, later syncs will be much easier. One could also try to steal code from libarchive. No idea if it's any easier and/or better...

Note: See TracTickets for help on using tickets.