Ticket #1952 (closed defect: fixed)

Opened 15 years ago

Last modified 20 months ago

mc cd foo.tar#utar does not handle POSIX ustar archives, only GNU tar vendor-specific/legacy ones

Reported by: mirabilos Owned by: andrew_b
Priority: major Milestone: 4.8.30
Component: mc-vfs Version: master
Keywords: Cc: miros-discuss@…, zaytsev, mrmazda@…, nerijus@…, szotsaki@…
Blocked By: Blocking: #2201, #4467
Branch state: merged Votes for changeset: committed-master

Description (last modified by andrew_b) (diff)

Hi,
please see http://www.opengroup.org/onlinepubs/9699919799/utilities/pax.html#tag_20_92_13_06
for the specification of the POSIX ustar interchange format.
GNU cpio (-Hustar), paxtar, and GNU tar --format=ustar
all create archives of this format; bsdtar probably does
as well. However, I cannot cd#utar or “Enter” them in
both mc-4.6.1-16 (MirPorts?) and mc_3:4.7.0-1 (Debian sid).
After looking at tar.c I think you only support the legacy
or vendor-specific/proprietary GNU tar archive format.
The new boot floppies of MirBSD as of today are ustar
archives, with the bootsector squeezed into an ustar
header and closely following the standard. Introspection
would be nice.

Attachments

untar.txt (5.7 KB) - added by zaytsev 4 years ago.

Change History

comment:1 Changed 15 years ago by andrew_b

  • Status changed from new to accepted
  • Owner set to andrew_b
  • severity changed from no branch to on review
  • Milestone changed from 4.7 to 4.7.3

Created 1952_branch. Parent branch is master.
changeset:ff37dc26d46f652538c34475fec3f2b9bc9aa536

In this branch, MC uses external TAR program instead of self parsing TAR archives. This branch also fixes #2201.

comment:2 Changed 15 years ago by andrew_b

  • severity changed from on review to on rework

comment:3 Changed 15 years ago by andrew_b

  • severity changed from on rework to on review

Fixed extraction files from TAR archive.

comment:4 Changed 15 years ago by andrew_b

  • severity changed from on review to on rework

There are problems with devices.

comment:5 Changed 15 years ago by andrew_b

  • severity changed from on rework to on review

comment:6 Changed 15 years ago by mirabilos

I’m reading the unidiff… this now looks better, but the various tar
utilities’ output formats *also* differ:

GNU tar

tg@frozenfish:~ $ tar tzvf mksh_39.3.orig.tar.gz
-rw-r--r-- root/wheel 296033 2010-02-25 22:03 mksh-39.3.orig/mksh-R39c.cpio.gz
-rw-r--r-- root/wheel 11840 2010-01-28 16:22 mksh-39.3.orig/printf.c.1.14

paxtar (OpenBSD, MirBSD, maybe others; I have a Debian package):

tg@blau:~ $ tar tzvf mksh_39.3.orig.tar.gz
-rw-r--r-- 1 root wheel 296033 Feb 25 22:03 mksh-39.3.orig/mksh-R39c.cpio.gz
-rw-r--r-- 1 root wheel 11840 Jan 28 16:21 mksh-39.3.orig/printf.c.1.14

bsdtar (libarchive-based; native on FreeBSD, MidnightBSD and others):

mirabilos@stargazer:~ $ tar tzvf mksh_39.3.orig.tar.gz
-rw-r--r-- 0 root wheel 296033 Feb 25 22:03 mksh-39.3.orig/mksh-R39c.cpio.gz
-rw-r--r-- 0 root wheel 11840 Jan 28 16:22 mksh-39.3.orig/printf.c.1.14

There may very well be others, but these three are the most often
used – although, on FreeWRT, we have busybox tar (because one of the
libc functions paxtar uses seems to be broken with µClibc):

root@wlan1:~ # tar tvf mksh_39.3.orig.tar
-rw-r--r-- 0/0 296033 2010-02-25 23:03:39 mksh-39.3.orig/mksh-R39c.cpio.gz
-rw-r--r-- 0/0 11840 2010-01-28 17:22:12 mksh-39.3.orig/printf.c.1.14

And yes, I’m also the maintainer of mc on FreeWRT ;-)

comment:7 Changed 15 years ago by andrew_b

OK, I see.

What we can do?

  • We can parse the output of tar --version and call the according function for each TAR utility (GNU, paxtar, bsdtar, ...) in new utar script.
  • We can support all tar formats in binary (as is currently in MC), but it will enlarge the size of main MC file (for reference, the size of GNU tar binary is more than 200 kB).
  • We can use some 3rd-party library or framework that supports tar archives:
  • Something else
Last edited 8 years ago by andrew_b (previous) (diff)

comment:8 follow-up: ↓ 9 Changed 15 years ago by mirabilos

Only GNU tar supports --long-options.

I see two ways:

Either support all formats (I see two "major" differences with two subtle
subformats each) in the vfs script, or detect which tar /bin/tar is at
configure time (e.g. by checking a minimal tar file, I can produce one
which is 2K in size) and patch the vfs script (using the .in mechanism
would be fine) and hardcode /bin/tar as $TAR.

We could whitelist the four supported output formats (also consider that
gid 0 can be root, wheel, or something else…) and reject unknowns, thus
getting people to send in the actual output THEY get. Locale settings may
be an issue with GNU software (and some other) too.

This would break cross compilation though.

Or we could just try to apply guesswork (for instance, uid/gid or
uid<whitespacespace>gid, and it doesn’t matter whether uid and gid are
numeric or not… just the time/date format is annoying – the ls(1)-like
format is something I loathe to parse, but you can relatively easily
check for it). FWIW:

tg@blau:~ $ tar tzvf /MirOS/dist/mir/mksh/mksh-R24.cpio.gz | head -1
-rw-r--r-- 1 root wheel 125442 Jul 6 2005 mksh/mksh.1

This is the format I see with “old” files.

So I’d all be for the first way – support all of them in the vfs script.
If you want I could have a look at hacking this too; I have access to
Solaris, possibly HP-UX and AIX (if they get the lpar to boot/work again),
so I could test it on relatively many systems. I’d need to be pointed to
a specification of what exact arguments, input and output the vfs scripts
receive and are supposed to output though.

comment:9 in reply to: ↑ 8 Changed 15 years ago by andrew_b

  • Description modified (diff)

Replying to mirabilos:

Only GNU tar supports --long-options.

Long options are not used in recent version of vfs script .

I see two ways:
[skip]
This would break cross compilation though.

Cross compilation wouldn't be broken.

I’d need to be pointed to
a specification of what exact arguments, input and output the vfs scripts
receive and are supposed to output though.

You can found that in MC source tree (lib/vfs/mc-vfs/extfs/README) or in installed MC in you system (/usr/libexec/mc/extfs.d/README or /usr/lib/mc/extfs.d/README).

Thanks!

comment:10 Changed 15 years ago by andrew_b

  • severity changed from on review to on rework

comment:11 Changed 15 years ago by angel_il

  • Milestone changed from 4.7.3 to 4.7.4

comment:12 Changed 15 years ago by andrew_b

  • Blocking 2297 added

comment:13 Changed 14 years ago by andrew_b

  • Blocking 2201 added

comment:14 Changed 14 years ago by zaytsev

  • Cc zaytsev added

There is a re-implementation of tar script in Debian bugzilla:

http://bugs.debian.org/500693

Maybe you can steal something from there.

comment:15 Changed 14 years ago by andrew_b

  • Keywords ustar tar vfs removed
  • Version changed from 4.6.1 to master
  • severity changed from on rework to on review
  • Milestone changed from 4.7.4 to 4.7.5

Branch 1952_tar. Parent: master.
changeset:cae7459699f6a22d63272e66dcfa4eedc017a765

comment:16 Changed 14 years ago by andrew_b

Recent master contains modified VFS layer. Branch 1952_tar has been rebased.
Initial changeset:dbf60df91916ca167270aa06d2cd1c88c0ac3cc7

comment:17 Changed 14 years ago by slavazanko

  • severity changed from on review to on hold
  • Blocked By 3 added

Ticket frozen until ticket:3 unfixed.

comment:18 Changed 14 years ago by slavazanko

  • severity changed from on hold to no branch
  • Branch state set to on hold

comment:19 Changed 14 years ago by andrew_b

  • Milestone changed from 4.7.5 to 4.8

comment:20 Changed 13 years ago by andrew_b

  • Blocking 2201 removed

comment:21 Changed 10 years ago by andrew_b

  • Milestone changed from 4.8 to Future Releases

comment:22 Changed 8 years ago by andrew_b

  • Blocking 2297 removed

comment:23 Changed 5 years ago by mrmazda

I've been extracting mozilla.org's Linux archives for two decades on various Gnu Linux distributions using MC exclusively, in virtually all cases the MC version packaged by the distro.

http://archive.mozilla.org/pub/firefox/releases/68.5.0esr/linux-x86_64/en-US/firefox-68.5.0esr.tar.bz2 2020-02-10 is the last version I was able to do this with successfully.

As of http://archive.mozilla.org/pub/firefox/releases/68.6.0esr/linux-x86_64/en-US/firefox-68.6.0esr.tar.bz2 2020-03-09 the destination has corrupted timestamps, 1970-01-01 for ordinary files, current date/time for directories, using 4.8.24 on Fedora 32, Debian Testing/Bullseye? and openSUSE Tumbleweed.

Same problem with http://archive.mozilla.org/pub/firefox/releases/68.7.0esr/linux-x86_64/en-US/firefox-68.7.0esr.tar.bz2 2020-04-06.

comment:24 Changed 5 years ago by mrmazda

  • Cc mrmazda@… added

comment:25 Changed 5 years ago by nerijus

  • Cc nerijus@… added

comment:26 Changed 4 years ago by zaytsev

  • Blocking 2201 added

Changed 4 years ago by zaytsev

comment:27 Changed 4 years ago by zaytsev

So, Suse people updated the script in mid-2018 and apparently it has been working well for quite some time. Andrew, what's your opinion? Is there a good reason (performance? availability on embedded w/o tar executable?) why we should keep our tar code?

If it makes more sense to keep our code, I wonder if we could steal somewhere a modern and clean implementation from all tar subformats floating around instead having an old unmaintained own implementation which probably was branched from whatever at some point...

comment:28 follow-up: ↓ 29 Changed 4 years ago by andrew_b

File extraction will be too slow (like an uzip). tar doesn't contain a list of files. To extract a file you should walk through archive to find it. To extract next file, you should walk through archive again. Again and again.

In the MC's tar implementation, position of all files are stored while archive reading and then used while file reading/extraction.

I'm working on update of tar -- I'm trying to sync it code with GNU tar one. But, unfortunately, haven't enough time for that. It's not trivial task because MC'tar is GNU tar approx. 25 years ago.

comment:29 in reply to: ↑ 28 Changed 4 years ago by andrew_b

Replying to andrew_b:

tar doesn't contain a list of files.

It couldn't help in any case in the current VFS implementation (see #3).

comment:30 Changed 4 years ago by zaytsev

Oh wow, thank you very much for the explanation. Yes, if you think that it's possible to sync up the code with GNU tar, this would be perfect. Hopefully if done right, later syncs will be much easier. One could also try to steal code from libarchive. No idea if it's any easier and/or better...

comment:31 Changed 4 years ago by szotsaki

  • Cc szotsaki@… added

comment:32 Changed 21 months ago by andrew_b

  • Blocked By 3 removed

comment:33 Changed 21 months ago by andrew_b

  • Branch state changed from on hold to on review
  • Milestone changed from Future Releases to 4.8.30

Now tar of MC supports various extended headers (including long file names and sparse files). The implementation is taken from GNU tar. Please test.

Branch: 1952_tar
Initial changeset:78a25f78009b9cdc0ff8842a9e9899a70fda1323

comment:34 Changed 21 months ago by zaytsev

This is awesome work! I wonder if the code can be organised somehow such that updates from GNU tar will be easier in the future by checking the diff and just stealing the code...

comment:35 Changed 20 months ago by andrew_b

  • Votes for changeset set to andrew_b
  • Branch state changed from on review to approved

comment:36 Changed 20 months ago by andrew_b

  • Status changed from accepted to testing
  • Votes for changeset changed from andrew_b to committed-master
  • Resolution set to fixed
  • Branch state changed from approved to merged

Merged to master: [e5911c1ef5499acadfed3cbc6ea0913d46ce8ae9].

git log --pretty=oneline 86a9e0be2..e5911c1ef

comment:37 Changed 20 months ago by andrew_b

  • Status changed from testing to closed

comment:38 Changed 20 months ago by ukr

  • Blocking 4467 added

comment:39 Changed 20 months ago by andrew_b

  • Status changed from closed to reopened
  • Resolution fixed deleted

comment:40 Changed 20 months ago by andrew_b

  • Votes for changeset committed-master deleted
  • Branch state changed from merged to on review

Timestamps in tar archive are shown as "Jan 1, 1970".

Branch: 195_tar_timestamp
changeset:c9169c0aa8c162ce6b5fd15636753865f9c3f844

comment:41 Changed 20 months ago by andrew_b

  • Votes for changeset set to andrew_b
  • Branch state changed from on review to approved

comment:42 Changed 20 months ago by andrew_b

  • Status changed from reopened to closed
  • Votes for changeset changed from andrew_b to committed-master
  • Resolution set to fixed
  • Branch state changed from approved to merged
Note: See TracTickets for help on using tickets.