Ticket #1952 (closed defect: fixed)
mc cd foo.tar#utar does not handle POSIX ustar archives, only GNU tar vendor-specific/legacy ones
Reported by: | mirabilos | Owned by: | andrew_b |
---|---|---|---|
Priority: | major | Milestone: | 4.8.30 |
Component: | mc-vfs | Version: | master |
Keywords: | Cc: | miros-discuss@…, zaytsev, mrmazda@…, nerijus@…, szotsaki@… | |
Blocked By: | Blocking: | #2201, #4467 | |
Branch state: | merged | Votes for changeset: | committed-master |
Description (last modified by andrew_b) (diff)
Hi,
please see http://www.opengroup.org/onlinepubs/9699919799/utilities/pax.html#tag_20_92_13_06
for the specification of the POSIX ustar interchange format.
GNU cpio (-Hustar), paxtar, and GNU tar --format=ustar
all create archives of this format; bsdtar probably does
as well. However, I cannot cd#utar or “Enter” them in
both mc-4.6.1-16 (MirPorts?) and mc_3:4.7.0-1 (Debian sid).
After looking at tar.c I think you only support the legacy
or vendor-specific/proprietary GNU tar archive format.
The new boot floppies of MirBSD as of today are ustar
archives, with the bootsector squeezed into an ustar
header and closely following the standard. Introspection
would be nice.
Attachments
Change History
comment:1 Changed 14 years ago by andrew_b
- Status changed from new to accepted
- Owner set to andrew_b
- severity changed from no branch to on review
- Milestone changed from 4.7 to 4.7.3
comment:3 Changed 14 years ago by andrew_b
- severity changed from on rework to on review
Fixed extraction files from TAR archive.
comment:4 Changed 14 years ago by andrew_b
- severity changed from on review to on rework
There are problems with devices.
comment:5 Changed 14 years ago by andrew_b
- severity changed from on rework to on review
I hope that's all. :)
comment:6 Changed 14 years ago by mirabilos
I’m reading the unidiff… this now looks better, but the various tar
utilities’ output formats *also* differ:
GNU tar
tg@frozenfish:~ $ tar tzvf mksh_39.3.orig.tar.gz
-rw-r--r-- root/wheel 296033 2010-02-25 22:03 mksh-39.3.orig/mksh-R39c.cpio.gz
-rw-r--r-- root/wheel 11840 2010-01-28 16:22 mksh-39.3.orig/printf.c.1.14
paxtar (OpenBSD, MirBSD, maybe others; I have a Debian package):
tg@blau:~ $ tar tzvf mksh_39.3.orig.tar.gz
-rw-r--r-- 1 root wheel 296033 Feb 25 22:03 mksh-39.3.orig/mksh-R39c.cpio.gz
-rw-r--r-- 1 root wheel 11840 Jan 28 16:21 mksh-39.3.orig/printf.c.1.14
bsdtar (libarchive-based; native on FreeBSD, MidnightBSD and others):
mirabilos@stargazer:~ $ tar tzvf mksh_39.3.orig.tar.gz
-rw-r--r-- 0 root wheel 296033 Feb 25 22:03 mksh-39.3.orig/mksh-R39c.cpio.gz
-rw-r--r-- 0 root wheel 11840 Jan 28 16:22 mksh-39.3.orig/printf.c.1.14
There may very well be others, but these three are the most often
used – although, on FreeWRT, we have busybox tar (because one of the
libc functions paxtar uses seems to be broken with µClibc):
root@wlan1:~ # tar tvf mksh_39.3.orig.tar
-rw-r--r-- 0/0 296033 2010-02-25 23:03:39 mksh-39.3.orig/mksh-R39c.cpio.gz
-rw-r--r-- 0/0 11840 2010-01-28 17:22:12 mksh-39.3.orig/printf.c.1.14
And yes, I’m also the maintainer of mc on FreeWRT ;-)
comment:7 Changed 14 years ago by andrew_b
OK, I see.
What we can do?
- We can parse the output of tar --version and call the according function for each TAR utility (GNU, paxtar, bsdtar, ...) in new utar script.
- We can support all tar formats in binary (as is currently in MC), but it will enlarge the size of main MC file (for reference, the size of GNU tar binary is more than 200 kB).
- We can use some 3rd-party library or framework that supports tar archives:
- Something else
comment:8 follow-up: ↓ 9 Changed 14 years ago by mirabilos
Only GNU tar supports --long-options.
I see two ways:
Either support all formats (I see two "major" differences with two subtle
subformats each) in the vfs script, or detect which tar /bin/tar is at
configure time (e.g. by checking a minimal tar file, I can produce one
which is 2K in size) and patch the vfs script (using the .in mechanism
would be fine) and hardcode /bin/tar as $TAR.
We could whitelist the four supported output formats (also consider that
gid 0 can be root, wheel, or something else…) and reject unknowns, thus
getting people to send in the actual output THEY get. Locale settings may
be an issue with GNU software (and some other) too.
This would break cross compilation though.
Or we could just try to apply guesswork (for instance, uid/gid or
uid<whitespacespace>gid, and it doesn’t matter whether uid and gid are
numeric or not… just the time/date format is annoying – the ls(1)-like
format is something I loathe to parse, but you can relatively easily
check for it). FWIW:
tg@blau:~ $ tar tzvf /MirOS/dist/mir/mksh/mksh-R24.cpio.gz | head -1
-rw-r--r-- 1 root wheel 125442 Jul 6 2005 mksh/mksh.1
This is the format I see with “old” files.
So I’d all be for the first way – support all of them in the vfs script.
If you want I could have a look at hacking this too; I have access to
Solaris, possibly HP-UX and AIX (if they get the lpar to boot/work again),
so I could test it on relatively many systems. I’d need to be pointed to
a specification of what exact arguments, input and output the vfs scripts
receive and are supposed to output though.
comment:9 in reply to: ↑ 8 Changed 14 years ago by andrew_b
- Description modified (diff)
Replying to mirabilos:
Only GNU tar supports --long-options.
Long options are not used in recent version of vfs script .
I see two ways:
[skip]
This would break cross compilation though.
Cross compilation wouldn't be broken.
I’d need to be pointed to
a specification of what exact arguments, input and output the vfs scripts
receive and are supposed to output though.
You can found that in MC source tree (lib/vfs/mc-vfs/extfs/README) or in installed MC in you system (/usr/libexec/mc/extfs.d/README or /usr/lib/mc/extfs.d/README).
Thanks!
comment:14 Changed 14 years ago by zaytsev
- Cc zaytsev added
There is a re-implementation of tar script in Debian bugzilla:
http://bugs.debian.org/500693
Maybe you can steal something from there.
comment:15 Changed 14 years ago by andrew_b
- Keywords ustar tar vfs removed
- Version changed from 4.6.1 to master
- severity changed from on rework to on review
- Milestone changed from 4.7.4 to 4.7.5
Branch 1952_tar. Parent: master.
changeset:cae7459699f6a22d63272e66dcfa4eedc017a765
comment:16 Changed 14 years ago by andrew_b
Recent master contains modified VFS layer. Branch 1952_tar has been rebased.
Initial changeset:dbf60df91916ca167270aa06d2cd1c88c0ac3cc7
comment:17 Changed 14 years ago by slavazanko
- severity changed from on review to on hold
- Blocked By 3 added
Ticket frozen until ticket:3 unfixed.
comment:18 Changed 13 years ago by slavazanko
- severity changed from on hold to no branch
- Branch state set to on hold
comment:23 Changed 5 years ago by mrmazda
I've been extracting mozilla.org's Linux archives for two decades on various Gnu Linux distributions using MC exclusively, in virtually all cases the MC version packaged by the distro.
http://archive.mozilla.org/pub/firefox/releases/68.5.0esr/linux-x86_64/en-US/firefox-68.5.0esr.tar.bz2 2020-02-10 is the last version I was able to do this with successfully.
As of http://archive.mozilla.org/pub/firefox/releases/68.6.0esr/linux-x86_64/en-US/firefox-68.6.0esr.tar.bz2 2020-03-09 the destination has corrupted timestamps, 1970-01-01 for ordinary files, current date/time for directories, using 4.8.24 on Fedora 32, Debian Testing/Bullseye? and openSUSE Tumbleweed.
Same problem with http://archive.mozilla.org/pub/firefox/releases/68.7.0esr/linux-x86_64/en-US/firefox-68.7.0esr.tar.bz2 2020-04-06.
comment:27 Changed 4 years ago by zaytsev
So, Suse people updated the script in mid-2018 and apparently it has been working well for quite some time. Andrew, what's your opinion? Is there a good reason (performance? availability on embedded w/o tar executable?) why we should keep our tar code?
If it makes more sense to keep our code, I wonder if we could steal somewhere a modern and clean implementation from all tar subformats floating around instead having an old unmaintained own implementation which probably was branched from whatever at some point...
comment:28 follow-up: ↓ 29 Changed 4 years ago by andrew_b
File extraction will be too slow (like an uzip). tar doesn't contain a list of files. To extract a file you should walk through archive to find it. To extract next file, you should walk through archive again. Again and again.
In the MC's tar implementation, position of all files are stored while archive reading and then used while file reading/extraction.
I'm working on update of tar -- I'm trying to sync it code with GNU tar one. But, unfortunately, haven't enough time for that. It's not trivial task because MC'tar is GNU tar approx. 25 years ago.
comment:29 in reply to: ↑ 28 Changed 4 years ago by andrew_b
comment:30 Changed 4 years ago by zaytsev
Oh wow, thank you very much for the explanation. Yes, if you think that it's possible to sync up the code with GNU tar, this would be perfect. Hopefully if done right, later syncs will be much easier. One could also try to steal code from libarchive. No idea if it's any easier and/or better...
comment:33 Changed 17 months ago by andrew_b
- Branch state changed from on hold to on review
- Milestone changed from Future Releases to 4.8.30
Now tar of MC supports various extended headers (including long file names and sparse files). The implementation is taken from GNU tar. Please test.
Branch: 1952_tar
Initial changeset:78a25f78009b9cdc0ff8842a9e9899a70fda1323
comment:34 Changed 17 months ago by zaytsev
This is awesome work! I wonder if the code can be organised somehow such that updates from GNU tar will be easier in the future by checking the diff and just stealing the code...
comment:35 Changed 17 months ago by andrew_b
- Votes for changeset set to andrew_b
- Branch state changed from on review to approved
comment:36 Changed 17 months ago by andrew_b
- Status changed from accepted to testing
- Votes for changeset changed from andrew_b to committed-master
- Resolution set to fixed
- Branch state changed from approved to merged
Merged to master: [e5911c1ef5499acadfed3cbc6ea0913d46ce8ae9].
git log --pretty=oneline 86a9e0be2..e5911c1ef
comment:39 Changed 17 months ago by andrew_b
- Status changed from closed to reopened
- Resolution fixed deleted
comment:40 Changed 17 months ago by andrew_b
- Votes for changeset committed-master deleted
- Branch state changed from merged to on review
Timestamps in tar archive are shown as "Jan 1, 1970".
Branch: 195_tar_timestamp
changeset:c9169c0aa8c162ce6b5fd15636753865f9c3f844
comment:41 Changed 17 months ago by andrew_b
- Votes for changeset set to andrew_b
- Branch state changed from on review to approved
comment:42 Changed 17 months ago by andrew_b
- Status changed from reopened to closed
- Votes for changeset changed from andrew_b to committed-master
- Resolution set to fixed
- Branch state changed from approved to merged
Merged to master: [5ac1e86e18e93837cd36528b2eae352091fe4f5b].
Created 1952_branch. Parent branch is master.
changeset:ff37dc26d46f652538c34475fec3f2b9bc9aa536
In this branch, MC uses external TAR program instead of self parsing TAR archives. This branch also fixes #2201.