Ticket #4128 (closed enhancement: fixed)
improve handling of compressed content in mc.ext
Reported by: | ossi | Owned by: | andrew_b |
---|---|---|---|
Priority: | major | Milestone: | |
Component: | mc-core | Version: | master |
Keywords: | Cc: | netch@… | |
Blocked By: | Blocking: | #2117 | |
Branch state: | no branch | Votes for changeset: |
Description
this may be controversial - there is a related thread at https://mail.gnome.org/archives/mc/2005-June/thread.html#00022, though one would expect it to be outdated by now. i've been using this code for a very long time.
Attachments
Change History
Changed 4 years ago by ossi
- Attachment 0001-improve-handling-of-compressed-content-in-mc.ext.patch added
comment:1 Changed 4 years ago by andrew_b
- Status changed from new to accepted
- Owner set to andrew_b
- Milestone changed from Future Releases to 4.8.26
comment:2 Changed 4 years ago by andrew_b
- Votes for changeset set to andrew_b
- Branch state changed from no branch to approved
Branch: 4128_mc.ext_compressed_content
changeset:8857423e4ebb770b6f0ea3103abf5d35c85fcbe8
comment:3 Changed 4 years ago by andrew_b
- Status changed from accepted to testing
- Votes for changeset changed from andrew_b to committed-master
- Resolution set to fixed
- Branch state changed from approved to merged
Thanks!
Merged to master: [eeb14d3f482c7a37e4a2391b951a84b451b3d6a7].
comment:5 Changed 4 years ago by zaytsev
It would be awesome if someone did something to ts - right now it is always regarded as a video file. This might have been reasonable before the advent of TypeScript, but now most of TS-files are actually code.
Unfortunately file is not perfect:
zaytsev@parallels:~/src/app/src$ file main.ts main.ts: Java source, ASCII text zaytsev@parallels:~/src/app/src$ file typings.d.ts typings.d.ts: ASCII text zaytsev@parallels:~/src/app/src$ file test.ts test.ts: Java source, UTF-8 Unicode text
Maybe match on text before video?
comment:6 Changed 4 years ago by ossi
that's one hell of an overloaded extension:
> file qt/translations/qt_ru.ts qt/translations/qt_ru.ts: XML 1.0 document, UTF-8 Unicode text, with very long lines
(this extension predates typescript by over a decade.)
we can keep the order, but need to switch to content detection:
... # does anyone have a sample file to verify that? type/MPEG Include=video ... type/XML Open=linguist %f ... include/editor ...
comment:7 Changed 4 years ago by zaytsev
zaytsev@parallels:~/src$ file sample_640x360.ts sample_640x360.ts: data zaytsev@parallels:~/src$ file ed24p_10.ts ed24p_10.ts: data
comment:8 Changed 4 years ago by ossi
heh, FAIL.
then we need some coding, to enable compound conditions like this:
shell/i/.ts type/^data$ Open=...
(so basically AND conditions by stacking.)
i'd suggest to create a new ticket (and possibly delete the kinda off-topic comments here).
comment:11 Changed 10 days ago by zaytsev
Feedback from Ubuntu user:
https://bugs.launchpad.net/ubuntu/+source/mc/+bug/2097560
I have a zip file where the first enclosed file is PDF. For a reason (I strictly prefer okular over evince and was lazy to find another place to set the priority), I changed the mc extension file (mc.ext) entry to the following:
type/^PDF Open=okular %f & disown
Unexpectedly, this caused misopening of some zip files. Checking what is happening by strace showed that mc calls file with arguments to detect the enclosed contents:
1368162 19:40:43.017932 execve("/usr/bin/file", ["file", "-z", "-S", "-L", "/home/netch/Downloads/sql_12.zip"], 0x6017f1d3d398 /* 68 vars */ <unfinished ...>
and this file helper returns:
1368162 19:40:43.040724 write(1, "PDF document, version 1.3, 90 pages (Zip archive data, at least v2.0 to extract, compression method=deflate)\n", 109) = 109
The rule in the extension file detects, according to "^PDF", this is PDF and runs okular instead of ark for this file. Okular deliberately rejects opening zip.
With the default extension file contents:
Open=/usr/lib/mc/ext.d/doc.sh open pdf
the bug still has place but is masked because passing through doc.sh and then xdg-open rechecks the file type using file (technically, with kioslave5) without -z that allows proper detection.
So using simple file/ rules in mc extension file is incompatible with using file -z for deeper analysis (if this deeper analysis has sense in general).
Iʼm uncertain for security impact but it could have taken place in a more complicated scenario.
comment:12 Changed 10 days ago by zaytsev
I'm not sure what the best way is to solve this problem universally. I think our Type= rules could/should exclude compressed? ossi, can you do something about this?
comment:13 Changed 10 days ago by zaytsev
- Status changed from closed to reopened
- Resolution fixed deleted
comment:14 Changed 10 days ago by zaytsev
- Votes for changeset committed-master deleted
- Branch state changed from merged to no branch
- Milestone changed from 4.8.26 to Future Releases
comment:15 Changed 10 days ago by ossi
dunno, depends on how stupid file really is.
hmm, apparently very - file -z on an .odt file bugs out a gzip error. this is doubly bogus, as these are zip and not gzip files.
i was attempting to figure out what it would do for higher-level file formats that are inherently archives. so much for that.
that it aggressively peeks contents of multi-file archives is also extremely questionable. it doesn't do that for tar files.
maybe that release of file is just really botched? this is file-5.45 from recent debian unstable.
comment:16 Changed 10 days ago by zaytsev
Some experiments on my machine (macOS):
% file --version file-5.41 magic file from /usr/share/file/magic % file -z -b -L -S test.pdf.zip ERROR:[gzip: Unknown compression format] (Zip archive data, at least v2.0 to extract, compression method=deflate) % file -b -L -S test.pdf.zip Zip archive data, at least v2.0 to extract, compression method=deflate % file -z -b -L -S test.pdf.gz PDF document, version 1.3 (zip deflate encoded) (gzip compressed data, was "test.pdf", last modified: Mon Jul 8 06:39:30 2024, from Unix) % file -b -L -S test.pdf.gz gzip compressed data, was "test.pdf", last modified: Mon Jul 8 06:39:30 2024, from Unix, original size modulo 2^32 1264794 % file -z -b -L -S test.pdf.bz2 PDF document, version 1.3 (zip deflate encoded) (bzip2 compressed data, block size = 900k) % file -b -L -S test.pdf.bz2 bzip2 compressed data, block size = 900k
It would really help if you could please explain what did you want to achieve and how it was supposed to work? To me it seems that -z, which is documented as "Try to look inside compressed files" does exactly what it says.
My guess is that you wanted to simplify / make the detection of man stuff (or, generally, compressed data for which we have special handling) more reliable.
In this case, maybe the solution is to make sure that we put the archives first, unless we have special support for compressed data. If we do, then these entries should come first. Can you look into this and make a patch?
comment:17 Changed 9 days ago by ossi
ok, i had a closer look.
In this case, maybe the solution is to make sure that we put the archives first, unless we have special support for compressed data.
yes, this seems like what we should do.
but then, there is the mess of c3848a689 and a4cccdff0.
so this clearly needs some rather careful examination.
i recommend to re-close this issue and create a followup.
comment:18 Changed 9 days ago by zaytsev
- Status changed from reopened to closed
- Resolution set to fixed
- Milestone Future Releases deleted
Created new ticket as suggested - #4649. I would appreciate any help - editing the ticket itself, and of course working on the rules.
I'm now trying to focus my time on researching a proper GitHub migration, as we don't want an improper one, and a proper one requires tons of work for which there are no other takers :(