Ticket #4128 (closed enhancement: fixed)

Opened 4 years ago

Last modified 9 days ago

improve handling of compressed content in mc.ext

Reported by: ossi Owned by: andrew_b
Priority: major Milestone:
Component: mc-core Version: master
Keywords: Cc: netch@…
Blocked By: Blocking: #2117
Branch state: no branch Votes for changeset:

Description

this may be controversial - there is a related thread at https://mail.gnome.org/archives/mc/2005-June/thread.html#00022, though one would expect it to be outdated by now. i've been using this code for a very long time.

Change History

comment:1 Changed 4 years ago by andrew_b

  • Status changed from new to accepted
  • Owner set to andrew_b
  • Milestone changed from Future Releases to 4.8.26

comment:2 Changed 4 years ago by andrew_b

  • Votes for changeset set to andrew_b
  • Branch state changed from no branch to approved

Branch: 4128_mc.ext_compressed_content
changeset:8857423e4ebb770b6f0ea3103abf5d35c85fcbe8

comment:3 Changed 4 years ago by andrew_b

  • Status changed from accepted to testing
  • Votes for changeset changed from andrew_b to committed-master
  • Resolution set to fixed
  • Branch state changed from approved to merged

Thanks!

Merged to master: [eeb14d3f482c7a37e4a2391b951a84b451b3d6a7].

comment:4 Changed 4 years ago by andrew_b

  • Status changed from testing to closed

comment:5 Changed 4 years ago by zaytsev

It would be awesome if someone did something to ts - right now it is always regarded as a video file. This might have been reasonable before the advent of TypeScript, but now most of TS-files are actually code.

Unfortunately file is not perfect:

zaytsev@parallels:~/src/app/src$ file main.ts 
main.ts: Java source, ASCII text

zaytsev@parallels:~/src/app/src$ file typings.d.ts 
typings.d.ts: ASCII text

zaytsev@parallels:~/src/app/src$ file test.ts 
test.ts: Java source, UTF-8 Unicode text

Maybe match on text before video?

comment:6 Changed 4 years ago by ossi

that's one hell of an overloaded extension:

> file qt/translations/qt_ru.ts
qt/translations/qt_ru.ts: XML 1.0 document, UTF-8 Unicode text, with very long lines

(this extension predates typescript by over a decade.)

we can keep the order, but need to switch to content detection:

...
# does anyone have a sample file to verify that?
type/MPEG
    Include=video
...
type/XML
    Open=linguist %f
...
include/editor
...

comment:7 Changed 4 years ago by zaytsev

zaytsev@parallels:~/src$ file sample_640x360.ts 
sample_640x360.ts: data

zaytsev@parallels:~/src$ file ed24p_10.ts 
ed24p_10.ts: data

comment:8 Changed 4 years ago by ossi

heh, FAIL.
then we need some coding, to enable compound conditions like this:

shell/i/.ts
type/^data$
    Open=...

(so basically AND conditions by stacking.)

i'd suggest to create a new ticket (and possibly delete the kinda off-topic comments here).

comment:9 Changed 4 years ago by zaytsev

Done in #4141.

comment:10 Changed 4 years ago by zaytsev

  • Blocking 2117 added

comment:11 Changed 10 days ago by zaytsev

Feedback from Ubuntu user:

https://bugs.launchpad.net/ubuntu/+source/mc/+bug/2097560

I have a zip file where the first enclosed file is PDF. For a reason (I strictly prefer okular over evince and was lazy to find another place to set the priority), I changed the mc extension file (mc.ext) entry to the following:

type/^PDF
        Open=okular %f & disown

Unexpectedly, this caused misopening of some zip files. Checking what is happening by strace showed that mc calls file with arguments to detect the enclosed contents:

1368162 19:40:43.017932 execve("/usr/bin/file", ["file", "-z", "-S", "-L", "/home/netch/Downloads/sql_12.zip"], 0x6017f1d3d398 /* 68 vars */ <unfinished ...>

and this file helper returns:

1368162 19:40:43.040724 write(1, "PDF document, version 1.3, 90 pages (Zip archive data, at least v2.0 to extract, compression method=deflate)\n", 109) = 109

The rule in the extension file detects, according to "^PDF", this is PDF and runs okular instead of ark for this file. Okular deliberately rejects opening zip.

With the default extension file contents:

    Open=/usr/lib/mc/ext.d/doc.sh open pdf

the bug still has place but is masked because passing through doc.sh and then xdg-open rechecks the file type using file (technically, with kioslave5) without -z that allows proper detection.

So using simple file/ rules in mc extension file is incompatible with using file -z for deeper analysis (if this deeper analysis has sense in general).

Iʼm uncertain for security impact but it could have taken place in a more complicated scenario.

Last edited 10 days ago by zaytsev (previous) (diff)

comment:12 Changed 10 days ago by zaytsev

I'm not sure what the best way is to solve this problem universally. I think our Type= rules could/should exclude compressed? ossi, can you do something about this?

comment:13 Changed 10 days ago by zaytsev

  • Status changed from closed to reopened
  • Resolution fixed deleted

comment:14 Changed 10 days ago by zaytsev

  • Votes for changeset committed-master deleted
  • Branch state changed from merged to no branch
  • Milestone changed from 4.8.26 to Future Releases

comment:15 Changed 10 days ago by ossi

dunno, depends on how stupid file really is.

hmm, apparently very - file -z on an .odt file bugs out a gzip error. this is doubly bogus, as these are zip and not gzip files.
i was attempting to figure out what it would do for higher-level file formats that are inherently archives. so much for that.

that it aggressively peeks contents of multi-file archives is also extremely questionable. it doesn't do that for tar files.

maybe that release of file is just really botched? this is file-5.45 from recent debian unstable.

comment:16 Changed 10 days ago by zaytsev

Some experiments on my machine (macOS):

% file --version
file-5.41
magic file from /usr/share/file/magic

% file -z -b -L -S test.pdf.zip
ERROR:[gzip: Unknown compression format] (Zip archive data, at least v2.0 to extract, compression method=deflate)

% file -b -L -S test.pdf.zip       
Zip archive data, at least v2.0 to extract, compression method=deflate

% file -z -b -L -S test.pdf.gz                                    
PDF document, version 1.3 (zip deflate encoded) (gzip compressed data, was "test.pdf", last modified: Mon Jul  8 06:39:30 2024, from Unix)

% file -b -L -S test.pdf.gz                                        
gzip compressed data, was "test.pdf", last modified: Mon Jul  8 06:39:30 2024, from Unix, original size modulo 2^32 1264794

% file -z -b -L -S test.pdf.bz2                                    
PDF document, version 1.3 (zip deflate encoded) (bzip2 compressed data, block size = 900k)

% file -b -L -S test.pdf.bz2                                       
bzip2 compressed data, block size = 900k

It would really help if you could please explain what did you want to achieve and how it was supposed to work? To me it seems that -z, which is documented as "Try to look inside compressed files" does exactly what it says.

My guess is that you wanted to simplify / make the detection of man stuff (or, generally, compressed data for which we have special handling) more reliable.

In this case, maybe the solution is to make sure that we put the archives first, unless we have special support for compressed data. If we do, then these entries should come first. Can you look into this and make a patch?

comment:17 Changed 9 days ago by ossi

ok, i had a closer look.

In this case, maybe the solution is to make sure that we put the archives first, unless we have special support for compressed data.

yes, this seems like what we should do.
but then, there is the mess of c3848a689 and a4cccdff0.
so this clearly needs some rather careful examination.
i recommend to re-close this issue and create a followup.

comment:18 Changed 9 days ago by zaytsev

  • Status changed from reopened to closed
  • Resolution set to fixed
  • Milestone Future Releases deleted

Created new ticket as suggested - #4649. I would appreciate any help - editing the ticket itself, and of course working on the rules.

I'm now trying to focus my time on researching a proper GitHub migration, as we don't want an improper one, and a proper one requires tons of work for which there are no other takers :(

comment:19 Changed 9 days ago by netch

  • Cc netch@… added
Note: See TracTickets for help on using tickets.