Ticket #2707 (closed defect: fixed)

Opened 5 years ago

Last modified 5 months ago

u7z fails to parse a particular archive

Reported by: birdie Owned by: zaytsev-work
Priority: major Milestone: 4.8.19
Component: mc-vfs Version: master
Keywords: Cc:
Blocked By: Blocking: #3744
Branch state: merged Votes for changeset: committed-master

Description

Subject.

Attachments

list.txt (3.1 KB) - added by birdie 5 years ago.
7z l of this archive
mc-2707-Extfs-helper-u7z-fix-parsing-7z-archive-no-datetime.patch (5.0 KB) - added by and 15 months ago.
2707-0001-extfs-u7z-make-the-code-more-readable.patch (2.2 KB) - added by mooffie 5 months ago.
2707-0004-extfs-u7z-print-cleanup.patch (1.1 KB) - added by mooffie 5 months ago.
listing2.txt (14.4 KB) - added by birdie 5 months ago.
2707-0002-extfs-u7z-handle-missing-size.patch (2.1 KB) - added by mooffie 5 months ago.
2707-0003-extfs-u7z-sed-portability.patch (1.5 KB) - added by mooffie 5 months ago.

Change History

Changed 5 years ago by birdie

7z l of this archive

comment:1 Changed 5 years ago by andrew_b

  • Component changed from mc-core to mc-vfs

comment:2 Changed 15 months ago by and

Confirmed,
7z archive can hold entries without datetime info
which hide/skip u7z helper.

Patch attached.

Last edited 15 months ago by and (previous) (diff)

comment:3 Changed 15 months ago by zaytsev

@and, in the future, please submit whitespace only patches as separate commits. If you change the whitespace in the same commit, then it's not immediately clear from the diff whether the additional changes also concern the logic, or it's only whitespace, unless special options are used (ignore whitespace or word diff). This makes reviewing the history much more tedious than otherwise.

comment:4 Changed 15 months ago by zaytsev

  • Milestone changed from Future Releases to 4.8.17

I have just created a 4.8.17 milestone. I suggest to use this for all tickets with fresh patches to be reviewed.

We need to make a 4.8.16 at some point this month, so just assigning all newly processed tickets to 4.8.16 is not helpful, as we need to stop at some point. But if they are left for future releases, then one would need to go through them again and see which ones already have good patches on review. Assigning them to 4.8.17 sounds like a good solution to me.

comment:5 Changed 13 months ago by zaytsev

  • Milestone changed from 4.8.17 to 4.8.18

comment:6 Changed 8 months ago by zaytsev

  • Milestone changed from 4.8.18 to 4.8.19

comment:7 Changed 7 months ago by zaytsev-work

  • Status changed from new to accepted
  • Owner set to zaytsev-work

comment:8 Changed 5 months ago by zaytsev

  • Branch state changed from no branch to on review

Branch: 2707_extfs_u7z
Initial changeset: [7f6a700d9e2feea02cde4097d64a7ba522eea7b9]

comment:9 Changed 5 months ago by zaytsev

See also #3730 regarding making a possible test out of this.

comment:10 Changed 5 months ago by mooffie

Please hold on the merge for one day, ok? I'll have a quick look into the testability of this today.

comment:11 Changed 5 months ago by zaytsev

No worries, it has been waiting for 5 years already, so I have absolutely no problems with waiting a couple of days more.

comment:12 Changed 5 months ago by mooffie

There's another problem in the archive @birdie posted: the size field is missing for some files.

  • I'm attaching a patch to fix this.
  • The first patch, though, will be to make the code more readable (otherwise you won't understand my fix).
  • Then two other small patches will follow (portability, cleanup).

Don't worry: you won't have to wait 5 more years to commit this. I've already prepared tests (I'll start a new ticket shortly) so you can commit this outright.

Changed 5 months ago by mooffie

Changed 5 months ago by mooffie

comment:13 Changed 5 months ago by mooffie

  • Blocking 3744 added

comment:14 follow-up: ↓ 16 Changed 5 months ago by zaytsev-work

Could you please give me an example of sed that has problems with \s ? Did you mean macOS?

zaytsevy:~ zaytsev$ echo "abc   def" | sed -e 's/\s*//g'
abc   def
zaytsevy:~ zaytsev$ echo "abc   def" | sed -e 's/[[:space:]]*//g' 
abcdef

I'm not sure that replacing \s with is the best approach, because \s at very least also includes tab, but maybe in this case it doesn't matter and if the input has some other whitespace we have a problem anyways.

comment:15 follow-up: ↓ 17 Changed 5 months ago by birdie

I can imagine 7z archives which lack both compressed _and_ uncompressed sizes simultaneously. Please consider this.

comment:16 in reply to: ↑ 14 Changed 5 months ago by mooffie

Replying to zaytsev-work:

I'm not sure that replacing \s with ' ' is the best approach, because \s at very least also includes tab [...] and if the input has some other whitespace we [...]


I don't see a problem with replacing \s with just ' '. On the contrary. If the input has whitespace zoo, our whole sed recipe would fail: we're very precise in the rest of the pattern(s).

Could you please give me an example of sed that has problems with \s ?


I'm not familiar with such sed personally. \s is just not standard.

Now, speaking of portability:

There was something that bothered me and I now verified it: the | (bar) I used in \($date_re\|$empty_date_re\) isn't portable either. sed is supposed to support "basic regular expressions" and | is of "extended regular expressions". So it's my mistake.

I'll soon re-upload patch #2 (after I get more info from @birdie).

comment:17 in reply to: ↑ 15 Changed 5 months ago by mooffie

Replying to birdie:

I can imagine 7z archives which lack both compressed _and_ uncompressed sizes simultaneously. Please consider this.


Could you please provide more info? I.e., why would they be missing?

(I'm asking this because sed is a bit limited, as you see, and I don't want to bloat the recipe needlessly.)

Last edited 5 months ago by mooffie (previous) (diff)

Changed 5 months ago by birdie

comment:18 Changed 5 months ago by birdie

Replying to mooffie:

Could you please provide more info? I.e., why would they be missing?

See listing2.txt - its last file has no sizes at all.

Changed 5 months ago by mooffie

Changed 5 months ago by mooffie

comment:19 Changed 5 months ago by mooffie

Ok, I updated patch 0002 (so I had to update 0003 too). Remember to commit them in order: 0001, 0002, 0003, 0004.

What's new in patch 0002?

  • It no longer uses | in the regex.
  • It handles the new case @birdie noted where both the uncompressed and compressed sizes are missing (see 'listing2.txt').

Don't worry about tricky bugs: the tests at #3744 were updated. 'u7z.missing-size-and-date.input' tests all the possibilities that patch 0002 needs to handle.

(As for counting 19 spaces in patch 0003: I felt ' *' would be too loose. Whatever. Let's not be fussy about this.)

comment:20 follow-up: ↓ 21 Changed 5 months ago by zaytsev

See listing2.txt - its last file has no sizes at all.

So, the answer is, because it's not a valid 7z archive, but rather 7z is used to explore other ~archive files that it supports, like NSIS installer packages in this case. But okay, fair enough.

I don't see a problem with replacing \s with just ' '. On the contrary. If the input has whitespace zoo, our whole sed recipe would fail: we're very precise in the rest of the pattern(s).

Alright, I was just saying that it's not an equivalent replacement, but you have a valid point.

Don't worry about tricky bugs: the tests at #3744 were updated.

Oh joys of test driven development ;-) I've just pushed out the new patches and Travis build was successful. Unless Andrew has anything to say against it, I'll try to merge it tomorrow.

Thank you mooffie for your help!

comment:21 in reply to: ↑ 20 Changed 5 months ago by andrew_b

Replying to zaytsev:

Unless Andrew has anything to say against it

No, I haven't. Everything is fine.

comment:22 Changed 5 months ago by zaytsev

  • Votes for changeset set to zaytsev andrew_b
  • Branch state changed from on review to approved

comment:23 Changed 5 months ago by zaytsev

  • Status changed from accepted to testing
  • Votes for changeset changed from zaytsev andrew_b to committed-master
  • Resolution set to fixed
  • Branch state changed from approved to merged

comment:24 Changed 5 months ago by zaytsev

  • Status changed from testing to closed
Note: See TracTickets for help on using tickets.