Ticket #2707 (closed defect: fixed)
u7z fails to parse a particular archive
Reported by: | birdie | Owned by: | zaytsev-work |
---|---|---|---|
Priority: | major | Milestone: | 4.8.19 |
Component: | mc-vfs | Version: | master |
Keywords: | Cc: | ||
Blocked By: | Blocking: | #3744 | |
Branch state: | merged | Votes for changeset: | committed-master |
Description
Subject.
Attachments
Change History
comment:2 Changed 9 years ago by and
Confirmed,
7z archive can hold entries without datetime info
which hide/skip u7z helper.
Patch attached.
Changed 9 years ago by and
- Attachment mc-2707-Extfs-helper-u7z-fix-parsing-7z-archive-no-datetime.patch added
comment:3 Changed 9 years ago by zaytsev
@and, in the future, please submit whitespace only patches as separate commits. If you change the whitespace in the same commit, then it's not immediately clear from the diff whether the additional changes also concern the logic, or it's only whitespace, unless special options are used (ignore whitespace or word diff). This makes reviewing the history much more tedious than otherwise.
comment:4 Changed 9 years ago by zaytsev
- Milestone changed from Future Releases to 4.8.17
I have just created a 4.8.17 milestone. I suggest to use this for all tickets with fresh patches to be reviewed.
We need to make a 4.8.16 at some point this month, so just assigning all newly processed tickets to 4.8.16 is not helpful, as we need to stop at some point. But if they are left for future releases, then one would need to go through them again and see which ones already have good patches on review. Assigning them to 4.8.17 sounds like a good solution to me.
comment:7 Changed 8 years ago by zaytsev-work
- Status changed from new to accepted
- Owner set to zaytsev-work
comment:8 Changed 8 years ago by zaytsev
- Branch state changed from no branch to on review
Branch: 2707_extfs_u7z
Initial changeset: [7f6a700d9e2feea02cde4097d64a7ba522eea7b9]
comment:9 Changed 8 years ago by zaytsev
See also #3730 regarding making a possible test out of this.
comment:10 Changed 8 years ago by mooffie
Please hold on the merge for one day, ok? I'll have a quick look into the testability of this today.
comment:11 Changed 8 years ago by zaytsev
No worries, it has been waiting for 5 years already, so I have absolutely no problems with waiting a couple of days more.
comment:12 Changed 8 years ago by mooffie
There's another problem in the archive @birdie posted: the size field is missing for some files.
- I'm attaching a patch to fix this.
- The first patch, though, will be to make the code more readable (otherwise you won't understand my fix).
- Then two other small patches will follow (portability, cleanup).
Don't worry: you won't have to wait 5 more years to commit this. I've already prepared tests (I'll start a new ticket shortly) so you can commit this outright.
Changed 8 years ago by mooffie
- Attachment 2707-0001-extfs-u7z-make-the-code-more-readable.patch added
comment:14 follow-up: ↓ 16 Changed 8 years ago by zaytsev-work
Could you please give me an example of sed that has problems with \s ? Did you mean macOS?
zaytsevy:~ zaytsev$ echo "abc def" | sed -e 's/\s*//g' abc def zaytsevy:~ zaytsev$ echo "abc def" | sed -e 's/[[:space:]]*//g' abcdef
I'm not sure that replacing \s with is the best approach, because \s at very least also includes tab, but maybe in this case it doesn't matter and if the input has some other whitespace we have a problem anyways.
comment:15 follow-up: ↓ 17 Changed 8 years ago by birdie
I can imagine 7z archives which lack both compressed _and_ uncompressed sizes simultaneously. Please consider this.
comment:16 in reply to: ↑ 14 Changed 8 years ago by mooffie
Replying to zaytsev-work:
I'm not sure that replacing \s with ' ' is the best approach, because \s at very least also includes tab [...] and if the input has some other whitespace we [...]
I don't see a problem with replacing \s with just ' '. On the contrary. If the input has whitespace zoo, our whole sed recipe would fail: we're very precise in the rest of the pattern(s).
Could you please give me an example of sed that has problems with \s ?
I'm not familiar with such sed personally. \s is just not standard.
Now, speaking of portability:
There was something that bothered me and I now verified it: the | (bar) I used in \($date_re\|$empty_date_re\) isn't portable either. sed is supposed to support "basic regular expressions" and | is of "extended regular expressions". So it's my mistake.
I'll soon re-upload patch #2 (after I get more info from @birdie).
comment:17 in reply to: ↑ 15 Changed 8 years ago by mooffie
Replying to birdie:
I can imagine 7z archives which lack both compressed _and_ uncompressed sizes simultaneously. Please consider this.
Could you please provide more info? I.e., why would they be missing?
(I'm asking this because sed is a bit limited, as you see, and I don't want to bloat the recipe needlessly.)
comment:18 Changed 8 years ago by birdie
Replying to mooffie:
Could you please provide more info? I.e., why would they be missing?
See listing2.txt - its last file has no sizes at all.
comment:19 Changed 8 years ago by mooffie
Ok, I updated patch 0002 (so I had to update 0003 too). Remember to commit them in order: 0001, 0002, 0003, 0004.
What's new in patch 0002?
- It no longer uses | in the regex.
- It handles the new case @birdie noted where both the uncompressed and compressed sizes are missing (see 'listing2.txt').
Don't worry about tricky bugs: the tests at #3744 were updated. 'u7z.missing-size-and-date.input' tests all the possibilities that patch 0002 needs to handle.
(As for counting 19 spaces in patch 0003: I felt ' *' would be too loose. Whatever. Let's not be fussy about this.)
comment:20 follow-up: ↓ 21 Changed 8 years ago by zaytsev
See listing2.txt - its last file has no sizes at all.
So, the answer is, because it's not a valid 7z archive, but rather 7z is used to explore other ~archive files that it supports, like NSIS installer packages in this case. But okay, fair enough.
I don't see a problem with replacing \s with just ' '. On the contrary. If the input has whitespace zoo, our whole sed recipe would fail: we're very precise in the rest of the pattern(s).
Alright, I was just saying that it's not an equivalent replacement, but you have a valid point.
Don't worry about tricky bugs: the tests at #3744 were updated.
Oh joys of test driven development ;-) I've just pushed out the new patches and Travis build was successful. Unless Andrew has anything to say against it, I'll try to merge it tomorrow.
Thank you mooffie for your help!
comment:21 in reply to: ↑ 20 Changed 8 years ago by andrew_b
Replying to zaytsev:
Unless Andrew has anything to say against it
No, I haven't. Everything is fine.
comment:22 Changed 8 years ago by zaytsev
- Votes for changeset set to zaytsev andrew_b
- Branch state changed from on review to approved
comment:23 Changed 8 years ago by zaytsev
- Status changed from accepted to testing
- Votes for changeset changed from zaytsev andrew_b to committed-master
- Resolution set to fixed
- Branch state changed from approved to merged
Merged to master: [12dbd957c009c443d1ab8e7f33e183cbb94d4e81].
7z l of this archive