Ticket #2310 (new enhancement)

Opened 14 years ago

Last modified 13 years ago

When moving mc queues delete operations and performs them only if copy is completed sucesfully

Reported by: zaytsev Owned by:
Priority: minor Milestone: Future Releases
Component: mc-core Version: 4.7.0.8
Keywords: Cc: kilobyte@…, zaytsev
Blocked By: Blocking:
Branch state: no branch Votes for changeset:

Description

Forwarded from Debian: http://bugs.debian.org/592941

From: Adam Borowski <kilobyte@angband.pl>
Subject: mc: interrupted move causes duplicated files
Date: Sat, 14 Aug 2010 12:43:32 +0200

When moving a directory to a different filesystem (ie, copy+delete), MC
postpones deletion until after all copying is done.  However, if there is
some interruption, most likely due to target fs running out of space, MC
will fail to do pending deletions of already completed files.

This causes duplicated files, and is especially painful when information
about what has been moved and what not is important.  All that MC offers is
restarting the move from the very start, which is not a good idea for large
transfers over slow links.

Also, while such postponed deletion does improve performance for small and
medium moves, it seriously degrades it when the tree is big enough to not
fit in the VFS cache.


Thus, a change I'd suggest is:
* don't postpone more than, say, 1000 deletions
* on an interruption, delete the pending completed files before aborting
(the former may seem tangential, but in a tree that takes 10 minutes to
delete, you don't want to block for long before displaying the error message)


Change History

comment:1 follow-up: ↓ 2 Changed 14 years ago by zaytsev

From: "Yury V. Zaytsev" <yury@shurup.com>
To: Adam Borowski <kilobyte@angband.pl>, 592941@bugs.debian.org
Date: Sun, 15 Aug 2010 01:10:54 +0200

Hi!

I don't think it's a bug. This behavior have been there for ages and I
think for a good reason. Let me clarify it below:

On Sat, 2010-08-14 at 12:43 +0200, Adam Borowski wrote:
> 
> This causes duplicated files, and is especially painful when information
> about what has been moved and what not is important. All that MC offers is
> restarting the move from the very start, which is not a good idea for large
> transfers over slow links.

Not true. When you start the move again, on the first already existing
file mc will stop and warn you. There is a number of options available:

Overwrite all targets? [ All ] [ Update ] [ None ] [ If size differs ] 

"Update" will only overwrite files the date of which is older than those
that are being moved or copied, "If size differs" will overwrite all the
target files that have different size than the source files.

The latter is what you need to continue the process properly by leaving
alone the files that already have been moved and moving only those that
haven't been moved already.

If you need the information on what has been moved already and what not,
you need the Compare directories to function properly (recursively).
There is a ticket for this in upstream bugzilla.

> Also, while such postponed deletion does improve performance for small and
> medium moves, it seriously degrades it when the tree is big enough to not
> fit in the VFS cache.

How bad is this performance decrease?

> Thus, a change I'd suggest is:
> * don't postpone more than, say, 1000 deletions
> * on an interruption, delete the pending completed files before aborting
> (the former may seem tangential, but in a tree that takes 10 minutes to
> delete, you don't want to block for long before displaying the error message)

I'd suggest not messing with deletions if the action that the user
requested has not been successfully completed in full. Maybe it will
give you some (arguable?) performance gain in your very specific case,
but will break other reasonable usage scenarios. I think you just need
to use "smart" resume strategy (overwrite if size differs) and Compare
directories when it will be fixed.

In what concerns a usage scenario that will be broken by your suggested
change: 

1) Imagine that I want to move a big tree to an FTP. In the middle of
the operation remote server ran out of space and the transfer failed.
Successfully transferred files have been deleted.

Now there's not much I can do with this remote FTP. I found another
location to upload them. I have to download them first, then re-upload
the whole package to the second server. I'm screwed.

2) I started moving a huge tree to another disk and half-way realized
that it's the wrong volume. So I abort the move, but screw it, half of
the collection has already been deleted from the source drive.

Now I have to move it back and only then move to the correct drive.

---

I suggest to close this as wontfix, because I really have to make mc do
any destructive action before the requested command has been completed
the way the user wanted it to in full and without problems.

I hope you can understand my point.
 
-- 
Sincerely yours,
Yury V. Zaytsev

comment:2 in reply to: ↑ 1 Changed 14 years ago by andrew_b

Replying to zaytsev:

I suggest to close this as wontfix, because I really have to make mc do
any destructive action before the requested command has been completed
the way the user wanted it to in full and without problems.

I agree with that.

comment:3 Changed 14 years ago by zaytsev

From: Adam Borowski <kilobyte@angband.pl>
Date: Sun, 15 Aug 2010 02:03:41 +0200

On Sun, Aug 15, 2010 at 01:10:54AM +0200, Yury V. Zaytsev wrote:
> Hi!
> 
> > This causes duplicated files, and is especially painful when information
> > about what has been moved and what not is important. All that MC offers is
> > restarting the move from the very start, which is not a good idea for large
> > transfers over slow links.
> 
> Not true. When you start the move again, on the first already existing
> file mc will stop and warn you. There is a number of options available:
> 
> Overwrite all targets? [ All ] [ Update ] [ None ] [ If size differs ] 

The only choice that will complete the move is "All".  The other ones will
leave the duplicates in the source dir.

> The latter is what you need to continue the process properly by leaving
> alone the files that already have been moved and moving only those that
> haven't been moved already.

For a "move" to be done, the source would need to be deleted.  None of the
files at that point have been fully moved, they at most completed the first
step -- copy.

> If you need the information on what has been moved already and what not,
> you need the Compare directories to function properly (recursively).
> There is a ticket for this in upstream bugzilla.

So they'd have to perform a complex action just to get what other file
management programs do by default.  I've looked at nautilus and WinXP's
explorer, and they don't cause duplicated files. 

> > Also, while such postponed deletion does improve performance for small and
> > medium moves, it seriously degrades it when the tree is big enough to not
> > fit in the VFS cache.
> 
> How bad is this performance decrease?

In the case of lots of very small files that do fit into their inodes/
directory tree/metadata blocks, and the admin has foolishly left
atime/relatime on, about a factor of two.

The copy has to read everything, and (if atime is on), write to every inode.
The deletion needs to again read everything, and then write.

> I'd suggest not messing with deletions if the action that the user
> requested has not been successfully completed in full.

The move consists of thousands of actions the user requested, and it's
better to complete _some_ of them instead of none at all.

> Maybe it will give you some (arguable?) performance gain in your very
> specific case, but will break other reasonable usage scenarios.

That optimization is indeed quite rarely useful, but I can't see any use
case where it breaks something.  I proposed only as an addition to fixing
pending deletion not being moved, it is in no way necessary -- it merely
would let us avoid a long pause between an error occuring and returning
control to the user.

It would have a nice side effect, however, by allowing the freed space to be
reused much sooner.

> I think you just need to use "smart" resume strategy (overwrite if size
> differs) and Compare directories when it will be fixed.

Well, but the current behaviour -- could you name any use case when it would
actually be useful?  If for some strange reason the user wanted to ensure
the whole directory is at all times complete on one of the systems, it's a
matter of just F5 enter F8 enter.

> In what concerns a usage scenario that will be broken by your suggested
> change: 
> 
> 1) Imagine that I want to move a big tree to an FTP. In the middle of
> the operation remote server ran out of space and the transfer failed.
> Successfully transferred files have been deleted.

Just as you requested.  In fact, if you selected a bunch of large files, MC
would do that -- it's only moving the contents of a directory which exhibits
this behaviour.

Some consitency would be good.

> 2) I started moving a huge tree to another disk and half-way realized
> that it's the wrong volume. So I abort the move, but screw it, half of
> the collection has already been deleted from the source drive.

How often you do that?  It strikes me as a pretty rare occurence.

And for scenarios broken by the current behaviour, let's start with:

1. You need to free some space.  You did move some big files, but misjudged
the space, and now need to stuff the rest someplace else.  You definitely
don't want to have the data occupy much more than it needs to.

2. Once some stuff has been dealt with, you move it to another place.  You
want to have everything either in the "done" dir or the "todo" one, but not
both.

3. You have a bunch of versions bozos you work with didn't keep under
version control, and want to sort them out.  Naturally, there will be lots
of duplicates, but you need to know which duplicates were there before and
which were not.

> I suggest to close this as wontfix, because I really have to make mc do
> any destructive action before the requested command has been completed
> the way the user wanted it to in full and without problems.

Any given file needs to be completely tranferred before being deleted.
It can't be done in a fully atomic way, but when the failure occurs during
transferring a particular way, MC does ask the user a question: "Incomplete
file was retrieved. Keep it? [ Delete ] [ Keep ]".

If you insist the current behaviour has a purpose, it would be better to at
least give the user an option to clean up one way or the other.

I still that would be redundant with F5 F8, but indeed, in that rare case
you pointed out, the user might have not anticipated that FTP server will
run out of space.

> I hope you can understand my point.

Kind of, but I'm surprised by this inconsitency between MC and other
programs, including MC's own way of moving multiple files.

comment:4 Changed 14 years ago by andrew_b

  • Type changed from defect to enhancement

comment:5 Changed 13 years ago by andrew_b

  • Branch state set to no branch

Related to #20.

comment:6 Changed 13 years ago by andrew_b

  • Milestone changed from 4.7 to Future Releases
Note: See TracTickets for help on using tickets.