Ticket #4318 (new enhancement)

Opened 2 years ago

Last modified 2 years ago

[feature request] Show hash of file in "File exists" window of "Copy/Move" dialog

Reported by: Bogdan107 Owned by:
Priority: major Milestone: Future Releases
Component: mc-core Version: master
Keywords: Cc:
Blocked By: Blocking:
Branch state: no branch Votes for changeset:

Description

Steps:
1) start MC;
2) cd /tmp; (in left panel)
3) cd ~; (in right panel)
4) select single file;
5) F5 + Enter; (file copied from "~/" to "/tmp")
6) F5 + Enter;
7) MC shows "File exists" window dialog.

This dialog contains:
1) full file path;
2) file size;
3) last modification time of file.

What I want:
I want to see hashes of source and destination files, like CRC, MD5, SHA-1 or any other.

Why I want this:
Hashes of two files (source and destination) allow to to quick check, that source file and destination file are identically or different.
If I need to copy single file per long time - then no problem to to "C-x C-d" or run "diff" from console. But if I need to copy/move many files, then I need more quickly method to detect identical/different files with the same name inside the "File exists" window of "Copy/Move?" dialog.

P.S. If source file and destination file are identical (have identical hashes in two different algorithms, for example - CRC and MD5, or MD5 and SHA-1), then MC may resolve file names conflict automatically, without "File exists" window dialog.

Change History

comment:1 Changed 2 years ago by andrew_b

Related to #2089.

comment:2 Changed 2 years ago by Bogdan107

Behaviour of "automatic file collision resolving in Copy and Move operations" may be controlled by some checkbox in options dialog.

For example:
Checkbox "Quietly overwrite simple collisions" in Copy and Move dialogs.

This checkbocks include behaviours:
1) "Quietly overwrite identical files with different modification time":
If SOURCE and DESTINATION files are identical (have the same name, the same size, the same hashes), and last modification time of SOURCE is higher than last midification time of DESTINATION, then DESTINATION will be overwritted by SOURCE automatically, quietly, without "File exists" dialog.
2) Some others variants of file collisions.

comment:3 Changed 2 years ago by andrew_b

Patches are welcome!

comment:4 Changed 2 years ago by ossi

i don't see a point in overwriting identical files; you'd want to update the timestamp instead, and in case of a move just remove the source file.

a few points regarding hashes:

  • given that files need to be read in their entirety to calculate their hashes, and it is a somewhat cpu-intensive operation, there is no point in using hashes for comparison, unless the hash is actually calculated remotely for remote files (which would be possible only for fishfs)
  • there is no point in showing the hash to the user, as the only thing that matters is the boolean result of the comparison
  • there is no point in using multiple hashes. if you don't trust the bit width, just use a wider version of the hash, say sha2. but this application isn't security-sensitive, so for all practical purposes even md5 is sufficiently collision-safe.

comment:5 Changed 2 years ago by Bogdan107

  1. About "cpu-intensive operation":

Hash calculations may be activated by separate checkbox "Calculate hashes" in Copy/Move? dialogs, near "Follow links", "Preserve attributes" and others.
Hash calculations may be ignored (and save the CPU time), if SRC and DST files have different size.

  1. About "not showing the hash to the user ..."

In work flows, where directory merging required:

  • I need an option to save time with identical files (identical by content too, not just by name and size):

1) in Copy operations - just update access/modification time from SRC file to DST file;
2) in Move operations - quickly (without separate dialog window) overwrite files in standard way for MC (just delete source or fully overwrite - it is the different task and does not matter in my case) with update access/modification time from SRC file to DST file.

  • I need an option to save time with files, which have the same name and same size but different content:

If file changed in little amount of bits, then the file size be the same, but hash can indicate the fact of difference in the content of files.
In this case, the showing of SRC and DST hashes - is a good idea.

Last edited 2 years ago by Bogdan107 (previous) (diff)

comment:6 Changed 2 years ago by ossi

In this case, the showing of SRC and DST hashes - is a good idea.

and why would you do something the computer is much better at manually (well, optically)?

I need an option to save time with files

this whole idea relies on the assumption that reading each target prior to overwriting it (and reading the source twice, if it's big) is actually faster than simply overwriting it. this will be the case only if a significant portion of the target files is identical, with "significant" heavily depending on the circumstances. the ratio will be relatively low if the bottleneck is a network and one can actually calculate a hash remotely (or the link is strongly asymmetrical in the "right" direction), while for a local file system it will probably approach 50%.

rather than speed, extending ssd life seems like a plausible motivation for avoiding writes.

for same-filesystem copies on modern file systems, replicating cp --reflink=auto would be a worthwhile addition (that deserves a separate ticket if there is none yet).

Note: See TracTickets for help on using tickets.