Ticket #1652 (accepted enhancement)

Opened 8 years ago

Last modified 3 years ago

Hide ^M in editor.

Reported by: slavazanko Owned by: angel_il
Priority: major Milestone: Future Releases
Component: mcedit Version: master
Keywords: Cc: zaytsev, gotar@…, pmiscml@…
Blocked By: Blocking:
Branch state: no branch Votes for changeset:

Description (last modified by angel_il) (diff)

Need to hide ^M symbols at end of lines in text files with '\r\n' line ends.

Better way: create keybinding for toggle ^M symbols.

Attachments

detect-linebreaks.patch (9.3 KB) - added by pfalcon 7 years ago.
0001-Ticket-1652.patch (10.1 KB) - added by angel_il 7 years ago.
0002-added-options-editor_autodetect_linebreak.patch (4.4 KB) - added by angel_il 7 years ago.
mc-4.8.1-ticket-1652-linebreaks.patch (12.3 KB) - added by huksley 5 years ago.
Patch against 4.8.1 for this feature

Change History

comment:1 Changed 8 years ago by slavazanko

  • Description modified (diff)

comment:2 follow-up: ↓ 3 Changed 8 years ago by birdie

Or even better:

suggest or "guess" line endings upon saving a file (WIN/LIN/MACOS).

comment:3 in reply to: ↑ 2 Changed 8 years ago by angel_il

  • Description modified (diff)

Replying to birdie:

Or even better:

suggest or "guess" line endings upon saving a file (WIN/LIN/MACOS).

comment:4 Changed 8 years ago by angel_il

  • Description modified (diff)

ups...

You mean #1571?

comment:5 Changed 8 years ago by SzG

If you implement an option, \r SHOULD BE SHOWN by default.

Mcedit is used as part of Cygwin by poor people who are forced to use Windows in the office (like myself). On Windows not all editors are preserving \n linefeeds like Far-manager, other editors may insert \r. The best way to check it is mcedit.

comment:6 Changed 8 years ago by angel_il

  • Owner set to angel_il
  • Status changed from new to accepted
  • Version changed from 4.7.0-pre3 to master
  • Milestone changed from 4.7 to 4.7.1

comment:7 Changed 8 years ago by angel_il

  • Milestone changed from 4.7.1 to Future Releases

comment:8 Changed 7 years ago by sorin

Currently this is one of the biggest issues with mc. The inability to work with different line ending give problems in lots of cases. It is something really common for see CR/LF even on linux from files coming from windows share, ftp, source control.

Like other editor mc should be able to detect the line ending and keep it. It could should the line ending in the status bar.

comment:9 Changed 7 years ago by zaytsev

Oh really? How did you conclude that mc is *unable* to work with different line endings?! It currently keeps the endings of edited files, however, \r endings are not shown by default, whereas ^M are. So it's perfectly safe to edit any kinds of files. Moreover, you can save files using any scheme that you want.

All it takes is to add the ability to optionally display alien endings and add options to select default endings scheme / and heuristics on keeping auto-detected endings or auto-converting them to the default scheme.

Last edited 6 years ago by slavazanko (previous) (diff)

comment:10 Changed 7 years ago by zaytsev

  • Cc zaytsev added

comment:11 Changed 7 years ago by gotar

  • Cc gotar@… added

comment:12 follow-up: ↓ 13 Changed 7 years ago by pfalcon

  • Cc pmiscml@… added

This is truly needed feature, and I lost my patience and went to implement it myself. At first I thought that just making mcedit to not display ^M if followed by \n would be enough, but turned out that I'd need to teach cursor to skip them too in various places, and that would be lot of effort to learn mcedit's internals and do it right in all places.

So I indeed went for the following solution:

  1. On opening file, detect line-endings used by sampling some initial content.
  2. If it happen to be CR or CRLF, skip fast load path, and in edit_insert_file() convert such line endings to \n.
  3. Save detected line ending type for editor.

That's basicly all - alien text files are converted to unix on load, then converted vice-versa on save. Only extra thing needed on this stage was to make Save As to treat "don't change" as indeed don't change (to not reset auto-detected line ending type).

Patch is attached (note - debug output). I'd appreciate review for the approach. Then, further work might be:

  1. Add config setting for auto-detect behavior.
  2. Add UI option for this setting.
  3. Optionally, on loading file with autodected line breaks, verify that all breaks indeed conform detected type, and otherwise warn user that one may have non-reversible data change.

comment:13 in reply to: ↑ 12 ; follow-up: ↓ 14 Changed 7 years ago by pfalcon

Replying to pfalcon:

  1. Add config setting for auto-detect behavior.
  2. Add UI option for this setting.
  3. Optionally, on loading file with autodected line breaks, verify that all breaks indeed conform detected type, and otherwise warn user that one may have non-reversible data change.
  1. Display line endings used in status line.

comment:14 in reply to: ↑ 13 ; follow-up: ↓ 17 Changed 7 years ago by angel_il

pfalcon:
ok, me like your patch.
1) can you create repo on github.com ?
2) i think no need add new param in edit_insert_file, just use edit->lb for this

comment:15 Changed 7 years ago by angel_il

comment:16 Changed 7 years ago by angel_il

  • severity changed from no branch to on review
  • Milestone changed from Future Releases to 4.8.0-pre1

comment:17 in reply to: ↑ 14 Changed 7 years ago by pfalcon

Replying to angel_il:

pfalcon:
ok, me like your patch.
1) can you create repo on github.com ?

Well, so far I do minimal hacking, so git diff/format-patch is less effort. If I'll do more, I'll push to github (btw, is there some repo to fork there? I cloned from midnight-commander.org)

2) i think no need add new param in edit_insert_file, just use edit->lb for this

Well, that would mean loss of operation precision. What edit_insert_file() does is inserts file filename with encoding lb into editor object edit with encoding edit->lb. In general case, lb != edit->lb. Yes, for what I implemented so far it doesn't matter, but "Insert file..." menu can be made to take advantage of lb detection too. And if "Insert file..." instead presume that inserted file's lb's are the same as currently being edited, that sooner or later will lead to unexpected behavior for some users, which then will complain that mc is buggy. So, worth doing it right from the first time ;-).

comment:18 Changed 7 years ago by pfalcon

Updated patch to show current (non-LB_ASIS) encoding in status line. Here's sample, still fits in 80 chars:

MainFrm.cpp        ----   0    87/87   1746/1746 <EOF>      ASCII CRLF      100%

Changed 7 years ago by pfalcon

comment:19 Changed 7 years ago by pfalcon

Oops, I actually didn't check into 1652_autodetect_lb, where status line feature was already added. Nevermind. I still think it's better to pass LineBreaks? to edit_insert_file explicitly.

And with changes to edit_get_save_file_as() in 3de1e2f26299017afa887f849199995c893de3e1, the whole option N_("&Do not change"), should be removed, as user's choosing it will lead to file being converted to LF, which is rather unexpected. But you cannot just remove that choice from dialog, as it is tied to LB_ASIS being 0. So, either LB_ASIS should be removed at all, or handled properly - not changing edit->lb, like my original patch did.

comment:20 Changed 7 years ago by pfalcon

Also, 8d28ef0b4756b3af0177a59433bd1be347d3c274 shows LB only for simple_statusbar case, probably worth adding for both types.

comment:21 Changed 7 years ago by slavazanko

  • severity changed from on review to on rework

mceditor should be stay as binary-safe editor, therefore need to implement configuration option for switching 'crlf autodetection' behaviour.

Changed 7 years ago by angel_il

Changed 7 years ago by angel_il

comment:22 Changed 7 years ago by angel_il

please review

comment:23 Changed 7 years ago by angel_il

  • severity changed from on rework to on review

comment:24 Changed 7 years ago by angel_il

please review

comment:25 follow-up: ↓ 27 Changed 7 years ago by andrew_b

If autodect is off, the editor has new behaviour that looks like a bug.

  1. Status line shows wrong line break.

How to reproduce:
1a. Open a text file with DOS line breaks.
Result: Status line shows LF.

  1. Editor is not binary-safe now.

2a. Open a new text file pressing shift-f4.
2b. Create several lines

a
b

then save file with DOS line breaks (shift-f2 -> Windows/DOS format (CR LF)) and close the editor.
2c. Open this file in editor. You can see ^M at line ends. That is ok.
2d. Move cursor to the first line, copy it pressing f3, down, f5.
2e. Save file pressing f2 (you still can see ^M at line ends) and close editor.
2f. Open this file again.
Result:DOS line breaks are lost.

Last edited 7 years ago by andrew_b (previous) (diff)

comment:26 Changed 7 years ago by andrew_b

  • severity changed from on review to on rework

comment:27 in reply to: ↑ 25 ; follow-up: ↓ 28 Changed 7 years ago by andrew_b

Replying to andrew_b:

  1. Editor is not binary-safe now.

Simplest testcase:

2a. Open a test file with DOS line breaks. You can see ^M at line ends.
2b. Save file pressing f2 (you still can see ^M at line ends) and close editor.
2c. Open this file again.
Result:DOS line breaks are lost.

Last edited 7 years ago by andrew_b (previous) (diff)

comment:28 in reply to: ↑ 27 Changed 7 years ago by angel_il

Replying to andrew_b:

Replying to andrew_b:

  1. Editor is not binary-safe now.

Simplest testcase:

2a. Open a test file with DOS line breaks. You can see ^M at line ends.
2b. Save file pressing f2 (you still can see ^M at line ends) and close editor.
2c. Open this file again.
Result:DOS line breaks are lost.

fix: 8c84722307e873a37aa88c1a6f5caaeae97421a9

comment:29 Changed 7 years ago by slavazanko

Have some new tests and have changed code in branch. Also, need to describe new feature and editor option in man pages and WIKI.

Last edited 7 years ago by slavazanko (previous) (diff)

comment:30 Changed 6 years ago by volo78

  • Branch state set to no branch

Is this ticket stalled? I'm missing this feature. Can this be integrated to main trunk?

comment:31 Changed 6 years ago by pfalcon

Well, I'm, as a contributor of the initial patch, found it perplexing that my patch is rewritten for no apparent reason, and while that being done, new bugs are added to it - exactly those which I could predict to happen and deliberately avoided in my original patch. So, it's a strange situation - I cannot continue with updated patch because I know it's buggy, and my original patch was essentially thrown away. And nobody else seem to continue with this either.

I guess, I should just swallow that and try again, because I like mc and use it daily, and each time I see ^M there (and that happens often!) I think to myself: "damn, I did patch to fix that, how on earth it happens I still see that issue?"

Last edited 6 years ago by andrew_b (previous) (diff)

comment:32 Changed 6 years ago by slavazanko

Thanks for notice.

Branch rebased to current master.

Changesets:
c98892acfb105ab64bdb95a4302c14539d4f2c2b Added unit tests for checking "detect type of line breaks" functionality
a3cf368eab04029ffc1cbe036a5454500513ebd9 added configure option editor_autodetect_linebreak
58aaddd431a3d2e9df4781913beffbe4bf76fd94 Ticket #1652: autodetect line-endings

Review, please.

pfalcon, as your original patch was changed, review and vote too, please.

comment:33 Changed 6 years ago by andrew_b

  • Milestone changed from 4.8.0-pre1 to Future Releases

comment:34 Changed 5 years ago by huksley

My patch includes both patches

0001-Ticket-1652.patch​ (10.1 KB) - added by angel_il 2 years ago.
0002-added-options-editor_autodetect_linebreak.patch​

with minor modifications for updated codebase.

I hope it will be included in recent release.

Last edited 5 years ago by huksley (previous) (diff)

comment:35 Changed 5 years ago by huksley

Fixed bug with edit->stat1 not updated properly if linebreaks are not ASIS on save (F2).
This bug were causing "The file has been modified in the meantime." prompt.

comment:36 Changed 5 years ago by pfalcon

Oh my god, isn't this fixed already. I nowadays run distro package again and suffering thru this issue almost everyday, like, on checking out each new github repo I'm shooting out bugreports "you have CRLF in repo, get git linebreak settings right", though of course that's issue on mc side. So I was hoping to switching to run from git master to get this fix and it's still not there ;-((.

huksley, thanks for bringing this up. Will test your patch.

comment:37 Changed 5 years ago by andrew_b

This patch doesn't hide ^M. This patch modifies file. With this patch, when you load file and then immediately save it, you get modified file. This is wrong way. If we really want hide ^M, we should do that without file modification.

Last edited 5 years ago by andrew_b (previous) (diff)

Changed 5 years ago by huksley

Patch against 4.8.1 for this feature

comment:38 Changed 5 years ago by huksley

Upon investigation, I found out this patch modifies file only in case if there is different types of line breaks throughout the file. I.e. if it contains \r\n and later on contains just \n then it will write all linebreaks as \r\n.

I`ve introduced modification to behaviour so it will not touch linebreaks (forces LB_ASIS) if option [ ] Autodetect linebreaks is not checked.

comment:39 Changed 5 years ago by pfalcon

I`ve introduced modification to behaviour so it will not touch linebreaks (forces LB_ASIS) if option [ ] Autodetect linebreaks is not checked.

I would have thought that's expected behavior right from the start - in my original patch there was provision for that (but no UI to force skipping any linebreaks munging). I refrained from answering to andrew_b's comment before I tested the latest patch version (still in my backlog). I'm glad that bug with not following that behavior was found and fixed.

So, idea how it should work is:

  1. By default any linebreak munging is disabled, so users will get the same level of binary safety as before.
  2. Users who need to work with varying linebreaks will need to enable it explicitly and are expected to stay in the full knowledge of that fact.
  3. There's indeed possibility that auto-detection may mis-detect binary file (I treat a file which have mix of linebreak types as such). Price of magic. Users who think that *text editor* does and will treat *binary* files in binary-safe in all cases are: a) very optimistic, like coming from another planet, where computers work differently; b) should not enable linebreak autodetection in the first place. We can make probability of error even lower by: a) improving autodetection algo to check for conflicting linebreak types in the sample buffer (my original patch din't do that); b) increase sample buffer size; c) as extreme case, scan entire file (but that's overkill IMHO, I already hate when I mis-hit F4 on a big file and mc hangs for few minutes).
  4. Otherwise, ability convert among linebreak styles is important feature, which is also in demand and expected by users migrating from other file managers (like FAR).
  5. As comments above say, docs/help need to be updated with give users enough familiarity with features and issues in pp.1-4.

As for making editor not showing ^M, while having all of them in the buffer, I commented 2 years (comment 12) that it was my original idea, but turned out that mc doesn't have enough abstraction to deal with buffer handling, and checks for ^M would need to be sprinkled in a lot of places (doubled by checks for not skipping them if not enabled), or such level of abstraction would need to be introduced. In either case, that would be substantial change to editor, which would take a lot of time to implement, and even more to actually make bug-free. So, that's risky and low-return way to do it, with no really good benefits achieved (and only missing nice features like linebreaks conversion). The fact that over 4 years nobody went for it is a good indication of that.

comment:40 Changed 5 years ago by mike.dld

Just a thought. Couldn't we use file utility (as in e.g. file --brief --mime-type <filename> or similar) to detect if the file we're about to edit it a text one? At least if it's installed.

As for files with varying linebreaks, as well as binary files, doing nothing (disabling any modification operations implied by autodetection) seems like a good choice to me.

comment:41 Changed 5 years ago by pfalcon

Couldn't we use file utility (as in e.g. file --brief --mime-type <filename> or similar) to detect if the file we're about to edit it a text one?

What external utility does what mc code couldn't do without relying on external utility availability and incurring overhead to call to it? I'm using mc on ~100MHz embedded routers, and would hope for mc to become more efficient in the future, not less. And of course, file utility can be gullible too - it's all heuristics after all.

comment:42 Changed 5 years ago by pfalcon

huksley: you patch doesn't apply to the current HEAD at all. git currently is at 4.8.9-pre, so patch against 4.8.1 is out of date from the start. Can you please fork https://github.com/MidnightCommander/mc on github and forward-port the patch to it? Or let me know if you'd rather have me go for it.

comment:43 Changed 3 years ago by pfalcon

I took the latest authoritative source for this feature, git branch "1652_autodetect_lb" and rebased it to latest master (rebase changes were considerable). It is available in "1652_autodetect_lb-revive" branch of my fork on github: https://github.com/pfalcon/mc/tree/1652_autodetect_lb-revive . Pull request submitted: https://github.com/MidnightCommander/mc/pull/49 . Maintainers notified: https://mail.gnome.org/archives/mc-devel/2014-December/msg00009.html

comment:44 Changed 3 years ago by pfalcon

For reference, contents of the mail which describe changes in "1652_autodetect_lb-revive" branch in detail:

We recently had a nice optimistic thread about mc's 20th anniversary,
and I'd like to draw everyone's attention to another anniversary - 5
years of "provide better editor support for files with CRLF encoding"
bugreport: http://www.midnight-commander.org/ticket/1652 . What's more
interesting is that it had a patch for 4 years too, and yet it's not
fixed in the mainline.

Few people contributed to the most authoritative patchset
(https://github.com/MidnightCommander/mc/commits/1652_autodetect_lb),
2 of which are mc maintainers, so the patch had good chances being
merged long ago. As it wasn't, apparently the party to blame is the
original author of patchset, which is me. 

So, trying to resolve this situation, I rebased "1652_autodetect_lb"
branch onto the latest master, and made auto-detection much more
conservative to address concerns of editor binary safety (note that
auto-detection is of course off by default anyway). I tested it to
behave as expected, unittests were provided by Slava Zanko and updated
to the latest changes too.

The pull request is submitted on github, to make it one button click
away from being merged - assuming it is ok. I'm also ready to address
any concerns raised.

https://github.com/MidnightCommander/mc/pull/49


Thanks!

comment:45 Changed 3 years ago by pfalcon

huksley: Unfortunately, I wasn't able to decipher what changes you did on top of changes done to 3 other contributors in your cumulative patch. The patchset as presented above works well for me (and backed by unit tests). I would suggest that you suggest your changes as a separate commit on top of the branch above. Thanks.

comment:46 Changed 3 years ago by andrew_b

What we get as ticket topic? Hide ^M.
What proposed as solution for that? Remove ^M.
I can't agree that items are same.

We already have the way to remove ^M. This way is "Save is...".

comment:47 Changed 3 years ago by pfalcon

Thanks for the reply. As often happens, after consideration of an issue as reported, it turned out that scalable solution doesn't exactly match (usually highly informal) issue description. In particular here, the problem turns out not in "^M", but in the fact that mcedit doesn't handle files with alternative line-endings. And that's exactly the issue being fixed, and https://github.com/MidnightCommander/mc/pull/49, the actual patch, has the correct title. This issue is being updated to keep in loop all people who were interested in it over the years.

comment:48 Changed 3 years ago by steevithak

I use mcedit to handle a lot of text files every day and the single most important feature to me is that mc not silently change the files. Some files have unix style line-endings, some have windows style, but many have been sent back and forth between windows users and unix web servers and have a mix of line-endings. I don't mind if mcedit hides the ^M visually, but it's very important to me that mc isn't altering files behind the scenes unless it alerts me that it's going to and offers me a choice. If I load a file and then save it without making an edit, I expect the file to be unchanged.

Note: See TracTickets for help on using tickets.