Ticket #4592 (accepted enhancement)

Opened 3 months ago

Last modified 3 months ago

Migrate formatter from indent to clang-format

Reported by: zaytsev Owned by: zaytsev
Priority: major Milestone: 4.8.33
Component: compilation Version: master
Keywords: Cc:
Blocked By: Blocking:
Branch state: no branch Votes for changeset:

Description

Currently, we use GNU Indent as our formatter. Unfortunately, it has several problems that make it almost unusable:

  1. It requires tons of /* *INDENT-OFF* */ directives to avoid breaking complex formatting.
  2. It randomly adds /* at the beginning and NUL at the end of source files.
  3. It fails to parse complex source files (see #4524):
indent: lib/util.c:522: Error:Stmt nesting error.
...

As I started trying to fix these errors, it turned out that half of our source tree is unformatted, because the formatting loop breaks on a non-zero exit code, and all files after that are simply not processed.

This makes it impossible to add formatting checks to the CI and automatically maintain proper formatting.

I have investigated possible formatting options, and the only viable candidate in my opinion is clang-format:

  1. It uses the clang compiler frontend for parsing, so it is able to process any valid code.
  2. It is also actively developed and mostly stable.
  3. It comes with a preset for GNU style.

As with any formatter change, it is not possible to make the style match 100%, and there will be one big diff, but hopefully life will get better after that.

Change History

comment:1 Changed 3 months ago by zaytsev

  • Owner set to zaytsev
  • Status changed from new to accepted

Branch: 4592_clang_format
Initial changeset:d4c919faaf26daf46985f649cb3e2233003b91c5

comment:2 Changed 3 months ago by zaytsev

I added a style based on the GNU style, but customized to produce a minimal diff with our code base. You can see what the result looks like in the second commit.

Before I spend more time adding trailing commas and stuff like that, can we agree in principle that we can switch? Any suggestions for other options we can change?

If we agree, it would be great if we could merge all open branches first, and then I can work on manual tweaks in this branch before it can be merged.

My observations are as follows:

  1. Many of our arrays are reformatted
  2. Include comment alignment is broken
  3. Spaces are being inserted before gettext shortcuts, causing a lot of diff

Arrays

This happens because many of our arrays are missing trailing commas, which causes them to be reformatted instead of keeping the items one per line. This is easy to fix.

For creative manual formatting, you can add empty comments at the end of the line (//) to force the formatter to leave them as they are.

Comments

One could convert C-style comments to line comments and play with the following options:

SpacesBeforeTrailingComments: 1
AlignTrailingComments:
  Kind: Always
  OverEmptyLines: 42

I don't think it's worth it and I can live with the new formatting.

Gettext

This happens, because gettext shortcuts are considered function calls (which they are). I can live with that.

P.S. Many tools like CLion have a built-in integration with clang-format, so formatting will happen mostly automatically if this is merged.

comment:3 Changed 3 months ago by andrew_b

For me, in general. it worse than GNU indent output.

Is there a way to avoid following:

-static void
-G_GNUC_PRINTF (1, 0)
-mc_va_log (const char *fmt, va_list args)
+static void G_GNUC_PRINTF (1, 0) mc_va_log (const char *fmt, va_list args)
Last edited 3 months ago by andrew_b (previous) (diff)

comment:4 Changed 3 months ago by zaytsev

Is there a way to avoid following

Like in the last commit, or am I missing something? With this change, it's like all the other functions - first types on one line, then the rest.

For me, in general. it worse than GNU indent output.

If there are other things you don't like, please tell me, I can try to find solutions.

To be honest, I don't really care about the formatting rules, as long as they're fixed and it's done automatically for me in a consistent way. My biggest problem with indent is that, as I've discovered, it doesn't work with our code base, and when I start to fix it, more and more bugs like /*, NUL, etc. appear.

The things I like about clang-format are that it works at all, and that the output is surprisingly tolerable, given that this is fully automatic without any manual changes so far. The * stuff is finally done consistently and correctly, whatever the rules are.

comment:5 follow-up: ↓ 7 Changed 3 months ago by andrew_b

Can't find an option to avoid following:

-    mc_filter->search_condition->is_case_sensitive =
-        mc_config_get_bool (fhl->config, group_name, "extensions_case", FALSE);
+    mc_filter->search_condition->is_case_sensitive
+        = mc_config_get_bool (fhl->config, group_name, "extensions_case", FALSE);

I want have = at the same line as variable.

comment:6 Changed 3 months ago by andrew_b

static const mc_search_type_str_t mc_search__list_types[]
    = { { N_ ("No&rmal"), MC_SEARCH_T_NORMAL },
        { N_ ("Re&gular expression"), MC_SEARCH_T_REGEX },

I'like to have something like following:

static const mc_search_type_str_t mc_search__list_types[] =
{
    { N_ ("No&rmal"), MC_SEARCH_T_NORMAL },
    { N_ ("Re&gular expression"), MC_SEARCH_T_REGEX },

comment:7 in reply to: ↑ 5 Changed 3 months ago by mike.dld

Replying to andrew_b:

Can't find an option to avoid following:

-    mc_filter->search_condition->is_case_sensitive =
-        mc_config_get_bool (fhl->config, group_name, "extensions_case", FALSE);
+    mc_filter->search_condition->is_case_sensitive
+        = mc_config_get_bool (fhl->config, group_name, "extensions_case", FALSE);

I want have = at the same line as variable.

I believe it's BreakBeforeBinaryOperators.

comment:8 Changed 3 months ago by zaytsev

I want have = at the same line as variable.

I think this can be solved with PenaltyBreakAssignment, see my commits. Interestingly, it seems to solve many of the array breaking problems.

Please keep your bad examples coming. There may be more "simple" solutions to make things better. Thank you!

comment:9 Changed 3 months ago by andrew_b

-const char *STR_E_RPL_NOT_EQ_TO_FOUND
-    = N_ ("Num of replace tokens not equal to num of found tokens");
+const char *STR_E_RPL_NOT_EQ_TO_FOUND = N_ (
+    "Num of replace tokens not equal to num of found tokens");

    mc_skin_color->fg = (items_count > 0 && values[0][0])
                            ? mc_skin_color_look_up_alias (mc_skin,
                                                           g_strstrip (g_strdup (values[0])))
                        : (tmp != NULL) ? g_strdup (tmp->fg)
                                        : NULL;

? and ; are not aligned.


-static tty_color_pair_t tty_color_defaults
-    = { .fg = NULL, .bg = NULL, .attrs = NULL, .pair_index = 0 };
+static tty_color_pair_t tty_color_defaults = {
+    .fg = NULL, .bg = NULL, .attrs = NULL, .pair_index = 0
+};

Structure initialization is more readable if members are one-per-line:

static tty_color_pair_t tty_color_defaults = {
    .fg = NULL,
    .bg = NULL,
    .attrs = NULL,
    .pair_index = 0

INDENT-OFF/ON are still there.

comment:10 Changed 3 months ago by andrew_b

A new huge dependency is an argument against clang-format: tiny GNU indent vs. huge clang/llvm.

comment:11 Changed 3 months ago by zaytsev

A new huge dependency is an argument against clang-format: tiny GNU indent vs. huge clang/llvm.

I don't think that's a fair argument. It's just for development. The tool itself is only <1 MB. But of course it pulls libclang and libllvm if clang is not installed, and only gcc is used. But what is 100 MB these days?

The problem with the tiny GNU indent is that it hasn't seen any development since 2018, and it doesn't work properly, but it's also unlikely to ever be fixed, because to do proper formatting for C, you basically have to implement a compiler front end.

comment:12 Changed 3 months ago by zaytsev

INDENT-OFF/ON are still there.

Yes, I left them in on purpose for now. They are ignored by clang-format. It has it's own ignores:

// clang-format off
// clang-format on

If we switch to clang-format, they can all be removed by regex at the same time.

Last edited 3 months ago by zaytsev (previous) (diff)

comment:13 Changed 3 months ago by zaytsev

If you put a trailing comma, clang-format will format one per line. There was no way to make indent behave. This is what I meant by "manual tweaks". Maybe when removing indent-comments one can check if trailing commas are needed.

 /* *INDENT-OFF* */
 static tty_color_pair_t tty_color_defaults = {
-    .fg = NULL, .bg = NULL, .attrs = NULL, .pair_index = 0
+    .fg = NULL,
+    .bg = NULL,
+    .attrs = NULL,
+    .pair_index = 0,
 };
 /* *INDENT-ON* */

There is also an option AlignArrayOfStructures: Left, but I think the results are questionable. What do you think?

comment:14 follow-up: ↓ 17 Changed 3 months ago by zaytsev

+const char *STR_E_RPL_NOT_EQ_TO_FOUND = N_ (
+    "Num of replace tokens not equal to num of found tokens");

Here indent was turned off and clang-format breaks, because it's over 100 chars. How do you want to handle this?

  • If this line is important, ignores can be left in place (see manual tweaks)
  • We can increase line limit to e.g. 120 characters
  • Ignore this problem

I don't think there is a way to do it nicer.

comment:15 Changed 3 months ago by zaytsev

mc_skin_color->fg =

This is a problem with nested ternaries :( By adding parenthesis the problem is solved. Another manual tweak...

comment:16 Changed 3 months ago by zaytsev

mc_skin_color->fg =

With AlignOperands: false and BreakBeforeTernaryOperators: false it behaves more like indent.

In this particular case the important part is AlignOperands, it saves space and code looks better. BreakBeforeTernaryOperators can be also left on. See last commits for different possibilities.

What you like more?

comment:17 in reply to: ↑ 14 Changed 3 months ago by andrew_b

Replying to zaytsev:

  • We can increase line limit to e.g. 120 characters

Yes.


                ? 1
                : 0;

:( There is a lot of space to do

                ? 1 : 0;

comment:18 Changed 3 months ago by andrew_b

Let's take a pause here and merge a 4572_cleanup branch.

Note: See TracTickets for help on using tickets.