Ticket #4592 (accepted enhancement)
Migrate formatter from indent to clang-format
Reported by: | zaytsev | Owned by: | zaytsev |
---|---|---|---|
Priority: | major | Milestone: | 4.8.33 |
Component: | compilation | Version: | master |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Branch state: | no branch | Votes for changeset: |
Description
Currently, we use GNU Indent as our formatter. Unfortunately, it has several problems that make it almost unusable:
- It requires tons of /* *INDENT-OFF* */ directives to avoid breaking complex formatting.
- It randomly adds /* at the beginning and NUL at the end of source files.
- It fails to parse complex source files (see #4524):
indent: lib/util.c:522: Error:Stmt nesting error. ...
As I started trying to fix these errors, it turned out that half of our source tree is unformatted, because the formatting loop breaks on a non-zero exit code, and all files after that are simply not processed.
This makes it impossible to add formatting checks to the CI and automatically maintain proper formatting.
I have investigated possible formatting options, and the only viable candidate in my opinion is clang-format:
- It uses the clang compiler frontend for parsing, so it is able to process any valid code.
- It is also actively developed and mostly stable.
- It comes with a preset for GNU style.
As with any formatter change, it is not possible to make the style match 100%, and there will be one big diff, but hopefully life will get better after that.
Change History
comment:2 Changed 2 months ago by zaytsev
I added a style based on the GNU style, but customized to produce a minimal diff with our code base. You can see what the result looks like in the second commit.
Before I spend more time adding trailing commas and stuff like that, can we agree in principle that we can switch? Any suggestions for other options we can change?
If we agree, it would be great if we could merge all open branches first, and then I can work on manual tweaks in this branch before it can be merged.
My observations are as follows:
- Many of our arrays are reformatted
- Include comment alignment is broken
- Spaces are being inserted before gettext shortcuts, causing a lot of diff
Arrays
This happens because many of our arrays are missing trailing commas, which causes them to be reformatted instead of keeping the items one per line. This is easy to fix.
For creative manual formatting, you can add empty comments at the end of the line (//) to force the formatter to leave them as they are.
Comments
One could convert C-style comments to line comments and play with the following options:
SpacesBeforeTrailingComments: 1 AlignTrailingComments: Kind: Always OverEmptyLines: 42
I don't think it's worth it and I can live with the new formatting.
Gettext
This happens, because gettext shortcuts are considered function calls (which they are). I can live with that.
P.S. Many tools like CLion have a built-in integration with clang-format, so formatting will happen mostly automatically if this is merged.
comment:3 Changed 2 months ago by andrew_b
For me, in general. it worse than GNU indent output.
Is there a way to avoid following:
-static void -G_GNUC_PRINTF (1, 0) -mc_va_log (const char *fmt, va_list args) +static void G_GNUC_PRINTF (1, 0) mc_va_log (const char *fmt, va_list args)
comment:4 Changed 2 months ago by zaytsev
Is there a way to avoid following
Like in the last commit, or am I missing something? With this change, it's like all the other functions - first types on one line, then the rest.
For me, in general. it worse than GNU indent output.
If there are other things you don't like, please tell me, I can try to find solutions.
To be honest, I don't really care about the formatting rules, as long as they're fixed and it's done automatically for me in a consistent way. My biggest problem with indent is that, as I've discovered, it doesn't work with our code base, and when I start to fix it, more and more bugs like /*, NUL, etc. appear.
The things I like about clang-format are that it works at all, and that the output is surprisingly tolerable, given that this is fully automatic without any manual changes so far. The * stuff is finally done consistently and correctly, whatever the rules are.
comment:5 follow-up: ↓ 7 Changed 2 months ago by andrew_b
Can't find an option to avoid following:
- mc_filter->search_condition->is_case_sensitive = - mc_config_get_bool (fhl->config, group_name, "extensions_case", FALSE); + mc_filter->search_condition->is_case_sensitive + = mc_config_get_bool (fhl->config, group_name, "extensions_case", FALSE);
I want have = at the same line as variable.
comment:6 Changed 2 months ago by andrew_b
static const mc_search_type_str_t mc_search__list_types[] = { { N_ ("No&rmal"), MC_SEARCH_T_NORMAL }, { N_ ("Re&gular expression"), MC_SEARCH_T_REGEX },
I'like to have something like following:
static const mc_search_type_str_t mc_search__list_types[] = { { N_ ("No&rmal"), MC_SEARCH_T_NORMAL }, { N_ ("Re&gular expression"), MC_SEARCH_T_REGEX },
comment:7 in reply to: ↑ 5 Changed 2 months ago by mike.dld
Replying to andrew_b:
Can't find an option to avoid following:
- mc_filter->search_condition->is_case_sensitive = - mc_config_get_bool (fhl->config, group_name, "extensions_case", FALSE); + mc_filter->search_condition->is_case_sensitive + = mc_config_get_bool (fhl->config, group_name, "extensions_case", FALSE);I want have = at the same line as variable.
I believe it's BreakBeforeBinaryOperators.
comment:8 Changed 2 months ago by zaytsev
I want have = at the same line as variable.
I think this can be solved with PenaltyBreakAssignment, see my commits. Interestingly, it seems to solve many of the array breaking problems.
Please keep your bad examples coming. There may be more "simple" solutions to make things better. Thank you!
comment:9 Changed 2 months ago by andrew_b
-const char *STR_E_RPL_NOT_EQ_TO_FOUND - = N_ ("Num of replace tokens not equal to num of found tokens"); +const char *STR_E_RPL_NOT_EQ_TO_FOUND = N_ ( + "Num of replace tokens not equal to num of found tokens");
mc_skin_color->fg = (items_count > 0 && values[0][0]) ? mc_skin_color_look_up_alias (mc_skin, g_strstrip (g_strdup (values[0]))) : (tmp != NULL) ? g_strdup (tmp->fg) : NULL;
? and ; are not aligned.
-static tty_color_pair_t tty_color_defaults - = { .fg = NULL, .bg = NULL, .attrs = NULL, .pair_index = 0 }; +static tty_color_pair_t tty_color_defaults = { + .fg = NULL, .bg = NULL, .attrs = NULL, .pair_index = 0 +};
Structure initialization is more readable if members are one-per-line:
static tty_color_pair_t tty_color_defaults = { .fg = NULL, .bg = NULL, .attrs = NULL, .pair_index = 0
INDENT-OFF/ON are still there.
comment:10 Changed 2 months ago by andrew_b
A new huge dependency is an argument against clang-format: tiny GNU indent vs. huge clang/llvm.
comment:11 Changed 2 months ago by zaytsev
A new huge dependency is an argument against clang-format: tiny GNU indent vs. huge clang/llvm.
I don't think that's a fair argument. It's just for development. The tool itself is only <1 MB. But of course it pulls libclang and libllvm if clang is not installed, and only gcc is used. But what is 100 MB these days?
The problem with the tiny GNU indent is that it hasn't seen any development since 2018, and it doesn't work properly, but it's also unlikely to ever be fixed, because to do proper formatting for C, you basically have to implement a compiler front end.
comment:12 Changed 2 months ago by zaytsev
INDENT-OFF/ON are still there.
Yes, I left them in on purpose for now. They are ignored by clang-format. It has it's own ignores:
// clang-format off // clang-format on
If we switch to clang-format, they can all be removed by regex at the same time.
comment:13 Changed 2 months ago by zaytsev
If you put a trailing comma, clang-format will format one per line. There was no way to make indent behave. This is what I meant by "manual tweaks". Maybe when removing indent-comments one can check if trailing commas are needed.
/* *INDENT-OFF* */ static tty_color_pair_t tty_color_defaults = { - .fg = NULL, .bg = NULL, .attrs = NULL, .pair_index = 0 + .fg = NULL, + .bg = NULL, + .attrs = NULL, + .pair_index = 0, }; /* *INDENT-ON* */
There is also an option AlignArrayOfStructures: Left, but I think the results are questionable. What do you think?
comment:14 follow-up: ↓ 17 Changed 2 months ago by zaytsev
+const char *STR_E_RPL_NOT_EQ_TO_FOUND = N_ ( + "Num of replace tokens not equal to num of found tokens");
Here indent was turned off and clang-format breaks, because it's over 100 chars. How do you want to handle this?
- If this line is important, ignores can be left in place (see manual tweaks)
- We can increase line limit to e.g. 120 characters
- Ignore this problem
I don't think there is a way to do it nicer.
comment:15 Changed 2 months ago by zaytsev
mc_skin_color->fg =
This is a problem with nested ternaries :( By adding parenthesis the problem is solved. Another manual tweak...
comment:16 Changed 2 months ago by zaytsev
mc_skin_color->fg =
With AlignOperands: false and BreakBeforeTernaryOperators: false it behaves more like indent.
In this particular case the important part is AlignOperands, it saves space and code looks better. BreakBeforeTernaryOperators can be also left on. See last commits for different possibilities.
What you like more?
comment:17 in reply to: ↑ 14 Changed 2 months ago by andrew_b
Replying to zaytsev:
- We can increase line limit to e.g. 120 characters
Yes.
? 1 : 0;
:( There is a lot of space to do
? 1 : 0;
comment:18 Changed 2 months ago by andrew_b
Let's take a pause here and merge a 4572_cleanup branch.
Branch: 4592_clang_format
Initial changeset:d4c919faaf26daf46985f649cb3e2233003b91c5