Ticket #2290 (closed defect: fixed)

Opened 14 years ago

Last modified 11 years ago

"Find File" is very slow and consumes 100% of CPU

Reported by: birdie Owned by: andrew_b
Priority: major Milestone: 4.8.12
Component: mc-core Version: master
Keywords: Cc:
Blocked By: Blocking:
Branch state: merged Votes for changeset: committed-master

Description

For a mounted filesystem which contains roughly ten thousand files and a thousand directories, MC's "Find File" search with default parameters takes 5 (five) seconds with 100% CPU usage.

There's something terribly wrong, because:

time find . > /dev/null

real    0m0.022s
user    0m0.010s
sys     0m0.010s

Other console file managers find all files and directories instantly.

Attachments

mc_findfile2.png (2.2 KB) - added by birdie 14 years ago.
Search parameters
mc-4.8.9-optimize-regex.patch (2.7 KB) - added by sknaumov 11 years ago.
mc-4.8.9-optimize-status-update.patch (3.4 KB) - added by sknaumov 11 years ago.
mc-4.8.10-no-get-line-at.patch (4.6 KB) - added by sknaumov 11 years ago.
This patch gets rid of get_line_at function that was called only at one place and was tightly coupled with caller function. It allows to perform some minor optimizations, and as a result improve search time by 18-20%.
mc-4.8.10-refresh-size+time.patch (3.0 KB) - added by sknaumov 11 years ago.
New patch to optimize updates in status bar during search (based on examining file size and time of the last status update). This version is synchronous and is based on assumption that relatively small files are quickly processed even on slow hardware so if refresh timeout expires during their processing, the delay couldn't be noticed by human. Therefore only rather big files have to be reported unconditionally, and others - only if there is a match in this file or if timeout expires.
mc-4.8.10-refresh-size+time-gtod.patch (3.0 KB) - added by sknaumov 11 years ago.
The same patch as mc-4.8.10-refresh-size+time.patch, but uses gettimeofday() instead of clock_gettime().

Change History

Changed 14 years ago by birdie

Search parameters

comment:1 Changed 14 years ago by birdie

  • Version changed from 4.7.3 to master
  • Milestone changed from 4.7.4 to 4.7.5

comment:2 Changed 14 years ago by andrew_b

  • Milestone changed from 4.7.5 to 4.7

comment:3 Changed 14 years ago by birdie

  • Milestone changed from 4.7 to 4.8

comment:4 Changed 11 years ago by sknaumov

  • Branch state set to no branch

I've benchmarked mc find with linux-3.10.3 source code and USB_SPEED_LOW search pattern. On mc-4.8.9 it takes 30 seconds to complete the search.
First of all, couple of seconds could be cut down by commenting out mc_refresh() call in search_content function in find.c: then it takes 23,5 seconds to complete.
Commenting out also gsprintf( ... "Grepping in %s" ... ) and status_update calls in this function cuts time to 20,5 seconds.
But commenting all status line manipulations (internals of status_update and find_rotate_dash functions) get us only to about 19 seconds, so there is another bottleneck.

Last edited 11 years ago by sknaumov (previous) (diff)

comment:5 Changed 11 years ago by slyfox

Do you have chance to run mc under 'perf' tool on that box? (from linux/tools/perf)

Should be quite straightforward:

perf record mc
# perfrom action, exit from mc
perf report
# post 10-15 top lines from the output

You will need debugging symbols present in mc (CFLAGS+=-g)

comment:6 Changed 11 years ago by sknaumov

Samples: 154K of event 'cpu-clock', Event count (approx.): 36529
26,57%  [kernel]                            [k] read_tsc
19,77%  [kernel]                            [k] _raw_spin_unlock_irqrestore
8,63%  mc                                  [.] g_string_append_c_inline
6,25%  mc                                  [.] get_line_at
5,14%  mc                                  [.] mc_search__get_char
5,02%  [kernel]                            [k] finish_task_switch
4,75%  libc-2.15.so                        [.] 0x00077704
4,27%  mc                                  [.] mc_search__run_regex
2,58%  libpcre.so.3.13.1                   [.] pcre_exec
1,59%  libglib-2.0.so.0.3400.1             [.] 0x0004c4b0
0,80%  libpixman-1.so.0.26.0               [.] 0x00080b70
0,79%  libc-2.15.so                        [.] __libc_calloc
0,51%  vboxvideo_drv_113.so                [.] VBoxHGSMIBufferSubmit
0,49%  libslang.so.2.2.4                   [.] 0x0006a1c1
0,47%  libc-2.15.so                        [.] malloc
0,46%  Xorg                                [.] 0x00125273
0,44%  xterm                               [.] 0x000372ad
0,43%  libglib-2.0.so.0.3400.1             [.] g_slice_free1
0,40%  mc                                  [.] mc_search__regex_found_cond

These figures are from perf top just after search completion, not from perf report, because perf report don't show any significant mc activity.

Last edited 11 years ago by sknaumov (previous) (diff)

comment:7 Changed 11 years ago by slyfox

Yeah, looks like mc triggers kernel's subsystem churn quite a lot.
Can you describe your hardware setup a bit more?
Looks like it's a virtualbox thing. I guess it's framebuffer
is quite inefficient (memory copy kills all caches)

Namely:

  • what guest $CHOST is that? (the 'gcc -v' output for example)
  • do you use it on local machine or networking kicks-in? (like SSH's session or something)
  • what video driver on host machine is used? (i guess some xorg's one)
  • what video driver on guest machine is used? (linux dumb term, or some vbox fbdev)

comment:8 Changed 11 years ago by sknaumov

It was Xubuntu 12.10 (with VirtualBox? guest additions installed) on top of Windows 7. I will run the same test tomorrow at work with Xubuntu 12.10 installed on AMD Athlon X2. I thought that it could be only terminal redraw issue, but as I said before 2/3 of the search time is spent somewhere else.

comment:9 follow-up: ↓ 10 Changed 11 years ago by sknaumov

Yes, on native system difference is not so big: 13,5 seconds with status updates and dash rotation commented out (except notification that search is finished) and 15 seconds otherwise. But I can't see any reason why status bar should be updated so often - information about file being grepped is unreadable anyway. Also it is interesting to measure overhead over ssh. As to mc's search code - it seems to be too generalized - it scans input buffer char by char to get string, then scans resulted string again to get length, then it calls some generic functions and then it seems that in mc_search_run__regex some functions are called per char.

Last edited 11 years ago by sknaumov (previous) (diff)

comment:10 in reply to: ↑ 9 ; follow-up: ↓ 11 Changed 11 years ago by andrew_b

Replying to sknaumov:

But I can't see any reason why status bar should be updated so often - information about file being grepped is unreadable anyway.

Unreadable until you find in small files. Search is fast for small file but long for large one. If you decimate (2, 3 or more times) output of file name and meet a large file after a lot of small ones during seach, you can see some previous file name in status instead of current large file.

As to mc's search code - it seems to be too generalized - it scans input buffer char by char to get string, then scans resulted string again to get length

It can be fixed easy.

then it calls some generic functions and then it seems that in mc_search_run__regex some functions are called per char.

This is issue about general speed up of mc search engine.

So, some minor optimization is performed in

2290_find_file_speedup branch.
Initial changeset:1b2e6883436d3a9e57d37c59abf44a1ab9ea8e2f
I guess this doesn't give us a large profit, but...

Last edited 11 years ago by andrew_b (previous) (diff)

comment:11 in reply to: ↑ 10 Changed 11 years ago by sknaumov

Replying to andrew_b:

So, some minor optimization is performed in

2290_find_file_speedup branch.
Initial changeset:f66b1bac8e56a0cb8843ba9ac110d75124a35f0e

I guess this doesn't give us a large profit, but...

Actually this patch slows down things a lot. On my new Virtualbox setup (Xubuntu 13.04, VDI image is on SSD) this patch increases search time from 17,5 to 20,65 seconds. Maybe using glib functions like g_string_append_c is very inefficient approach, but...
Passing pointer to variable on stack to get_line_at function and setting resulting string length to it surprisingly increases search time to 19.3 seconds. Usage of global variable to pass length from get_line_at to mc_search_run decreases search time to 17.2 seconds. Sounds a bit unreal, but I launched tests three times in a row and results were repeatable.

Usage of SIGALARM and checking per line whether 1 second passed since previous status update reduces search time to 15.7 seconds. Now I should check whether this change improves or decreases performance in native case.

UPDATE: on Virtualbox results seem to change a bit from one "make install" to another, but not so much. Have to check everything tomorrow on native system at work.

Last edited 11 years ago by sknaumov (previous) (diff)

Changed 11 years ago by sknaumov

Changed 11 years ago by sknaumov

comment:12 Changed 11 years ago by sknaumov

These patches decreases search time by 33% on Virtualbox and by 16% on real hardware. Not so big improvement, but not so bad given that changes are rather small. Please review and benchmark.

comment:13 Changed 11 years ago by sknaumov

As to your last changeset:df952a, Andrew, please benchmark it. When I passed length by reference in get_line_at function search time increased approximately by 5-6%. When I used global variable it decreased by 3%.

Last edited 11 years ago by sknaumov (previous) (diff)

comment:14 follow-up: ↓ 15 Changed 11 years ago by sknaumov

Find without content pattern is VERY VERY slow partly due to usage of g_list_append function, because it uses g_list_last internally to traverse from the beginning to the end of a list, so forming a list of results has O(n2) complexity instead of O(n). I can't find g_list_insert_after in GLib Reference Manual, but is seems that this function exists. We should track the end of list to improve append time. Find time with find_add_match function commented is about 4 seconds, but in normal case it takes about 20-22 seconds to complete. Here is perf report output:

 20,29%          mc  libglib-2.0.so.0.3600.0  [.] g_list_last
 17,04%          mc  libglib-2.0.so.0.3600.0  [.] 0x00000000000405fd
 13,17%          mc  libc-2.17.so             [.] bsearch
 12,90%          mc  libslang.so.2.2.4        [.] SLsmg_write_chars
 12,26%          mc  mc                       [.] str_utf8_make_make_term_form
  7,12%          mc  libglib-2.0.so.0.3600.0  [.] g_unichar_isprint
  2,63%          mc  libglib-2.0.so.0.3600.0  [.] g_unichar_type
  2,38%          mc  libglib-2.0.so.0.3600.0  [.] g_unichar_to_utf8
  2,18%          mc  libglib-2.0.so.0.3600.0  [.] g_utf8_get_char_validated
  1,89%          mc  mc                       [.] str_unichar_iscombiningmark
  1,61%          mc  libglib-2.0.so.0.3600.0  [.] g_unichar_iswide
  0,95%          mc  libglib-2.0.so.0.3600.0  [.] g_strlcpy
  0,65%          mc  mc                       [.] utf8_tool_copy_chars_to_end
  0,59%          mc  [kernel.kallsyms]        [k] 0xffffffff81043e6a
  0,56%          mc  mc                       [.] g_unichar_to_utf8@plt
  0,53%          mc  libglib-2.0.so.0.3600.0  [.] bsearch@plt
  0,53%          mc  mc                       [.] g_utf8_get_char_validated@plt

I've tried to use g_list_prepend and it resulted in inverse of list elements in output and constant redrawing, but it took only 16 seconds to complete.

comment:15 in reply to: ↑ 14 ; follow-up: ↓ 16 Changed 11 years ago by andrew_b

Replying to sknaumov:

I can't find g_list_insert_after in GLib Reference Manual, but is seems that this function exists.

There is no such function in the GLib source tree.

We should track the end of list to improve append time.

There is the GQueue class with such functionality, but glib >= 2.14.0 is required. Another way is reimplementation of WListbox using GPtrArray.

Last edited 11 years ago by andrew_b (previous) (diff)

comment:16 in reply to: ↑ 15 ; follow-up: ↓ 17 Changed 11 years ago by sknaumov

Replying to andrew_b:

There is the GQueue class with such functionality, but glib >= 2.14.0 is required. Another way is reimplementation of WListbox using GPtrArray.

May be we can use g_list_prepend and traverse this list backward on show? Is it traversed only one/a few times?

Version 1, edited 11 years ago by sknaumov (previous) (next) (diff)

comment:17 in reply to: ↑ 16 ; follow-up: ↓ 18 Changed 11 years ago by andrew_b

Replying to sknaumov:

Replying to andrew_b:

There is the GQueue class with such functionality, but glib >= 2.14.0 is required. Another way is reimplementation of WListbox using GPtrArray.

May be we can use g_list_prepend and traverse this list backward on show? Is it traversed only one/a few times?

Store WListbox content in reverse order makes support of listbox_append_t flags more complicate.
Length of list is required in any case, so GArray is better than GList. From other hand, GAarray doesn't provide the sorted insertion, but this is cam be workarounded using some implementation of lock/unlock fuctionality:

listbox_lock_sort ();
... /* add a lot of items in some order */
listbox_unlock_sort (); /* and sort here */
... /* draw listbox */

Also we can insert dummy sentinel element, use g_list_insert_before function and do not traverse this element at show time.

There is no such function in GLib.

comment:18 in reply to: ↑ 17 Changed 11 years ago by andrew_b

Replying to andrew_b:

There is no such function in GLib.

Oops, sorry...

Changed 11 years ago by sknaumov

This patch gets rid of get_line_at function that was called only at one place and was tightly coupled with caller function. It allows to perform some minor optimizations, and as a result improve search time by 18-20%.

comment:19 Changed 11 years ago by andrew_b

  • Milestone changed from 4.8 to Future Releases

mc-4.8.9-optimize-regex.patch with some modifications is applied as [bb2f5d35f3a005effb72ab9dc030a5495e9b0a4a].

Changed 11 years ago by sknaumov

New patch to optimize updates in status bar during search (based on examining file size and time of the last status update). This version is synchronous and is based on assumption that relatively small files are quickly processed even on slow hardware so if refresh timeout expires during their processing, the delay couldn't be noticed by human. Therefore only rather big files have to be reported unconditionally, and others - only if there is a match in this file or if timeout expires.

Changed 11 years ago by sknaumov

The same patch as mc-4.8.10-refresh-size+time.patch, but uses gettimeofday() instead of clock_gettime().

comment:20 Changed 11 years ago by andrew_b

  • Status changed from new to accepted
  • Owner set to andrew_b
  • Branch state changed from no branch to on review
  • Milestone changed from Future Releases to 4.8.12

Rebased 2290_find_file_speedup branch.
Initial changeset:6c784ed7a35926230d377d856f447ed458da81bd

comment:21 Changed 11 years ago by slavazanko

  • Votes for changeset set to slavazanko

comment:22 Changed 11 years ago by slavazanko

  • Branch state changed from on review to approved

comment:23 Changed 11 years ago by andrew_b

  • Status changed from accepted to testing
  • Votes for changeset changed from slavazanko to committed-master
  • Resolution set to fixed
  • Branch state changed from approved to merged

Merged to master: [f949a29422e9961cb429553a643bf298a8e4ead9].

git log --pretty=oneline 7abd8a1..f949a29

comment:24 Changed 11 years ago by andrew_b

  • Status changed from testing to closed
Note: See TracTickets for help on using tickets.