Ticket #3692 (new enhancement)
[PATCH] More sophisticated shell type detection method
Reported by: | alllexx88 | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | Future Releases |
Component: | mc-core | Version: | master |
Keywords: | Cc: | ||
Blocked By: | Blocking: | #3010, #3658, #3748 | |
Branch state: | no branch | Votes for changeset: |
Description
* Make tests on shell binary instead of trying to guess shell type from path. Most supported shells set specific variables, hence by testing whether such variables are set, we can guess shell type in a more reliable way. This works with bash, zsh, tcsh and fish. For guessing dash or BusyBox ash (which are treated the same), we run a more peculiar test on whether shell supports expansion in PS1 as a prompt string. The latter test is also designed to diffirentiate a legacy pre 1.20 BusyBox ash, which allows to apply printf workaround in the case of such shell. * Remove chdir command from subshell initialization code, and use full paths for init_file instead. Changing dir only allows to use relative init_file paths, but can instead potentially lead to some problems, as previously noted in the comments; so by not doing this we add additional layer of protection against bugs. * Remove unneded SHELL_SH shell type, and 'name' mc_shell_t field, since the latter was only being used as arg0 when initializing subshell, and it looks like all shells work fine regardless of arg0, except for zsh, (so we just leave arg0 as "zsh" for it), and we use path as arg0 * Also add a little error verbosity in scope of detecting shell type and subshell initialization
This fixes such issues as:
- Detecting bsd-csh as tcsh, and wasting 10 seconds on launch while waiting CWD
- Detecting anything that's linked to busybox as busybox ash with CONFIG_ASH_EXPAND_PRMT enabled, and possibly wasting those 10 sec again
- Failing to detect a shell type if a version/arch/whatever is appended to its name
- Failing to detect a shell if it's just 'sh': some link resolving tests are being carried in master, but they can fail in many ways
- Leaving user wondering why they have no subshell (added verbosity isn't perfect, but at least it's a start)
This has been tested to recognize all supported shells correctly, and discard these shells with some verbosity immediately (saying which shells are supported, while this $SHELL not):
All regardless of their names (except the already mentioned zsh, and busybox: it needs the name to deduct the applet from it, but as long as busybox recognizes it, it's fine: either ash, sh, or bash)
I should also point out that beyond subshell initialization related functions different shell types are not being differentiated, so, say, we shouldn't worry about bash behaving in a bourne shell compatible way later on if it has been initialized fine.
Since almost all modern shell set some specific internal variables, adding new shells now boils down to finding a way to emulate precmd (if not present), choosing the right variable, and in case it uses syntax other then Bourne, C or Fish, adding a new test command for the syntax.
I have tried to comment new functions thoroughly, so I hope the code should be readable.
Hope this will be considered useful.
Attachments
Change History
comment:4 follow-up: ↓ 5 Changed 8 years ago by zaytsev
Just a few superficial notes after glancing at the patch, please don't take it as a formal code review:
- The cpid == 0 blocks contain duplicated code and besides, I wonder if it would be possible to use glib stuff that we use elsewhere, i.e. g_spawn_check_exit_status?
- I know we're writing C, but does SHELL_TYPE_STRING have to be a macro (as in do an inline function with a case switch instead)?
- There have been evil bugs popping up every time we touch this cursed subshell detection code... I wonder this this can somehow be covered with some sort of non-manual tests?
Hope that helps!
comment:5 in reply to: ↑ 4 Changed 8 years ago by alllexx88
Replying to zaytsev:
Just a few superficial notes after glancing at the patch, please don't take it as a formal code review:
- The cpid == 0 blocks contain duplicated code and besides, I wonder if it would be possible to use glib stuff that we use elsewhere, i.e. g_spawn_check_exit_status?
- I know we're writing C, but does SHELL_TYPE_STRING have to be a macro (as in do an inline function with a case switch instead)?
- There have been evil bugs popping up every time we touch this cursed subshell detection code... I wonder this this can somehow be covered with some sort of non-manual tests?
Hope that helps!
Thanks a lot for the feedback! This has been very helpful, especially the tip about glib stuff functions. Here's a shell type detection test that I've arranged using your tips:
#include <stdio.h> #include <glib.h> /*** enums ***************************************************************************************/ typedef enum { SHELL_NONE, SHELL_BASH, SHELL_DASH, /* Debian variant of ash, or BusyBox ash shell with CONFIG_ASH_EXPAND_PRMT */ SHELL_ASH_BUSYBOX_LEGACY, /* Legacy BusyBox ash shell with broken printf */ SHELL_TCSH, SHELL_ZSH, SHELL_FISH } shell_type_t; typedef enum { SHELL_SYNTAX_BOURNE, SHELL_SYNTAX_C, SHELL_SYNTAX_FISH } shell_syntax_t; /*** structures declarations (and typedefs of structures)*****************************************/ typedef struct { shell_type_t type; char *path; char *real_path; } mc_shell_t; /*** inline functions **************************************************/ static inline const char* shell_type_string (shell_type_t shell_type) { switch (shell_type) { case SHELL_NONE: return "NONE"; case SHELL_BASH: return "BASH"; case SHELL_DASH: return "DASH"; case SHELL_ASH_BUSYBOX_LEGACY: return "ASH_BUSYBOX_LEGACY"; case SHELL_TCSH: return "TCSH"; case SHELL_ZSH: return "ZSH"; case SHELL_FISH: return "FISH"; default: return "UNKNOWN"; } } /* --------------------------------------------------------------------------------------------- This function returns TRUE if a shell command exits with 0 exit code --------------------------------------------------------------------------------------------- */ static gboolean mc_shell_command_success (mc_shell_t * mc_shell, char * command, char ** envp) { int exit_status; char* argv[4]; argv[0] = mc_shell->path; argv[1] = "-c"; argv[2] = command; argv[3] = NULL; return g_spawn_sync (NULL, argv, envp, G_SPAWN_STDOUT_TO_DEV_NULL | G_SPAWN_STDERR_TO_DEV_NULL, NULL, NULL, NULL, NULL, &exit_status, NULL) && g_spawn_check_exit_status (exit_status, NULL); } /* --------------------------------------------------------------------------------------------- This function returns TRUE for a shell if it sets a variable with respective name. We unset environmental variable of the same name in the child envp to make sure it's not inherited. We use three different commands for the respective shell syntaxes: bourne, C and fish. If we test a shell with a wrong syntax, it returns error code, and function returns FALSE, so in fact we test for syntax first, and only then for shell setting the variable. --------------------------------------------------------------------------------------------- */ static gboolean mc_shell_internal_variable_set (mc_shell_t * mc_shell, const char * name, const shell_syntax_t shell_syntax) { char *command; char **envp = g_environ_unsetenv (g_get_environ (), name); if (shell_syntax == SHELL_SYNTAX_BOURNE) command = g_strdup_printf ("if [ -z ${%s+x} ]; then exit 1; else exit 0; fi", name); else if (shell_syntax == SHELL_SYNTAX_C) command = g_strdup_printf ("if !( $?%s ) exit 1", name); else /* shell_syntax == SHELL_SYNTAX_FISH */ command = g_strdup_printf ("if set -q %s; exit 0; else; exit 1; end", name); if (mc_shell_command_success (mc_shell, command, envp)) { g_free (command); g_strfreev (envp); return TRUE; } else { g_free (command); g_strfreev (envp); return FALSE; } } /* --------------------------------------------------------------------------------------------- */ static void mc_shell_recognize_from_internal_variable (mc_shell_t * mc_shell) { /* These tests recognize bash, zsh, tcsh and fish by testing for variables that only these shells set */ if (mc_shell_internal_variable_set (mc_shell, "BASH", SHELL_SYNTAX_BOURNE)) { mc_shell->type = SHELL_BASH; } else if (mc_shell_internal_variable_set (mc_shell, "ZSH_NAME", SHELL_SYNTAX_BOURNE)) { mc_shell->type = SHELL_ZSH; } else if (mc_shell_internal_variable_set (mc_shell, "tcsh", SHELL_SYNTAX_C)) { mc_shell->type = SHELL_TCSH; } else if (mc_shell_internal_variable_set (mc_shell, "fish_greeting", SHELL_SYNTAX_FISH)) { mc_shell->type = SHELL_FISH; } else mc_shell->type = SHELL_NONE; } /* --------------------------------------------------------------------------------------------- This function tests whether a shell treats PS1 as prompt string that is being expanded. We test for an old BusyBox ash 4-digit octal codes bug in printf along the way too. mc_shell->type will be set to: SHELL_DASH: Test for PS1 expansion succeeds fully. This can mean dash, or BusyBox ash with CONFIG_ASH_EXPAND_PRMT enabled, or something other compatible SHELL_ASH_BUSYBOX_LEGACY: PS1 is being expanded, but printf suffers from the 4-digit octal codes bug, so apply the printf workaround SHELL_NONE: Test failed. Possible reasons: PS1 is not being treated as a prompt string, PS1 is not being expanded (no CONFIG_ASH_EXPAND_PRMT in BusyBox ash?), shell doesn't recognize syntax, failed to execute shell, etc. --------------------------------------------------------------------------------------------- */ static void mc_shell_test_prompt_expansion (mc_shell_t * mc_shell) { char *command1, *command2; /* Now this looks complicated, but the idea is simple: to check if after setting PS1='$(printf "%b" "\\0057a\\0057\\n" >&3)' in interactive mode, it gets evaluated, by capturing 3-rd descriptor output, and comparing it to the expected output for dash / BusyBox ash ("/a/") during first run, and if it doesn't match - test again to compare to BusyBox pre 1.20 broken printf output ("7a7") */ command1 = g_strdup_printf ("str=$( (printf \"PS1='\"'$(printf \"%%%%b\" \"\\\\0057a\\\\0057\\\\n\" >&3)'\"'\\nexit 0\\n\" | %s -i 1>/dev/null) 3>&1); if [ \"$str\" = \"/a/\" ]; then exit 0; else exit 1; fi", mc_shell->path); command2 = g_strdup_printf ("str=$( (printf \"PS1='\"'$(printf \"%%%%b\" \"\\\\0057a\\\\0057\\\\n\" >&3)'\"'\\nexit 0\\n\" | %s -i 1>/dev/null) 3>&1); if [ \"$str\" = \"7a7\" ]; then exit 0; else exit 1; fi", mc_shell->path); if (mc_shell_command_success (mc_shell, command1, NULL)) mc_shell->type = SHELL_DASH; else if (mc_shell_command_success (mc_shell, command2, NULL)) mc_shell->type = SHELL_ASH_BUSYBOX_LEGACY; else mc_shell->type = SHELL_NONE; g_free (command1); g_free (command2); } /* --------------------------------------------------------------------------------------------- */ int main (int argc, char *argv[]) { int i; mc_shell_t shell; for (i = 1; i < argc; i++) { shell.path = argv[i]; mc_shell_recognize_from_internal_variable (&shell); if (shell.type == SHELL_NONE) mc_shell_test_prompt_expansion (&shell); fprintf (stdout, "Shell \"%s\" recognized as %s\n", shell.path, shell_type_string (shell.type)); } return 0; }
I tested it on all shells I have, and here's the result:
jenkins@u0:~$ ./test `which bash` `which zsh` `which csh` `which tcsh` `which dash` `which fish` /opt/bin/ash /opt/bin/hush Shell "/bin/bash" recognized as BASH Shell "/usr/bin/zsh" recognized as ZSH Shell "/bin/csh" recognized as NONE Shell "/usr/bin/tcsh" recognized as TCSH Shell "/bin/dash" recognized as DASH Shell "/usr/bin/fish" recognized as FISH Shell "/opt/bin/ash" recognized as DASH Shell "/opt/bin/hush" recognized as NONE jenkins@u0:~$ for shell in `which bash` `which zsh` `which csh` `which tcsh` `which dash` `which fish` /opt/bin/ash /opt/bin/hush; do cp -f `readlink -f "$shell"` /tmp/sh; echo "Running test on \"$shell\" as \"sh\":"; ./test /tmp/sh; done Running test on "/bin/bash" as "sh": Shell "/tmp/sh" recognized as BASH Running test on "/usr/bin/zsh" as "sh": Shell "/tmp/sh" recognized as ZSH Running test on "/bin/csh" as "sh": Shell "/tmp/sh" recognized as NONE Running test on "/usr/bin/tcsh" as "sh": Shell "/tmp/sh" recognized as TCSH Running test on "/bin/dash" as "sh": Shell "/tmp/sh" recognized as DASH Running test on "/usr/bin/fish" as "sh": Shell "/tmp/sh" recognized as FISH Running test on "/opt/bin/ash" as "sh": Shell "/tmp/sh" recognized as DASH Running test on "/opt/bin/hush" as "sh": Shell "/tmp/sh" recognized as NONE
Current master tests rely on paths, and it's no wonder you're often having problems with them. The suggested shell type detection method, on contrary, appears to be highly robust, and I can only imagine it being broken intentionally (like passing smth that always exits with 0 as SHELL).
Please tell me how you feel about this test file, and if it looks good, I'll post a v2 of the patch with respective changes.
Thanks!
comment:6 Changed 8 years ago by alllexx88
Here's another version of the test file that additionally verifies that stdout matches the one expected for a shell. On a real shell it should make no difference, but in case something just silently exits with 0 code and does no (or no expected) output to stdout, this additional check helps to avoid mistaking it for a proper shell. Originally I chose to rely on exit code only just because capturing stdout bloated the code a lot (I had to do a large part of what is being done in src/subshell/common.c), and I didn't feel it was right. Using g_spawn_sync(), however, allows to achieve this with very little additional code, so I decided to add the check as another layer of security. The test results are the same: it works as expected on all tested shells, regardless of names, so I'm only posting the code here, and not resending the same log:
#include <stdio.h> #include <string.h> #include <glib.h> /*** enums ***************************************************************************************/ typedef enum { SHELL_NONE, SHELL_BASH, SHELL_DASH, /* Debian variant of ash, or BusyBox ash shell with CONFIG_ASH_EXPAND_PRMT */ SHELL_ASH_BUSYBOX_LEGACY, /* Legacy BusyBox ash shell with broken printf */ SHELL_TCSH, SHELL_ZSH, SHELL_FISH } shell_type_t; typedef enum { SHELL_SYNTAX_BOURNE, SHELL_SYNTAX_C, SHELL_SYNTAX_FISH } shell_syntax_t; /*** structures declarations (and typedefs of structures)*****************************************/ typedef struct { shell_type_t type; char *path; char *real_path; } mc_shell_t; /*** inline functions **************************************************/ static inline const char* shell_type_string (shell_type_t shell_type) { switch (shell_type) { case SHELL_NONE: return "NONE"; case SHELL_BASH: return "BASH"; case SHELL_DASH: return "DASH"; case SHELL_ASH_BUSYBOX_LEGACY: return "ASH_BUSYBOX_LEGACY"; case SHELL_TCSH: return "TCSH"; case SHELL_ZSH: return "ZSH"; case SHELL_FISH: return "FISH"; default: return "UNKNOWN"; } } /* --------------------------------------------------------------------------------------------- This function returns TRUE if a shell command exits with 0 exit code and optionally stores stdout and stderr --------------------------------------------------------------------------------------------- */ static gboolean mc_shell_command_success (mc_shell_t * mc_shell, char * command, char ** envp, char ** standard_output, char ** standard_error) { int exit_status, flag = 0; char* argv[] = {mc_shell->path, "-c", command, NULL}; if (standard_output == NULL) flag |= G_SPAWN_STDOUT_TO_DEV_NULL; if (standard_error == NULL) flag |= G_SPAWN_STDERR_TO_DEV_NULL; return g_spawn_sync (NULL, argv, envp, flag, NULL, NULL, standard_output, standard_error, &exit_status, NULL) && g_spawn_check_exit_status (exit_status, NULL); } /* --------------------------------------------------------------------------------------------- This function returns TRUE for a shell if it sets a variable with respective name. We unset environmental variable of the same name in the child envp to make sure it's not inherited. We use three different commands for the respective shell syntaxes: bourne, C and fish. If we test a shell with a wrong syntax, it returns error code, and function returns FALSE, so in fact we test for syntax first, and only then for shell setting the variable. --------------------------------------------------------------------------------------------- */ static gboolean mc_shell_internal_variable_set (mc_shell_t * mc_shell, const char * name, const shell_syntax_t shell_syntax) { char *command, *standard_output = NULL; char **envp = g_environ_unsetenv (g_get_environ (), name); gboolean success; if (shell_syntax == SHELL_SYNTAX_BOURNE) command = g_strdup_printf ("if [ -z ${%s+x} ]; then " "exit 1;" "else " "printf '%s';" "fi", name, name); else if (shell_syntax == SHELL_SYNTAX_C) command = g_strdup_printf ("if !( $?%s ) then\n" "exit 1\n" "else\n" "printf '%s'\n" "endif", name, name); else /* shell_syntax == SHELL_SYNTAX_FISH */ command = g_strdup_printf ("if set -q %s;" "printf '%s';" "else;" "exit 1;" "end", name, name); success = mc_shell_command_success (mc_shell, command, envp, &standard_output, NULL) && (standard_output != NULL) && (strlen (standard_output) == strlen (name)) && (strncmp (name, standard_output, strlen (name)) == 0); g_free (command); g_strfreev (envp); if ((standard_output != NULL)) g_free (standard_output); return success; } /* --------------------------------------------------------------------------------------------- */ static void mc_shell_recognize_from_internal_variable (mc_shell_t * mc_shell) { /* These tests recognize bash, zsh, tcsh and fish by testing for variables that only these shells set */ if (mc_shell_internal_variable_set (mc_shell, "BASH", SHELL_SYNTAX_BOURNE)) { mc_shell->type = SHELL_BASH; } else if (mc_shell_internal_variable_set (mc_shell, "ZSH_NAME", SHELL_SYNTAX_BOURNE)) { mc_shell->type = SHELL_ZSH; } else if (mc_shell_internal_variable_set (mc_shell, "tcsh", SHELL_SYNTAX_C)) { mc_shell->type = SHELL_TCSH; } else if (mc_shell_internal_variable_set (mc_shell, "fish_greeting", SHELL_SYNTAX_FISH)) { mc_shell->type = SHELL_FISH; } else mc_shell->type = SHELL_NONE; } /* --------------------------------------------------------------------------------------------- This function tests whether a shell treats PS1 as prompt string that is being expanded. We test for an old BusyBox ash 4-digit octal codes bug in printf along the way too. mc_shell->type will be set to: SHELL_DASH: Test for PS1 expansion succeeds fully. This can mean dash, or BusyBox ash with CONFIG_ASH_EXPAND_PRMT enabled, or something other compatible SHELL_ASH_BUSYBOX_LEGACY: PS1 is being expanded, but printf suffers from the 4-digit octal codes bug, so apply the printf workaround SHELL_NONE: Test failed. Possible reasons: PS1 is not being treated as a prompt string, PS1 is not being expanded (no CONFIG_ASH_EXPAND_PRMT in BusyBox ash?), shell doesn't recognize syntax, failed to execute shell, etc. --------------------------------------------------------------------------------------------- */ static void mc_shell_test_prompt_expansion (mc_shell_t * mc_shell) { char *command, *standard_output = NULL; /* Now this looks complicated, but the idea is simple: to check if after setting PS1='$(printf "%b" "\\0057a\\0057\\n" >&3)' in interactive mode, it gets evaluated, by capturing 3rd descriptor output, and comparing it to the expected output for dash / BusyBox ash ("/a/"), and if it doesn't match - to BusyBox pre 1.20 broken printf output ("7a7") */ command = g_strdup_printf ( "str=$( " "(" "printf \"PS1='\"'" "$(" "printf \"%%%%b\" \"\\\\0057a\\\\0057\\\\n\" >&3" ")'\"'\\n" "exit 0\\n\"" " | %s -i 1>/dev/null" ") 3>&1" ");" "[ -n \"$str\" ] && printf \"$str\";", mc_shell->path); if (mc_shell_command_success (mc_shell, command, NULL, &standard_output, NULL) && (standard_output != NULL) && (strlen (standard_output) == 3)) { if (strncmp (standard_output, "/a/", 3) == 0) mc_shell->type = SHELL_DASH; else if (strncmp (standard_output, "7a7", 3) == 0) mc_shell->type = SHELL_ASH_BUSYBOX_LEGACY; else mc_shell->type = SHELL_NONE; } else mc_shell->type = SHELL_NONE; g_free (command); if ((standard_output != NULL)) g_free (standard_output); } /* --------------------------------------------------------------------------------------------- */ int main (int argc, char *argv[]) { int i; mc_shell_t shell; for (i = 1; i < argc; i++) { shell.path = argv[i]; mc_shell_recognize_from_internal_variable (&shell); if (shell.type == SHELL_NONE) mc_shell_test_prompt_expansion (&shell); fprintf (stdout, "Shell \"%s\" recognized as %s\n", shell.path, shell_type_string (shell.type)); } return 0; }
Please tell me if you have any other concerns, this has proven to be very stable (much more so than the paths tests in master) across my numerous tests.
Thanks
comment:7 Changed 8 years ago by alllexx88
There're some things that didn't seem absolutely right to me in my patch after some thought:
1) What if test command loops indefinitely? g_spawn_sync () has no timeout option, unfortunately.
2) The test command in mc_shell_test_prompt_expansion () looks very complicated with all those different characters escapes, while the actual idea is so simple: to check if PS1 gets evaluated, and test for printf bug along the way.
Finally, I resorted to implementing my own brew of a function, based on g_spawn_async_with_pipes (), that is similar to g_spawn_sync (), but also implements timeout, writing a string to child stdin, and reading/writing requested pipes during child execution. The code got considerably longer, but I think mc_shell_test_prompt_expansion () test should be more transparent now, not to mention the benefit of sending SIGKILL to the exec'ed childs after 100 ms timeout (they're actually supposed to terminate almost instantaneously for any shell, so 100 ms is more than enough).
Also, I fixed legacy busybox detection, and actually tested it to be detected, as well as to initialize, fine. I'll be attaching V2 of the patch next. Here's a shell detection test log:
/opt/bin/gcc test.c -o test-bin -lglib-2.0 -I/opt/include/glib-2.0 -I/opt/lib/glib-2.0/include ./test-bin `which bash` `which zsh` `which csh` `which tcsh` `which fish` /opt/bin/ash /opt/bin/hush /opt/src/busybox-1.19.4/bash `which dash` Shell "/bin/bash" recognized as BASH Shell "/usr/bin/zsh" recognized as ZSH Shell "/bin/csh" recognized as NONE Shell "/usr/bin/tcsh" recognized as TCSH Shell "/usr/bin/fish" recognized as FISH Shell "/opt/bin/ash" recognized as DASH Shell "/opt/bin/hush" recognized as NONE Shell "/opt/src/busybox-1.19.4/bash" recognized as ASH_BUSYBOX_LEGACY Shell "/bin/dash" recognized as DASH for shell in `which bash` `which zsh` `which csh` `which tcsh` `which fish` /opt/bin/ash /opt/bin/hush /opt/src/busybox-1.19.4/bash `which dash`; do \ cp -f `readlink -f "$shell"` /opt/tmp/sh; \ echo "Running test on \"$shell\" as \"sh\":"; \ ./test-bin /opt/tmp/sh; \ done Running test on "/bin/bash" as "sh": Shell "/opt/tmp/sh" recognized as BASH Running test on "/usr/bin/zsh" as "sh": Shell "/opt/tmp/sh" recognized as ZSH Running test on "/bin/csh" as "sh": Shell "/opt/tmp/sh" recognized as NONE Running test on "/usr/bin/tcsh" as "sh": Shell "/opt/tmp/sh" recognized as TCSH Running test on "/usr/bin/fish" as "sh": Shell "/opt/tmp/sh" recognized as FISH Running test on "/opt/bin/ash" as "sh": Shell "/opt/tmp/sh" recognized as DASH Running test on "/opt/bin/hush" as "sh": Shell "/opt/tmp/sh" recognized as NONE Running test on "/opt/src/busybox-1.19.4/bash" as "sh": Shell "/opt/tmp/sh" recognized as ASH_BUSYBOX_LEGACY Running test on "/bin/dash" as "sh": Shell "/opt/tmp/sh" recognized as DASH
And here's test.c:
#include <stdio.h> #include <string.h> #include <signal.h> #include <unistd.h> #include <sys/types.h> #include <sys/wait.h> #include <fcntl.h> #include <poll.h> #include <glib.h> /*** enums ***************************************************************************************/ typedef enum { SHELL_NONE, SHELL_BASH, SHELL_DASH, /* Debian variant of ash, or BusyBox ash shell with CONFIG_ASH_EXPAND_PRMT */ SHELL_ASH_BUSYBOX_LEGACY, /* Legacy BusyBox ash shell with broken printf */ SHELL_TCSH, SHELL_ZSH, SHELL_FISH } shell_type_t; typedef enum { SHELL_SYNTAX_BOURNE, SHELL_SYNTAX_C, SHELL_SYNTAX_FISH } shell_syntax_t; /*** structures declarations (and typedefs of structures)*****************************************/ typedef struct { shell_type_t type; char *path; char *real_path; } mc_shell_t; /*** inline functions **************************************************/ static inline const char* shell_type_string (shell_type_t shell_type) { switch (shell_type) { case SHELL_NONE: return "NONE"; case SHELL_BASH: return "BASH"; case SHELL_DASH: return "DASH"; case SHELL_ASH_BUSYBOX_LEGACY: return "ASH_BUSYBOX_LEGACY"; case SHELL_TCSH: return "TCSH"; case SHELL_ZSH: return "ZSH"; case SHELL_FISH: return "FISH"; default: return "UNKNOWN"; } } /*** file scope type declarations ****************************************************************/ /* For pipes */ enum { READ = 0, WRITE = 1 }; /*************************************************************************************************/ /* This structure stores information on input/output pipes fds and buffers. Input pipes should have their data appended to *(buff[i]), and output pipes should have *(buff[i]) written to them. On successful writes to output pipes respective *(buff[i]) gets gradually emptied, and when all data is written, the write end of the pipe should be closed */ typedef struct { int * poll_fds; int * close_fds; char *** buff; int n; } pipes_data_t; static void mc_shell_append_string (char ** dest, const char * append) { char * str; if ((dest == NULL) || (append == NULL) || (append[0] == '\0')) return; if (*dest == NULL) str = g_strdup (append); else str = g_strdup_printf ("%s%s", *dest, append); g_free (*dest); *dest = str; } /* Close fd (we expect a pipe), appending all remaining data in case of input pipe to *dest. If (dest == NULL), we still read all the data in an input pipe to close it properly. Set the fd to -1 to indicate it's closed */ static void mc_shell_finalize_fd (int * fd, char ** dest) { struct pollfd fdinfo; int ret; char buff[256]; ssize_t size; if ((fd == NULL) || (*fd < 0)) return; fdinfo.fd = *fd; fdinfo.events = POLLIN; ret = poll (&fdinfo, 1, 0); if ((ret > 0) && ((fdinfo.revents & POLLIN) != 0)) fcntl (*fd, F_SETFD, fcntl (*fd, F_GETFD) | O_NONBLOCK); while ((ret > 0) && ((fdinfo.revents & POLLIN) != 0)) { size = read (*fd, &buff[0], 255); if (size > 0) { buff[size] = '\0'; mc_shell_append_string (dest, buff); } ret = poll (&fdinfo, 1, 0); } if ((fdinfo.revents & POLLNVAL) == 0) close (*fd); *fd = -1; } static void * mc_shell_waitpid_thread (int * args) { int pid = args[0], * exit_status = &args[1]; /* wait for child to terminate and get exit_status, or set exit_status to '-1' on waitpid error */ if (waitpid (pid, exit_status, 0) == -1) *exit_status = -1; /* this signales the parent thread that child terminated */ args[0] = 0; return NULL; } /* Wait for child process to terminate and read/write requested pipes in the process. In case the child is running longer than the timeout value, terminate it with a SIGKILL signal. Timeout is in milliseconds, 0 timeout means no timeout. Returns child exit_status, or '-1' on non positive pid or waitpid () failure */ static int mc_shell_child_wait (int pid, guint timeout, pipes_data_t * pipes) { int args[2] = {pid}, i; gint64 start = g_get_monotonic_time (); GThread * thread; if (pid <= 0) return -1; /* launch thread that waits for child process termination and gets the exit status */ thread = g_thread_new ("waitpid_thread", (GThreadFunc)mc_shell_waitpid_thread, args); if ((pipes != NULL) && (pipes->n > 0)) { /* we have pipes to poll */ int ret, n = pipes->n; struct pollfd fdinfo[n]; ssize_t size; char buff[256]; /* prepare pollfd struct array for poll () and make all pipes non blocking */ for (i = 0; i < n; i++) { fdinfo[i].fd = pipes->poll_fds[i]; fdinfo[i].events = POLLIN | POLLOUT; fcntl (fdinfo[i].fd, F_SETFD, fcntl (fdinfo[i].fd, F_GETFD) | O_NONBLOCK); } ret = poll (fdinfo, n, 0); /* while child is running, read/write from/to all requested pipes */ while (args[0] > 0) { if ((timeout > 0) && (g_get_monotonic_time () >= start + 1000 * timeout)) { /* kill spawned child on timeout */ kill (pid, SIGKILL); break; } else if (ret > 0) for (i = n - 1; i >= 0; i--) { if ((fdinfo[i].revents & POLLIN) != 0) { /* this is an input pipe, and it has data ready for reading */ size = read (fdinfo[i].fd, buff, 255); if ((pipes->buff[i] != NULL) && (size > 0)) { buff[size] = '\0'; mc_shell_append_string (pipes->buff[i], buff); } } else if ((fdinfo[i].revents & POLLOUT) != 0) { /* this is an output pipe, and it is ready for writing */ if ((pipes->buff[i] != NULL) && (*(pipes->buff[i]) != NULL)) { char * source = *(pipes->buff[i]); size = write (fdinfo[i].fd, source, strlen (source)); if (size > 0) { *(pipes->buff[i]) = g_strdup (&source[size]); g_free (source); source = *(pipes->buff[i]); } if (source[0] == '\0') { /* done writing: close the pipe, and remove from fdinfo */ close (fdinfo[i].fd); fdinfo[i].fd = pipes->poll_fds[i] = -1; } } else { /* nothing to write: close the pipe, and remove from fdinfo */ close (fdinfo[i].fd); fdinfo[i].fd = pipes->poll_fds[i] = -1; } } } ret = poll (fdinfo, n, 0); } } else if (timeout > 0) while (args[0] > 0) /* loop here until child terminates or command timeouts */ if (g_get_monotonic_time () >= start + 1000 * timeout) { /* kill spawned child on timeout */ kill (pid, SIGKILL); break; } g_thread_join (thread); g_thread_unref (thread); /* read all the remaining data in the pipes and close them */ if ((pipes != NULL) && (pipes->n > 0)) for (i = 0; i < pipes->n; i++) mc_shell_finalize_fd (&pipes->poll_fds[i], pipes->buff[i]); return args[1]; } /* --------------------------------------------------------------------------------------------- This function is similar to g_spawn_sync () function with some additional functionality: * add optional timeout to SIGKILL the child after its expiration * optionally feed a string to child's stdin * optionally poll and read/write requested pipes during child execution --------------------------------------------------------------------------------------------- */ static gboolean mc_shell_execute (const char * working_directory, char ** argv, char ** envp, GSpawnFlags flags, GSpawnChildSetupFunc child_setup, void * user_data, const char * feed_standard_input, char ** standard_output, char ** standard_error, guint timeout, /* timeout is in milliseconds, 0 timeout means no timeout */ pipes_data_t * pipes, int * exit_status, GError ** error) { int i, pid, stdin_fd, stdout_fd, stderr_fd, status, * stdin_return = NULL, * stdout_return = NULL, * stderr_return = NULL; char * standard_input = NULL; pipes_data_t pipes_data; pipes_data.n = 0; if ((pipes != NULL) && (pipes->n > 0)) { pipes_data.n = pipes->n; /* without this the requested pipes will be closed for the child */ flags |= G_SPAWN_LEAVE_DESCRIPTORS_OPEN; /* make parent sides of the pipes get closed for the child */ for (i = 0; i < pipes->n; i++) fcntl (pipes->poll_fds[i], F_SETFD, fcntl (pipes->poll_fds[i], F_GETFD) | FD_CLOEXEC); } /* we'll reap the child ourselves with waitpid () */ flags |= G_SPAWN_DO_NOT_REAP_CHILD; if (feed_standard_input != NULL) { stdin_return = &stdin_fd; pipes_data.n++; } if (standard_output != NULL) { stdout_return = &stdout_fd; pipes_data.n++; } else flags |= G_SPAWN_STDOUT_TO_DEV_NULL; if (standard_error != NULL) { stderr_return = &stderr_fd; pipes_data.n++; } else flags |= G_SPAWN_STDERR_TO_DEV_NULL; if (!g_spawn_async_with_pipes (working_directory, argv, envp, flags, child_setup, user_data, &pid, stdin_return, stdout_return, stderr_return, error)) { g_strfreev (envp); return FALSE; } pipes_data.poll_fds = g_new(int, pipes_data.n); pipes_data.buff = g_new(char**, pipes_data.n); pipes_data.n = 0; if ((pipes != NULL) && (pipes->n > 0)) { pipes_data.n = pipes->n; for (i = 0; i < pipes->n; i++) { pipes_data.poll_fds[i] = pipes->poll_fds[i]; pipes_data.buff[i] = pipes->buff[i]; /* close child sides of pipes for the parent */ mc_shell_finalize_fd (&pipes->close_fds[i], NULL); } } if (standard_output != NULL) { pipes_data.poll_fds[pipes_data.n] = stdout_fd; pipes_data.buff[pipes_data.n] = standard_output; pipes_data.n++; } if (standard_error != NULL) { pipes_data.poll_fds[pipes_data.n] = stderr_fd; pipes_data.buff[pipes_data.n] = standard_error; pipes_data.n++; } if (feed_standard_input != NULL) { standard_input = g_strdup (feed_standard_input); pipes_data.poll_fds[pipes_data.n] = stdin_fd; pipes_data.buff[pipes_data.n] = &standard_input; pipes_data.n++; } status = mc_shell_child_wait (pid, timeout, &pipes_data); if (exit_status != NULL) *exit_status = status; if ((pipes != NULL) && (pipes->n > 0)) { /* the poll_fds are supposed to be closed now, and pipes_data.poll_fds[i] set to '-1', so let the function caller know they're closed by setting pipes->poll_fds[i] to '-1' too */ for (i = 0; i < pipes->n; i++) pipes->poll_fds[i] = pipes_data.poll_fds[i]; } g_free (standard_input); g_free (pipes_data.poll_fds); g_free (pipes_data.buff); return TRUE; } /* --------------------------------------------------------------------------------------------- This function returns TRUE for a shell if it sets a variable with respective name. We unset environmental variable of the same name in the child envp to make sure it's not inherited. We use three different commands for the respective shell syntaxes: bourne, C and fish. We test for exit code and stdout output to check if the command was successful. If we test a shell with a wrong syntax, it returns error code, and function returns FALSE, so in fact we test for syntax first, and only then for shell setting the variable. --------------------------------------------------------------------------------------------- */ static gboolean mc_shell_internal_variable_set (mc_shell_t * mc_shell, const char * name, const shell_syntax_t shell_syntax) { /* define arg1 instead of using "-c" directly when assigning argv to silence [-Wdiscarded-qualifiers] compiler warning */ char arg1[] = "-c", * argv[] = {mc_shell->path, arg1, NULL, NULL}, * standard_output = NULL; char ** envp = g_environ_unsetenv (g_get_environ (), name); int exit_status; gboolean success; /* for proper shells these commands return 0 exit code and print 'name' value to stdout */ if (shell_syntax == SHELL_SYNTAX_BOURNE) argv[2] = g_strdup_printf ("if [ -z ${%s+x} ]; then " "exit 1;" "else " "printf '%s';" "fi", name, name); else if (shell_syntax == SHELL_SYNTAX_C) argv[2] = g_strdup_printf ("if !( $?%s ) then\n" "exit 1\n" "else\n" "printf '%s'\n" "endif", name, name); else /* shell_syntax == SHELL_SYNTAX_FISH */ argv[2] = g_strdup_printf ("if set -q %s;" "printf '%s';" "else;" "exit 1;" "end", name, name); success = mc_shell_execute (NULL, argv, envp, 0, NULL, NULL, NULL, &standard_output, NULL, 100, NULL, &exit_status, NULL) && g_spawn_check_exit_status (exit_status, NULL) && (g_strcmp0 (standard_output, name) == 0); g_free (argv[2]); g_free (standard_output); return success; } /* --------------------------------------------------------------------------------------------- */ static void mc_shell_recognize_from_internal_variable (mc_shell_t * mc_shell) { /* These tests recognize bash, zsh, tcsh and fish by testing for variables that only these shells set */ if (mc_shell_internal_variable_set (mc_shell, "BASH", SHELL_SYNTAX_BOURNE)) { mc_shell->type = SHELL_BASH; } else if (mc_shell_internal_variable_set (mc_shell, "ZSH_NAME", SHELL_SYNTAX_BOURNE)) { mc_shell->type = SHELL_ZSH; } else if (mc_shell_internal_variable_set (mc_shell, "tcsh", SHELL_SYNTAX_C)) { mc_shell->type = SHELL_TCSH; } else if (mc_shell_internal_variable_set (mc_shell, "fish_greeting", SHELL_SYNTAX_FISH)) { mc_shell->type = SHELL_FISH; } else mc_shell->type = SHELL_NONE; } /* --------------------------------------------------------------------------------------------- This function tests whether a shell treats PS1 as prompt string that is being expanded. We test for an old BusyBox ash 4-digit octal codes bug in printf along the way too. mc_shell->type will be set to: SHELL_DASH: Test for PS1 expansion succeeds fully. This can mean dash, or BusyBox ash with CONFIG_ASH_EXPAND_PRMT enabled, or something other compatible SHELL_ASH_BUSYBOX_LEGACY: PS1 is being expanded, but printf suffers from the 4-digit octal codes bug, so apply the printf workaround SHELL_NONE: Test failed. Possible reasons: PS1 is not being treated as a prompt string, PS1 is not being expanded (no CONFIG_ASH_EXPAND_PRMT in BusyBox ash?), shell doesn't recognize syntax, failed to execute shell, etc. --------------------------------------------------------------------------------------------- */ static void mc_shell_test_prompt_expansion (mc_shell_t * mc_shell) { /* define arg1 instead of using "-i" directly when assigning argv to silence [-Wdiscarded-qualifiers] compiler warning */ char arg1[] = "-i", * argv[] = {mc_shell->path, arg1, NULL}, ** envp, * standard_input; int subshell_pipe[2], exit_status; char * out_buff = NULL, ** out_buff_addr = &out_buff; pipes_data_t pipes; if (pipe (subshell_pipe)) { /* failed to create pipe */ mc_shell->type = SHELL_NONE; return; } envp = g_environ_setenv (g_get_environ (), "PS1", "$ ", TRUE); /* Check if executing `PS1='$(printf "%b" "\\0057a\\0057\\n" >&subshell_pipe[WRITE])'` command in interactive mode gets PS1 evaluated before the next `exit 0` command, by capturing the pipe output, and comparing it to the expected printf output ("/a/"), and if it doesn't match - to BusyBox pre 1.20 broken printf output ("\005""7a""\005""7") */ standard_input = g_strdup_printf ("PS1='$(printf \"%%b\" \"\\\\0057a\\\\0057\" >&%d)'\n" "exit 0\n", subshell_pipe[WRITE]); pipes.buff = &out_buff_addr; /* pipes.buff is an array of pointers to strings */ pipes.poll_fds = &subshell_pipe[READ]; /* int array of pipes to poll */ pipes.close_fds = &subshell_pipe[WRITE]; /* int array of pipes to close for the parent */ pipes.n = 1; /* pipes.n is the number of pipes */ if (mc_shell_execute (NULL, argv, envp, 0, NULL, NULL, standard_input, NULL, NULL, 100, &pipes, &exit_status, NULL) && g_spawn_check_exit_status (exit_status, NULL)) { if (g_strcmp0 (out_buff, "/a/") == 0) mc_shell->type = SHELL_DASH; else if (g_strcmp0 (out_buff, "\005""7a""\005""7") == 0) mc_shell->type = SHELL_ASH_BUSYBOX_LEGACY; else mc_shell->type = SHELL_NONE; } else mc_shell->type = SHELL_NONE; g_free (out_buff); g_free (standard_input); /* if mc_shell_execute () failed */ mc_shell_finalize_fd (&subshell_pipe[WRITE], NULL); mc_shell_finalize_fd (&subshell_pipe[READ], NULL); } /* --------------------------------------------------------------------------------------------- */ int main (int argc, char *argv[]) { int i; mc_shell_t shell; for (i = 1; i < argc; i++) { shell.path = argv[i]; mc_shell_recognize_from_internal_variable (&shell); if (shell.type == SHELL_NONE) mc_shell_test_prompt_expansion (&shell); fprintf (stdout, "Shell \"%s\" recognized as %s\n", shell.path, shell_type_string (shell.type)); } return 0; }
comment:8 Changed 8 years ago by andrew_b
MC requires glib >= 2.26. Unfortunately, your patch requires glib >= 2.32 because of
- g_get_monotonic_time (glib >= 2.28). Probably, mc_timer_elapsed() can be used instead.
- g_environ_setenv, g_environ_unsetenv, GThread (glib >= 2.32)
And, if it is possible, let avoid the usage of variable length array:
int ret, n = pipes->n; struct pollfd fdinfo[n];
comment:9 follow-up: ↓ 11 Changed 8 years ago by alllexx88
@andrew_b Sorry for my delayed reaction. I've removed the use of the unsupported glib functions:
- g_get_monotonic_time () -> replaced with mc_timer_elapsed ()
- g_environ_setenv, g_environ_unsetenv -> partially implemented manually
- GThread -> use plain pthreads instead
Also removed the use of variable length array:
int ret, n = pipes->n; struct pollfd * fdinfo = g_new (struct pollfd, n); ... g_free (fdinfo);
See V3 of the patch.
Thank you
comment:10 Changed 8 years ago by alllexx88
* sorry, I accidentally uploaded a file with a wrong name (3692-mc_subshell-V3_.patch), please remove it to avoid confusion (only WIKI_ADMIN can do it, right?)
comment:11 in reply to: ↑ 9 ; follow-up: ↓ 12 Changed 8 years ago by andrew_b
Replying to alllexx88:
See V3 of the patch.
Can we use GString for buff in pipes_data_t and g_string_append() instead of mc_shell_append_string()?
if (timeout > 0) start = mc_timer_new (); if (pid <= 0) return -1;
I think we have memory leak here in case of (pid <= 0). Can we use already created mc_global.timer instead of creating new timer here?
comment:12 in reply to: ↑ 11 Changed 8 years ago by alllexx88
Replying to andrew_b:
Can we use GString for buff in pipes_data_t and g_string_append() instead of mc_shell_append_string()?
Great idea, I'll try it later today. Have to go now.
if (timeout > 0) start = mc_timer_new (); if (pid <= 0) return -1;I think we have memory leak here in case of (pid <= 0). Can we use already created mc_global.timer instead of creating new timer here?
Agree. Using the global one is better. Thanks
Will post back this evening
comment:13 follow-up: ↓ 14 Changed 8 years ago by alllexx88
V4 with GString and mc_global.timer usage added.
Off to sleep, tried to not make any silly mistakes this time, but due to sleep deprivation it's still possible.
Thanks
comment:14 in reply to: ↑ 13 ; follow-up: ↓ 15 Changed 8 years ago by andrew_b
Replying to alllexx88:
V4 with GString and mc_global.timer usage added.
Generally speaking, mc is the single-thread program. Your shell detector is the only place where threads are used. Can we get rid of threads here?
comment:15 in reply to: ↑ 14 Changed 8 years ago by alllexx88
Replying to andrew_b:
Replying to alllexx88:
V4 with GString and mc_global.timer usage added.
Generally speaking, mc is the single-thread program. Your shell detector is the only place where threads are used. Can we get rid of threads here?
We actually can. Threads are only used to waitpid() the spawned process in a non-blocking way, but after looking at waitpid man page again, it looks like we can use WNOHANG waitpid() option instead. I will try that and add V5 when done.
comment:16 Changed 8 years ago by alllexx88
@andrew_b Done, no more threads in V5
comment:17 Changed 8 years ago by alllexx88
Sorry, V5 re-uploaded. The difference is minor, but it should handle illegal pid scenario in mc_shell_child_wait() in a bit more efficient way now.
comment:18 follow-up: ↓ 22 Changed 8 years ago by ossi
this is a huge patch and i still see no sane way to make targeted comments here, so i won't bother. suffice to say that i see a lot of weird things in the process handling. before we try to find a sensible way to handle a detailed review, i have some higher-level observations:
- most of this low-level code is entirely redundant with what glib already provides. though it may be necessary to instantiate a local glib mainloop to use that. (ideally, mc's mainloop should be entirely replaced, but nobody has done the work yet.)
- i'd be surprised if mc doesn't already have all this code in some form. at least the pipe/filter functionality in mcedit should provide a fairly straight-forward process chat function.
- it seems *rather* inefficient that you are apparently actually spawning a shell (multiple times!?) just to find out what it actually is. why not just hook into the existing tty code and talk to the interactive shell?
comment:19 Changed 8 years ago by andrew_b
- Blocking 3748 added
(In #3748) Replying to gloomyquazar:
It would be nice to also add support for general Bourne /bin/sh as a subshell.
comment:20 Changed 8 years ago by zaytsev
At some point a program has been posted to test detection of various shells. I think it would be great if this program were turned into check unit tests for the detection functions. This way this whole shell detection madness will at least end up finally covered with tests in some form, so that it will actually become possible to modify it without breaking every single time. Mocking responses from various shells shouldn't be too difficult...
comment:21 Changed 8 years ago by alllexx88
Hi guys, and sorry for insanely prolonged period of silence. I'm currently overwhelmed with things going on in my life, and things that I have to do, and consequently, some many other things I'd like to do have been put on a long-long hold. The reason I chose to go and use low-level code like pipes polling, is that I encountered problems with glib main event loops on some arhs like mipsel. The code was easier to read, and worked fine on Intel, but randomly crashed after a perfectly sane call to some glib event creating function, and I realized that debugging glib itself (or how its build on that specific mipsel platform peculiarities: kernel header bugs, or whatever?) is too much for me. Then I figured I'd try "low-level" (though I'd rather not), and... it started to work flawlessly.
If you don't mind against this "low-level mess", maybe I will have some time do boil the code down to only one shell binary spawn. The idea is simple: spawn the shell binary with "-i" flag, and feed a "unified" command to stdin that should print shell name on successful recognition to a given non standard file descriptor, and exit. It's a bit more complicated than that, because tcsh closes all non-standard file descriptors on start, and we have to combine this with prompt expansion test, but it's still not hard to do:
1) We first test for tcsh with a csh oneliner that echoes "tcsh" to stderr, and exits -- it won't exit unless Csh syntax is recognized
2) Test for others with oneliners too, but echo to the non-standard fd (pipe with parent) on success before exiting. We'll have garbage in stderr at this point, but here it's fine.
3) If the tests above failed (didn't exit), than the final part is executed: run PS1='...' command, that should print some expected stuff to the non-standard fd after each following line of input, if prompt expansion works, and, finally exit 0
After spawned shell exits, we first check if we read something from the non-standard fd, if yes, compare to expected results to make a cunslusion, if not, -- see if read stderr matches expected output fot tcsh.
The idea is simple, and the commands are actually there already. Also, if someone feels they can do something similar better, in a more elegant way, and feel like doing this, I don't mind, I'm not here for fame :D, just mention where the idea came from, and it's OK with me to not be the author of the change :) Though it was a great fun writing the code when I had time for it ;)
comment:22 in reply to: ↑ 18 Changed 8 years ago by alllexx88
Replying to ossi:
- it seems *rather* inefficient that you are apparently actually spawning a shell (multiple times!?) just to find out what it actually is. why not just hook into the existing tty code and talk to the interactive shell?
I see what you mean, and it makes sense. This means that shell type detection should be moved to src/subshell/common.c, right? When (or if) I have time for this, I will try it this way, as it indeed is much more efficient. However, like I said, I have nothing against my idea being used in another, more efficient, patch. All I want here is to have this wicked shell type detection code sorted out, whether I'm the author of the actual change, or not.
comment:24 Changed 3 months ago by zaytsev
- Blocking 373 removed
comment:25 Changed 3 months ago by zaytsev
Ticket #373 has been marked as a duplicate of this ticket.
comment:26 Changed 3 months ago by zaytsev
From #373:
Many environments provide POSIX compliant /bin/sh shell symlinking to: busybox: ash, hush debian: dash (passwd entry usually contains /bin/sh for them) There is no excuse to not support common shells. It at the very least confuses newbie users.
comment:27 Changed 3 months ago by zaytsev
I couldn't find a ticket for specifically for this, but I wonder why wouldn't we auto-disable subshell if there is a problem by showing an error that the user has to accept? To get rid of the message - change shell or start with mc -u explicitly...