Investigate process state and memory in Linux

Generally investing JVM states is easier since Oracle provide the tools inside the JDK (jstack, jstat, jconsole,…) but what to do when a process hangs for no reason?

Linux provides us with strace, a great tool to tail the syscall our processes issue to the kernel BUT this wont tell us the state of the process, for example:


# strace -s 128 -ffp 25617
Process 25617 attached - interrupt to quit
restart_syscall(<... resuming interrupted call ...>) = 0
poll([{fd=11, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
poll([{fd=11, events=POLLIN|POLLPRI}], 1, 1000) = 0 (Timeout)
poll([{fd=11, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
poll([{fd=11, events=POLLIN|POLLPRI}], 1, 1000) = 0 (Timeout)
poll([{fd=11, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
poll([{fd=11, events=POLLIN|POLLPRI}], 1, 1000) = 0 (Timeout)
poll([{fd=11, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
poll([{fd=11, events=POLLIN|POLLPRI}], 1, 1000) = 0 (Timeout)
poll([{fd=11, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
poll([{fd=11, events=POLLIN|POLLPRI}], 1, 1000) = 0 (Timeout)
poll([{fd=11, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)

This strace tells us our process is looking of a poll() waiting for a FD to become ready but we dont know
what actually is running at higher level (a curl? an FTP stream?).

To better know what’s happening we need to grab the stacktrace or backtrace with GDB:

# gdb
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
(gdb) attach 20043
(gdb) bt

In the example 20043 is the process PID and the output of bt will tell us all the function calls that led the process to the poll():

#0 0x00007ff9063acb9f in poll () from /lib/libc.so.6
#1 0x00007ff8fe055296 in ?? () from /usr/lib/libcurl.so.4
#2 0x00007ff8fe04a935 in ?? () from /usr/lib/libcurl.so.4
#3 0x00007ff8fe26d289 in zif_curl_exec
(ht=252629312, return_value=0xf87890, return_value_ptr=0x3e8, this_ptr=0xffffffffffffffff, return_value_used=5754161)
at /root/php53/php-5.3.3/ext/curl/interface.c:2175
#4 0x00007ff9004634ee in zend_do_fcall_common_helper_SPEC
(execute_data=0x7ff8fa0f1150) at /root/php53/php-5.3.3/Zend/zend_vm_execute.h:316

#5 0x00007ff90043ebe1 in execute (op_array=0xf83628) at /root/php53/php-5.3.3/Zend/zend_vm_execute.h:107
#6 0x00007ff90040e9d1 in zend_execute_scripts
(type=0, retval=0x7fff0f0ed3b0, file_count=3) at /root/php53/php-5.3.3/Zend/zend.c:1266

#7 0x00007ff9003b6f05 in php_execute_script
(primary_file=Cannot access memory at address 0x80000f0ec2c0) at /root/php53/php-5.3.3/main/main.c:2289

#8 0x00007ff90049f8b9 in php_handler
(r=0x51505fc367380) at /root/php53/php-5.3.3/sapi/apache2handler/sapi_apache2.c:688

#9 0x0000000000439123 in ap_run_handler ()
#10 0x000000000043c6ef in ap_invoke_handler ()
#11 0x0000000000449740 in ap_internal_redirect ()
#12 0x00007ff8ff4d1b95 in ?? () from /usr/lib/apache2/modules/mod_rewrite.so
#13 0x0000000000439123 in ap_run_handler ()
#14 0x000000000043c6ef in ap_invoke_handler ()
#15 0x00000000004498de in ap_process_request ()
#16 0x0000000000446a08 in ?? ()
#17 0x0000000000440643 in ap_run_process_connection ()
#18 0x000000000044e580 in ?? ()
#19 0x000000000044e8d4 in ?? ()
#20 0x000000000044f516 in ap_mpm_run ()
#21 0x0000000000425be5 in main ()

If we want to dig more into the process configuration or data structure of course is possible to debug it while it’s
running with GDB, of course is need a very good knowledge of the code itself and it’s sources otherwise would be
nearly impossible to understand.

For configuration and simple text-base data structures (ENVIRONMENT) a good trick is to force a memory dump.

To discover the virtual memory address of the heal is sufficient this command:
# grep heap /proc/$PID/maps|cut -f1 -d' '
This will output 2 memory offsets in this form: 00d5b000-0480f000

Lets go back to GDB and dump the memory region, remember to use the offsets in HEX notation:

(gdb) # dump memory /tmp/mydump 0x00d5b000 0x0480f000

Now you can open /tmp/mydump with vim and navigate the ASCII data, u may be able to know more about what’s wrong with your process.
Best of luck

Share

Leave a Reply

Your e-mail address will not be published. Required fields are marked *