Development/Howto/Software Crash

From Mandriva Community Wiki

Jump to: navigation, search
This page is intended at helping you report useful information when you encounter a software crash, and maybe fixing it by yourself.

Contents

Preliminaries

First, you should install the -debug packages of the applications and all the libraries that might be involved. It will allow all the debugging tools to report more useful information.

To find the name of the debug package, here is an automated way :

$ rpm -qf --qf "%{SOURCERPM}\n" /usr/lib/libz.so.1.2.3 | sed 's/-[^-]*-[^-]*$/-debug/'
zlib-debug

If you forget some libraries, you'll get the library filename when requesting a backtrace in gdb, then you can install the debug package and restart gdb. In the following example, I miss the -debug package for /lib64/libc.so.6 :

(gdb) bt
#0  0x00002b526953c7ef in poll () from /lib64/libc.so.6
#1  0x00002b52691f412e in g_main_context_iterate (context=0x51f2d0, block=1, dispatch=1, selfVariable "self" is not available.
) at gc:2977
#2  0x00002b52691f45ea in IA__g_main_loop_run (loop=0x53ad10) at gc:2879

When this is done, you can start collecting useful information using the various tools listed below.

gdb

gdb is the GNU debugger, it's available in the package gdb. When a program crashes due to a segmentation fault or an abort, gdb will allow you to get a backtrace, that is the place in the program where the error occured and from where we were coming.

Running your application inside gdb

If you can reproduce the crash easily, then you can run your application inside gdb and get useful information at crash time.

  • Start gdb giving it the path to your application and tell it to run with, if needed, the parameters.
[pterjan@coin ~]$ gdb /bin/cat 
GNU gdb 6.3-7mdv2007.0 (Mandriva Linux release 2007.0)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-mandriva-linux-gnu"...(no debugging symbols found)
Using host libthread_db library "/lib64/libthread_db.so.1".

(gdb) run /proc
Starting program: /bin/cat /proc
/bin/cat: /proc: Is a directory

Program exited with code 01.
(gdb)
  • If you get messages about signal 33, you need to tell gdb to not stop on them and pass them to the application, and then restart it:
Program received signal SIG33, Real-time event 33.
[Switching to Thread 1182845264 (LWP 11543)]
0x00002b661d87d536 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
(gdb) handle SIG33 nostop noprint noignore pass
Signal        Stop      Print   Pass to program Description
SIG33         No        No      Yes             Real-time event 33
(gdb) kill
Kill the program being debugged? (y or n) y
(gdb) run
  • When the program crashes, you'll get a gdb prompt. Get the backtrace with the command bt full, this is the information needed in the bug report. If the previous messages talk about threads, you should ask all the backtraces with thread apply all bt full.
(gdb) bt full
#0  0x00002b526953c7ef in poll () from /lib64/libc.so.6
No symbol table info available.
#1  0x00002b52691f412e in g_main_context_iterate (context=0x51f2d0, block=1, dispatch=1, self=Variable "self" is not available.
) at gc:2977
        max_priority = 2147483647
        timeout = 30000
        some_ready = Variable "some_ready" is not available.

Running gdb on a core file

If for some reson you can't easily reproduce the crash inside gdb but got a core file, you can do a post-mortem analyis. Just give gdb your application and the core file like gdb /bin/cat core.42, then ask for a backtrace with thread apply all bt full. If no core gets generated after a segfault, try running ulimit -c unlimited in the shell from which you'll start your application.

Attaching gdb to a running application

Debugging applications with complex startup, e.g. system services, can be tricky since gdb won't run a script and you may not know all of the parameters the script ends up using when it launches the actual binary executable.

In such cases (providing the crash doesn't occur during startup) you can attach gdb to a running instance of the application.

First, identify the process in which the crash will occur, e.g. with
ps ax | grep appname
. If the app is running multiple processes, try noting them all, causing the crash, and checking syslog to see which of them actually crashed; when you restart the app, you can attach gdb to the process which occupies the same relative place in the list.

Next, attach gdb to the process using gdb executable-name process-id. Doing this will cause gdb to attach to that process and suspend it. To resume its execution, use the gdb c command.

Finally, cause the crash, which will result in gdb telling you about it and putting up a prompt, at which you can request a backtrace.

If you have to run gdb from a tty, add >outputfile 2>&1 to the gdb command, which will redirect all output to that file. Of course, this means you won't see anything on the tty where gdb is running, so in another tty issue tail -f outputfile, which will let you see what's going on. Do realize, though, that any input you want to give gdb must be typed on the tty where gdb is running, and not on the tty where tail is running (where it will be ignored). When you're done, the output file will have a record of your gdb session.

strace

strace will list all the system calls done by the application (open a file, read on a network socket, ...) and that can help finding some issues, like a missing file, a non writable directory, ...

You can run it with strace -f -o outputfile command for example strace -f -o ls.strace ls /tmp will run ls /tmp and list all the system calls into ls.strace.

ltrace

valgrind

When a program is run under Valgrind's supervision, all reads and writes of memory are checked, and calls to malloc/new/free/delete are intercepted. As a result, Valgrind can detect problems such as:

  • Use of uninitialised memory
  • Reading/writing memory after it has been free'd
  • Reading/writing off the end of malloc'd blocks
  • Reading/writing inappropriate areas on the stack
  • Memory leaks -- where pointers to malloc'd blocks are lost forever
  • Passing of uninitialised and/or unaddressible memory to system calls
  • Mismatched use of malloc/new/new [] vs free/delete/delete []
Personal tools