Development/Howto/Software Crash

From Mandriva

Jump to: navigation, search
Analyzing a software crash

This page is intended at helping you report useful information when you encounter a software crash, and maybe fixing it by yourself.

Contents


[edit]

Preliminaries

First, you should install the -debug packages of the applications and all the libraries that might be involved. It will allow all the debugging tools to report more useful information.

To find the name of the debug package, here is an automated way :

$ rpm -qf --qf "%{SOURCERPM}\n" /usr/lib/libz.so.1.2.3 | sed 's/-[^-]*-[^-]*$/-debug/'
zlib-debug

If you forget some libraries, you'll get the library filename when requesting a backtrace in gdb, then you can install the debug package and restart gdb. In the following example, I miss the -debug package for /lib64/libc.so.6 :

(gdb) bt
#0  0x00002b526953c7ef in poll () from /lib64/libc.so.6
#1  0x00002b52691f412e in g_main_context_iterate (context=0x51f2d0, block=1, dispatch=1, selfVariable "self" is not available.
) at gc:2977
#2  0x00002b52691f45ea in IA__g_main_loop_run (loop=0x53ad10) at gc:2879

When this is done, you can start collecting useful information using the various tools listed below.

[edit]

gdb

gdb is the GNU debugger, it's available in the package gdb. When a program crashes due to a segmentation fault or an abort, gdb will allow you to get a backtrace, that is the place in the program where the error occured and from where we were coming.

[edit]

Running your application inside gdb

If you can reproduce the crash easily, then you can run your application inside gdb and get useful information at crash time.

  • Start gdb giving it the path to your application and tell it to run with, if needed, the parameters.
[pterjan@coin ~]$ gdb /bin/cat 
GNU gdb 6.3-7mdv2007.0 (Mandriva Linux release 2007.0)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-mandriva-linux-gnu"...(no debugging symbols found)
Using host libthread_db library "/lib64/libthread_db.so.1".

(gdb) run /proc
Starting program: /bin/cat /proc
/bin/cat: /proc: Is a directory

Program exited with code 01.
(gdb)
  • If you get messages about signal 33, you need to tell gdb to not stop on them and pass them to the application, and then restart it :
Program received signal SIG33, Real-time event 33.
[Switching to Thread 1182845264 (LWP 11543)]
0x00002b661d87d536 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
(gdb) handle SIG33 nostop noprint noignore pass
Signal        Stop      Print   Pass to program Description
SIG33         No        No      Yes             Real-time event 33
(gdb) kill
Kill the program being debugged? (y or n) y
(gdb) run
  • When the program crashes, you'll get a gdb prompt. Get the backtrace with the command bt full, this is the information needed in the bug report. If the previous messages talk about threads, you should ask all the backtraces with thread apply all bt full.
(gdb) bt full
#0  0x00002b526953c7ef in poll () from /lib64/libc.so.6
No symbol table info available.
#1  0x00002b52691f412e in g_main_context_iterate (context=0x51f2d0, block=1, dispatch=1, self=Variable "self" is not available.
) at gc:2977
        max_priority = 2147483647
        timeout = 30000
        some_ready = Variable "some_ready" is not available.
[edit]

Running gdb on a core file

If for some reson you can't easily reproduce the crash inside gdb but got a core file, you can do a post-mortem analyis. Just give gdb your application and the core file like gdb /bin/cat core.42, then ask for a backtrace with thread apply all bt full. If no core gets generated after a segfault, try running ulimit -c unlimited in the shell from which you'll start you application.

[edit]

strace

strace will list all the system calls done by the application (open a file, read on a network socket, ...) and that can help finding some issues, like a missing file, a non writable directory, ...

You can run it with strace -f -o outputfile command for example strace -f -o ls.strace ls /tmp will run ls /tmp and list all the system calls into ls.strace.

[edit]

ltrace

[edit]

valgrind

Personal tools