Development/Docs/Kernel Deadlocks

From Mandriva Community Wiki

Jump to: navigation, search
Analysis of kernel deadlocks in Mandrake 9.2

Contents


[edit] Introduction

I have been experiencing a problem which seems to be caused by a kernel deadlock in devfs code since an update to kernel in mid-9.1 Mandrake. It was still present in cooker kernel 2.4.22-0.3mdk.

(AleksanderAdamowski 21 Aug 2003)
In kernel 2.4.22-0.6mdk it seems to be fixed - but that's not completely certain, longer testing needs to be conducted.
I'm putting the machine under big stress, but it has run stable for 4 days. So far so good.

Yes, it's been fixed sometime after 2.4.22-0.3mdk (vulnerable) but before 2.4.22-0.6mdk (not vulnerable).

I've closed Image:bug_small.png Bug #4709.

-- AleksanderAdamowski - 30 Aug 2003

[edit] Details (kept for archival purposes)

It's hard to reproduce, but extremely annoying. I would apprectiate help and suggestions on debugging methods that would better diagnose the problem and finally catch this bug. You're welcome to [mail me].

The Bugzilla bug for this problem is Image:bug_small.png Bug #4709.

[edit] Symptoms & reproducability

[edit] VT problem

The symptom is inability to launch a new shell (I'm suspecting a deadlock in kernel's devfs handling code). The typical scenario is: the system enters into some state, and after that no new shells can be spawned - they hang at one stage during launch. In fact, any program that tries to access the /dev directory hangs - even ls /dev. This sometimes happens when the system has been running for a couple of hours, but sometimes only a couple of minutes after booting.

The programs which cannot start include:

  • bash
  • csh
  • xterm
  • mc
  • ssh
  • mozilla
  • smbmount

I am unable to start not only local, but also remote shells, so it seems that the problem is in allocating a virtual terminal.

The programs that can start include:

  • konqueror
  • galeon
  • gftp
  • links (if you have a shell open before the deadlock occurs)

[edit] Switching between virtual consoles

When I try to switch between virtual consoles from the X server and back, I get a completely unresponsive system. It doesn't bo back to X, doesn't show the "Welcome to..." text mode line, the screen is blank.

[edit] Rebooting

When I try to reboot, the WM session ends fine, but then the system cannot perform the shutdown procedure properly, it just hangs with blank screen. To preserve the integrity of filesystems, I use the magic SysRq key combinations: Alt-SysRq-S to execute emergency sync, Alt-SysRq-U to execute emergency readonly remount of filesystems, Alt-SysRq-B to execute emergency instant reboot.

[edit] Reproducability

I weren't able to determine exact factors that trigger this bug, but it may be related to certain activity (the bug seems to be more frequent when I perform intensive tasks, but maybe I'm making things up ;) ).

During everyday work, I see this happen usually 2 times a day, sometimes more frequently.

No related log messages can be seen in /var/log/kernel and /var/log/messages when the system enters this odd state.

[edit] Hardware

Since this bug seems to occur randomly and I cannot determine specific activities that would trigger it, there's some probability that it's a hardware bug. So the specs of my hardware are:

  • Motherboard: Abit KX7-333R, but for some period of time I exchanged it for an Intel 810 motherboard running a Pentium III
  • CPU: Athlon XP 1800, Pentium III (see above)
  • RAM: 384MB SDRAM
  • Video card: ATI Radeon 9000 PRO, but for some period of time I exchanged it for nVidia Vanta 16 MB

[edit] Pros (why this could be a hardware problem):

  • The problem occurs quite randomly

[edit] Cons (why this couldn't be a hardware problem):

  • I've exchanged various hardware parts and the problem didn't go away
  • When the problem occurs, all processes that are already running, run fine. New processess can be launched, when they don't depend on a shell. The system can continue running for indefinite time, but it's crippled. So it seems that the kernel goes into a specific state of a deadlock somewhere.

[edit] Debugging info collected so far

  • Logs of system calls (generated with strace) of various commands that depend on a shell being launched: bash, csh, xterm, mc... All of those logs are attached to Image:bug_small.png Bug #4709.
Personal tools