Development/Docs/Kernel Deadlocks
From Mandriva Community Wiki
Contents |
Introduction
I have been experiencing a problem which seems to be caused by a kernel deadlock in devfs code since an update to kernel in mid-9.1 Mandrake. It was still present in cooker kernel 2.4.22-0.3mdk.
(AleksanderAdamowski 21 Aug 2003)
In kernel 2.4.22-0.6mdk it seems to be fixed - but that's not completely certain, longer testing needs to be conducted.
I'm putting the machine under big stress, but it has run stable for 4 days. So far so good.
Yes, it's been fixed sometime after 2.4.22-0.3mdk (vulnerable) but before 2.4.22-0.6mdk (not vulnerable).
I've closed
Bug #4709.
-- AleksanderAdamowski - 30 Aug 2003
Details (kept for archival purposes)
It's hard to reproduce, but extremely annoying. I would apprectiate help and suggestions on debugging methods that would better diagnose the problem and finally catch this bug. You're welcome to [mail me].
The Bugzilla bug for this problem is
Bug #4709.
Symptoms & reproducability
VT problem
The symptom is inability to launch a new shell (I'm suspecting a deadlock in kernel's devfs handling code). The typical scenario is: the system enters into some state, and after that no new shells can be spawned - they hang at one stage during launch. In fact, any program that tries to access the /dev directory hangs - even ls /dev. This sometimes happens when the system has been running for a couple of hours, but sometimes only a couple of minutes after booting.
The programs which cannot start include:
- bash
- csh
- xterm
- mc
- ssh
- mozilla
- smbmount
I am unable to start not only local, but also remote shells, so it seems that the problem is in allocating a virtual terminal.
The programs that can start include:
- konqueror
- galeon
- gftp
- links (if you have a shell open before the deadlock occurs)
Switching between virtual consoles
When I try to switch between virtual consoles from the X server and back, I get a completely unresponsive system. It doesn't bo back to X, doesn't show the "Welcome to..." text mode line, the screen is blank.
Rebooting
When I try to reboot, the WM session ends fine, but then the system cannot perform the shutdown procedure properly, it just hangs with blank screen. To preserve the integrity of filesystems, I use the magic SysRq key combinations: Alt-SysRq-S to execute emergency sync, Alt-SysRq-U to execute emergency readonly remount of filesystems, Alt-SysRq-B to execute emergency instant reboot.
Reproducability
I weren't able to determine exact factors that trigger this bug, but it may be related to certain activity (the bug seems to be more frequent when I perform intensive tasks, but maybe I'm making things up ;) ).
During everyday work, I see this happen usually 2 times a day, sometimes more frequently.
No related log messages can be seen in /var/log/kernel and /var/log/messages when the system enters this odd state.
Hardware
Since this bug seems to occur randomly and I cannot determine specific activities that would trigger it, there's some probability that it's a hardware bug. So the specs of my hardware are:
- Motherboard: Abit KX7-333R, but for some period of time I exchanged it for an Intel 810 motherboard running a Pentium III
- CPU: Athlon XP 1800, Pentium III (see above)
- RAM: 384MB SDRAM
- Video card: ATI Radeon 9000 PRO, but for some period of time I exchanged it for nVidia Vanta 16 MB
Pros (why this could be a hardware problem):
- The problem occurs quite randomly
Cons (why this couldn't be a hardware problem):
- I've exchanged various hardware parts and the problem didn't go away
- When the problem occurs, all processes that are already running, run fine. New processess can be launched, when they don't depend on a shell. The system can continue running for indefinite time, but it's crippled. So it seems that the kernel goes into a specific state of a deadlock somewhere.
Debugging info collected so far
- Logs of system calls (generated with strace) of various commands that depend on a shell being launched: bash, csh, xterm, mc... All of those logs are attached to
Bug #4709.

