I'm currently toying with the idea of creating a small frontend to SOS.DLL (CDB/WinDbg) and KD. This frontend should simplify some production debugging tasks based on crash dumps. (In my prototype, I'm already happily clicking myself through managed code threads, callstacks, parameters, locals and browsing objects' contents just like in Visual Studio, but solely based on crash dumps.)
A lot of debugging work is in general based on identifying a set of rather common issues. But even if you are tracking down a very common issue, it's still a tough call to work through 25 managed code callstacks to find out - for example - why a certain thread might be blocking a number of other threads. This was the rationale behind some of the features I'm thinking about implementing in the debugging frontend tool: to simplify and automate some manual tasks. But this tool is not (yet) ready for writing about, so lets look at the manual process for finding out whos blocking your thread.
For the analysis of thread blocks based on WaitHandles, I've found two kinds of possible solutions: one which is not 100% deterministic but works on crash dumps (but only for named synchronization objects) and another one which is deterministic but only works on the live system with a kernel debugger. I'll show you both approaches in this post.
First Approach - User Mode only, but works on Dumps
Note: This approach is not deterministic and not accurate but based on
the automated equivalent of trial and error in the form of !DumpStackObjects
. It's
essentially based on looking for WaitHandle references on a callstack and
similar means to get the identifier of a named synchronization object. The only
good news is that it works on dumps generated by ADPlus (and of course on live
systems as well).
Ok. Let's get startet. Suppose you have a process in which several threads end up blocking others for periods longer than you would expect and you basically want to know which threads block on resources held by which other threads. You've created a -hang dump with ADPlus and are ready to examine the dump in WinDbg or CDB.
First, you will need to find out which threads are actually waiting for something. To do this, you first load SOS.DLL and then look at the states of your managed threads:
0:013> .loadby sos mscorwks // load the extension sos.dll in directory of mscorwks 0:013> !threads // list all managed threads ThreadCount: 12 UnstartedThread: 0 BackgroundThread: 1 PendingThread: 0 DeadThread: 0 Hosted Runtime: no PreEmptive GC Alloc Lock ID OSID ThreadOBJ State GC Context Domain Count APT Exception 0 1 b70 00180fd0 200a020 Enabled 00000000:00000000 0014c100 0 MTA 2 2 1b54 00152480 b220 Enabled 00000000:00000000 0014c100 0 MTA (Finalizer) 3 3 1ba0 0019a1c0 200b020 Enabled 00000000:00000000 0014c100 0 MTA 4 4 1f9c 0019c138 200b020 Enabled 00000000:00000000 0014c100 0 MTA 5 5 210 0019cd18 200b020 Enabled 00000000:00000000 0014c100 0 MTA 6 6 13e0 0019d8d0 200b020 Enabled 00000000:00000000 0014c100 0 MTA 7 7 17e0 0019e488 200b020 Enabled 00000000:00000000 0014c100 0 MTA 8 8 1204 0019f040 200b020 Enabled 00000000:00000000 0014c100 0 MTA 9 9 1ec0 0019fbf8 b020 Disabled 00000000:00000000 0014c100 0 MTA 10 a 1a48 001a07b0 b020 Disabled 00000000:00000000 0014c100 0 MTA 11 b 1558 001a1368 200b020 Enabled 00000000:00000000 0014c100 0 MTA 12 c 1d88 001a1f20 200b020 Enabled 00000000:00000000 0014c100 0 MTA
The threads with state 200a020 are currently waiting. As this does not necessarily mean that they are waiting for a WaitHandle (they could also just have called Thread.Sleep()), you now also need to find out which of these blocked threads are actually blocking on a WaitHandle.
To do this, you will have to look at the top managed code stackframe in each of the blocked threads. (Please note: You have to specify the native thread ID for the next command. This ID is contained in the first column of !threads, and is not the CLR thread id contained in the second column!)
0:007> ~0e !clrstack // list the managed call stack of thread 0 (CLR thread #1) OS Thread Id: 0xb70 (0) ESP EIP 0012f420 7c90eb94 [HelperMethodFrame: 0012f420] System.Threading.Thread.SleepInternal(Int32) 0012f474 793d80f5 System.Threading.Thread.Sleep(Int32) 0012f478 00ca00fb ResourceLock.Program.Main(System.String[]) 0012f69c 79e88f63 [GCFrame: 0012f69c]
As you can see in this first example, the specified thread does not block on a WaitHandle, so let's continue our search. (Note: instead of manually using ~<THREAD_NUMBER>e !clrstack for each thread, you could also call ~*e !clrstack, do dump the complete managed code callstacks for all threads.)
0:007> ~7e !clrstack // list the managed call stack of thread 7 (also CLR thread #7) OS Thread Id: 0x17e0 (7) ESP EIP 011cf588 7c90eb94 [HelperMethodFrame_1OBJ: 011cf588] System.Threading.WaitHandle.WaitOneNative(Microsoft.Win32.SafeHandles.SafeWaitHandle, U Int32, Boolean, Boolean) 011cf634 793d424e System.Threading.WaitHandle.WaitOne(Int64, Boolean) 011cf64c 793d4193 System.Threading.WaitHandle.WaitOne(Int32, Boolean) 011cf65c 793d420e System.Threading.WaitHandle.WaitOne() 011cf660 00ca0250 ResourceLock.Program.ThreadProc() 011cf6b4 793d7a7b System.Threading.ThreadHelper.ThreadStart_Context(System.Object) 011cf6bc 793683dd System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) 011cf6d4 793d7b5c System.Threading.ThreadHelper.ThreadStart() 011cf8f8 79e88f63 [GCFrame: 011cf8f8]
Here, the top level call stack elements contain calls to WaitHandle.WaitOne() or WaitOneNative(). This thread is currently waiting for a synchronization object.
To retrieve details of the underlying sync object, you first have to look at the unmanaged callstack for the same thread. You will usually send a command like "kb 5" to the debugger after selecting the current thread with ~7 s.
0:007> ~7 s // select thread 7 if its not the current thread eax=00000000 ebx=011cf350 ecx=79690098 edx=79690048 esi=00000000 edi=7ffde000 eip=7c90eb94 esp=011cf328 ebp=011cf3c4 iopl=0 nv up ei pl nz na po nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000202 ntdll!KiFastSystemCallRet: 7c90eb94 c3 ret 0:007> kb 5 // print five stack frames ChildEBP RetAddr Args to Child 011cf324 7c90e9ab 7c8094f2 00000001 011cf350 ntdll!KiFastSystemCallRet 011cf328 7c8094f2 00000001 011cf350 00000000 ntdll!ZwWaitForMultipleObjects+0xc 011cf3c4 79f8ead4 00000001 011cf614 00000001 KERNEL32!WaitForMultipleObjectsEx+0x12c 011cf42c 79f17522 00000001 011cf614 00000001 mscorwks!WaitForMultipleObjectsEx_SO_TOLERANT+0x6f 011cf44c 79f17493 00000001 011cf614 00000001 mscorwks!Thread::DoAppropriateAptStateWait+0x3c
Please note: This output is what youll receive if you have debug symbols configured correctly. While its definitely not as nice as the one with symbols, you can still get the same information (minus the method names) even if you do not have debug symbols installed:
0:007> kb 5 // print five stack frames ChildEBP RetAddr Args to Child WARNING: Stack unwind information not available. Following frames may be wrong. 011cf3c4 79f8ead4 00000001 011cf614 00000001 ntdll!KiFastSystemCallRet 011cf42c 79f17522 00000001 011cf614 00000001 mscorwks!StrongNameFreeBuffer+0x57fc 011cf44c 79f17493 00000001 011cf614 00000001 mscorwks!CreateAssemblyCache+0x9474 011cf4d0 79f1732f 00000001 011cf614 00000001 mscorwks!CreateAssemblyCache+0x93e5 011cf520 7a07b49c 00000001 011cf614 00000001 mscorwks!CreateAssemblyCache+0x9281
Here, I'll ignore the first three lines in any case, no matter if we have symbols or not. The next line is usually a call to WaitForMultipleObjectEx or a comparable function. This function is defined like this:
DWORD WaitForMultipleObjectsEx(
DWORD nCount,
const HANDLE* lpHandles,
BOOL bWaitAll,
DWORD dwMilliseconds,
BOOL bAlertable
);
In the third, fourth and fifth field of the output of kb 5, we can get the first three arguments to the method. We therefore know that nCount = 1 and lpHandles = 011cf614. This means that the .NET Framework has passed an array of handles which contains 1 (nCount) entry and starts at the location 011cf614 (lpHandles) to WaitForMultipleObjectsEx.
You can then call "dd
<address> L<number_of_handles>"
(for example "dd 011cf614 L00000001"
) to
the debugger to generate a memory dump to retrieve the list of handles.
0:007> dd 011cf614 L1 // dump 1 dword at location 011cf614 011cf614 00000684
In this command, the first field in each line contains the memory address and the subsequent fields (only one in our case) contain the handle to the object you are waiting for.
For each handle, you can then then call "!handle
<handle> f"
, for example "handle 00000684 f"
to retrieve handle
information:
0:007> !handle 00000684 f // dump detailed information (flags = f) for handle 684
Handle 684
Type Mutant
Attributes 0
GrantedAccess 0x1f0001:
Delete,ReadControl,WriteDac,WriteOwner,Synch
QueryState
HandleCount 4
PointerCount 8
Name \BaseNamedObjects\{1CA3A6E4-68E7-4847-9602-5669BC40D01E}
Object Specific Information
Mutex is Owned
In this case, you now know that the thread is waiting for a Mutex (Mutant is the kernel-level name for a Mutex) which has been named "{1CA3A6E4-68E7-4847-9602-5669BC40D01E}".
The next step is to find out who currently owns this mutex. This would be relatively straightforward in a live debug as we could use a kernel debugger to identify the owning thread (which I'll explain later). In a crash dump however, we are somewhat limited as we have to hunt for this information in a trial-and-error way. We are also limited to owner threads in the same process if the named mutex is owned by a thread in another process, we won't be able to get there with our single-process memory dump.
This trial-and-error information retrieval comes in the form of !DumpStackObjects (which can be abbreviated to !dso), an extension command which walks the stack of the selected thread and checks if each value it finds on the stack could be a managed object.
0:007> ~0e !dso // dump stack objects for thread #0 OS Thread Id: 0xb70 (0) ESP/REG Object Name 0012f408 01273450 System.IO.StreamWriter 0012f428 01274768 System.Char[] 0012f478 01271b38 System.Object[] (System.String[]) 0012f534 01271b38 System.Object[] (System.String[]) 0012f6e0 01271b38 System.Object[] (System.String[]) 0012f708 01271b38 System.Object[] (System.String[])
Theres not a lot going on in this thread. Certainly no Mutex here, let's go on:
0:007> ~4e !dso // dump stack objects for thread #4 OS Thread Id: 0x1f9c (4) ESP/REG Object Name 00ecf748 01273b30 System.String Thread 4 sleeping 00ecf770 01273450 System.IO.StreamWriter 00ecf790 01273b64 System.Char[] 00ecf804 01273a40 System.Threading.ThreadStart 00ecf808 01273a40 System.Threading.ThreadStart 00ecf80c 01273a94 System.Threading.ThreadHelper 00ecf814 01271db8 System.String Thread 00ecf818 01273a94 System.Threading.ThreadHelper 00ecf81c 01271d98 System.Threading.ContextCallback 00ecf820 01273a94 System.Threading.ThreadHelper 00ecf82c 01273a94 System.Threading.ThreadHelper 00ecf834 01273ac8 System.Threading.ExecutionContext 00ecf844 01273ac8 System.Threading.ExecutionContext 00ecf848 01273a94 System.Threading.ThreadHelper 00ecf850 01273a94 System.Threading.ThreadHelper 00ecf914 01273aa8 System.Threading.ThreadStart 00ecfac0 01273aa8 System.Threading.ThreadStart 00ecfad0 01273aa8 System.Threading.ThreadStart
Thread #4 looks more interesting but contains no reference to any synchronization object, let's continue again:
0:007> ~3e !dso // dump stack objects for thread #3 OS Thread Id: 0x1ba0 (3) ESP/REG Object Name 00daf7c8 012739d8 System.String Thread 3 sleeping 00daf7f0 01273450 System.IO.StreamWriter 00daf810 01273a0c System.Char[] 00daf880 0127383c System.Threading.Mutex 00daf884 01271db8 System.String Thread 00daf888 01271db8 System.String Thread 00daf88c 01271db8 System.String Thread 00daf898 01271cb8 System.Threading.ThreadHelper 00daf89c 01271d98 System.Threading.ContextCallback 00daf8a0 01271cb8 System.Threading.ThreadHelper 00daf8ac 01271cb8 System.Threading.ThreadHelper 00daf8b4 01271d2c System.Threading.ExecutionContext 00daf8c4 01271d2c System.Threading.ExecutionContext 00daf8c8 01271cb8 System.Threading.ThreadHelper 00daf8d0 01271cb8 System.Threading.ThreadHelper 00daf994 01271ccc System.Threading.ThreadStart 00dafb40 01271ccc System.Threading.ThreadStart 00dafb50 01271ccc System.Threading.ThreadStart
At first glance, thread #3 looks innocent too. Only when examining it more closely, you'll find a Mutex as the fourth stack object. (Of course, this is now total guesswork as we don't know if the thread ever really acquired the mutex or if it is just looking at it. The thread might also have already released the mutex as well but I don't know if we can find an answer to these questions without looking at kernel memory.)
First, lets look if the method acquires a mutex at all. To do this, we first need to find out the method which owns this section of the stack. To do this, we take the value of ESP for our Mutex (which is the address on the stack which references the mutex) which is 00daf880 in our case and compare it to the managed code stack dump of the same thread:
0:007> ~3 s // select thread #3 eax=7a38cf40 ebx=00000000 ecx=79ed75c0 edx=fbffffff esi=00000000 edi=00daf700 eip=7c90eb94 esp=00daf6d0 ebp=00daf728 iopl=0 nv up ei pl nz na pe nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000206 ntdll!KiFastSystemCallRet: 7c90eb94 c3 ret 0:003> !clrstack // dump the managed code stack for this method OS Thread Id: 0x1ba0 (3) ESP EIP 00daf808 7c90eb94 [HelperMethodFrame: 00daf808] System.Threading.Thread.SleepInternal(Int32) 00daf85c 793d80f5 System.Threading.Thread.Sleep(Int32) 00daf860 00ca02c3 ResourceLock.Program.ThreadProc() 00daf8b4 793d7a7b System.Threading.ThreadHelper.ThreadStart_Context(System.Object) 00daf8bc 793683dd System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) 00daf8d4 793d7b5c System.Threading.ThreadHelper.ThreadStart() 00dafaf8 79e88f63 [GCFrame: 00dafaf8]
In this stack dump, you can see that the stack address of the reference to the mutex (00daf880) lies between the third and the fourth frame's ESP. You can deduce that the object is created in the third frame, in the method ResourceLock.Program.ThreadProc().
To verify this, you can have a look at the IL code of this method. First you need to convert the Instruction Pointer (IP) to a method description. The IP is 00ca02c3 according to the output of !clrstack (second field in the stack frame for ThreadProc()).
0:003> !IP2MD 00ca02c3 // convert an instruction pointer (IP) to a method description (MethodDesc) MethodDesc: 00912ff8 Method Name: ResourceLock.Program.ThreadProc() Class: 009112c4 MethodTable: 00913010 mdToken: 06000003 Module: 00912c14 IsJitted: yes m_CodeOrIL: 00ca0188
The resulting MethodDesc is 00912ff8. You can use this a parameter to !DumpIL to retrieve the IL code of this method:
0:003> !dumpil 00912ff8 // get the IL code for a method ilAddr = 004020b4 IL_0000: nop IL_0001: call System.Threading.Thread::get_CurrentThread ... IL_004a: newobj System.Threading.Mutex::.ctor IL_004f: stloc.2 IL_0050: ldloc.2 IL_0051: callvirt System.Threading.WaitHandle::WaitOne
IL_0044 to IL_51 roughly translate to the following C# fragment:
Mutex mtx = new Mutex (...);
mtx.WaitOne();
So, yes, the method definitely acquires at least one mutex at one point. But to find out if the the Mutex we've found on the stack really references the same named synchronization object the other thread is blocking on, we need to examine it more closely. First, let's have a look at the contents of the mutex by using !DumpObj (which we abbreviate to !do).
0:003> !do 0127383c // dump the mutex. Address is from ~3e !dso Name: System.Threading.Mutex MethodTable: 7910d018 EEClass: 7910cfa0 Size: 24(0x18) bytes (C:\WINDOWS\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll) Fields: MT Field Offset Type VT Attr Value Name 790f9c18 4000184 4 System.Object 0 instance 00000000 __identity 790fe160 40005a6 c System.IntPtr 0 instance 1752 waitHandle 7910cf3c 40005a7 8 ...es.SafeWaitHandle 0 instance 012738d8 safeWaitHandle 79104f64 40005a8 10 System.Boolean 0 instance 1 hasThreadAffinity 790fe160 40005a9 984 System.IntPtr 0 shared static InvalidHandle >> Domain:Value 0014c100:-1 << 79104f64 40005ee 990 System.Boolean 0 shared static dummyBool >> Domain:Value 0014c100:NotInit << 7910d018 40005ef 14c ...m.Threading.Mutex 0 shared static s_ReservedMutex >> Domain:Value 0014c100:NotInit <<
The second field is called waitHandle and contains the value 1752. This is the decimal handle to the underlying mutex. Lets convert this to hex (6d8) and have a look at the handle:
0:003> !handle 6d8 f // look at handle 6d8 (decimal 1752) Handle 6d8 Type Mutant Attributes 0 GrantedAccess 0x1f0001: Delete,ReadControl,WriteDac,WriteOwner,Synch QueryState HandleCount 4 PointerCount 8 Name \BaseNamedObjects\{1CA3A6E4-68E7-4847-9602-5669BC40D01E} Object Specific Information Mutex is Owned
As you can see, thread #3 acquired the mutex with the name {1CA3A6E4-68E7-4847-9602-5669BC40D01E} using wait handle 6d8, and this is the exact same named mutex which thread #7 is waiting for on wait handle 684. It seems that we got lucky this time and could find the mutex in our crash dump. Whats left as an exercise is to dump the !clrstack of thread #3 and examine the top stackframes using !IP2MD and !DumpIL to find out why the mutex has not been released in time.
The Deterministic Second Approach - or: Now give me my Kernel Debugger!
If someone would have told me just two years ago that I'd ever start a kernel debugger, I wouldn't have believed him: after all, I'm not writing device drivers and that's all you need KD.EXE for and it's quite difficult to setup kernel debugging in the first place, right? Well, turns out that neither is the case.
If you only want to examine kernel structures on your local machine, you can start a kernel debugger as easily as KD.EXE /KL (and it's part of the same Debugging Tools for Windows package which contains WinDbg and CDB). Theres no need to setup a second machine and connect the debugger to it or anything like this. The only really important part is that contrary to WinDbg and CDB you absolutely have to have debug symbols configured correctly for what we are doing now. Usually its enough to just create a directory like c:\symbols, set _NT_SYMBOL_PATH to the matching entry SRV*c:\symbols*http://msdl.microsoft.com/download/symbols and start KD. The necessary symbols will be downloaded automatically behind the scenes whenever they are needed for the first time. (Note that this also means that KD will block from time to time while it's downloading symbols on demand.)
C:\debug>set _NT_SYMBOL_PATH=SRV*C:\symbols*http://msdl.microsoft.com/download/symbols C:\debug>kd /KL Microsoft (R) Windows Debugger Version 6.6.0007.5 Copyright (c) Microsoft Corporation. All rights reserved. Connected to Windows XP 2600 x86 compatible target, ptr64 FALSE Symbol search path is: SRV*C:\symbols*http://msdl.microsoft.com/download/symbols Executable search path is: ******************************************************************************* WARNING: Local kernel debugging requires booting with /debug to work optimally. ******************************************************************************* Windows XP Kernel Version 2600 (Service Pack 2) UP Free x86 compatible Product: WinNt, suite: TerminalServer SingleUserTS Built by: 2600.xpsp_sp2_gdr.050301-1519 Kernel base = 0x804d7000 PsLoadedModuleList = 0x8055a420 Debug session time: Sat Aug 5 11:19:11.543 2006 (GMT+2) System Uptime: 6 days 14:14:43.119 lkd>
Well, allright then. Let's start to hunt for the mutex. And don't forget: you can run WinDbg/CDB *and* KD at the same time to debug the user-mode and the kernel-mode parts side by side!
First, we need to get information about the process which we are debugging. The EXE in my example is called ResourceLock.exe, but you can use the !process command in KD to retrieve the necessary entry point for any process on your machine: (I'll use red color coding for KD commands and blue color coding for CDB commands)
lkd> !process 0 0 ResourceLock.exe // retrieve information for all processes running ResourceLock.exe PROCESS 8900e020 SessionId: 0 Cid: 1cd0 Peb: 7ffde000 ParentCid: 12a0 DirBase: 0cb83000 ObjectTable: e72125c8 HandleCount: 122. Image: ResourceLock.exe
The !process command can also be used to retrieve additional information. It accepts two numeric parameters: the process address and a list of flags. If you specify the process address we've retrieved above (8900e020) and the value 2 for flags, you'll get detailed information about the individual threads and their synchronization objects:
Important: If you are running WinDbg/CDB and KD side-by-side for the same process, you must switch WinDbg/CDB in to run mode (F5 or g) to retrieve this information.
lkd> !process 8900e020 2 // retrieve thread info for process PROCESS 8900e020 SessionId: 0 Cid: 1cd0 Peb: 7ffde000 ParentCid: 12a0 DirBase: 0cb83000 ObjectTable: e72125c8 HandleCount: 122. Image: ResourceLock.exe THREAD 89285020 Cid 1cd0.0b70 Teb: 7ffdd000 Win32Thread: e52fe420 WAIT: (DelayExecution) UserMode Alertable 89285110 NotificationTimer THREAD 8a02f280 Cid 1cd0.0268 Teb: 7ffdc000 Win32Thread: 00000000 WAIT: (UserRequest) UserMode Non-Alertable 891da5f8 SynchronizationEvent 8a13d6f8 SynchronizationEvent 89d50020 SynchronizationEvent THREAD 88f9fd58 Cid 1cd0.1b54 Teb: 7ffdb000 Win32Thread: 00000000 WAIT: (UserRequest) UserMode Non-Alertable 8a2fe310 NotificationEvent 8974b6f0 SynchronizationEvent THREAD 89d86da8 Cid 1cd0.1ba0 Teb: 7ffda000 Win32Thread: 00000000 WAIT: (DelayExecution) UserMode Alertable 89d86e98 NotificationTimer THREAD 8902e020 Cid 1cd0.1f9c Teb: 7ffd9000 Win32Thread: 00000000 WAIT: (DelayExecution) UserMode Alertable 8902e110 NotificationTimer THREAD 87b86020 Cid 1cd0.0210 Teb: 7ffd8000 Win32Thread: 00000000 WAIT: (DelayExecution) UserMode Alertable 87b86110 NotificationTimer THREAD 8971b828 Cid 1cd0.13e0 Teb: 7ffd7000 Win32Thread: 00000000 WAIT: (DelayExecution) UserMode Alertable 8971b918 NotificationTimer THREAD 87ba5020 Cid 1cd0.17e0 Teb: 7ffd6000 Win32Thread: 00000000 WAIT: (UserRequest) UserMode Alertable 89d48fc0 Mutant - owning thread 89d86da8 THREAD 88724da8 Cid 1cd0.1204 Teb: 7ffd5000 Win32Thread: 00000000 WAIT: (DelayExecution) UserMode Alertable 88724e98 NotificationTimer THREAD 8869e548 Cid 1cd0.1ec0 Teb: 7ffd4000 Win32Thread: 00000000 READY THREAD 88e28020 Cid 1cd0.1a48 Teb: 7ffaf000 Win32Thread: 00000000 READY THREAD 88ab5020 Cid 1cd0.1558 Teb: 7ffae000 Win32Thread: 00000000 WAIT: (DelayExecution) UserMode Alertable 88ab5110 NotificationTimer THREAD 88827360 Cid 1cd0.1d88 Teb: 7ffad000 Win32Thread: 00000000 WAIT: (UserRequest) UserMode Alertable 89d48fc0 Mutant - owning thread 89d86da8
Fantastic, isn't it? Everything we've been looking for with trial-and-error analysis with WinDbg/CDB is contained in this single output.
But let's just look at the three important threads for this list (the two which have a Mutant in their list of kernel objects, and the thread which is specified as the owner for this mutant):
THREAD 89d86da8 Cid 1cd0.1ba0 Teb: 7ffda000 Win32Thread: 00000000 WAIT: (DelayExecution) UserMode Alertable
89d86e98 NotificationTimer
THREAD 87ba5020 Cid 1cd0.17e0 Teb: 7ffd6000 Win32Thread: 00000000 WAIT: (UserRequest) UserMode Alertable
89d48fc0 Mutant - owning thread 89d86da8
THREAD 88827360 Cid 1cd0.1d88 Teb: 7ffad000 Win32Thread: 00000000 WAIT: (UserRequest) UserMode Alertable
89d48fc0 Mutant - owning thread 89d86da8
WAIT: (DelayExecution) in the first line means that the thread is sleeping. The next line indicates the kernel object which the thread is waiting for. It is a NotificationTimer and its a safe bet to assume that this is the timer which has caused the DelayExecution in the first place. By the way: If you've forgotten to switch your user mode debugger into run mode - just like it happens to me all the time - your threads will be marked as WAIT: (Suspended) KernelMode Non-Alertable. In this case, just hit F5 or g in your user mode debugger and re-try the !process command.
The second and third thread in this list are both in WAIT: (UserRequest). The synchronization object is in both cases the Mutant which is currently owned by thread 89d86da8. This owner thread is the first thread in our list. To match this information back to our user-mode information, we need to look at the thread's Cid, which is 1cd0.1ba0 in our case. Of this Cid, the first part (1cd0) is the Process ID and the second part (1ba0) is the Thread ID.
If you then switch back to the user mode debugger and break into the application (CTRL+BREAK), you can find the corresponding CLR thread by using !threads:
0:013> !threads // list all CLR threads ThreadCount: 12 UnstartedThread: 0 BackgroundThread: 1 PendingThread: 0 DeadThread: 0 Hosted Runtime: no PreEmptive GC Alloc Lock ID OSID ThreadOBJ State GC Context Domain Count APT Exception 0 1 b70 00180fd0 200a020 Enabled 00000000:00000000 0014c100 0 MTA 2 2 1b54 00152480 b220 Enabled 00000000:00000000 0014c100 0 MTA (Finalizer) 3 3 1ba0 0019a1c0 200b020 Enabled 00000000:00000000 0014c100 0 MTA 4 4 1f9c 0019c138 200b020 Enabled 00000000:00000000 0014c100 0 MTA 5 5 210 0019cd18 200b020 Enabled 00000000:00000000 0014c100 0 MTA 6 6 13e0 0019d8d0 200b020 Enabled 00000000:00000000 0014c100 0 MTA 7 7 17e0 0019e488 200b020 Enabled 00000000:00000000 0014c100 0 MTA 8 8 1204 0019f040 200b020 Enabled 00000000:00000000 0014c100 0 MTA 9 9 1ec0 0019fbf8 b020 Disabled 00000000:00000000 0014c100 0 MTA 10 a 1a48 001a07b0 b020 Disabled 00000000:00000000 0014c100 0 MTA 11 b 1558 001a1368 200b020 Enabled 00000000:00000000 0014c100 0 MTA 12 c 1d88 001a1f20 200b020 Enabled 00000000:00000000 0014c100 0 MTA
The third column in the thread list is called OSID and contains the operating system thread id. If you look for the thread ID 1ba0 which we retrieved from the kernel debugger's Cid, you will find that thread #3 is the one who blocks the Mutex. This is the same information which we got from the first approach, but now its a 100% deterministic and a lot simpler as well. I just needed to let go of my fear of kernel debugging before venturing in this direction ...
Hi, Just wanted to thank you for this article. We had a situation with the DBWinMutex (OutputDebugString) was being held and locking up our process during debugging. I used the kernel debugger commands from this post to find the owning thread of the mutex, and then used a !thread kernel-thread-id command to output the user-mode process.threadid value, which we then used to figure out which process was holding onto the mutex. Turned out to be a 3rd-party exe, and we were able to move past the issue. Thanks again for the information.
Posted by: Kevin Tambascio | 01/11/2008 at 07:21 PM