Home > Memory issues, Windows debugging tricks > How to troubleshot native memory leaks on Windows: GFlags and UMDH

How to troubleshot native memory leaks on Windows: GFlags and UMDH

Memory issues are amongst the worst one to solve because pointing precisely the source is often difficult and painful. Memory leaks are not an exception, especially with real-world application: most of the time, programmers start to worry about it when the application outputs some “out of memory” errors. At this moment, you have to find which one, among thousands of functions and many more allocated blocks, causes the application to leak and eventually to crash.

Let’s summarize what you really need when you have a memory leak (in addition to a way to reproduce the issue):

  • You want to find which object(s) are leaking
  • You want to know why they are leaking: is there some static reference to it, or maybe they are not freed?

The process described today deals with the first one, which is often the most difficult.

As full .NET application have their allocations tracked by the CLR (heard about the garbage collector?), a good profiler, such as the free CLRProfiler or the not-so-expensive .NET Memory Profiler or DotTrace for more complex cases, can help you point the issue. With this, and the ‘!GCRoot’ command of SOS in Windbg, you can always track down the issue.

Things get harder when native code is involved, and it’s often the case, even if you don’t know it. The CLR allocates native memory (I mean memory not handled by the garbage collector) for some operations, and if you are working on big software, chances are that some third-party use such memory. Track those allocations is always difficult and time-consuming, and you often ends up with some expensive native memory profiler that have a major impact on performances when you enable it. I will show you today a quite simple (and not well-known) method that I used at work to hunt memory allocations. It requires just the symbol files of your application (you should always have them, read this if you’re not sure why), a few tools to dig down the data generated, and have a performance penalty small enough to think about temporary use it in production processes.

Start the allocation stack trace collecting mode

The first thing we have to do is inform the heap service of Windows that we want to track down allocations for a specific process. Once again, it’s the magic tool GFlags that we have to use (available in the Debugging Tools for Windows package, that you can easily download here). Start it, and fill the fields as it is shown (of course replace ‘AllocateSomeCppData.exe’ by your application process name):

GFlags

By checking the “Create user mode stack trace database”, you notify the Windows’ heap service that it has to record the call stack for each allocation done in the heap. If we agree that a leak is something that is not freed by your program while it is running, the steps are the following:

  • Reproduce the issue, many times, in the same process, so you ends up with a process with lots of private bytes, among which most bytes come from the leak
  • Find which call stack(s) allocated most of the memory
  • Go to your code, and check why the data allocated at the line X of the file Y (you got this information with at least one frame of the call stack and of course with symbol files including source code information) is not released, may it be a missing ‘delete’ or a store in some static container.

The first step may be or not be easy, but every diagnostic process starts with it: reproduce the issue! The second step is handled by GFlags/UMDH, and the third one is all up to you. Now that you’re convinced (I hope so?) that having the call stacks that allocates most of the memory is mandatory, and sometimes sufficient, to track down memory leaks, here is the next part of the process: use UMDH.

Use UMDH to get the allocation data

UMDH (also available in the Debugging Tools for Windows package) can take a snapshot of the allocation data at a specific time, and also compare two snapshots. You first have to ask yourself: what is the best time to take my snapshots? My advice would be to follow those steps:

  • Start your program, and do what you need to let the leak appears, but stop just before the leak indeed happen
  • Take the first snapshot
  • Reproduce the leak, many times if you can, until a large part of the private bytes allocated by the process comes from the leak (use Process Explorer to watch your process’ memory profile)
  • Take the second snapshot
  • Compare the two snapshots
  • Then you can kill your process if you need, not before! (by the way, if you want some managed interpretation of the collected data, just don’t kill it)

Now that you got the idea, let’s do it: after the setup of GFlags, start you process. Once you want to take a snapshot, start a command line. Beware that for every use of UMDH, the environment variable _NT_SYMBOL_PATH must be set! This variable contains your symbol path, and it should look like this:

  • ‘srv*<some local cache folder>*http://msdl.microsoft.com/Download/Symbols’ if you don’t have a specific symbol server for your software
  • ‘srv*<some local cache folder>*<your symbol server path>;srv*<some local cache folder>*http://msdl.microsoft.com/Download/Symbols’ if you have a symbol server

Whether you define this environment variable at system level, or only in your command line, make sure it is set with the right information (to understand the symbol path syntax, you can peek some information here).

You’re ready to take your first snapshot. At the command line prompt, type:

C:\Debugging Tools for Windows>umdh –p:<pid of your process> -f:Snapshot0.txt

Play with your software, as explained previously, make the leak appears as many times as you can, and take a second snapshot:

C:\Debugging Tools for Windows>umdh –p:<pid of your process> -f:Snapshot1.txt

Make the comparison:

C:\Debugging Tools for Windows>umdh –d Snapshot0.txt Snapshot1.txt -f:Result.txt

The ‘-d’ option stands for decimal in output: although I use hexadecimal for anything related to memory address, I just prefer decimal when it comes to counting stuff. So, what do you have in the result.txt file? A bunch of symbol search log at the beginning, and the call stacks ordered by amount of bytes allocated (and not freed) looking like this:

+ 134217748 ( 134217748 –      0)      1 allocs    BackTraceB78AC
+       1 (      1 –      0)    BackTraceB78AC    allocations

ntdll!RtlAllocateHeap+00000274
MSVCR100D!_heap_alloc_base+00000053 (f:\dd\vctools\crt_bld\self_x86\crt\src\malloc.c, 55)
MSVCR100D!_heap_alloc_dbg_impl+000001FC (f:\dd\vctools\crt_bld\self_x86\crt\src\dbgheap.c, 431)
MSVCR100D!_nh_malloc_dbg_impl+0000001F (f:\dd\vctools\crt_bld\self_x86\crt\src\dbgheap.c, 239)
MSVCR100D!_nh_malloc_dbg+0000002C (f:\dd\vctools\crt_bld\self_x86\crt\src\dbgheap.c, 302)
MSVCR100D!malloc+0000001B (f:\dd\vctools\crt_bld\self_x86\crt\src\dbgmalloc.c, 56)
MSVCR100D!operator new+00000011 (f:\dd\vctools\crt_bld\self_x86\crt\src\new.cpp, 59)
AllocateSomeCppData!operator new[]+0000000E (f:\dd\vctools\crt_bld\self_x86\crt\src\newaop.cpp, 6)

AllocateSomeCppData!wmain+0000009A (c:\users\julien\bugslasher\myfriendumdh\allocatesomecppdata\allocatesomecppdata.cpp, 24)

Here I can see that the line 24 of my file ‘AllocateSomeCppData.cpp’ made 1 allocation of 134.217.748 bytes. But let’s describe exactly what you can obtain with some dumb little C++ program.

A step by step investigation with a real (but very small) C++ code

Here is the C++ code that I’m going to use for the demo:

#include "stdafx.h"
#include <string>
#include <iostream>

int _tmain(int argc, _TCHAR* argv[])
{
	std::cout << "Press a key to start allocating memory" << std::endl;
	std::cin.get();

	while(true)
	{
		new double[100000];
		new std::string[30000];

		std::string* p = new std::string[100000];
		delete [] p;

		std::cout << "Press a key to start again" << std::endl;
		std::cin.get();
	}

	return 0;
}

As you can see, it has a main infinite loop that allocates some memory each time the user press a key. You can also notice that an array of 1.000 double and an array of 300 std::string are allocated (which one do you thing is the biggest?) but not stored (hence not ‘freeable’), and another array of 10.000 of std::string is allocated and freed just after. Of course we expect UMDH to give us a leak at line 12 and line 13, with the biggest leak first, and we also expect no report coming from line 15, because the array is freed and should not appear as a leak.

Just to be sure that this code leaks, here is the Process Explorer memory profile of it:

ProcessExplorerMemoryProfile

Remember to configure GFlags before starting the process as explained in the beginning of the post, and check your _NT_SYMBOL_PATH environment variable!

Here are the command lines that I typed (those are explained before), the first one for the first snapshot, the second one for the second snapshot, and the third one at the end:

C:\Debugging Tools for Windows>umdh –p:<pid of your process> -f:Snapshot0.txt
C:\Debugging Tools for Windows>umdh –p:<pid of your process> -f:Snapshot1.txt
C:\Debugging Tools for Windows>umdh –d Snapshot0.txt Snapshot1.txt :f:Result.txt

Let’s have a look at the ‘result.txt’ file. The first lines are some information about the symbol resolving process:

// Debug library initialized …
DBGHELP: AllocateSomeCppData – private symbols & lines
C:\BugSlasher\MyFriendUMDH\AllocateSomeCppData\Debug\AllocateSomeCppData.pdb
DBGHELP: ntdll – public symbols
C:\Symbols\wntdll.pdb\DCCFF2D483FA4DEE81DC04552C73BB5E2\wntdll.pdb

You should check if everything is ok here: remember that if some symbol file is not resolved, you will not have the function name of all the call stack’s frame located in the matching dll, and maybe worst, the remaining of the call stack after a call of this dll will probably be wrong (symbol files are needed to properly walk a call stack, read this to learn about FPO). Just after is a reminder of the data format:

//
// Each log entry has the following syntax:
//
// + BYTES_DELTA (NEW_BYTES – OLD_BYTES) NEW_COUNT allocs BackTrace TRACEID
// + COUNT_DELTA (NEW_COUNT – OLD_COUNT) BackTrace TRACEID allocations
//     … stack trace …
//
// where:
//
//     BYTES_DELTA – increase in bytes between before and after log
//     NEW_BYTES – bytes in after log
//     OLD_BYTES – bytes in before log
//     COUNT_DELTA – increase in allocations between before and after log
//     NEW_COUNT – number of allocations in after log
//     OLD_COUNT – number of allocations in before log
//     TRACEID – decimal index of the stack trace in the trace database
//         (can be used to search for allocation instances in the original
//         UMDH logs).
//

With this you should be able to understand the remaining of the file (I put only the first two call stacks, the third one is about internals of the std::string object, and is not relevant here):

+ 5760144 ( 5760144 –      0)      6 allocs    BackTrace1178AC
+       6 (      6 –      0)    BackTrace1178AC    allocations

ntdll!RtlAllocateHeap+00000274
MSVCR100D!_heap_alloc_base+00000053 (f:\dd\vctools\crt_bld\self_x86\crt\src\malloc.c, 55)
MSVCR100D!_heap_alloc_dbg_impl+000001FC (f:\dd\vctools\crt_bld\self_x86\crt\src\dbgheap.c, 431)
MSVCR100D!_nh_malloc_dbg_impl+0000001F (f:\dd\vctools\crt_bld\self_x86\crt\src\dbgheap.c, 239)
MSVCR100D!_nh_malloc_dbg+0000002C (f:\dd\vctools\crt_bld\self_x86\crt\src\dbgheap.c, 302)
MSVCR100D!malloc+0000001B (f:\dd\vctools\crt_bld\self_x86\crt\src\dbgmalloc.c, 56)
MSVCR100D!operator new+00000011 (f:\dd\vctools\crt_bld\self_x86\crt\src\new.cpp, 59)
AllocateSomeCppData!operator new[]+0000000E (f:\dd\vctools\crt_bld\self_x86\crt\src\newaop.cpp, 6)
AllocateSomeCppData!wmain+000000A7 (c:\bugslasher\myfriendumdh\allocatesomecppdata\allocatesomecppdata.cpp, 13)
AllocateSomeCppData!__tmainCRTStartup+000001BF (f:\dd\vctools\crt_bld\self_x86\crt\src\crtexe.c, 552)
AllocateSomeCppData!wmainCRTStartup+0000000F (f:\dd\vctools\crt_bld\self_x86\crt\src\crtexe.c, 371)
kernel32!BaseThreadInitThunk+0000000E
ntdll!__RtlUserThreadStart+00000070
ntdll!_RtlUserThreadStart+0000001B

+ 4800120 ( 4800120 –      0)      6 allocs    BackTrace117818
+       6 (      6 –      0)    BackTrace117818    allocations

ntdll!RtlAllocateHeap+00000274
MSVCR100D!_heap_alloc_base+00000053 (f:\dd\vctools\crt_bld\self_x86\crt\src\malloc.c, 55)
MSVCR100D!_heap_alloc_dbg_impl+000001FC (f:\dd\vctools\crt_bld\self_x86\crt\src\dbgheap.c, 431)
MSVCR100D!_nh_malloc_dbg_impl+0000001F (f:\dd\vctools\crt_bld\self_x86\crt\src\dbgheap.c, 239)
MSVCR100D!_nh_malloc_dbg+0000002C (f:\dd\vctools\crt_bld\self_x86\crt\src\dbgheap.c, 302)
MSVCR100D!malloc+0000001B (f:\dd\vctools\crt_bld\self_x86\crt\src\dbgmalloc.c, 56)
MSVCR100D!operator new+00000011 (f:\dd\vctools\crt_bld\self_x86\crt\src\new.cpp, 59)
AllocateSomeCppData!operator new[]+0000000E (f:\dd\vctools\crt_bld\self_x86\crt\src\newaop.cpp, 6)
AllocateSomeCppData!wmain+00000094 (c:\bugslasher\myfriendumdh\allocatesomecppdata\allocatesomecppdata.cpp, 12)
AllocateSomeCppData!__tmainCRTStartup+000001BF (f:\dd\vctools\crt_bld\self_x86\crt\src\crtexe.c, 552)
AllocateSomeCppData!wmainCRTStartup+0000000F (f:\dd\vctools\crt_bld\self_x86\crt\src\crtexe.c, 371)
kernel32!BaseThreadInitThunk+0000000E
ntdll!__RtlUserThreadStart+00000070
ntdll!_RtlUserThreadStart+0000001B

Of course the call stacks contain all the functions used, from the start of the process to the very call of the win32 heap service, but don’t be scared by the quantity of data that you can see, just focus on what is important: the amount of bytes allocated between each call, and the frame in the call stacks that belong to your code (I highlighted them).

A résumé of this result.txt file is:

  • The line 13 of AllocateSomeCppData.cpp (‘new std::string[30000];’) allocated most of the data: 5.760.144 bytes, in six calls. That is fully consistent with the fact that I press 6 times a key, if you know that I compiled in unicode char format and that a std::wstring allocate 16 wchar = 32 bytes in its own data structure. So 6 * 30000 * 32 = 5.760.000. The remaining 144 bytes is an overhead induced by the stack trace collecting mode.
  • The line 12 of AllocateSomeCppData.cpp (‘new double[100000];’) also allocated some blocks: 4.800.120 bytes, in six calls also. It is 6 * 100.000 * 8 (a double is 8 bytes) = 4.800.000, and 120 bytes of overhead.

You got your leak! In this particular case, it was quite easy to find just by looking at the code, but in real world it is rarely so easy. You may notice that, since the data allocated at line 15 (‘std::string* p = new std::string[100000];’) is deleted at line 16, it does not appear in the file. Those data are only about memory that is not freed, hence the validity of the method to find leaks.

I’m very happy with those call stacks, but I only see native frames, does it work with managed code?

Well, yes and no. This tool tracks native allocations, so any ‘new’ made in .NET will NOT appears here (.NET do not use the heap service, it allocates big blocks of memory with a call to VirtualAlloc, and cut some chunks in it). But, if you call a native function in .NET (with P/Invoke, with some CLI code, of with a COM call), and if this function allocate some native data, you will see it. Let’s say that we have a call to the function System.Runtime.InteropServices.Marshal.AllocHGlobal, who is dedicated to allocate native memory in any .NET code, with 10.000.000 bytes as argument. Here is what you got with UMDH:

+ 10000000 ( 10000000 –      0)      1 allocs    BackTrace45EED8
+       1 (      1 –      0)    BackTrace45EED8    allocations

ntdll!RtlAllocateHeap+00000274
KERNELBASE!LocalAlloc+0000005F
mscorlib.ni!???+00000000 : 667B56D6
mscorlib.ni!???+00000000 : 667AF7F1
clr!CallDescrWorker+00000033

The 10.000.000 bytes allocation is indeed here, though a call to ntdll!RtlAllocateHeap from the function KERNELBASE!LocalAlloc. Those lines can be understood. But what the **** is the line ‘mscorlib.ni!???+00000000 : 667B56D6’ ?

Though it seems to be a bunch of crappy characters, this line has a sense: first it says that this frame is located in mscorlib.ni. This name is a placeholder created by the .NET framework to put the jitted code of mscorlib.dll, the very dll that defines our function AllocHGlobal. So we know that this frame is some function of mscorlib.dll. Plus, the number at the end (667B56D6) gives us the address inside the jitted code of this function where KERNELBASE!LocalAlloc is called. Can we extract a name from this? The answer is yes.

Just start WinDbg, attach to your running process (you have to use the same process for UMDH and WinDbg, so don’t kill it in between), make sure you have all the symbols loaded (or issue the command ‘.sympath srv*<some local cache folder>*http://msdl.microsoft.com/Download/Symbols’ to be sure), load SOS (‘.loadby sos clr’ for .NET 4.0 of ‘.loadby sos mscorwks’ for previous versions) and issue the following command:

!IP2MD 667B56D6

MethodDesc:   665f867c
Method Name:  System.Runtime.InteropServices.Marshal.AllocHGlobal(IntPtr)
Class:        665d79a4
MethodTable:  6689d4a4
mdToken:      06003cbc
Module:       66581000
IsJitted:     yes
CodeAddr:     667b5694
Transparency: Critical

Here is our method name! Can we do the same for the other one?

!IP2MD 667AF7F1

MethodDesc:   665f8688
Method Name:  System.Runtime.InteropServices.Marshal.AllocHGlobal(Int32)
Class:        665d79a4
MethodTable:  6689d4a4
mdToken:      06003cbd
Module:       66581000
IsJitted:     yes
CodeAddr:     667af7d8
Transparency: Critical

Well that makes perfect sense, because the function AllocHGlobal(Int32) indeed calls its overload AllocHGlobal(IntPtr), use ildasm or Reflector on mscorlib.dll to convince yourself. But you can see here the limits of this tip: we were lucky here to have 2 valid .NET frames, most of the time you will find only one, and sometimes 0. In this example, I can’t find my Main function where I wrote the call to AllocHGlobal. So you don’t have the full stack, but you have the first frame(s), and sometimes it’s enough!

Conclusion

The combination of GFlags and UMDH is quite powerful to find memory leaks, and even to profile an application for low memory use. But before using it, make sure that the leak is native, and not managed! I will soon write something about how to make the difference, in the meantime you can remember one ‘word’: perfmon.exe.

Happy debugging.

  1. Mike
    December 31, 2011 at 2:00 pm

    Thank you! Been searching forever to find a proper walkthru on tracing native leaks back to managed code!

  2. January 21, 2012 at 12:24 pm

    This is one of the best article for searching memory leak in native codes. I am using this approach from many years.

  3. Andrew
    November 11, 2013 at 11:16 am

    the string:
    C:\Debugging Tools for Windows>umdh –d Snapshot0.txt Snapshot1.txt :f:Result.txt
    need to fix to:
    C:\Debugging Tools for Windows>umdh –d Snapshot0.txt Snapshot1.txt -f:Result.txt

    • November 11, 2013 at 8:34 pm

      tks, typo fixed

      • Andrew
        November 14, 2013 at 9:47 am

        Thank you, your article is in my C++ bookmarks. I have used it many times.

  4. Amit Priyadarshi
    September 16, 2014 at 10:10 am

    Hi Julien,

    I am trying to hunt down reason for memory build up in my application (native code on windows)
    In umdh when i take a diff between logs i find call stacks arranged by decreasing size of allocations.

    in the topmost I have a stack that sought most allocation (call stack say callStack1). When i scroll to bottom i find the same call stack ie callStack1, with negative allocation (is it deallocation?)

    If it is deallocation then typically it should have similar magnitude as allocation?? Please correct me if I am wrong.
    I am stating this because you would use the same handle (pointer for deallocation) as the one you used while allocating memory.

    Please advise

    • September 30, 2014 at 11:40 pm

      Hello Amit
      I already saw some small discrepencies between allocation and deallocation on debug build, not sure why exactly, but it should be epsilon of the total quantity allocated.

  1. No trackbacks yet.

Leave a comment