Home > Memory issues > Memory management under Windows : what your mother didn’t tell you

Memory management under Windows : what your mother didn’t tell you

To cope with the various needs of any kind of software, every operating system offer a generalist memory management system. As it works out-of-the-box most of the time, developers usually don’t dig into it to find how it behaves under the hood. But you may be surprised to know that you can not access all the memory of your process, even when some parts seem to be free. Beyond available memory and fragmentation of user space, there is another barrier that can prevent you to get the remaining free bits : the heap service and the minimum size of a virtual allocation call.

Imagine a software who, after a few hours of uptime, starts to throw out-of-memory exceptions. First thought is that, indeed, you don’t have enough memory left. Let’s say that 1Go is free, and that you just ask for a small block of 30k. If you’re familiar with memory issues, second thought may be: “memory fragmentation! Check the largest contiguous block!”. And if I say that a block of 60k is available ? Well, if you are like me before this issue happens at my work, you’re stuck. But let’s go back to the beginning.

How our story begins

A few days ago, I was called to investigate an issue: after 5 or 6 hours of working, an application stopped responding as usual and eventually died. I came to the user’s office, attached a debugger (WinDbg in that case, so I just had to copy some files) and looked at what happened.

I saw a whole bunch of C++ exceptions, so I asked the debugger to stop next time it happens. After a look at the call stack, it appeared that the issue was pretty straightforward: a C++ std::vector tried to expand because the software pushed a new element, and no more space was available in its buffer. As the vector algorithm of the Standard Template Library doubles its size each time, the memory manager have to find enough contiguous space to get twice the amount of memory the vector currently holds (something like 50K bytes in my case). The memory manager failed and then the C++ runtime translated this failure into a std::bad_allocation. Classic!

Just to be completely sure, I checked that no large enough contiguous memory region was available with VMMap, and I was rather surprised: even if my process was on the verge of death because the largest free segment available was a 60K one, it should have been able to allocate those 50K bytes. Why did it fail ?

What is this ‘contiguous memory’ thing ?

This post is about virtual user memory, not physical memory. How the operating system manages to create this virtual user space from physical space is not relevant here.

All the Windows kernel functions that I’m aware of allocate virtual user memory contiguously: when you ask for 1GB of data, you have a plain 1GB of data, not two blocks of 512MB separated by any range. If your process address space is 2GB (like a 32 bit process on a non-tweaked 32 bit Windows OS), and if some nasty code allocates just one byte right in the middle of those 2GB, the maximum that you allocate after one call is more or less 1GB. This is called memory fragmentation, and you should be aware of that because 99% of the memory exhaustion that I experience in my jobs is related to that.

If you’re not familiar about memory management under Windows OS, I recommend you to watch the brilliant video of Mark Russinovich during the Microsoft 2010 PDC, available here.

And how can I know what is the largest block of memory available in my process ?

Hopefully there are some tools that can give you this information. I know two of them: VMMap and WinDbg.

VMMap is a very useful tool made by SysInternals (and guess what, the guy behind this is…Mark Russinovich) that you can freely download here. The basic purpose is to give you a summarized view of each type of memory inside a process. By selecting the “Free” memory type, you can sort the region by size and have an idea of your largest free region :

WinDbg is one of the best debugger for Windows processes and can be downloaded here. The magic command that you should use is ‘!address -summary’, here is an example of the output (it may differ, depending on your version of WinDbg):

0:000> !address -summary 
TEB 7efdd000 in range 7efdb000 7efde000  
ProcessParametrs 005f18f0 in range 005f0000 005f8000  
Environment 005f0810 in range 005f0000 005f8000

-------------------- Usage SUMMARY --------------------------    
 TotSize (      KB)   Pct(Tots) Pct(Busy)   Usage     
 9411000 (  151620) : 07.23%    95.42%    : RegionUsageIsVAD    
764c2000 ( 1938184) : 92.42%    00.00%    : RegionUsageFree      
  50c000 (    5168) : 00.25%    03.25%    : RegionUsageImage      
  100000 (    1024) : 00.05%    00.64%    : RegionUsageStack           
       0 (       0) : 00.00%    00.00%    : RegionUsageTeb      
  110000 (    1088) : 00.05%    00.68%    : RegionUsageHeap           
       0 (       0) : 00.00%    00.00%    : RegionUsagePageHeap       
    1000 (       4) : 00.00%    00.00%    : RegionUsagePeb           
       0 (       0) : 00.00%    00.00%    : RegionUsageProcessParametrs          
       0 (       0) : 00.00%    00.00%    : RegionUsageEnvironmentBlock        
Tot: 7fff0000 (2097088 KB) Busy: 09b2e000 (158904 KB)

-------------------- Type SUMMARY --------------------------    
 TotSize (      KB)   Pct(Tots)  Usage    
764c2000 ( 1938184) : 92.42%   : <free>     
  75b000 (    7532) : 00.36%   : MEM_IMAGE      
  1af000 (    1724) : 00.08%   : MEM_MAPPED     
 9224000 (  149648) : 07.14%   : MEM_PRIVATE

-------------------- State SUMMARY --------------------------    
 TotSize (      KB)   Pct(Tots)  Usage     
 85e7000 (  137116) : 06.54%   : MEM_COMMIT    
764c2000 ( 1938184) : 92.42%   : MEM_FREE     
 1547000 (   21788) : 01.04%   : MEM_RESERVE

Largest free region: Base 00000000 - Size 00010000 (64 KB)

As you can see, the largest free region for the process being debugged is 64 KB. You may think that allowing, let’s say, 50 KB, is always possible. You’re wrong.

Heaps and memory

The kernel function used to allocate memory is ntdll!ZwAllocateVirtualMemory, at least under Windows XP. Most of the various runtimes do not use directly this function (even through its user-mode stub VirtualAlloc) . Instead, they use an OS Heap service through a simple set of functions like HeapCreate and HeapAlloc. All those functions will ultimately call ntdll!ZwAllocateVirtualMemory, but they also bring you at least one essential level of abstraction : an allocation granularity lower than 64KB.

To illustrate that, just execute this tiny C++ program on your computer:

#include "stdafx.h"
#include <Windows.h>
#include <iostream>

int _tmain(int argc, _TCHAR* argv[])
{
    unsigned int allocated = 0;

    while(::VirtualAlloc(NULL, 1, MEM_COMMIT, PAGE_READWRITE) != NULL)
        allocated += 1;

    std::cout << allocated << " bytes allocated by VirtualAlloc" << std::endl;
    return 0;
}

The first argument of ‘VirtualAlloc’ is the memory address where you want an allocation, the second one is the size asked. On my computer, this program returns 32293 bytes. So I ran out of memory after just 32293 allocated bytes???

Well, not really: the trick is that VirtualAlloc have an allocation granularity of 64K on my version of Windows (seven x64), and a 4K minimum allocation, equal one page (if you’re curious about that, you can read this and this ; to get those values at runtime, call GetSystemInfo). So when you call VirtualAlloc, asking for one byte, it will commit 4K (the minimum allocation) ; as I let the first argument NULL, it will give each time a brand new 64k block. With 2G of address space available, I can call it 2G/64k = 32768 times. I only get 32293 here because some space is already reserved: the default heap, some dlls, a stack for my thread, etc.

Well, there is a little bit of space wasted, right? Heaps are here to avoid this, and provide a smaller minimum allocation, 8 bytes most of the time. They make a provision of bytes by calling VirtualAllocEx, and then they cut inside to give you what you asked.

Heaps also offer various other features, as the ability to serialize each call in case of multi-threaded heap operations (use HEAP_NO_SERIALIZE when creating your heap), powerful algorithm to avoid memory fragmentation (Microsoft’s link, more about it on this presentation), and many other things depending on your operating system. Except the .NET framework who calls directly VirtualAllocEx because it handles memory chunks by itself, all the runtimes that I know (C, C++, VB6) make an intensive use of heaps to create and destroy objects.

How can I find all the heaps used in my process ?

The Windows API offers a few functions to walk your heaps, such as GetProcessHeaps or HeapWalk, but the simplest is to use, again, WinDbg. The magic command is ‘!heap -s’ to have a summarized view of each one :

0:000> !heap -s
LFH Key                   : 0x098e6b7e
Termination on corruption : DISABLED   
  Heap     Flags   Reserv  Commit  Virt   Free  List   UCR  Virt  Lock  Fast       
                    (k)     (k)    (k)     (k) length      blocks cont. heap  
-----------------------------------------------------------------------------
004c0000 40000062    1024     20   1024      2     2     1    0      0       
00030000 40001062      64      4     64      2     1     1    0      0       
-----------------------------------------------------------------------------

We can see here that this process have two heaps at address 0x004c0000 and 0x00030000.

This brings another question: how can you know which heap you are using in a specific runtime or language ? If you have a live process, you can add a breakpoint to ntdll!RtlAllocateHeap (by typing ‘bu ntdll!RtlAllocateHeap’), wait for the debugger to stop on this breakpoint and issue a ‘k b’ command:

0:000> k b
ChildEBP RetAddr  Args to Child
004ff81c 76f10787 001a0000 0000000a 00000234 ntdll!RtlAllocateHeap
004ff83c 76f106c1 0000007f 00000000 00000001 ntdll!RtlpAllocateListLookup+0x35
004ff85c 76f105d4 001a0000 66a4fb57 00000000 ntdll!RtlpInitializeUCRIndex+0x27
004ff924 76f9fcfd 50000062 00000000 00100000 ntdll!RtlCreateHeap+0x7f0
004ff96c 76f4d4a3 40000062 00000000 00100000 ntdll!RtlDebugCreateHeap+0x230
004ffa48 76f15e48 40000062 00000000 00100000 ntdll!RtlCreateHeap+0x294
004ffbd8 76f15947 004ffc4c 76ed0000 66a4fe5b ntdll!LdrpInitializeProcess+0x708
004ffc28 76f09cc9 004ffc4c 76ed0000 00000000 ntdll!_LdrpInitialize+0x78
004ffc38 00000000 004ffc4c 76ed0000 00000000 ntdll!LdrInitializeThunk+0x10

The heap is the first argument of the function, 0x001a0000 here.

If you only have a dump, you have to find the variable where the runtime store a pointer to its heap. In Microsoft C++ runtimes, the name is ‘_crtheap‘, so you can find it with ‘x *!_crtheap*’:

0:000> x *!_crtheap*
69612650 MSVCR100D!_crtheap = 0x00030000

I can see here that the heap used by the C++ runtime version 10.0 is 0x00030000.

Of course once you have the heap address, you can browse it to have all regions that are reserved or committed to this heap with the command ‘!heap -a <address of your heap>’, beware because it can takes a long time for big heaps.

Now I’m clear about heaps, but I still don’t know why I have a memory exhaustion while I have enough contiguous space

First, there is a tiny rule that should aware of: you cannot allocate the first 64K block (from 0x00000000 to 0x0000FFFF) and the last 64k block (from 0xFFFE0000 to 0xFFFFFFFF) of your virtual address space. The first one because I guess it’s a placeholder for “you’re pointing to an invalid address”, the second one because “Microsoft reserves this partition because doing so makes implementing the operating system easier for Microsoft” (I’m quoting Jeffrey Richter in his book “Programming Applications for Microsoft Windows”, chapter 13).

Now that you know that, you can understand why you will always have at least a 64k free region in you process, even if you can not use it. So let’s talk about the other regions, where you’re supposed to be able to write. Can you always use them?

Imagine that your process have a high fragmentation rate, and the heap you’re using is completely full. As most heap, it has the ability to grow up by asking to the operating system a few new blocks through a call to VirtualAlloc. If all 64k blocks have already been asked, as in the little C++ program I wrote just before, the allocation fails.

Let’s illustrate this by another version of the previous C++ program :

#include "stdafx.h"
#include <Windows.h>
#include <iostream>

int _tmain(int argc, _TCHAR* argv[])
{
    unsigned int allocatedByVirtualAllocEx = 0;
    unsigned int allocatedByNew = 0;

    // virtual memory allocation
    while(::VirtualAlloc(NULL, 1, MEM_COMMIT, PAGE_READWRITE) != NULL)
        allocatedByVirtualAllocEx += 1;

    std::cout << allocatedByVirtualAllocEx << " bytes allocated by VirtualAllocEx" << std::endl;

    //Heap allocation
    try
    {
        while(new int[0x10])
            allocatedByNew += 0x10;
    }
    catch(const std::bad_alloc& )
    {
        // I was waiting for it...the heap is full !
    }

    std::cout << allocatedByNew << " bytes allocated by new" << std::endl;
    return 0;
}

First I do the maximum number of call to ‘VirtualAlloc’, then the same thing to ‘new’. At the end of the program, on my computer, I have a sum of 167 040 bytes allocated bytes. Here is what VMMap displays :

It looks like I have a lot of 60K blocks of free memory (remember that the 64K one is not usable because it starts at 0x00000000). But any tryout to allocate something, even one byte, will fail. The heap has reached its limit, as you can see with WinDbg :

0:000> !heap -s 0x00350000
Walking the heap 00350000 .
  0: Heap 00350000 
   Flags          40001062 - HEAP_GROWABLE HEAP_TAIL_CHECKING_ENABLED
   Reserved memory in segments              64 (k) 
   Commited memory in segments              64 (k) 
   Virtual bytes (correction for large UCR) 64 (k) 
   Free space                               0 (k) (8 blocks) 
   External fragmentation          0% (8 free blocks) 
   Virtual address fragmentation   0% (1 uncommited ranges)  
   Virtual blocks  0 - total 0 KBytes
   Lock contention 0
   Segments        1

                    Default heap   Front heap       Unused bytes    
   Range (bytes)     Busy  Free    Busy   Free     Total  Average  
------------------------------------------------------------------
      0 -   1024      449      8      0      0      12380     27
   1024 -   2048        2      0      0      0         29     14   
   2048 -   3072        2      0      0      0         56     28
------------------------------------------------------------------    
   Total              453      8      0      0      12465     27

Even if he can grow, doing so will call ‘ntdll!ZwAllocateVirtualMemory‘ and failed because no more 64k blocks are available. End of story.

We could guess that all those 60K blocks saw in VMMap are not available by a call to the virtual memory allocation service, because they don’t start at an address that is a multiple of 64K, meaning that somebody has already asked part of the memory.

As a conclusion

This strange behavior (be unable to allocate x bytes while a memory region with more than x bytes is available) is due to three separate things:

  • The OS virtual address space is allocated by blocks of 64K
  • Something in the program being debugged indirectly call ntdll!ZwAllocateVirtualMemory with a size that is not a multiple of 64K
  • The Windows XP heap can only get new 64k blocks when growing, it cannot commit memory of already asked blocks.

You may say that, anyway, a software who asked for every 64k blocks is going to die anyway. You’re right, but I just felt annoyed to see it die before I though it would….

Happy debugging.

Categories: Memory issues Tags: , ,
  1. Aniket B. Dumbare
    March 26, 2012 at 2:13 pm

    Hi,

    I had been tracking down memory leaks in my application using WinDbg and observed that !heap extension works probably only for the CRT allocations, allocations made through new, malloc, etc. It does NOT detect memory leak for allocations made through VirtualAlloc().
    !address -summary shows the increased RegionUsageIsVAD value though.

    But is there any way to get the stack trace for the allocations made using VirtualAlloc()?

    regards,
    Aniket

  2. March 26, 2012 at 3:27 pm

    Hi
    You’re right, !heap just display the heap infos, and all the tools that I know track down the allocations made through the heap because it’s most of the time the source of all memory leak.
    There is not tool that I’m aware of that collect calls to VirtualAlloc and VirtualFree. But I have an quick-and-dirty idea: as I may not expect many calls to those routines, you can just tell windbg to display the callstack when one of both function is called, and “scrap” the output to reconstruct the alloc/free cycles.
    To do that, just put a breakpoint on kernel32!VirtualAlloc that display the callstack, step out of the function so you can display the return value (it should be stored in eax), and continue.
    It should be something like that (not tested):
    bu kernel32!VirtualAlloc -c “kb ; gu ; r eax ; gc”
    bu kernel32!VirtualFree -c “kb ; gc”
    kb => display the call stacks with the arguments of all functions
    gu => step out
    r eax => display eax value (the return value)
    gc = > go and continue
    With the output you should be able to reconstruct all you need. And if the output of windbg become too large, don’t forget to define a logging file (menu Edit->Open/Close log file…).
    Hope it helps.

  1. October 20, 2014 at 12:46 am

Leave a comment