A lil’ bit about NDIS, Windows Firewall and the undocumented Firewall-Hook Drivers Interface

haxorcize on Oct 11th 2008

Up until a recent time, I wasn’t thoroughly familiar with the way Windows-based Firewalls are implemented. Due to my endless endeavours in the mazy subject called Device Driver Development (The evil triple D’s) I came accross the interesting subject of Network device drivers, during of which I’ve learned some interesting “internal” facts I’d like to share with you, in relation to Windows Firewall.

Not being too impetuous by the choice of words in this article’s title,  I didn’t plan on being well versed in WF’s internal features as I’ll mainly focus on the implementation detail that interested me: The way WF filters network packets. One more thing to note is that this article concerns the WXP Firewall. In vista things have changed with the arrival of NDIS 6.X and the creation of the Windows Filtering Platform (WFP).

I am going to begin by describing merely a few kernel NDIS-related particles of the humongous Windows Networking Architecture in a somewhat superficial but understable way:

  1. If you are familiar with NDIS you may skip to the following phrase. For those of you who aren’t, try understanding the definitions but dont feel bad if you don’t, it takes some time to get acquainted with those terms.
  2. NDIS, aka the “Network Driver Interface Specification”, is meant to provide a common framework for network interfaces development, and originated back in 1989. It allows network adapater vendors to easily develop NDIS drivers for the NIC devices, in a rather portable manner. It does so by a layered design - very much like the OSI refernece model.
  3. NDIS Library Driver - %SYSTEMROOT%\System32\Drivers\Ndis.sys implements the NDIS library functionality that allows the decoupling between the layers, and makes everything tick from top to bottom and back to the top.
  4. NDIS Miniport Drivers - The ones that actually talk to the hardware itself (the NIC), with the assitance of the HAL of course.
  5. NDIS Protocol Drivers - Drivers that implement a specific protocol, and are logically seperated from the Miniport-Drivers with the aid of the NDIS library. An example for such a driver is the TCP/IP protocol driver is implemented in %SYSTEMROOT%\System32\Drivers\tcpip.sys.
  6. NDIS Intermediate Drivers - Windows NT 4.0 SP3 introduced a new type of drivers called NDIS Intermediate (IM) drivers. NDIS IM drivers layer themself (with the aid of the NDIS Library) between the miniport drivers and the protocol drivers, thus allowing themself to notice all the traffic a host generates and receives.

I even created (as in “not copied” :)) a figure to illustrate the relations between the various NDIS components and their layers:

As been told above, the TCP/IP implementation in Windows is actually an NDIS Protocol driver. As such, it communicates with the different layers inside NDIS with the help of the NDIS library. The great Windows Internals book, along with the (lousy :)) WDK documentation describes that the TCP/IP Protocol Driver has a few private interfaces that allows other drivers to extend the functionality of the TCP/IP driver by implementing all sort of Hooks on packets going in and out. Some of them are known as “Filter-Hook Drivers” and “Firewall-Hook drivers”.

An interesting fact about the Firewall-Hook Drivers Interface is that its completely undocumented in the WDK, while the Filter-Hook Drivers Interface is completely documented (that is, in a WDK lacking fashion). Sufficiently to say that Filter-Hook Drivers Interface isn’t as powerful as Firewall-Hook Drivers Interface, since it doesn’t allow you to touch the data of outgoing packets, nor can you alter the contents. On top of that, there’s a limitation of the FIlter-Hook Drivers Interface that allows only a single Filter-Hook Driver to exist in the system.

The only reference I could find in the WDK on Firewall-Hook drivers is half a page that barely begins to describe anything, and contains, mostly, the following recommendations:

It is not recommended to implement a firewall-hook driver (or firewall driver) for Microsoft Windows XP and later versions of the operating system.

To provide firewall functionality on Windows XP and later, you should create an NDIS intermediate miniport driver to manage packets sent and received across a firewall. For information about creating an NDIS intermediate miniport driver, see NDIS Intermediate Drivers.

As it seems from above’s comments, its highly unrecommended to use the Firewall-Hook interface, and Microsoft strongly advises to develop firewall-like products as IM drivers (which indeed makes a lot of sense due to their comfortable position in the network-stack). But for some weird-enough reason, they didn’t follow their own recommendations and developed WF as a Firewall-Hook driver! WF is an integrated part of the IpNat.sys driver, that among other things offers NAT and Firewall services.

Here’s a revised figure that illustrates the TCP/IP Protocol Driver extensions (there are more than the two I’m showing):

The WDK Reference section for Firewall-Hook Drivers contains documentation on a few weird methods, that their usage was (and probably is still) unclear at that stage. The WDK was kind enough though to lead me in the right direction, directly to the Ipfirewall.h header file! Looking at the ipfirewall.h header file, we can find the some interesting artifacts that only begin to describe the intricacies of this private interface. Among those artifacts you could find non-common data structures used to transfer and receive data in that private interface. Other artifacts include function definition, some constants, and a very interesting IOCTL named IOCTL_IP_SET_FIREWALL_HOOK. This IOCTL name hints that its probably important.

Trying to find definitive answers, I’ve decided to look into IpNat.sys’s assembly. I immediately launched IDA and dropped the file into it. IpNat’s DriverEntry routine, like in most drivers, performs a lot of driver-related initialization work. (For example: it creates device object named: “\\Device\\IPNAT”). Among other things, it calls an interesting function called NatInitializeDriver.

Seeing as there’s a lot of initialization, I took a different approach and searched for a place where this IOCTL could possibly be constructed as an IRP. A quick xref search on the IoBuildDeviceIoControlRequest lead me to the following interesting results:

The first entry immediately points at an interesting function, NatSetFirewallHook. And immediately at the begining we witness the creation of a IOCTL IRP:

And later on we see that this IRP is sent with IofCallDriver to the device object represented by \Device\IP.
Interesting! 

So what is this IOCTL precisely?

Device Type  Access  IOCTL Code  Transfer Type
00...010010    10   000000001100      00
#define FSCTL_IP_BASE       FILE_DEVICE_NETWORK
#define _IP_CTL_CODE(function, method, access) \
        CTL_CODE(FSCTL_IP_BASE, function, method, access)
#define IOCTL_IP_SET_FIREWALL_HOOK  \
        _IP_CTL_CODE(12, METHOD_BUFFERED, FILE_WRITE_ACCESS)

As suspected, WF uses the exact same IOCTL we found in ipfirewall.h! to register itself as a Firewall-Hook Driver. (Searching this IOCTL in google brings up some very interesting hits. Especially this article that actually describes the whole Firewall-Hook charade from a programmatic point of view).

I guess most of you evil-minded fellas have already had the following thought coming across their mind in some point of the article (if not before): “Gee… Not only does the Windows Firewall completely ignores outgoing packets, it also binds itself directly to the TCP/IP Protocol Driver leaving the rest of the NDIS network stack completely vulnerable for kernel-malware??”. Yes, it seems that way. I guess this is why most people use commercial firewalls, and Microsoft advises against their own particular firewall implementation.

I’ll add and say that good firewalls will usually take a bit more of an aggressive approach and hook the entire networking subsystem to ensure that not even the slighest bit of information is slipping out or coming into the host computer. The not-so-good among them usally perform a poor-man’s hook into some higher-level layer(s) inside NDIS. The worst kinds stay away and find some comfortable “Firewal-Hook” trick to enfore their policies. Then again, even the best commerical product has the somewhat-better rootkit that bypasses it completely.

Anyway, this has been fun and I hope you guys have learnt a lil’ bit about NDIS, Windows Firewall and the undocumented Firewall-Hook Drivers Interface.

Filed in Dissected Programs, Misc. | 2 responses so far

A process within a process - Part II

haxorcize on Sep 20th 2008

In the previous post I presented the process of loading an executable image file into another process’ address space. I’ve also illustrated some One-Of-Many-Problems that might occur during this non-trivial task. The latest problem that I’ve illustrated was the one where my guest cmd.exe process was unable to display certain strings, that otherwise appear normally on a regular cmd.exe.

In order to fight the problem, I immediately fired up WinDBG and IDA to understand the source of all evil. Conveniently enough, prior to this expedition I’ve performed some RE research on CMD.EXE and I was rather familiar with its internals.

I guessed that there is something wrong with the resources, and I began looking at the code mapping the resources into the memory, which seemed just fine. Then I started putting breakpoints on resource message related APIs, until I hit the jackpot with kernel32!FormatMessageW. I then saw it is called from cmd!FindMsg.

Looking at the assembly of cmd!FindMsg I was able to determine it is responsible for locating messages and returning their textual representation using the FormatMessage API, and in the case of an error, it returns a string in the form of “The (system/application) cannot find message text for message number 0xMESSAGE in the message file for Application.” which fits our symptoms exactly:

jumps into the following code in the event of a failure:

So why exactly is there an error in cmd!FindMsg ? How come kernel32!FormatMessage is failing?? Its time to dive headfirst into the assembly. Here’s some uber-simplified pseudo code for cmd!FindMsg:

if (!FormatMessageW(FORMAT_MESSAGE_FROM_SYSTEM | FORMAT_MESSAGE_FROM_HMODULE,
                    NULL, messageId, ...)) {
   // copy error string into the given buffer
} 

The first parameter given to the FormatMessage function means that the message is to be looked up in the message-table resources of the HMODULE given in the second parameter, which in this case NULL designates the current process’ resource message-table. [ehm]

kernel32!FormatMessage immediately calls kernel32!BaseDllFormatMessage which in turn calls kernel32!BasepMapModuleHandle with our given NULL HMODULE. Here’s the dissassembly of BasepMapModuleHandle:

This method is rather simple, here’s some pseudo code:

HMODULE BasepMapModuleHandle(HMODULE hmod, x) {
    if (hmod == 0) {
        // return __ImageBase from PEB (stored in eax)
        __asm {
            mov eax, fs:[30h]
            add eax, [eax+8]
        }
    } else if (hMod == 1) {
        // ...
    } else {
       return hMod
    }
} 

When you run this function with a parameter of 0, just like in our case, the function uses the PEB (Process Environment Block) to determine the ImageBase of the current process. But, hold on a sec, that will return the host process’ ImageBase and not the ImageBase cmd expects it to be in!

Because of that, when kernel32!BaseDllFormatMessage calls ntdll!RtlFindMessage with the HMODULE of the host process (and not the injected process), ntdll!RtlFindMessage fails to find the message. Here’s the illustrated stack:

0:000> k
 # ChildEBP RetAddr
00 0012f8c0 7c834ab4 ntdll!RtlFindMessage
01 0012f94c 7c834bc8 kernel32!BaseDllFormatMessage+0xf3
02 0012f974 4ad0669d kernel32!FormatMessageW+0x21
WARNING: Frame IP not in any known module. Following frames may be wrong.
[cmd!FindMsg ^^]
03 0012fa1c 4ad0658f 0×4ad0669d
04 0012fb38 7c91003d 0×4ad0658f
05 0012fa9c 4ad0ef32 ntdll!RtlFreeHeap+0×647
06 0012fb38 7c91003d 0×4ad0ef32
07 0012fad8 4ad05f20 ntdll!RtlFreeHeap+0×647
08 0012fd74 7c90d89c 0×4ad05f20
09 0012fd78 00154228 ntdll!NtQueryPerformanceCounter+0xc
0a 0012fd84 00000000 0×154228

We can also look at the peb with the WinDBG !peb command:

0:000> !peb
PEB at 7ffdf000
    InheritedAddressSpace:    No
    ReadImageFileExecOptions: No
    BeingDebugged:            Yes
    ImageBaseAddress:         00400000 <<<<< Note this value <<<<<
    Ldr                       00251ea0
    Ldr.Initialized:          Yes
    Ldr.InInitializationOrderModuleList: 00251f58 . 002528f0
    Ldr.InLoadOrderModuleList:           00251ee0 . 002528e0
    Ldr.InMemoryOrderModuleList:         00251ee8 . 002528e8
....

Note that the ImageBaseAddress points to 0×00400000 and not to 0×4AD00000 which is cmd.exe’s real ImageBase! This is because our host process is located at 0×00400000. In order to make sure that this is it, I added the following lines in my poc:

mov eax, fs:[0x30]
add eax, 8
mov ebx, imageBase
mov [eax], ebx

And this is what I got:

One-Of-Many-Problems #3: As you can see, I have to alter the PEB so that one process will fill comfortable, but what about the host process?

As you can see, this concept is very tricky, but it’s kinda cool. Making it work flawlessly requires a lot of knowledge on the EXE we’re loading. I am sure there are still a lot of issues I havn’t even encountered yet (VEH Handlers, LoaderLock, global CRT/ntdll variables, synchronization issues, etc…).

But its working, so I’m quite pleased :)

If you find more of those One-Of-Many-Problems, feel free to let us know.

Filed in Misc. | 3 responses so far

A process within a process - Part I

haxorcize on Sep 20th 2008

The idea of running a completely different process within a context of another process is indeed something cool (or at least I thought so). Note that I’m not talking about running two different processes in the system and using RPC to communicate between each other, but instead, running one process in the memory context of another.

The general idea behind this concept is that we control one process’s address space and load another executable image into it completely on our own. This procedure is tricky and has a lot of problems involved with it. DLL were designed to be loaded into an arbitrary process context and are supported by the OS architecture. On the other hand, EXE by definition are “The rulers of their own kingdom”, “The kings of their address-space” and so on. Those definitions are the cause of most of our the problems.

Our story begins at the PE (Portable Executable) file format. Since loading an EXE Image into another process’ memory space isn’t a trivial/generic/common task, I pretty much figured I won’t have a comfy LoadExecutable / FreeExecutable API functions to do so, so I’ll have to write them myself. The nice thing about it is that its possible to pull this stunt off completely from user-space, and I even recalled that I found interesting concepts about loading DLLs from memory in products like py2exe (with Joachim Bauch’s MemoryModule).

I chose the Win32 Command Processor (CMD.EXE) for my experiment due to its consoley-nature (so that I wouldn’t have to hassle with annoying GUI implications — even though they are just as interesting). The first thing I’ve done was to check cmd.exe’s required loading address (ImageBase).

One-Of-Many-Problems #1: You can’t load any EXE into another EXE. Most production EXE’s have their relocation PE section stripped, and if they can’t be loaded into their specified ImageBase (because its already used by another module for instance), they can’t be loaded at all.

Now that I know that I can safely load CMD.EXE into my process, I will carry on with writing the actual loader code. I am going to paste a code snippet followed by an outline of the code. The code is a simplified version without any error checking and should not be used in “production”-POC implementations :)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
typedef unsigned char byte;
typedef unsigned int uint32;
byte* LoadExecutable(byte* exeBuf, uint32 bufLen) {
 
    // #1
    PIMAGE_DOS_HEADER dosHeader = (PIMAGE_DOS_HEADER) exeBuf;
    PIMAGE_NT_HEADERS ntHeaders = (PIMAGE_NT_HEADERS)
        (exeBuf + dosHeader->e_lfanew);
 
    // #2
    byte* imageBase = (byte*) VirtualAlloc(
        (LPVOID) ntHeaders->OptionalHeader.ImageBase,
        ntHeaders->OptionalHeader.SizeOfImage,
        MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE);
 
    // #3
    memcpy(imageBase, exeBuf, ntHeaders->OptionalHeader.SizeOfHeaders);
    dosHeader = (PIMAGE_DOS_HEADER) imageBase;
    ntHeaders = (PIMAGE_NT_HEADERS) (imageBase + dosHeader->e_lfanew);
 
    // #4
    CopySections(exeBuf, imageBase, ntHeaders);
 
    // #5
    ProcessIAT(imageBase, ntHeaders);
 
    // #6
    FinalizeSections(imageBase, ntHeaders);
 
    return imageBase + ntHeaders->OptionalHeader.AddressOfEntryPoint;
}
  1. Assign pointers to the IMAGE_DOS_HEADER and IMAGE_NT_HEADERS structures in the raw sequential executable buffer.
  2. Allocate SizeOfImage bytes at ImageBase in our process memory space.
  3. Copy the headers from the raw exe buf into our allocated memory space.
  4. Copy the PE sections from the raw exe buf to their locations in the memory space.
  5. Process the import table of the executable, load any necessary dependecies and bind import table entries to their real addresses.
  6. Iterate the sections once more, mark their memory pages according to their attributes (READ/WRITE/EXECUTE) and discard unneeded sections.

The next step would be to write a main proc and do the following:

1
2
3
4
5
6
7
void main() {
    byte* entryPoint = LoadExecutable(CMD_BUF, CMD_BUF_LEN);
    __asm {
        mov eax, entryPoint
        call eax
    }
}

One-Of-Many-Problems #2: To exit the command prompt you must type “exit” which eventually results with the ExitProcess() function being called. This function will terminate the containing process as well and all of its threads.

Pleased with the progress I’ve made so far, I ran my little POC program and I’ve encountered the following screen:

The spawned shell seemed rather responsive, and I was even able to type commands like “dir” and “exit” and get results (or at least some of them). I also tried typing “bir” as a typo on purpose and I’ve got some interesting errors. I also compared the same command sequence on a regular cmd.exe. Here are some screenshots:

So far we’ve seen that its possible to load PE images on our own, and do it completely with user-mode privileges. We’ve also seen some nasty problems that can occur, and that the solution is far from being generic. If you want any full snippets for what I’ve demonstrated above, leave your mail in a comment and I’ll get back to you.

In the next part I’ll discuss this weird bug that I’ve just demonstrated and how to solve it.

Filed in Misc. | One response so far

IDAPython

haxorcize on Aug 3rd 2008

Very often we find the need to automate certain applications to make up for features they lack, and to better suite their interface to our usage patterns. Sometimes its just for the ease of use, but many time we find the need for automation rather crucial, mainly due to the fact that the automation can apply new logics into applications and help us continue further in our work.

Lately, I’ve been working quite a lot on the FabulaTech’s USB-over-Network product in order to examine the concept internals. During of which I’ve encountered numerous reveresing difficulties that I didn’t find an easy answer for. You can probably already guess from the title of this post, and my foreword, that I am going to discuss IDAPython as an handy automation interface to IDA.

Use Case #1 - Device Drivers:

Windows device drivers are binary files that expose an entry point called DriverEntry that is in charge of initializing the driver itself (once its loaded into the kernel space). One of the most important tasks the initialization code performs is to set a bunch of “callbacks” or “entry points” that the driver support.

Those entry points are the driver’s main interaction points with the system, and allow the driver to react to events that require the driver’s attention. The function signature:

NTSTATUS  DriverEntry(PDRIVER_OBJECT DriverObject, PUNICODE_STRING RegistryPath);

Here’s an example for such intiailization code:

Driver entry point filling

As you can see, there’s a bunch of function pointers being filled into specific DriverObject structure offsets. Those offsets will be able to give us useful information about the functions, and what they do, therefore it is imperative for us to understand what the offsets mean. Looking at the structure itself, you can see all of those offsets lie within a DRIVER_OBJECT member called MajorFunction:

PDRIVER_DISPATCH MajorFunction[IRP_MJ_MAXIMUM_FUNCTION+1];

This is where IDAPython came in handy. I wanted a way to extend IDA’s default structure for DRIVER_OBJECT so it includes the specifiec IRP_MJ_XXX values representing offsets into the MajorFunction structure member. I also wanted it to automatically rename the functions whose pointers are being filled into the structure, and basically ease my way into drivers, automatically. Here’s some basic skeleton example code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
from idaapi import *
from idc import *
 
IRPS = ("IRP_MJ_CREATE", "IRP_MJ_CREATE_NAMED_PIPE", "IRP_MJ_CLOSE",
		"IRP_MJ_READ", "IRP_MJ_WRITE", "IRP_MJ_QUERY_INFORMATION",
		"IRP_MJ_SET_INFORMATION", "IRP_MJ_QUERY_EA", "IRP_MJ_SET_EA",
		"IRP_MJ_FLUSH_BUFFERS", "IRP_MJ_QUERY_VOLUME_INFORMATION",
		"IRP_MJ_SET_VOLUME_INFORMATION", "IRP_MJ_DIRECTORY_CONTROL",
		"IRP_MJ_FILE_SYSTEM_CONTROL", "IRP_MJ_DEVICE_CONTROL",
		"IRP_MJ_INTERNAL_DEVICE_CONTROL", "IRP_MJ_SHUTDOWN",
		"IRP_MJ_LOCK_CONTROL", "IRP_MJ_CLEANUP",
                "IRP_MJ_CREATE_MAILSLOT", "IRP_MJ_QUERY_SECURITY",
                "IRP_MJ_SET_SECURITY", "IRP_MJ_POWER", "IRP_MJ_SYSTEM_CONTROL",
                "IRP_MJ_DEVICE_CHANGE", "IRP_MJ_QUERY_QUOTA",
                "IRP_MJ_SET_QUOTA", "IRP_MJ_PNP")
 
# Change the struct
sid = GetStrucIdByName("DRIVER_OBJECT")
DelStrucMember(sid, 0x38)
for i in xrange(len(IRPS)):
	AddStrucMember(sid, IRPS[i], 0x38 + (i * 4), FF_DWRD, -1, 4)

Note that this is just the skeleton code. You can extend it like to do alot of other neat things :)

Look how clear it looks now:

Use Case #2 - Decrypting self-modifying code:

I’ve recently encountered some executables that used some annoying anti-debugging and anti-reversing tricks. One particular annoying protection scheme I encountered was functions that are autonomously in charge of their protection.

That is, each time the function is called, the prolouge of the functions decrypts the rest of if, altering the code a few byte ahead, and the function’s epilogue is in charge of re-encrypting it, to disable methods like “Memory Dumping” and PE reconstruction.

Since I was too lazy to reverse the proprietary encryption algorithm, I figured out I could probably make the program decrypt the code for me. At first I thought I could just disable the epilogue, and let the program run for a while, but this method has two serious disadvantages: The first one is that I probably couldn’t reach full function coverage (and functions would be left encrypted). The second is that re-entrancy is going to be a pain. I don’t know how the code’s gonna look after decrypting it twice :)

Therefore, I thought of another way. Since I recognised the decryption pattern (same function is being called, key is stored within the code segment as data), I could easily spot all the functions that needed decryption.

Using IDAPython, I wrote a small code that produced a set of WinDBG commands that will cause WinDBG to execute only the decryption functions with the right parameters within each function to leave all functions decrypted in memory. From there the next logical step would be to take a dump (memory dump that is ;)), and put it back into IDA.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
fout = file(r"windbg.txt", "wb")
fout.write("bp 00414746; g\r\n") # Has to run till there to make
                                 # the decryption functions work
 
ea = 0xDEADBEEF # The encryption function to get xrefs from
print "\n*** Code references from ", ea
x = RfirstB(ea)
while x != BADADDR:
    fout.write("r EIP = %08x; %s ?eip; ?eax\r\n"  % (x - 0x29, "p;" * 14))
    x = RnextB(ea, x)
fout.write("r EIP = 00414792; p; p\r\n") # This command makes the program
                                         # jump into Sleep(INFINITE) so I
                                         # could detach and attach a
                                         # proper memory dumper :)
fout.close()
 
---------------------
Script output:
bp 00414746; g
...
r EIP = 0040df8a; p;p;p;p;p;p;p;p;p;p;p;p;p;p; ?eip; ?eax
r EIP = 0040e5b1; p;p;p;p;p;p;p;p;p;p;p;p;p;p; ?eip; ?eax
r EIP = 0040e816; p;p;p;p;p;p;p;p;p;p;p;p;p;p; ?eip; ?eax
...
r EIP = 00414792; p; p

This is yet another raw example used to show you how very-simple things can be.

Use Case #3 - Patching bytes:

Following use case #2, after I took the dump I wanted to insert it into IDA for further investigation. The problem is, I have to insert patches into precise locations. I didn’t want to override the whole dump of the .TEXT segment.

This is where IDAPython came to my help again. The specifiec example code of this one is too messy (not difficult at all though, just messy cuz I left it that way) so I will provide some basic example:

1
2
3
4
5
6
PATCH_OFFSETS = (0xABCDABCD, 0xDEADBEEF, 0xFEEDBABE, 0xABBADEAD)
 
fin = file("input.bin", "rb")
for offset in PATCH_OFFSETS:
	PatchByte(offset, ord(fin.read(1))
fin.close()

Again, I remind this is an uber-simple example.

Conclusion

IDAPython is a very cool IDC wrapper for IDA. It helps you enrich your world from a beloved Python interpreter, and use every bit of favorite feature you love in Python: The syntax, the ease of use, the massive runtime library objects availablity, all of it combined within a good graphical disassembler.

I hope you find this article useful. I know it’s been a while since the last post, and that is due to massive pressure from real-life obligations: working hard, studying, and on top of all, I have to keep my body in shape ;) I hope I’ll be able to finally finish my latest dissection project any time soon so I could finally post it. :)

Filed in Misc. | One response so far

NetLimiter 1.3

haxorcize on Jun 6th 2008

Today we’ll figure out how NetLimiter (version 1.3) works!

Here’s a short description:

“NetLimiter is an ultimate internet traffic control and monitoring tool designed for Windows. You can use NetLimiter to set download/upload transfer rate limits for applications or even single connection and monitor their internet traffic.”

NetLimiter offers numerous useful features, but the one I was interested in particular was the Transfer Rate Limiting feature. Given this particular feature, one is able to control transfer rates for his currently running processes, giving one process the temporary benefit of bandwidth it so desperately requires while his other time-slice sharing buddies manage with the remains.

I was really interested in finding out the methods available to implement a “system-wide per-process bandwidth limiter” under windows and NetLimiter’s method in particular. At first, it didn’t seem too inconceivable to think that NetLimiter may be using Kernel-Mode code to achieve its goals. Browsing through the web, searching for other methods, I have stumbled upon some nice articles (1, 2) discussing the deprecated Microsoft Traffic Control APIs which can very well be a solution on its own (might not be a per-process one, but still close).

After I packed small bits of knowledge about several traffic limiting methods, I began looking at the NetLimiter application. The version I chose to analyze was 1.3 (2.0 is out aswell, but for a while now I’ve found no real reason to upgrade).

Starting at the installation root, you notice the NetLimiter.exe program executable and two interesting DLLs: sporder.exe and nl_lsp.dll. After checking the main executable’s implicit dependencies (Dependency Walker), I saw it is using nl_msgs.dll (residing at %windir%\system32) and sporder.exe (residing at the installation root).

NetLimiter.exe is using one of sporder.dll’s exported functions, called WSCWriteProviderOrder. A quick google check lead me to MSDN, which in turn explains that this function is meant to “reorder the available winsock 2 transport providers” and that belongs to the “Winsock Transport SPI” (Transport Service Provider Interface).

Hold on a minute I said. SPI? Winsock Transport Providers? What are all of those? Sounds interesting due to its relevancy to Winsock and Network Manipulation (after all, thats what we’re looking for). I then recalled reading about Winsock Extensions as part of the “Windows Networking APIs” chapter inside the amazing Microsoft Windows Internals book, so I’ve decided to take another look, and I was surprisingly correct. The book does mention “Winsock Extensions”, and discusses winsock SPI.

SPI enables third parties to extend the Winsock API by providing additional layers on top of existing protocols to provide all kinds of functionalities (ehm… even more interesting). When such TSPs (transport service providers) are registered with Winsock, Winsock uses the TSPs to implement socket functions such as connect and accept, enabling it to communicate with its own transport driver in kernel mode. When a program creates a socket, Winsock searches through its catalog for potential TSPs and loads them according to the order currently defined in the system which is the same order manipulated by the NetLimiter.exe through the WSCWriteProviderOrder function.

The platform SDK includes a utility called sporder.exe that allows the user to change the order the TSPs are enumerated by Winsock. Running sporder.exe on my system showed the following interesting things:

Doesn’t take a genius to figure that NL stands for NetLimiter and that those additional entries belong to the program itself. Pushing the “More” button showed me the TSP in question is the nl_lsp.dll we found earlier in the installation root! Unfortunately, after looking at the other fields of newly opened dialog, I had a feeling I am not familiar enough with the SPI internals, so I’ve googled some more and I came up with this article which cleared a few things up.

Looking at the NetLimiter.exe file through IDA, you can pretty easily see it installs the TSP nl_lsp.dll using the WSCInstallProvider function (those of you who are interested, it takes the dll path from the registry), and then arranges the order with the WSCWriteProviderOrder function. (There are a few more actions performed, such as grouping the protocols in a ProtocolChain, you can read up on it in the article I linked to on the last paragraph).

Given the fact that NetLimiter installs a TSP on our system, it is easy to speculate on the method used to manipulate the bandwidth. According to the documentation, the TSP will be loaded into each process that uses the Winsock API, and inside it will intercept each API call on its way down the chain to the Base Protocol Providers (TCP/UDP), thus enabling it to pace any recv/recvfrom and send/sendto API function calls.

In order to verify this speculation, I’ve created a random 100mb test file on my webserver, and I am going to consume it with a small Python script that I’ve NetLimited to 1kb/s. I start off by attaching my newly created Python process into a capable debugger, such as WinDBG (kobyk have showed me just how invaluable this debugger is during out work together):

[windbg]
0:001> lm m nl* // cannot find the nl_lsp module at first
start    end        module name

0:001> sxe ld nl_lsp // capture nl_lsp.dll load
0:001> g

1
2
>>> from socket import *
>>> s = socket(AF_INET, SOCK_DGRAM)
[windbg]
// The dll has just been loaded after calling socket()
ModLoad: 00e40000 00e55000   C:\Program Files\NetLimiter\nl_lsp.dll
eax=00000003 ebx=00000000 ecx=00e51004 edx=f0e40000 esi=00333150 edi=00000000
eip=7c90eb94 esp=0021edfc ebp=0021eef0 iopl=0         nv up ei pl zr na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000246

0:000> !uniqstack
…
0021f4f0 71ab74f1 kernel32!LoadLibraryA+0×94 // nl_lsp.dll being loaded
0021f874 71ab494d WS2_32!DPROVIDER::Initialize+0×112
0021f894 71ab49ac WS2_32!DCATALOG::LoadProvider+0×6d // load relevant TSPs
0021f8b0 71ab3a20 WS2_32!DCATALOG::GetCountedCatalogItemFromAttributes+0xf5
0021f908 71ab3be1 WS2_32!WSASocketW+0×89
0021f930 00aa3c50 WS2_32!socket+0×73
…
00000000 00000000 python25!PyEval_EvalCodeEx+0×62d

0:000> g
ModLoad: 00e60000 00e71000   C:\WINDOWS\system32\nl_msgc.dll (RPC client)
ModLoad: 71a50000 71a8f000   C:\WINDOWS\system32\mswsock.dll
ModLoad: 662b0000 66308000   C:\WINDOWS\system32\hnetcfg.dll
ModLoad: 71a90000 71a98000   C:\WINDOWS\System32\wshtcpip.dll

At the above debugging session we can notice that the TSP is being loaded during the socket creation operation. Next I’m going to let the python script consume the big file:

1
2
>>> import urllib
>>> urllib.urlretrieve("http://www.haxorcize.com/test/big", r"c:\\big")

@note: Some of the following IDA screen-caps have functions and symbols I have renamed to ease the reversing process and maintain readability to the viewer. Your reverse outputs may not seem as friendly.

At this point nl_lsp!WSPRecv intercepts every recv performed by the process (python in our case). Careful studying of the nl_lsp!WSPRecv function shows us that checks for a certain context value associated with each SOCKET it messes with:

Later on we meet some branching as to whether the socket is overlapped or not, and then a big function call with many of the main parameters being passed to it. My assumption was that this function carries on the pacing logics, so I’ve decided to analyze it aswell.

I found out the big\interesting function calls mswsock!WSPRecv (which in turn calls the next TSP in the chain):

And also performs Sleeps in a loop with variable sleep durations which perfectly conforms to our assumption:

The following WinDBG breakpoint will enable you to examine the dynamic sleep duration algorithm and how it behaves in situations where the throttling is On/Off:

0:000> bp nl_lsp+3C8C "?edi; g" // this is the address where Sleep is being called
                                // edi holds the sleep duration

Thats all for now, see you next time.

(Half of this post has mysteriously disappeared when I came back home tonight, so I recovered it from memory at 04:31am. I hope I’ve done as good as the original post :))

Filed in Dissected Programs | 2 responses so far