Hooking Tutorial
Here you will learn about this ancient miracle cure called Hooking. Our ancestors used hooking to catch fish! No just kidding, lets get to it.. But before we dive in, I'll provide a little introduction to the Portable Executable (PE). Don't skip it; its understanding is crucial to get to the meat and potatoes of this tutorial.
The Portable Executable (PE) format is a file format for executables, object code, DLLs, FON Font files, and others used in 32-bit and 64-bit versions of Windows operating systems. It is a data structure format that encapsulates information regarding dynamic library references for linking, API export and import tables, resource management data and thread-local storage (TLS) data.
The Extensible Firmware Interface (EFI) specification states that PE is the standard executable format in EFI environments. On NT family of operating systems, the PE format is used for EXE, DLL, SYS (device driver), and other file types. There are 2 versions of PE's, the PE32 format which stands for Portable Executable 32-bit and PE32+ which is Portable Executable 64-bit format.
Initially, the only programs that existed were COM files. The format of a COM file is… um, none. There is no format. A COM file is just a memory image. This 'format' was inherited from CP/M. To load a COM file, the program loader merely sucked the file into memory unchanged and then jumped to the first byte. No fixups, no checksum, nothing. Just load and go. The COM file format had many problems, among which was that programs could not be bigger than about 64KB. To address these limitations, the EXE file format was introduced. The header of an EXE file begins with the magic letters 'MZ' and continues with other information that the program loader uses to load the program into memory and prepare it for execution.
Addresses
- Physical Memory Address is what the CPU sees
- Virtual Address (VA) is an abstraction manageable by the OS
- Relative Virtual Address (RVA) (relative to the VA) is the offset from the VA at which the program is loaded.
- The RVA in DLL/EXE (PE Format) files are usually relative to the 'loaded base address' in memory
- The PE Format contains a 'section' mapping structure to map the physical file content into memory. So the RVA is not really relative to the file offset,
- To calculate a RVA of some byte, you have to find its offset in the section and add the section base. Thus: AddressInMemory = BaseAddress + RVA example: Assume you have a file (, a DLL) that's loaded at address 1000h. In that file, you have a variable at RVA 200h. In that case, the VA of that variable (after the DLL is mapped to memory) is 1200h (ie the 1000h base address of the DLL plus the 200h RVA (offset) to the variable.
Section Headers
- .text , or .code : Code section, contains the program's instructions and constant data - read only -.
- .data : Generally used for writable data with some initialized non-zero content, variables, arrays. Thus, the data section contains information that could be changed during application execution and this section must be copied for every instance.
- .bss : (Block Started by Symbol) Used for writable (static) data initialized to zero or unitialized memory (including empty arrays)
- .reloc : Contains the relocation table. The relocation table is a list of pointers created by the compiler or assembler and stored in the object or executable file. Each entry in the table, is a pointer to an absolute address in the object code that must be changed when the loader relocates the program such that it will refer to the correct location. They are designed to support relocation of the program as a complete unit. In some cases, each entry in the table is itself relative to a base address of zero, so the entries themselves must be changed as the loader moves through the table.
In other words, the relocation table is a lookup table that holds pointers to all callable API's needed by the PE file that need patching, because they are dependent on the absolute address of the loaded program. This occurs when the file is loaded at a non-default base address.
A PE which doesn't have a .reloc section “wants” to be loaded at a specific address, regardless of the platform & environment in use (which can cause a plethora of issues). dlls typically have a reloc table, which contains pointers to its sections, if it has to be relocated in case its preferred base address is not available.
Of course, nowadays, for security reasons, we have ASLR (Address Space Layout Randomization) where the loader will always try to vary the base address of the executable. Not to be confused with DEP (Data Execution Prevention), which is a similar security measure which prevents certain memory memory sectors, eg. the stack, from being executed. When those two are combined it becomes exceedingly difficult to exploit vulnerabilities in applications using shellcodes or return-oriented programming (ROP) techniques.
You will find many variants of section names. I have documented many of them I have stumbled upon, and I will list them here for reference:
- .rodata: read only data
- .rdata : Const/Read-only data of any kind are stored here. The loader will utilize these. This section must not have execute privileges.
- .rcdata : (resource data) Raw data whose format is not defined by windows. Thus, they can be any kind of data - ie user defined data, could be binary data injected into the program.
- .pdata : contains an array of function table entries for exception handling, and is pointed to by the exception table entry in the image data directory
- .edata : Export directory, descriptors & handles. Also known as the Exports section. When a PE exports code or data it is making them available for other PEs to use, particularly other
.exe
s. These are all referred as “symbols”. The symbol information contains the symbol name, which is typically the same as the function or variable name, and an address. Also each exported symbol has an ordinal number associated that is used to look it up and differentiate it among all other symbols. - .idata : Import directory for handles & descriptors. Also known as the Imports section. It is used by executable files (exe's, dll's etc) to designate the imported and exported functions
- *data : custom data sections
- .init : This section holds executable instructions that contribute to the process initialization code. That is, when a program starts to run the system arranges to execute the code in this section before the main program entry point (called main in C programs).
- .fini : This section holds executable instructions that contribute to the process termination code. That is, when a program exits normally, the system is arranged to execute the code in this section.
- .rsrc : Section which holds information about various other resources needed by the executable, such as the icon that is shown when looking at the executable file in explorer
- .ctors : Section which preserves a list of constructors
- .dtors : Section that holds a list of destructors
- .tls : Refers to 'Thread Local Storage' and is related to the TlsAlloc family of Win32 functions. When dealing with a .tls section, the memory manager sets up the page tables such that whenever a process switches threads, a new set of physical memory pages is mapped to the .tls section's address space.
Defines:
#define IMAGE_FILE_RELOCS_STRIPPED 0x0001 // Relocation info stripped from file.
#define IMAGE_FILE_EXECUTABLE_IMAGE 0x0002 // File is executable (ie no unresolved externel references).
#define IMAGE_FILE_LINE_NUMS_STRIPPED 0x0004 // Line nunbers stripped from file.
#define IMAGE_FILE_LOCAL_SYMS_STRIPPED 0x0008 // Local symbols stripped from file.
- Each section has to be located in a different page in memory and each has to be aligned. Each section/page requires alignment to a page boundary (4,096B). This differentiation is necessary, partly because sections require different privileges according to their types.
- Files though may use different alignment, from 512B (FAT sector size) or other (it's configurable).
- Pages have to be allocated contiguously, otherwise RVAs won't work correctly. This is done by the loader.
Hooking
We have reached the meat and potatoes of our subject. Hooking, also known as detouring, is an umbrella term for a range of techniques used to alter the behavior of a program. It is, in essence, a piece of code that intercepts function calls and handles them in a certain way to augment the functionality of a program.
Hooking can be used for debugging, extending functionality in applications. And of course hacking, ie. getting unauthorized access to programs, ie. circumventing their intended use. But there are many legitimate uses, such as benchmarking programs, framerate timers in 3D games and more.
Hooks are also used to load plugins. At the base level plugins are basically dlls (libraries of code), which can be loaded by a program independently. The program can enumerate all its plugins at startup, or during runtime (if a new plugin is added). Plugins extend the functionality of the original program. You build your plugin separately, then load it at run-time, look up its symbols by name, and can then call them. To load plugins/dlls you use:
- on POSIX (Linux), you use the dlopen() family of functions.
- for Win32, there is LoadLibrary()
Hooks implemented with the runtime modification approach have to be executed within the address space of the target program. The program that manipulates the target process' address space to make it load the hook code, is called an injector. More about injection is discussed in another article (go through this one first though, as the latter is more advanced).
The simplest and easiest hook is a JMP instruction which is injected/inserted in place of a function call. And this is what we do here as well. In this case we want to find the address in Program.exe where we call sum( 5, 5 );
. In order to find this address you can use either Visual Studio itself (just put a breakpoint and step through the code in Assembly) or through Cheat Engine or some other disassembler. As you can see in dllMain.cpp (which is thoroughly commented out) the targetAddress
/ address
I was looking for, is 0x001B10E5 and I have hardcoded it in the program. You can change it easily in your own case. Remember addresses change whenever you change and recompile your Program.exe
.
After we find this address we will replace it, along with the next 4 addresses with our hook code which is none other than (in Assembly):
jmp 0x012E63DC
The opcode of jmp requires 1 byte and an x86 address (like this one) requires 4 bytes, this is why we need to replace 5 Bytes. Hook_dll.dll uses the WinAPI function VirtualProtect
to gain EXECUTABLE + MODIFY access to this 5-Byte memory region in Program.exe.
Now how can change the memory contents? Typically you'd use `memset` for this. But I opted for a more down and dirty way, ie to directly store a value to a memory location in C and C++.
// C:
int value = 124;
*(volatile unsigned long long int*)(0x010B1CE2) = value;
// equal to the following in C++:
volatile unsigned long long int* memLoc = reinterpret_cast<volatile unsigned long long*>( 0x010B1CE2 );
*memLoc = value;
// 0x010B1CE2 + 5 = 0x010B1CE7, we need 5Bytes of code for the "JMP address" instruction
Similarly we can read from it.
Keep this in mind when you go over the dll program's code.
We've added various extraneous stuff in the code. This is just decor and you can omit it completely, at first sight. When you have understood the core of this tutorial you can look into that as well; it's just more food for thought to test your understanding, in particular:
- an alternative
hookedSum
function written in Assembly - a way to return execution back to Program.exe if the user presses
F10
(after the dll has been hooked)
You can use any injector/dll-injector to hook the dll Hook_dll.dll
into the executable Program.exe
. I recommend the excellent free program ProcessHacker 2:
- download and launch ProcessHacker
- launch our Program
- find the Program.exe in ProcessHacker's process list
- RMB -> Miscellaneous -> Inject dll..
- Browse and select the Hook_dll.dll
In a later tutorial we will learn how to create the injector program as well, so we'll do away with 3d party programs.
I have used Windows Visual Studio and C++17 to build the project.
Github
Github repository link.
Acknowledgements
x86 Assembly instruction - opcode reference
Inside Windows An In-Depth Look into the Win32 Portable Executable File Format, Part 2
Stack Overflow post.