PE injection explained
Advanced memory code injection technique

Injecting code into other process memory is generally limited to shellcode, either to hide the shellcode from Antivirus or to inject a DLL. The method described here is more powerful and enables to inject and run a complete executable (PE format) inside another process memory.

Article published on 13 April 2014
last modification on 14 October 2019

by Emeric Nasi

WARNING: This post is from 2014. A more recent post covering Windows 10 64bits on the same topic is available in Code Injection Series Part 1.

============================================================

Note: This white paper requires some knowledge on Windows system programming and the Portable Executable format.
License : Copyright Emeric Nasi, some rights reserved
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Creative Commons License

Injecting code into other process memory is generally limited to shellcodes, either to hide the shellcode from Antivirus or to inject a DLL. The method described here is more powerful and enables to inject and run a complete executable module inside another process memory.

You can read the HTML version here or download the PDF version as uploaded on packetstorm:

PE_Injection_Explained.pdf

I Presentation.

I.1 PE injection

This is not another article on DLL injection or shellcode injection (already a lot of is available online). The method described here allows to inject the complete image of the running process module in the memory of another process, basically this means having two different codes running in the same process (or having one PE hidden inside another).
This technique is more powerful than classic code injection technique because it does not require any shellcoding knowledge, the program code can be written in regular C++ and relies on well documented Windows System and Runtime API. Compared to DLL injection the main asset of PE injection is that you don’t need several files, the main exe self inject inside another process and calls itself in there.
I don’t know who invented this method (official researchers or underground?). The thing is, the technique is not very widespread on the Internet and generally the source code lacks some explanation.
Here I provide complete explanation of the technique and implementation source code at the end of article.

II.2 Method impact

I’ve run several tests around PE injection. From what I’ve tried it is possible to inject pretty any code in the target process, I’ve tested with success:

  • Socket creation and network access
  • Access to filesystem
  • Create threads
  • Access to system libraries
  • Access to common runtime libraries

In fact other stuff like remote control and keylogger did run good as well.

I’ve tested PE injection on Vista and Windows 7 without any problem. Concerning architecture, PE injection works with both 32bit and 64bit softs. However you can only inject a 32bit PE image inside a 32bit process and a 64bit PE image in a 64bit process.

I have also monitored the targeted process with Sysinternal ProcExp tool. There is a memory growth after the injection phase and you can detect a new thread running inside the process. It is yet very difficult to detect a module image was injected in the target. This is because the injected module is not properly loaded by the system. For example, an injected DLL loaded with LoadLibrary will be referenced in ProcExp as one of the process modules. PE injection just creates a bunch of data in process virtual memory. It could be possible to check memory for unusual ’MZ’ or other part of PE headers, but then PE header can also be scrambled when injected if stealth is required.

II.3 The tools

Obviously you need a compiler and a debugger. I use Microsoft Visual Studio express 2010 which is free and provides the source code editor, the compiler, the linker, the debugger (with a very practical code machine view). If you have full Visual Studio and crash another process with this injection technique you have the possibility to investigate what happened by debugging the failed process in Visual Studio (this requires some habit into working with assembly code and memory layout...).
I’ve also used another debugger WinDBG which is simple but very practical.
The Sysinternal toolsuit is a must for system monitoring. I’ve especially used the ProcExp tool.

II Principles

II.1 Writing code into distant process memory

Writing some code into another process memory is the easy part. Windows provides systems API to Read and Write the memory of other processes.

First you need to get the PID of the process, you could enter this PID yourself or use a method to retrieve the PID from a given process name.

Next, open the process. This is easily done by calling the OpenProcess function provided by Kernel32 library.

Note : Opening another process is submitted to restrictions. Since Vista, a few protection exists along with Microsoft UAC. The main protection for process memory is Mandatory Integrity Control (MIC). MIC is a protection method to control access to objects based on their "Integrity level". There are 4 integrity levels:
  • Low Level for process which are restricted to access most of the system (for example Internet explorer)
  • Medium Level is the default for any process started by unprivileged users and also administrator users if UAC is enabled.
  • High level is for process running with administrator privileges
  • System level are ran by SYSTEM users, generally the level of system services and process requiring the highest protection.

For our concern that means the injector process will only be able to inject into a process running with inferior or equal integrity level.
For example, if UAC is activated, even if user account is administrator a process will run at "Medium" integrity level (unless is is specifically run as administrator). The "explorer.exe" process is permanent and running at medium integrity level process so it makes an ideal target in our case, even with UAC enabled.
Discussing Windows system protections is not the main subject of this article, you can find a lot of details using the MSDN description.

After opening the process we will allocate some memory in the distant process so that we can insert the current process Image. This is done using the VirtualAllocEx function. To calculate the amount of memory we need to allocate, we can retrieve the size of the current process image by parsing some PE header information.

Writing into a process memory is done by calling the writeProcessMemory function. This is pretty simple as you can see in the source code section.

II.2 Handling binaries fixed addresses

The main issue with code injection is that the base address of the module will change. Generally, when a process starts, the main module is loaded at address 0X00400000. When we inject our code in another process, the new base address of our module will start some place not predictable in the distant process virtual memory.
In an .exe file, after compilation and link, all code and data addresses are fixed and build using the virtual memory base address.
For PE injection, we will need to change the base address of all data described using full address pointer. For that, we are going to use the process relocation section.

The relocation data is present in all 64bit executable and in all 32bit compiled without fixed base address. The goal of the relocation table (.reloc segment) is to enable Address Space Layout Randomization and to load DLL. This is pretty handy since it will allow us to find and modify every place where base addresses needs to be modified.

When a file is normally loaded by the system, if the preferred base address cannot be used, the operating system will set a new base address to the module. The system loader will then use the relocation table to recalculate all absolute addresses in the code.
In the PE injection method we use the same method as the system loader. We establish delta values to calculate the new addresses to set in the distant process. Then, thanks to the relocation table, we access to all full addresses declared in the code and we modify them.

For the next step it is important to understand how relocation data is organized.
Relocation data are stored in a data directory.This directory can be access through the use of IMAGE_DIRECTORY_ENTRY_BASERELOC
The relocation data directory is an array of relocation blocks which are declared as IMAGE_BASE_RELOCATION structures.
Here is the definition of that structure:

The relocation blocks do not all have the same size, in fact a number of 16bits relocation descriptors are set in each relocation block. The SizeOfBlock attribute of the structure gives the total size of the relocation block.
Here is a simple memory layout of a relocation data directory:

The VirtualAddress attribute is the base address of all the places which must be fixed in the code. Each 16bit descriptor refers to a fixed address somewhere in the code that should be changed as well as the method that should be used by the system loader to modify it. The PE format describes about 10 different transformations that can be used to fix an address reference. These transformations are described through the top 4 bits of each descriptor. The transformation methods are ignored in the PE injection technique. The bottom 12bits are used to describe the offset into the VirtualAddress of the containing relocation block.
This means that "relocationBlock.VirtualAddress + Bottom 12bit of descriptor" points to the address we need to fix in the code. So basically, we must go through all relocation descriptors in all relocation blocks, and for each descriptor, modify the pointed address to adapt it to the new base address in the distant process.

II.3 Calling our code in remote process

Once the code is injected, we can attempt to call its functions.
The first issue we face is that we need to calculate the address of the function we want to call in the remote process.

Calling the function itself can be done in several ways. These techniques are the same that are employed for DLL injection.
Here are three techniques I tested successfully:

  • CreateRemoteThread -> Call the the CreateRemoteThread function from Kernel32 library. Very simple to use and fully documented.
  • NtCreateThreadEx-> Like CreateRemoteThread, consist into calling a thread in a distant process. The difference is NtCreateThreadEx is declared in ntdll.dll module and is not documented so it is not as straightforward to use.
  • Suspend, Inject, Resume. This method consists into suspending all threads inside the target process, changing the context so that next instruction points to our injected code, and finally resume all threads. The drawback of this method is it doesn’t seem to work with all process (for example I couldn’t make it work with explorer.exe).

To focus on the code injection itself, I will only provide the CreateRemoteThread method in the implementation code example section. Feel free to email me if you want more complete source code including all three techniques (you can also easily find them on the Internet).

III Implementation challenges

III.1 Heap and stack variables

The relocation table will do the trick to modify all pointer linked in the executable code but won’t be useful to adapt any data declared on the Stack or the Heap after the process has started.
This is why the code must not rely on any dynamically allocated space of any local variables that where initialized before the PE image is injected.
Once the image is injected there is no problem to use the Stack and the Heap of the host process. Static variables, global variables, and constants are initialized in PE image segments so they are not concerned by this issue (see PE memory layout in Code segment encryption article).

III.2 Cope with Windows Runtime Library issues

The Microsoft Runtime Library contains all C standard functions like malloc, strncpy, printf and is included by default in most C and C++ programs built for windows. It is automatically called by Visual Studio compiler, either as a static library or a DLL loaded at runtime.
The problem is that if you want to rely on the Runtime Library, a lot of data is allocated even before the main() function is called. This is because in Windows application, the default entry point of a program is not main but mainCRTStartup(). When this function is called, it will setup the environment so the application can be runned in a safe way (enable multithread locks, allocate local heap, parse parameters, etc.). All these data are set using the process base address and there are so many it would be too painful to modify them all before injecting them into another process.
So you have basically two solutions here:

  • You don’t use the Common Runtime Library
  • You initialize the common runtime library after the code is injected.

In both case you have to define a new entry point for the code. This can be done using pragma definition or using Visual Studio linker options.

If you don’t want to use the Common Runtime library (and you may not want it for a lot of reasons, like 300k of code...) you are going to have to face a few issues. You can do a lot of stuff using the system libraries but you will miss not having basic functions like printf, malloc or strncpy. I suggest you build your own tiny CRT and implement all the useful function you will need in your code. I have personally grab a lot of sources to have my own CRT, I would be glad to share it with others working on the same topic, especially if someone finds a nice way to implement operations on 64bit integers.
Avoiding runtime in Visual studio can be done using the /NODEFAULTLIB linker option.

The second method has a bigger footprint but allows to do anything you want (once CRT is initialized). It is however a bit tricky to use. Why? Because in regular windows program, the first function called is not main but mainCRTStartup(). This function initializes runtime library and then calls the main function in the code. Also this function is declared only in runtime library.

What do we need to do:

  1. First, you need a main() function, it will be automatically called by mainCRTStartup and it will be entry point of what you want to play in the distant process.
  2. You also need to declare a function which will call mainCRTStartup() in the remote process, lets call it entryThread(). It will be started as a remote thread.
  3. Finally you need a program entry point, used to call the code injection routines, and the remote thread function, lets call it entryPoint().

Here is the call stack of what will happen:

This method is the one presented the source code implementation section.

III.2 Weird breakpoint instructions in main function

During my tests on PE injections I’ve encountered a strange issue when attempting to call a "main()" function in the remote process. It seems that a breakpoint instruction is automatically added at the beginning of all any main(), wmain() function. Instead of having my main function starting with:


It started with:

I don’t know why this wild breakpoint is added, I had the same behavior on several OS version, this is maybe a Visual Studio trick (in release mode I still have the breakpoint...). Also the breakpoint is not added to the entry point but to any function called "main()", and no others.
If we plan to use runtime library we need a main() function so I just patch the main function first instruction before injecting the code.

I hope someone can explain me what happens and have a nicer solution.

III.4 Compatibility layer

On some OS, when starting the exe in Visual Studio, the Microsoft compatibility Layer map Kernel32.dll on AcLayers.dll. Because of that calling GetProcAddress routine in the injected code will fail because it will be linked to a stubbed function declared in AcLayers.dll which will not be loaded in the target process.
The problem is you may want to call GetProcAddress in your injected code, also it is mandatory if you use Microsoft Runtime Library. I had this behavior on Vista but not on 7 64bit, it may depend on OS and version of Visual Studio. You can find more about AcLayers.dll problems related to this topic here.

In any case this problem only occurs when starting the injector program directly from Visual Studio IDE (which itself loads AcLayers.dll).
So my recommendation is do not run the injector executable from Visual Studio.

IV Simple implementation source code

To finish, here is an implementation of this technique with a lot of commentaries.