Detecting Process Injection

Evasion Techniques and Detection Strategies for Memory-Resident Malware

Jul 16, 2024

Overview

In today's mature enterprise environments, adversaries must choose a stealthy means of beacon execution. The advancement of antivirus (AV) engines has forced threat actors to migrate many heavily-signatured implants from disk to memory, where they are not scanned as often if at all. Process Injection [T1055] is a common technique used to achieve this goal. In this article, we will explore the Windows logging mechanisms available for defenders to detect and prevent process injection, as well as the evasion techniques used by advanced threat actors to circumvent detection. At a high-level, the figure below demonstrates the general steps that adversaries must take in order to perform Process Injection or Reflective Code Loading [T1620], and the coverage / visibility that good endpoint detection and response (EDR) products have along the way.

TL;DR

With the myriad of publicly available shellcode loaders, broad detection mechanisms should be utilized to detect as many variations as possible. Most process injection techniques can be abstracted into memory allocation, write, and execution primitives to dynamically execute code. The more generic the abstractions that we can create, the more individual procedures that we can potentially identify.

Step 1: Load PE File From Disk

The first step that adversaries commonly employ is loading a PE file from disk to establish command and control (C2) [TA0011] communications. The implant loaded from disk can then callback to an attacker-controlled server, facilitating control over the compromised system. However, stage-less implants directly loaded like this can be easily signatured by AV. While some static indicators can be encrypted to hide their presence, excessively high file entropy can also be an indicator itself. Additionally, EDRs often perform automated dynamic analysis in a sandbox before allowing execution of suspicious binaries to take place. This naturally led to the development of "stagers", which are small programs designed to load and execute position-independent shellcode. Stagers decouple functionality and allow for retrieval of the shellcode at runtime, thereby bypassing AV scans. While impractical for some environments, application whitelisting or code signing enforcement can restrict the usage of unknown applications. This would prevent a stager from directly executing in the first place. While these defenses can potentially be bypassed via DLL Hijacking [T1574.001] and other techniques, for now, we will assume that the attacker is able to execute a stager on the machine with the goal of injecting code into a remote process.

Relevant Security Controls:

Step 2: Identify Sacrificial Process

Next, the attacker needs to identify a process to inject code into. While code can be reflectively loaded into the stager itself (the local process), attackers often use remote process injection to better mask execution and masquerade as a legitimate process. For stability reasons, many opt to create the desired remote process and spawn it at runtime, rather than potentially crashing an existing live process. Then, attackers (usually) need to acquire a handle for the target process to perform virtual memory operations in the context of the remote address space. So, what telemetry sources do defenders have for these actions?

Since most EDRs operate using a kernel-mode driver, they can register a set of custom kernel callback routines to get notified whenever certain actions take place. For example, EDRs can utilize the PsSetCreateProcessNotifyRoutine* family of functions to register a callback that is invoked whenever a process is created1. An EDR can use this notification as an opportunity to inject a hooking library into the new process, as a source of telemetry for certain sensitive API calls. Additionally, ObRegisterCallbacks() can be used to register a callback routine for thread, process, and desktop handle operations. On its own, these behaviors are usually not enough to trigger an alert. However, when combined with other indicators, the general process injection flow becomes clearer.

Relevant Security Controls:

Windows Kernel Callback Functions

Step 3: Allocate Virtual Memory

Depending on the injection method, memory usually needs to be explicitly allocated before shellcode can be written and executed. Windows provides several different memory allocation methods, each with slightly different functionality.

To allocate memory on the heap, HeapAlloc() or one of its wrappers ( GlobalAlloc() and LocalAlloc() ) can be used. Today, these wrapper functions are functionally equivalent and remain as artifacts from the old days of 16-bit Windows. The C runtime (CRT) provides malloc() and new, but they also internally call HeapAlloc(). For COM-aware allocations, CoTaskMemAlloc() or IMalloc::Alloc() with an OLE memory allocator can be used. VirtualAlloc() and VirtualAllocEx() are similar, but align allocations to page granularity (usually 64KB) and round up the length to the minimum page size (typically 4KB).

Despite the multitude of different options, most are simply wrappers and end up calling the same low-level implementations (as shown in this example call stack):

kernel32!VirtualAlloc()

↳ kernel32!VirtualAllocEx()

↳ ntdll!NtAllocateVirtualMemory()

Another important distinction for the VirtualAlloc* family of functions is that, by default, they (along with the NtAllocateVirtualMemory() NT API syscall) will treat executable pages as valid indirect call targets for Control Flow Guard (CFG). CFG, which is Microsoft's implementation of Control Flow Integrity (CFI), is an exploit mitigation feature designed to restrict arbitrary code execution by validating call targets using a bitmap. It is worth noting that CFG is only designed to limit exploitation of memory corruption vulnerabilities, as Microsoft exposes the SetProcessValidCallTargets() API for programs to manually designate call targets as valid or not.

Additionally, with the widespread adoption of Data Execution Prevention (DEP), stack and heap allocations are often automatically marked as non-executable in their corresponding page table entry (PTE) control bits.

3.1 Hooking and Syscalls

One way that EDRs can get telemetry from allocation events is via usermode (ring 3) API hooks. Once an EDR is notified of a new process creation, it can inject its hooking library and detour functions by overwriting their function body in memory. This strategy is known as an inline hook, but there are many other methods that could be employed as well, such as import address table (IAT) hooking or previously system service dispatch table (SSDT) hooking. AV and EDR vendors used to more readily patch kernel memory, but this led to system instability and insecure implementations. With the introduction of Kernel Patch Protection (KPP / PatchGuard) in Windows XP, security vendors have now been forced to migrate their hooks to usermode instead.

Using native system services routines (syscalls) instead of their corresponding WinAPI wrappers can bypass some high-level hooks, but they can ultimately be scrutinized by an EDR just as easily. If an adversary directly executes a syscall stub (from the exported function in ntdll.dll), the stub could be hooked just like any other function. Additionally, the EDR can analyze the call stack during execution of the syscall and see that it directly returns to user code (instead of through the normal wrappers), which is a high fidelity indicator of malicious activity. Some attackers may choose to embed custom syscall stubs inside an implant to bypass inline hooks. These stubs consist of a short function prologue in Assembly that sets up the required registers ( mov rax, <SSN>; mov r10, rcx ) before executing a system call. They have the added overhead of having to manually recover the system service number (SSN) of the desired syscall, in order to properly set the RAX register. If these stubs perform "direct" sycalls, by directly using the syscall instruction, then they can actually be caught by a simple static indicator. Only internal Windows libraries implement syscalls, so any other user binary with the syscall instruction ( 0x0F 0x05 ) in its .text section is likely to be malicious. Some attackers implement "indirect" syscalls, which jump to the address of a syscall instruction in ntdll.dll. While these stubs may bypass static analysis, they can still be caught using call stack analysis at runtime. Using Windows' internal instrumentation engine, EDRs can register a callback for the transition from kernel to user mode (by setting the KPROCESS!InstrumentationCallback field). Once the callback is invoked, the EDR can analyze the context for each syscall and check the RIP to determine if it is legitimate or not. Unless the call stack is artificially legitimized and spoofed (e.g. by using ROP gadgets or hardware breakpoints), then the syscall will still return directly back to user code and appear anomalous.

3.2 ETW

An alternative approach taken by some malware authors is to launch the sacrificial process in a suspended state, in an effort to beat the EDR before its hooks can be fully initialized (Process Hollowing [T1055.012]). However, using Event Tracing for Windows (ETW), EDR vendors can tap into the Threat-Intelligence (TI) log provider to receive telemetry without relying on hooking.

This provider generates several relevant events:

THREATINT_ALLOCVM_LOCAL
THREATINT_ALLOCVM_REMOTE
THREATINT_FREEZE_PROCESS
THREATINT_MAPVIEW_LOCAL
THREATINT_MAPVIEW_REMOTE
THREATINT_RESUME_PROCESS
THREATINT_RESUME_THREAD
THREATINT_SUSPEND_PROCESS
THREATINT_SUSPEND_THREAD
THREATINT_THAW_PROCESS
…

While ETW largely operates at the kernel-level, some events are sent from userland via ntdll!EtwEventWrite(). As a result, implants may patch this function in memory to disable some ETW providers. However, telemetry from ETW can be used to detect this tampering (via the THREATINT_PROTECTVM* and THREATINT_WRITEVM* events), which may actually increase the chances of detection. In some cases, attempting to unhook or disable security controls unintentionally results in an increased likelihood of detection. In social psychology, this is known as the boomerang effect.

Now, back to memory allocation. Since EDRs have plenty of data surrounding explicit allocation, one alternative is to perform actions that have the side effect of allocating memory, like sending messages to a graphical window message queue (Shatter Attacks), or stuffing shellcode into the environment strings of a child process. Other techniques, such as enumerating existing PAGE_EXECUTE_READWRITE protected memory pages (Mockingjay), overwriting linker padding in a PE (Code Cave), or abusing the shared Extra Window Memory region of Explorer's tray window (Extra Window Memory Injection [T1055.011]), can even take advantage of existing memory without the need to explicitly allocate it. These techniques are much more difficult to atomically detect since they depart from the normal process injection paradigm. From a defensive perspective, allocation events by themselves present far too much noise to be a reliable indicator of process injection. But, they can help paint a full picture, especially when using temporal correlation to observe and link other injection steps.

Relevant Security Controls:

API Hooking
- Inline Hooks
- IAT Hooks
Call Stack Analysis
- Process Instrumentation Callback
Event Tracing for Windows (ETW)
- Microsoft Threat Intelligence (ETW-TI)
  - THREATINT_ALLOCVM_LOCAL
  - THREATINT_ALLOCVM_REMOTE
  - THREATINT_MAPVIEW_LOCAL
  - THREATINT_MAPVIEW_REMOTE

Step 4: Write Virtual Memory

After suitable memory has been identified and/or allocated, the payload can be written. This is usually performed using WriteProcessMemory() or its NT API equivalent, NtWriteVirtualMemory(). While most loaders simply use PAGE_EXECUTE_READWRITE protected memory pages, PAGE_READWRITE can also be used if the protection is changed after the data is written (using either VirtualProtect() or NtProtectVirtualMemory() ).

Most of the same telemetry sources are applicable from the previous section. Using the ETW-TI provider, EDRs have visibility into memory writes as well as protection modifications via THREATINT_WRITEVM* and THREATINT_PROTECTVM* events. But, before discussing specific detection mechanisms, it’s important to understand the different types of memory first.

On Windows systems, memory can be marked as any of the following:

MEM_FREE
- Unused physical memory
MEM_RESERVE
- Virtual memory that has been reserved for future use
MEM_COMMIT
- Virtual memory that has been committed and assigned physical storage
MEM_PRIVATE
- Private memory that is not shared between processes
MEM_MAPPED
- Shared memory that is mapped into the view of a section object
MEM_IMAGE
- Shared memory that is mapped into the view of an image section object

For now, we will ignore the first 3 since we're more interested in the type of memory rather than it's current state. Most dynamically allocated memory, like those resulting from the previously mentioned allocation functions, typically falls under the category of private memory. Consequently, it is almost always protected as PAGE_READWRITE, which aligns with its usage as stack and heap storage. Private memory is rarely ever marked as executable, with the exception of JIT compilers in a web browser or .NET Framework Common Language Runtime (CLR) allocations. Mapped memory, on the other hand, often originates from a file on disk via CreateFileMappingA() or NtCreateSection(). After using these functions to create a shared mapping/section object, a process can map a view of the section using MapViewOfFile() or NtMapViewOfSection() to interact with its contents. If the mapping object is backed specifically by an executable file and was created using the SEC_IMAGE flag, then subsequent views are marked as MEM_IMAGE regions instead of MEM_MAPPED. Since MEM_IMAGE regions originate from an executable file on disk, page protections for these views are determined by the PE itself from the permissions listed in the section table. Naturally, MEM_IMAGE blocks receive the least amount of scrutiny, since their contents were likely already scanned by AV on disk.

Since image section views are usually where executable code resides, what's stopping an attacker from overwriting it with shellcode? The answer lies in a mechanism called "copy-on-write". When a DLL is mapped into memory, Windows employs a resource sharing technique called "copy-on-write" to optimize memory management. Subsequent loads of the same file will be backed by the same shared memory pages, with a transparent "copy" made if a process tries to modify the shared region. The original page remains unmodified, while Windows creates a private copy of the page for the process to write to. This reduces overhead and avoids unnecessary duplication of unmodified pages across multiple processes. Since executable code segments are marked as PAGE_EXECUTE_WRITECOPY, once an attacker modifies the page, the Shared bit in that page's extended working set information will be cleared2. This detection methodology (unshared MEM_MAPPED or MEM_IMAGE pages) can also be used to detect hooks and other memory patching, like disabling ETW.

One injection technique that works around these limitations is Process Doppelgänging [T1055.013] / Phantom DLL Hollowing. This technique abuses transactional NTFS (TxF) by opening an isolated file handle to alter the .text section of a DLL. This is done without ever flushing the changes back to disk, and occurs before the view is mapped into memory, making detection much more difficult. However, due to the isolation provided by the TxF transaction, this technique has the unique side effect causing calls to GetMappedFileNameW() to fail when attempting to query the name of the file associated with the image region. Additionally, the MmDoesFileHaveUserWritableReferences() function can be used by an EDR to determine if there are any writable references to the file object of a section (broken section coherency).

Other detection logic is largely focused on private memory. Using the data sources described earlier, EDRs have visibility into the parameters passed to memory management functions. This is sufficient to cover certain anomalous behaviors, like private RWX allocation or fluctuating memory protections (RW ⇄ RX). Contextual behaviors like these can elevate a process' risk score, and trigger further investigation. One such investigative tool is memory scanning. Since full memory scanning is far too resource intensive, EDRs often rely on event-triggered scans. For example, an EDR could choose to scan the buffer being written to memory if the pages being written to are executable. These scans can be used to detect PE Injection [T1055.002] by searching for a PE header ( MZ ) found in private memory, which indicates that an executable file was loaded in an abnormal way (and not via LoadLibrary()). Another potential indicator is buffer size. As noted in the previous section, VirtualAlloc() rounds the allocation up to the minimum page size. Since most programs don't need to write large chunks of memory to a remote process, the vast majority of legitimate remote memory operations are performed on a single page. Shellcode, on the other hand, can be much larger — especially when using an off-the-shelf framework like Metasploit or Cobalt Strike.

Relevant Security Controls:

API Hooking
- Inline Hooks
- IAT Hooks
Call Stack Analysis
- Process Instrumentation Callback
Event Tracing for Windows (ETW)
- Microsoft Threat Intelligence (ETW-TI)
  - THREATINT_PROTECTVM_LOCAL
  - THREATINT_PROTECTVM_REMOTE
  - THREATINT_WRITEVM_LOCAL
  - THREATINT_WRITEVM_REMOTE

Step 5: Execute Payload

After staging the payload in memory, the final step is to trigger execution. There's a wide variety of execution primitives, with the most common being CreateRemoteThread() / RtlCreateUserThread() / NtCreateThreadEx(). These functions simply create and insert a new thread into the target process, with the specified starting address. In a "classic" DLL Injection [T1055.001], LoadLibraryA() is used as the starting address. Using the PsSetCreateThreadNotifyRoutine kernel callback, these techniques can be detected by determining if the starting address points to private memory or a suspicious trampoline function.

Another method is to hijack the state of an existing thread using the SetThreadContext() API (Thread Execution Hijacking [T1055.003]). This function modifies the register state (context) of a suspended thread, and can be used to redirect execution flow by directly setting the RIP register. However, it is primarily only used by debuggers and can be caught using API hooks or THREATINT_SETTHREADCONTEXT_REMOTE ETW-TI events.

Existing threads can also be used to execute an asynchronous procedure call (APC) by using QueueUserAPC() or NtQueueApcThread() to insert a user-mode APC object into the thread’s APC queue (Asynchronous Procedure Call [T1055.004]). This will cause the thread to execute the specified APC the next time it enters an alertable state. ETW-TI also provides visibility into these events via THREATINT_QUEUEUSERAPC_REMOTE logs.

Message hook functions, like SetWindowsHookEx() or NtUserSetWindowsHookEx(), are yet another option. These functions install a custom hook procedure into the hook chain, which triggers execution whenever the specified WH_* event occurs3. Due to their widespread abuse by keyloggers, defensive tooling often has the capability to detect suspicious message hooks by enumerating gSharedInfo members, performing API hooking, or even keeping a stateful list to determine anomalies4.

Lastly, a large class of injection methods involve overwriting a pointer to code (often a callback). These methods take advantage of the fact that many pointers to code are stored in writable memory, and overwriting these pointers can redirect the execution flow to arbitrary locations when the callback is triggered. This includes attacks that abuse Window subclassing (PROPagate), window message handlers (ConsoleWindowClass), PE entry points (AddressOfEntryPoint), thread local storage (TLS) callbacks (Thread Local Storage [T1055.005]), control signal handler callbacks (Ctrl-Inject), the KernelCallbackTable PEB member, and many many more. While there's an almost innumerable amount of execution primitives, most can be detected using a combination of API hooking, validating the target address of a remote memory writes, as well as monitoring new threads that originate from unbacked memory.

Relevant Security Controls:

API Hooking
- Inline Hooks
- IAT Hooks
Call Stack Analysis
- Process Instrumentation Callback
Event Tracing for Windows (ETW)
- Microsoft Threat Intelligence (ETW-TI)
  - THREATINT_PROTECTVM_LOCAL
  - THREATINT_PROTECTVM_REMOTE
  - THREATINT_QUEUEUSERAPC_REMOTE
  - THREATINT_READVM_REMOTE
  - THREATINT_SETTHREADCONTEXT_REMOTE
  - THREATINT_WRITEVM_LOCAL
  - THREATINT_WRITEVM_REMOTE
Exploit Protection [M1050]
- Control Flow Integrity (CFI)
  - Control Flow Guard (CFG)
- Data Execution Prevention (DEP)
Windows Kernel Callback Functions
- PsSetCreateThreadNotifyRoutine
- PsSetCreateThreadNotifyRoutineEx

Technically, this callback isn't invoked until the first thread is created and inserted into the process, see NtCreateProcessEx()

https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualquery#remarks

https://learn.microsoft.com/en-us/windows/win32/api/winuser/nf-winuser-setwindowshookexa#parameters

https://github.com/rajiv2790/FalconEye

Black Lantern Security (BLSOPS)

Discussion about this post