More Like This
-
Capability Overview
Cyber Resilience
-
Product / Service
Penetration Testing Services
This is Part I of II of the DUALITY blog post series. The first blog post introduces the concept of DUALITY, which is a methodology and pipeline for backdooring multiple DLLs on the fly so they are able to re-infect each other if infections are lost due to program updates. DUALITY’s custom logic can be modified by the operator to do other fun things as well. This blog post frames DUALITY in the context of red team persistence and uses relatively primitive techniques for process injection that will get detected by mid- to advanced-level AV / EDR.
Part II of the series will demonstrate using DUALITY for initial access, as well as bringing the tradecraft up-to-par to utilize indirect syscalls by packing a clean a NTDLL.dll into backdoored DLLs as a section, aided by the staging process (grabbing ntdll.dll from target hosts). We will also modify some items in the compilation pipeline such that DUALITY is more difficult to signature on disk. Note that the codebase that will be publicly released to accompany this blog post is only for Part I of the series. A competent malware developer can extend the code from Part I to perform Part II’s capabilities. We did this because researching and implementing novel tradecraft for modern red teaming is time consuming, and we also did not want to publish a fully weaponized, easy-to-use version of DUALITY to the public.
Five weeks ago, your initial foothold was provided by a slick colleague (alias Dark Marina) who was able to physically penetrate one of the target organizations’ HQ offices in a big downtown building. She entered the building with a carefully crafted outfit after performing reconnaissance against the target organization’s staff a few days beforehand.
She used a badge cloner hidden in her tote bag on the ground floor while standing two feet away from an office worker with a visibly exposed badge. She pleasantly talked her way out of the conversation and went to use the restroom. Five minutes later, armed with a fake badge, she walked through the ground floor waist-high turnstile gates. She then had access to the elevator and the company’s rented out office space on the 22nd floor.
She walked in confidently, and amidst the post-pandemic work-from-home ambiguity, she found an empty workstation. She worked her magic with a network port and some custom hardware.
She lobbed you a quiet form of initial access, and even more quietly exited the vicinity. She was never there.
Now it’s your turn.
Your target is the CEO’s work laptop, and your goal is to collect information over a long period of time as the CEO takes this laptop with him on trips around the world. Calendar events, emails, access to all files on the laptop, credentials (passwords, tokens), the ability to ride on session tokens / other credentialing mechanisms from the vantage point of the compromised machine.
The organization sports an EDR solution. They implement custom detections on services, scheduled tasks, auto-startup file locations, and other low-hanging areas for persistence. Sneakily loading signed vulnerable drivers might be an option, but let’s assume you’re not interested in burning a novel driver 0-day for this gig.
After some intense Active Directory Ashi Garami, while avoiding rustling the SOC’s feathers, you find yourself with an opportunity for a foothold on the CEO’s machine and let your client know. They tell you to proceed with long-term information gathering. So far, you’ve been operating from a few Unix-based boxes, including your own custom hardware that Dark Marina dropped off earlier.
Oddly, it looks like the CEO also has Python installed on their machine.
You explore your persistence options. LOLBAS aren’t really going to work here because this SOC knows what they’re doing, so spawning MSBuild.exe can be an operational catastrophe. You’d also rather not risk running processes like rundll32, even if short-lived – the environment itself doesn’t present too many options for indirect code execution. You still would like to execute from a native code perspective, preferably from the context of some application, as to avoid loading or finding the .NET CLR to execute C# code.
You realize that DLL proxying has been a thing as well as the backdoor factory, although neither have the capability to outlive program updates, as they are not operating at that scope. It may entirely be feasible to craft a beacon proxy DLL, though still, you need to think about long-term persistence, especially if this program happens to update frequently.
What if you had the ability to just backdoor two or more arbitrary program DLLs that provided you with 2-in-1 code execution and long-term persistence? What if this operational ability also allowed you high customizability over tradecraft like process injection techniques and the manner you interact with WinAPI in general from a native perspective?
What if all this technique did was swap out two (or more) dependency DLLs that relied on each other to keep each other (and all backdoored DLLs) infected through program updates for this long-term persistence requirement? And what if you could, with one command, remove all backdoored DLLs and restore the system to its pre-modified state?
Allow us to introduce DUALITY.
The root issue is that DLL signatures are not checked by most programs when they load dependencies. They are haphazardly loaded and unloaded multiple times during program execution. Some EDR solutions will use unsigned DLLs as a heuristic for increased suspicion, but rarely are they a primary heuristic in determining malicious activity. This is because companies very often use unsigned DLLs of their own tooling internally.
The following screenshot shows a signed version of a “ffmpeg.dll” that Microsoft Teams uses.
The next screenshot shows one that has been tampered with.
If you start Microsoft Teams with the tampered version, it will gladly load any file called “ffmpeg.dll” from that directory.
Before we dive in, we should give DLL proxying its due. You can theoretically implement similar DUALITY behavior using only DLL proxying, but you lose some of the control you would have had you went the shellcode route. Cobalt Strike’s website from the link above (here again for reference) contains several great diagrams that explain the process.
Ultimately our goal with DUALITY is to have high control over our payload (to the level of machine code obfuscation) and implant logic, which is why we opt for the pure shellcode / assembly route.
The Backdoor Factory (BDF) provides several capabilities for backdooring PEs in different architectures, as well as Mach-O files; however, the original intention of that tool was to repurpose OnionDuke, which was Tor-based malware from that time. There was no provided pipeline to use in terms of a C2 capability. That is not to say that it is not possible to repurpose BDF for use in a pipeline, but in this case, it made more sense to start fresh, as there are some capabilities in BDF we don’t need, and there are other capabilities we do need on top of regular backdooring.
With regards to BDF, DUALITY’s goal is to be more immediately focused on red team engagements, while implementing the special concept of DUALITY between DLLs.
Duality is a concept where backdoored DLLs keep an implant alive, while keeping each other infected through program updates / changes. Its particular implementation here is not necessarily the only way to go about this, although the implementation here will sufficiently demonstrate the idea.
In the proof-of-concept code / package, there are multiple items in-play. We present a C# program capable of backdooring DLLs with custom shellcode that it compiles, a template to use for creating custom shellcode containing DUALITY’s logic, as well as a Cobalt Strike aggressor / assistant VM pipeline that can immediately operationalize this capability on a red team engagement. On an engagement, all we do is swap around files for long-term persistence, even if the targeted programs are already running.
To demonstrate this proof-of-concept, we will backdoor DLLs with custom shellcode that performs the keep-others-alive check and executes custom process injection shellcode. More sophisticated implant logic can be implemented if desired by the astute capabilities' developer.
There are three variations of this capability:
The following flowchart may help illustrate a DLL’s logic when it is backdoored with custom DUALITY shellcode.
Let’s step through a sample execution workflow.
First, a backdoored DLL is loaded by the primary EXE. For example, Microsoft Teams or Slack loads “ffmpeg.dll” from a low-privilege location.
Since DLLs can be loaded and unloaded multiple times during program execution, we need to make sure that our check runs once per each DUALed program’s lifecycle. Otherwise, the program will hang and cause usability issues to the victim user, as there is some overhead associated with each implant logic execution, and a race condition may be triggered if the implant logic tries to execute against another instance of itself. To avoid this, we create a mutex object for our “keep-alive” check. Next time the DLL is loaded, we will check for this mutex object. If it’s there, we already checked if the other DUALs are alive during this program’s lifecycle.
If the “keep-alive” is not present, it means this DLL was loaded for the first time during the program’s lifecycle, so we create the mutex and proceed to check on the other DLLs. If they are not infected, we check to make sure that the same file is still present in the affected directory. This is because if a program has gone through a major architectural change and the DLL is no longer used, we won’t re-infect a non-existent DLL. In this case, the same file is not present, we operate in SINGULARITY mode. This means that only one DLL is now infected and there are no DUALS. If the same file is present and un-infected, however, we re-infect it.
If a checked file is infected, it indicates a state where we can back up the currently running / malicious DLL as well as all other infected DLL to some location. We do this for every DUAL, including the redundant process of backing ourselves up. For this proof-of-concept project, we will back up the backdoored DLLs to the user’s TEMP folder.
Following this, we check if the “implant mutex” is present. If it’s not, it means that between all the backdoored DUALs, none have yet executed the implant. In that case, we create the mutex and bring the implant to life. If the mutex does exist, then ideally the implant should be up unless the operator intentionally terminated it.
Note that in this case, there is a check mutex created for each backdoored DLL / program lifecycle, and there is one implant mutex created for ALL DUALS involved in keeping the implant alive.
To pull this off, we have several moving parts.
This is easy enough to generate. In Cobalt Strike, we’ll click on the “Payloads” menu > Windows Stageless Payload, and then generate the “Raw” version of this payload. This is demonstrated in the next two screenshots.
Let’s look at the custom shellcode portion. This file will be compiled to become position-independent code (PIC) that will be added as a section to a targeted DLL file. Note that the same custom shellcode will be injected into each targeted DLL. We will borrow here from hasherezade’s project, masm_shc and the relevant methodology. The idea here is that we can write carefully written C code that does not rely on the C runtime or on any other external libraries.
Then we compile this code in a special manner while avoiding certain compile-time modifications like buffer checks.
We then modify the assembly using “masm_shc” to inline strings if they need to be, and we convert all jumps to long jumps.
Finally, we link the custom shellcode into an executable with a custom-defined entry point that will align our 64-bit stack to be ready for WinAPI jiu-jitsu.
We will then extract the assembly from the .text segment of the resulting executable which will be our PIC shellcode.
In order to perform various operations within the SCC file, we will need to dynamically resolve WinAPI functions. The astute hacker here will wonder about the efficacy of the dynamic module and function resolution process, as this is exactly where some portion of EDR heuristics comes into play, due to function hooking. Rest assured however that the process is entirely customizable, and we will recommend methods to perform bypasses towards the end of the post. For now, we will rely on the functions provided in “masm_shc”, particularly the functions inside peb_lookup.h.
To dynamically resolve API functions, we perform the following calls. “get_func_by_name” is defined in peb_lookup.h.
An example of the mutex check we perform is shown below. We open a mutex and then use whether it is found or not as part of our logic. In this case, our “keep-alive” mutex has the variable name “cMutex”.
Another example of the mutex usage is when each backdoored DLL checks a mutex to see if any of the other programs has brought the implant up yet. Note that the mutex names can be changed as well. In this case we use “Local\\tm22s”, intentionally hardcoded. Keep this in mind for Part II of the blog post series.
The DUALITY logic that runs prior to process injection is depicted below.
Assuming we are not operating in SINGULARITY mode (we have at least one other DUAL), and the check mutex is not present, we go through each DUAL file and check to see if it’s infected. In this case, we are looking for a section called “.duality”. This section name can obviously be changed and randomized with each iteration, but the proof-of-concept does not implement this (nor sophisticated process injection) currently as Anti-Skid Protection (ASP).
Following this section of code, if the file is present but uninfected, we restore the infected file from the user’s TEMP directory, which should be backed up at this point from the initial execution.
Before we get to process injection, we need to grab the shellcode from a section in the DLL that we will be adding (and explaining in a later part of the post).
Now, the “FindSectionInAllModulesCurrentProc” function is implanted in the “helper.h” file that the SCC file includes. This function can be seen below.
Remember, we can’t use the C runtime or any other libraries. As an aside, ChatGPT wrote most of that function.
The process injection technique can be seen below.
We create a process, in this case Microsoft Edge, and write our shellcode as we decrypt it over the entry point. Then we resume the process.
There are several placeholder variables in the SCC file that get changed on every new compilation of DUALITY. The following screenshot depicts them.
Keep these variables in mind as we will discuss them when we go through the DUALITY C# portion of this pipeline.
In order to make use of this SCC file, the C# program must perform the following sequence of events:
If the second point seems a little confusing, let us try to explain now, although it will all make sense at the end.
When you ask DUALITY.cs to backdoor a DLL, it needs the actual path to the filename that it will backdoor on the victim’s machine, and it also needs the path that this DLL will exist in on the machine where DUALITY will run. Rather than specifying two variables to hold this information, we can provide this information in the DLL name itself.
For instance, we want to backdoor a DLL that is present on “C:\Users\hacker\Desktop\ffmpeg.dll”. This DLL was extracted from the victim’s machine from the path “C:\Users\victim\AppData\Local\Microsoft\Teams\current\ffmpeg.dll”.
The actual file path to the DLL on the hacker’s desktop would then be:
C:\Users\hacker\Desktop\06485796515437264859_____c-__--_-users-_-victim-_-AppData-_-Local-_-Microsoft-_-Teams-_-current-_-ffmpeg.dll
The 20-digit prefix is a unique identifier to each DUALITY.cs compilation event (which involves multiple DLLs). Then there are five underscores to split for the path. Then any special characters are replaced with a sequence of hyphens and underscores (“:” is replaced with “-__-“ and “\” is replaced with “-_-“) to make less room for confusion for any file upload / download functionality in the pipeline that will handle this file.
When testing locally, it’s important to note this format, as it’s the same one used when DUALITY is operationalized. For now, if the naming format is confusing, it too will become clearer when we get to the portion of operationalizing DUALITY in this post.
This portion of DUALITY is responsible for patching each DLL. This includes modifying the entry point (although that can also be customized to patch a specific function inside the DLL), adding sections, and tying everything together so execution is handed back to the DLL successfully.
Currently, DUALITY.cs handles a subset of all DLLs based on the entry point opcode sequence, which is a MOV instruction with two starting opcodes of (0x48 and 0x89). This handles a good percentage of the DLLs we are interested in targeting. If DUALITY can’t handle backdooring a certain DLL given current capabilities, it will simply return the DLL as is. There is a program called “LowDLL.exe” that will be released with this project to help you identify DLLs to backdoor. We will discuss this project later on in the post. The link to this project’s code can be found towards the end of this post.
Let’s go through the major steps in the DUALITY.cs program. First, we acquire the raw shellcode and correctly formatted DLL locations. Note that the check for correct formatting is weak here, as typically the Cobalt Strike aggressor script to go along with this project will format the filename. For testing / development purposes, copy the original DLL to a DLL that has the original’s path encoded in the name.
The variable names are long and hopefully descriptive. The list of “originalDLLsVictimMachineFilePaths” is the list of DLL path strings of where the DLLs originally resided on the victim’s machine. The “preBackupOriginalDLLsLocalFilePaths” is the list of DLL path strings of where the DLLs exist for DUALITY.cs to handle them. For our pipeline, this will be on a desktop on the operator’s “assistant VM”. We’ll get to what that is in the operationalization section.
Here's the next sequence of events.
Step number four is where all the magic happens – let’s take a look in detail.
In this project, we heavily rely on PeNet, which is a parser for Windows Portable Executable headers. It’s written completely in C# and does not rely on any native Windows APIs. This is a dependency / C# reference for DUALITY.
The BackdoorDLL method assumes you have Visual Studio 2022 installed with the necessary compilation and linking programs available. Moreover, it uses “masm_shc.exe” from hasherezade as a dependency. This can be seen in the following screenshot.
At this point, we have our encrypted shellcode and set up a few variables for the compilation process. Before we compile, we need to prepare our SCC file. Remember how earlier we mentioned we need to replace some variables in the SCC file for every new generation of DUALITY? Here is where it will happen. Note how several items are not changed and are intentionally left hardcoded for this version of DUALITY.
Now our SCC file is ready, and we are ready for compilation. The compilation process is depicted in the next image.
01 We begin the compilation process by specifying our commands. First, we set our environment variables, then we specify the “cl.exe” command string. Note that the command string specifies several key flags:
We then configure our “masm_shc.exe” command. We simply specify the input file and an output file path for the assembly listings, and we let it do its magic, particularly where it includes the “AlignRSP” function as well.
Finally, we specify our linking command. In this command, we instruct the linker that our entry point should be the “AlignRSP” function.
02 We create a list of our commands to be run sequentially, then we run them.
03 After “masm_shc” does its magic, we replace any SHORT jumps to longer jumps simply by omitting the keyword. Technically it should become a “jmp NEAR”, but it works without specifying “NEAR” as well, as the jumps are still relative rather than absolute. Then we remove references to “OFFSET FLAT:”. "OFFSET FLAT:" is typically seen in assembly listings for programs that are being compiled for a flat memory model. In a flat memory model, there is a single address space that is accessible to the program, and each memory location is accessed using a linear offset from the beginning of the address space.
The "OFFSET FLAT:" directive is typically used to specify the base address of the program's code or data segments in the flat memory model. This directive is usually located at the beginning of the assembly listing, and it tells the assembler to use a fixed offset value for all memory accesses. This is not going to be necessary for our position-independent shellcode.
04 We then perform the linking to finally create the executable where the “.text” section contains our PIC shellcode.
At this point we have to extract the PIC from the .text section.
This is relatively straightforward to understand from the code listing; however, note that we trim a bunch of null bytes at the end that are likely an artifact of the compilation process adhering to some size modulus. It’s a good space-saving measure to trim most of the zeroes and leave just a few in there.
Now that we have the PIC shellcode ready, we can begin the backdooring process by performing surgery on the target DLL. This is going to be a slightly more convoluted process, so we will go through smaller sections of code at a time.
To start off, we must first get the static offset in the DLL file to the “.text” section. As an aside, “static offset” means the offset to a location in the DLL when the file is dormant on disk. The “dynamic offset” refers to the offset to a certain location in the file when the file is live in-memory. The following screenshot depicts how we get the static address, as well as the actual usage of that function.
Following this, we’re going to make sure that the entry point contains a code sequence that we have the capability to backdoor.
Theoretically, you can backdoor most DLLs, but you’ll need to accommodate for varying types of entry point instructions (and argument sizes to the primary opcode sequence). For this red team focused use-case, we can pull this off without needing to write (or import) a disassembler. The vast majority of DLLs appear to start with a “0x48 0x89” opcode sequence which is a “mov qword ptr [rsp+x], <register>” type instruction. This results in a consistent length of 5 bytes that we can replace with a jump instruction of exactly the same length. If the first instruction was instead 3 bytes long, we would need to do a few other things to create more space or change entry point locations. For this proof-of-concept, we’re going to target what the vast majority of DLLs use, which is a “0x48 0x89” opcode at the entry point.
When we get to the operationalization section of this post, we will discuss a tool accompanying this project we wrote called “LowDLL”. Part of this tool’s responsibility is to figure out if these two bytes are present at the entry point as well before suggesting that they are susceptible to DUALITY. If we don’t satisfy the requirement of these two opcodes inside DUALITY.cs, we spit the file back out to the user. This is an important step, because when we operationalize DUALITY with multiple DLLs, we don’t want to break the compilation process in case the operator selects an incompatible or already backdoored DLL.
Next, we need to find places where we can stick our “pre-shellcode stub”, also referred to as the “prep stub” in the code. This is going to be the stub that stores our register states, calls our “.duality” section (we’ll get to that in a bit), restores register states, performs the instruction that we will overwrite at the entry point, and then jumps back to near the entry point to hand execution back to the DLL.
The above screenshot contains the “FindABunchOfSpace” function definition as well as its usage. We need to find space that is at least as big as our pre-shellcode stub. There are two categories of “space” – null bytes (“0x00”) and “0xcc” opcodes, which are compiler padding between sections of code and would trigger a breakpoint if you run into them from a debugging perspective (you’re not supposed to run into them in normal execution unless you build your code with this type of breakpoint built in).
A very common place to find space for this pre-shellcode stub, which is rather small (75 bytes), is what we refer to as a “code beach” as opposed to a code cave. Code caves typically imply that you’re surrounded by other code of similar permissions; code beaches are parts of memory at the end of a section in a PE in memory that are usually filled with “0xCC” opcodes or “0x00” by the compiler to stick to some size modulus during compilation. For example, if a section size on disk is 0x981, the PE loader in memory is not going to stick the next section at 0x0982. It’s going to be at an offset of something that ends with 0x00 or 0x0000, depending on what it’s laying out in memory. So the next section might be at 0x2000, just as an example to demonstrate the space between the two sections.
During runtime, the OS will also pad these with “0x00” for memory page size modulus purposes as well, although the runtime paddings are not necessarily usable. It is important to stick to the size of the “code beach” from the perspective of the static file. After the code beach, you may find either unallocated memory or the start of another section. The distinction here is if you start another code section like “.data” after a beach, you can’t guarantee the same memory permissions, whereas you can sometimes slither between code caves without worrying much about changing permissions (like going from an executable section to a read/write only section)
The following two screenshots demonstrate a code beach followed by a code cave.
Can we just use a bunch of caves and slither between them? Yes of course, but we don’t need to do that for now. As an aside, the backdoor factory already does code cave jumping.
Now that we have some room to stick our pre-shellcode stub, we can do a little bit of math to calculate the jump distance from the entry point to this code beach. We do a bit of byte jiu jitsu and then start patching the DLL.
Our entry point is now patched. We now add two sections to the DLL. The “.duality” section and the “.ensc” section, both intentionally hardcoded with these section names. The “.duality” section is what controls our shellcode, in terms of doing things with it, such as process injection. We already discussed the logic of this section when we talked about the “SCC” file above. The “.ensc” section is the encrypted implant shellcode section. Note that the section names are both intentionally hardcoded. There are several parts in this workflow with intentionally hardcoded opportunities for basic detection. If interested in using this tool, the astute operations architect will be wise to make modifications to the proof-of-concept in a few places, beyond just the function resolution and process injection techniques, which will be expanded upon in Part II of this blog series. These are two of them. The following screenshot depicts adding the sections.
The “AddAndWriteSection” is rather straightforward, so we won’t discuss it further. It heavily relies on PeNet.
We now need to make modifications to our “pre-stub” template. This is what the template looks like.
Note the two sets of opcodes. The first one sets RAX to the next opcode’s address, loads into RAX how much farther we have to jump to get to our “.duality” section, then calls it. So, we’ll need to do some math here. The second set of opcodes performs the instruction that we overwrote from the entry point, so we’ll need to replace it with the actual instruction, and finally we hand execution back to the DLL entry point by jumping back to the right place. In our linking process, we made sure that the start of our DUALITY code section aligned our stack for 64-bit interaction with the Windows API down the line.
The code that modifies the template follows.
Finally, we patch in our pre-shellcode stub to the targeted DLL.
And that’s it. The DLL is written to disk to either be manually used by the operator or to be used by the next section, which will be the operationalization pipeline.
In order to effectively use DUALITY on an engagement, we ideally want to be able to use it from a C2 perspective and from an internal network access perspective. For red team persistence purposes, we’ll focus on the former, and we’ll operationalize through Cobalt Strike specifically. Part II of the blog post will demonstrate usage for initial access.
There are several ways to implement an operationalization pipeline. harmj0y from Specter Ops gave a great talk about using DevOps for offensive operations called OffSecOps. The implementation we will discuss here relies on using an “assistant” Windows VM (although it can be a Windows server somewhere) that the operator has access to. This will enable the operator’s Cobalt Strike instance to communicate with a DUALITY implementation on a Windows machine through the use of an aggressor script, which is Cobalt Strike’s built-in scripting language.
Our current implementation in Sleep is not pretty, as we are aware that there are smarter ways to handle callbacks, but documentation and examples are scarce enough that even ChatGPT (using GPT-3.5) either simply says it has no idea what we are talking about, or it starts coding in an entirely fictional language when asked to code a basic string concatenation function. Comparatively, it would have had no problems had we asked it to code it in Python. Note that this blog post was written before more refined versions of ChatGPT came out.
If asked to do that in Python, ChatGPT tells me to just use the “+” operator and then politely provides a function anyway. That’s the difference in language popularity.
Moving on, this diagram shows how our operationalization should work in theory.
We first get the target DLLs from the victim machine by downloading them to the Cobalt Strike teamserver and then synchronizing those DLLs to the attacker’s local machine. Interactions between the teamserver and the “Assistant VM” are automated with the “DUALITY.cna” Cobalt Strike aggressor script. Using the aggressor script, we reach out to the “Assistant VM” and provide the DLLs to backdoor. Within the “Assistant VM”, the C# Duality program ingests the DLLs and backdoors them with the logic defined in the SCC file. This file contains the primary DUALITY logic as well as the process injection technique. The backdoored DLLs are then obtained by the aggressor script from the “Assistant VM” and uploaded to the victim machine using the DLL swapping technique to allow for swapping even if programs are running.
In practical step-by-step terms, which heavily mirror the theoretical diagram from above, the following takes place:
Additional things to keep in mind.
Before we jump into the pipeline, let’s address these points.
There is a trick in Windows where you can replace a DLL or EXE in use without deleting it. If you try to delete the file or copy over it, it will fail for a chain of reasons. However, you CAN rename a file while it is in use without terminating the program. So, we can rename the original DLL and slip in our backdoored DLL in its place. A handle will remain open to the currently open file by the running program. The program will still need to restart in order for the backdoored DLL to be consumed by the main program. This can be performed forcefully by the operator (or DUALITY logic) by terminating the process and restarting it, or it can happen naturally on the next program usage after exit.
That DLL we renamed is the original untouched version! So that satisfies needing a backup as well. Two birds one stone.
If we are really inclined, to get a minor advantage over DLL proxying in this specific context, we can send the original off to some other directory to make the current directory seem entirely untouched, especially when paired with a time stomp, but this functionality is not in the current proof-of-concept. It is easy to implement however (see my note above about astute operations architects).
Let’s go through the execution of the CNA (aggressor) script. For this example, we are operating from a foothold on a machine with regular Defender installed.
First, DUALITY.cna will ask you to specify which listener’s shellcode you want to use. In this case we have one listener so we will select that.
We then select from a list of DLLs that we can potentially backdoor.
This part is a little tricky. In Part II of the blog post, the stager will perform the part of low-privilege DLL reconnaissance without relying on a pre-set list. This is extremely useful for initial access. In this blog post, we will use a pre-set list.
For opsec, we won’t have the script actually go around getting handles on files to see if we can backdoor them. We want to know ahead of time what common programs from a low-privilege perspective can be backdoored. We already know most orgs have to use some common programs like Microsoft Teams, Slack, Zoom, Wire, etc… Before the operation, we want to install all these common programs on a machine and see what we can do with them. This is where “LowDLL” comes in. This program (really more of a wrapper) uses “Listdlls64.exe” from sysinternals as a dependency (although we could rewrite the program without the dependency, but “listdlls” does the core job well here). When you execute this program, it will show you which of the currently running programs’ DLLs are reachable by a low-privilege user.
So to get a listing, we will setup a machine with Slack, Teams, and other common communications programs and run all of them. Then, as a low-privilege user, we will run “LowDLL”. Here are a few screenshots of the output.
At first, we’re greeted with three interesting output types. If the highlighted DLL is red, it means we can backdoor it with DUALITY. If it’s yellow, it means it is 64-bit, but we don’t currently handle the entry point opcode sequence. If it’s blue, it means it’s a 32-bit DLL although it is reachable by the current low-privilege user.
The following screenshot shows some items we could backdoor.
Discord actually uses 32-bit DLLs in the low-privilege locations we are interested in.
The useful part of this as it pertains to our CNA file is at the end.
LowDLL will generate an array for us in Sleep syntax to throw in our CNA file with an arbitrary username “SOMEDUALITYUSER”. Then our CNA script will replace the actual current username of the victim we are operating under with “SOMEDUALITYUSER” and look for those same DLLs. There is the caveat that if the DLLs are stored in some other location, we’re going to miss them. This is usually not the case, however. For future refactoring of the CNA script, we can opt to manually paste it in the paths of the DLLs we would like to target.
One other note, we recommend shortening the list as much as possible, because it will overflow in the Cobalt Strike UI and you won’t be able to see what you are selecting. Always test the entire workflow before an operation. We don’t recommend backdooring DLLs that you have not tested locally.
Moving on, let’s select three DLLs to backdoor.
The next sequence of events is all automated. We’re going to go through the Cobalt Strike output as well as output from the assistant VM in chunks.
On the server side, we have a Python “http.server” with file upload capabilities. We receive the files as follows. These are followed by the period checks for the “done” file.
On the server, we also have a time-based Python program that looks for new files prefixed with 20 digits. When they are found, the server will wait until the downloads are complete and then begin the compilation process.
DUALITY.cs will then do its magic. The following screenshots depict the compilation process of a DLL.
Finally, DUALITY.cs writes the three backdoored DLLs to disk on the “assistant VM”.
At this point, the “done” file is generated by the server-side Python script, and the aggressor script has the signal to download the finished files. It does so as can be seen from the Python web server logs.
Back to the Cobalt Strike console, we see that we acquired the backdoored files from the assistant VM.
We change our directory in Cobalt Strike to where we need to upload each respective file. Some irrelevant debugging lines have been blurred in the following screenshot.
Those who used Cobalt Strike a while back may wonder about the 1 MB upload limit. That was actually addressed by Cobalt Strike in the version 4.6 release. It requires modification to the malleable profile prior to starting the team server. Be sure to include these changes.
Now our files are on the target. Let’s look at the “ffmpeg” directory for Microsoft Teams. Note the size of “ffmpeg.dll” versus “ffmpeg.dll.csbak”. The larger one is packed with our DUALITY logic and Cobalt Strike shellcode.
The backups are named conspicuously and are present in the backdoored program’s current directory for this proof-of-concept. The backup is simply the clean version of the DLL. Currently, that file is used by the aggressor script to restore the original “ffmpeg.dll” when we discuss the “UNDUALITY” function. If that file (“.csbak”) is missing, it will also mean that the program was updated, which means that the DUAL is probably gone as well, since updates usually rewrite the folder. This behavior can be changed to only keep the backdoored version on disk and to use different behavior for “UNDUALITY”. In terms of signaturing, this is not yet necessary.
Let’s look at the other two just for the sake of completion.
Now, let’s have some fun. We have to run at least one of these programs for all the backups to be established in %TEMP%. So, we’ll run Python.
Python runs and spawns a beacon. Now all the backups are established. The DUALITY-spawned beacon is running inside “msedge.exe”. The original foothold was through “iexplore.exe” (which is now sunset, by the way. So, it’s not a wise choice for process injection in the future.)
Let’s then simulate a Microsoft Teams update. Let’s remove the Teams “ffmpeg.dll” and replace it with the original.
Now, we can run any of the other two programs, Slack or Python, to reinfect Teams. Teams can still be running too while it’s getting re-backdoored. So, we’ll have it running.
Then we run Slack. Teams is now re-infected.
The timestamp is the same as the original backdoored one because the DUALITY logic grabbed it from %TEMP%, which is the DUALITY logic backdoors files to (this can also be changed)
Note how each backup has a unique prefix, because some DLLs have the same name across different programs. “ffmpeg.dll” is a great target (for programs that use a 64-bit version) and a good example of this case.
The video demonstrating this can be found here: https://youtube.com/watch?v=LKXOZsltBLI
Removing this capability is mostly automatic, but may require an operator checklist for manual changes. We will discuss this checklist towards the end of this section.
The operator can do this by simply typing “UNDUALITY”. Any folders from our Sleep program array where there exists a “.csbak” file such as “fmpeg.dll.csbak” will replace the current file named “ffmpeg.dll”, which is the backdoored version. Since program updates often replace the entire contents of a directory, if this backup file is not present, it usually means that the backdoored version is gone as well. As with any automated items, an in-flight checklist should be utilized by the operator to ensure that backdoored programs are restored to original condition. The following screenshot demonstrates UNDUALITY.
In the above screenshot, we perform the following steps for each backdoored program.
The next screenshot shows what happens after we ran UNDUALITY while Python was running.
The backdoored file still had a handle to it, but we successfully renamed it and restored the original “python310.dll”. The operator can opt to forcefully remove this by force-closing Python and removing the file, by waiting until Python is naturally closed by the victim, or by waiting until the next folder update. The next screenshot shows what happens after we ran UNDUALITY while Microsoft Teams was closed.
Since no handles to the backdoored file were open, it was possible to rename and remove it. Note that in both cases, the original dependency DLL was successfully restored.
By now, if anyone has read this post and is interested in this capability, there should be a horde of detection engineers (and potentially red teamers) jumping up and down ready to throw their pitchforks at me.
There have been a series of intentionally placed weaknesses (and probably an equal number of unintentional ones) in the public version of this capability.
But in general, let’s talk about some things you can (and really should) detect. For each point, we’re going to explain how you, as an operations architect, can modify some things to get around it. In the next few paragraphs, “Blue” means detection side (SOC, AV, EDR, etc) and “Red” means offensive operations.
First, the obvious “.duality” (and maybe “.ensc”) section name in any DLL should raise a huge red flag.
Blue: This is a trivial detection and if it’s not implemented, I’m going to be wondering if anyone reads these posts. Please signature this at least to make this post worth it.
Red: Change the “.duality” strings in the SCC file to a “#define” and replace it during the compilation process next to everything else, like the KEY length.
The “.duality” section is created with executable permissions on that section. The permissions are not changed on the victim side, it’s already that way when the DLL is dropped in.
Blue: Perhaps there’s a way to separate common binaries with multiple executable sections vs. Potentially malicious binaries with multiple executable sections. For Part II of this blog post, it will become obvious that signaturing on a section that contains “ntdll.dll” as a section in a binary is going to be a high-fidelity red flag.
Red: Technically, many binaries out in the wild already do have multiple sections, so focus on behavioral execution instead (direct syscalls, etc). If you are inclined to get rid of extra sections however, that will require resizing the .text section and sticking everything in there.
The “.ensc” section is the XOR encrypted shellcode section and is read-only.
Blue: This is going to be hard, but if you can find where the decryption key is stored relative to the section, you can try decrypting it and see if it matches known C2 signatures. Since the SCC file can change easily and offsets will change, this one might be rough.
Red: You’re good for this one most probably. However, note that advanced EDR will detect when unencrypted shellcode starts to run in a new thread in a new process via kernel callbacks, even if you performed the injection using syscalls (and didn’t have to worry about stack cleanliness if operating from a random heap address such as the case for an injected implant). Using a sleep mask variant as well as a stage 0 injection to then decrypt and place your implant somewhere else in memory (not at the start of the thread you just created, for example) will be useful.
The backdoored DLL is not signed.
Blue: This is a win for blue team, as it helps EDR zero in on malware with a solid heuristic.
Red: Sign your PEs. This maybe a feature implemented in the next iteration of DUALITY with improved tradecraft, but you will have to bring your own code signing certificate. It’s not difficult to add a signing step to the C# pipeline.
If we dive a bit deeper, the pre-shellcode stub itself could be signatured as well.
Blue: You’ll have to do a bit of shellcode tracing after the entry point. For native code, many basic AV engines do that so we imagine this isn’t a big deal. Detect on the proof-of-concept’s sequence of opcodes if you can shellcode trace past the entry point.
Red: Randomize how you back up the registers and restore them in mirrored fashion. To go further, break up the register backup sequence so you can avoid blue team detecting on a long sequence of opcodes of register backups. Or use various ROP chains *jazz hands* to do the same thing from regions marked as readonly.
When we jump to the actual DUALITY section, there’s going to be a few things that EDRs love to zero in on. Dynamic function resolution doesn’t free us from the hooks in the APIs we resolve. Assuming we even get past the dynamic function resolution stage, if the EDR is detecting intelligently on malicious series of API calls, you’re going to get caught.
Blue: Depending on your defense stack, as long as you’re not using home Defender, your work here is probably done for the proof-of-concept. Anything that’s doing half a decent job at API function hooking will detect the public process injection technique (I really hope).
Red: You have a few options here. You can load a clean NTDLL by spawning a suspended process and grabbing it before AV / EDR can hook it, like Peruns Fart. You can also do syscalls, but then you need to worry about where you’re making the syscalls from, because Blue will detect on a syscall out of Narnia in call stack, as opposed to something like NTDLL, potentially thanks to instrumentation callbacks in the PEB, although this is really more of an issue if you’re operating from memory rather than disk So, pack all of a clean, compatible NTDLL and whatever else you need into the targeted DLL, as a variation on “Bring Your Own Land”. Memory space is cheap and it’s unlikely that anyone is going to notice a 3 MB vs a 5 MB DLL. We might implement this in a future release, but it’ll require some work to automate regarding which NTDLL to throw in there based on the victim machine version. We’ll spoil it for you, we implemented this for Part II of the blog post and will be discussing it (although not releasing the code).
I’m not going to touch on finding the Cobalt Strike implant in memory, because that is a whole separate story. DUALITY is not specific to Cobalt Strike. Refer to Sleep Mask, stack spoofing, timers, etc.
The DUALITY implant mutex check name is hardcoded to “tm22s”.
Blue: If there’s a “tm22s” mutex object in the kernel, well, it could be DUALITY.
Red: Change the Mutex name to something else, or preferably, generate a new one every time, just like the “#define CHECKMUTEX” line in the SCC file.
There are 20-digit prefixed DLLs backed up in %temp%, as can be seen in the following screenshot.
Blue: We are curious how unique this indicator is – it might end up generating some false positives, but maybe it could be a secondary or tertiary heuristic.
Red: Change the format naming and maybe extensions to match something like the GUIDs right above it.
There are several areas in this project that are ripe for iteration, expansion, and/or polish. The most notable ones follow. Each point has associated research that needs to along with it, as there may be hurdles for each item.
Ultimately, we are hoping that this brings more attention to DLL loading security, especially when the signature mechanism is already implemented. Perhaps program developers can find a performant way to verify signatures of loaded dependencies especially as they load and unload the multiple times during a program’s lifecycle. Perhaps Windows can better enforce programs loading DLLs with valid signatures during runtime. We suspect both suggestions are much easier said than done.
Advanced threat actors are moving towards (or have been, in some cases) exploiting Windows kernel-related misconfigurations and vulnerabilities, such as vulnerable drivers. Recently the infosec community observed a UEFI bootkit called BlackLotus. DUALITY is not operating at the kernel level and definitely not at boot time. This is purely userland shenanigans.
Unfortunately, the idea behind DUALITY could be modified to be significantly stealthier and more potent by a threat actor with more time / resources. DUALITY requires no research time to find and exploit a vulnerable driver, and in its apex form, the only trace on the system could be the backdoored DLLs that use undetectable process injection (or perhaps reflective loading!) techniques and backup backdoored DLLs stored in alternate locations. That, in combination with undetected timestomping and valid signing certificates, could make for a dangerous combination.
The code can be found here: https://github.com/AonCyberLabs/DUALITY
The code for LowDLL can be found here: https://github.com/AonCyberLabs/LowDLL
The unofficial usage / proof-of-concept video can be found here: https://www.youtube.com/watch?v=LKXOZsltBLI
The public talk of DUALITY at BSides KC can be found here: https://www.youtube.com/watch?v=DM23uv4P0EI
We left this till the end because it may have confused the reader if brought up earlier in the post.
“Faisal, why didn’t you just include the other DLL and its DUALITY logic in each other DLL as added sections instead of backing up to some location on the same machine?”
On the surface, this is an interesting idea – get rid of the backups in %TEMP% and just include them in the DLLs themselves. Unfortunately, the Axiom of Regularity from Set Theory will set our record straight here. Let’s take a look at this diagram.
In this diagram, we would include all of “python310.dll” as a backup dual inside of “ffmpeg.dll” in the section “.dual1”. “python310.dll” would contain a backup of all “ffmpeg.dll” as well in its own section, “.dual1”.
In this thought experiment, if “ffmpeg.dll” gets deleted, “python310.dll” would simply read its own “.dual1” section and write a new file to disk called “ffmpeg.dll” with the section contents. Similarly, if “python310.dll” got deleted, “ffmpeg.dll” would read the contents from its own “.dual1” section and write “python310.dll” to disk. This effectively establishes a DUALITY between the two DLLs.
If we think a little bit harder however, this will mean that “python310.dll” would now indirectly include a backup of itself as well, since it is a part of “ffmpeg.dll”. This backup of itself also contains another backup of “ffmpeg.dll”… There would be an infinitely recursive nature to these two containers.
While we are not operating with sets necessarily, and we may be reaching for some formal justification, it can be handy to consider set theory in our thought experiment, where we could think of each file as a set. From this Wikipedia article on the Axiom of Regularity: “The axiom of regularity together with the axiom of pairing implies that no set is an element of itself, and that there is no infinite sequence (an) such that ai+1 is an element of ai for all “i”. With the axiom of dependent choice (which is a weakened form of the axiom of choice), this result can be reversed: if there are no such infinite sequences, then the axiom of regularity is true. Hence, in this context the axiom of regularity is equivalent to the sentence that there are no downward infinite membership chains.”
And so, one implication of regularity is that no set is an element of itself – “Let A be a set and apply the axiom of regularity to {A}, which is a set by the axiom of pairing. We see that there must be an element of {A} which is disjoint from {A}. Since the only element of {A} is A, it must be that A is disjoint from {A}. We cannot have A ∈ A (by the definition of disjoint).”
In practice, regarding our diagram, you could try to recursively include backups in each program, but then you would have a finite number of “restorations” from a backup, as each time you work your way up the backup chain, you lose one recursively placed element. Eventually, the backups would run out of each other’s containers, and then DUALITY would break. This is also without mentioning the gargantuan, exponentially/factorially increasing size of each DLL as you go deeper into recursion.
Ultimately, there has to be a reference to some objects outside of the DUALITY. In this implementation’s case, these are the backups placed in the user’s %TEMP% folder.
Capability Overview
Cyber Resilience
Product / Service
Penetration Testing Services
General Disclaimer
This material has been prepared for informational purposes only and should not be relied on for any other purpose. You should consult with your own professional advisors or IT specialists before implementing any recommendation, following any of the steps or guidance provided herein. Although we endeavor to provide accurate and timely information and use sources that we consider reliable, there can be no guarantee that such information is accurate as of the date it is received or that it will continue to be accurate in the future.
About Cyber Solutions
Cyber security services are offered by Stroz Friedberg Inc., its subsidiaries and affiliates. Stroz Friedberg is part of Aon’s Cyber Solutions which offers holistic cyber risk management, unsurpassed investigative skills, and proprietary technologies to help clients uncover and quantify cyber risks, protect critical assets, and recover from cyber incidents.
Terms of Use
The contents herein may not be reproduced, reused, reprinted or redistributed without the expressed written consent of Aon, unless otherwise authorized by Aon. To use information contained herein, please write to our team.
Our Better Being podcast series, hosted by Aon Chief Wellbeing Officer Rachel Fellowes, explores wellbeing strategies and resilience. This season we cover human sustainability, kindness in the workplace, how to measure wellbeing, managing grief and more.