0xkylm
Whoami
Student @ 2600
VR @ FuzzingLabs
Hypervisor, compilation and maldev enthusiast
Introduction
LLVM Deep Dive
IR level pass
Evasion
Machine level pass
Sometimes some yara check for exact bytes or strings, if the binary change at each compile we can bypass those dumb checks
And also increases the effort required for reverse engineering
LLVM Deep Dive
https://zhuanlan.zhihu.com/p/618817970
Simpliest pass
#include "llvm/Transforms/MyObfsPass/Obf.h"
using namespace llvm;
PreservedAnalyses ObfsPass::run(Function &F, FunctionAnalysisManager &AM) {
//print the function name
outs() << "Processing function: " << F.getName() << "\n";
if (F.getName() != "main") {
outs() << "Skipping function: " << F.getName() << "\n";
return PreservedAnalyses::all();
}
IRBuilder<> Builder(F.getContext());
bool Changed = false;
//for the current function, find parse every BasicBlocks
for (BasicBlock &BB : F) {
//For each BasicBlock, parse every instruction
for (Instruction &I : BB) {
//Get operands for all instructions
for (unsigned i = 0; i < I.getNumOperands(); i++) {
Value *Op = I.getOperand(i);
//if it's a ConstantInt we change it to 42
if (ConstantInt *CI = dyn_cast<ConstantInt>(Op)) {
errs() << "Found constant: " << CI->getValue() << " in instruction: " << I << "\n";
I.setOperand(i, ConstantInt::get(CI->getType(), 42));
Changed = true;
}
}
}
}
return Changed ? PreservedAnalyses::none() : PreservedAnalyses::all();
}
[PassPluginLibraryInfo.... ]Simpliest pass
#add.c
#include <stdio.h>
int main(){
int a = 10;
int b = 12;
printf("a + b = %d",a+b);
return 1;
}
----------------------
Processing function: main
Found constant: 1 in instruction: %1 = alloca i32, align 4
Found constant: 1 in instruction: %2 = alloca i32, align 4
Found constant: 1 in instruction: %3 = alloca i32, align 4
Found constant: 0 in instruction: store i32 0, ptr %1, align 4
Found constant: 10 in instruction: store i32 10, ptr %2, align 4, !dbg !33
Found constant: 12 in instruction: store i32 12, ptr %3, align 4, !dbg !35
Found constant: 1 in instruction: ret i32 1, !dbg !37
Processing function: _vsprintf_l
Back2Hack : Encrypt Strings
Back2Hack : Encrypt Strings
Objectives :
Back2Hack : Encrypt Strings
Well cheat a little bit
std::vector<StringUsage> collectStringUsages(Function &F) {
std::vector<StringUsage> Usages;
for (BasicBlock &BB : F) {
for (Instruction &I : BB) {
for (unsigned i = 0; i < I.getNumOperands(); ++i) {
Value *Op = I.getOperand(i);
auto *GV = dyn_cast<GlobalVariable>(Op);
if (!GV)
continue;
if (!GV->isConstant() || !GV->hasInitializer())
continue;
auto *CA = dyn_cast<ConstantDataArray>(GV->getInitializer());
if (!CA)
continue;
if (!CA->isString())
continue;
Usages.push_back({&I, GV, i});
[...]Back2Hack : Encrypt Strings
Creating stub
BasicBlock *LoopCond = BasicBlock::Create(Ctx, "loop.cond", DeobfFunc);
BasicBlock *LoopBody = BasicBlock::Create(Ctx, "loop.body", DeobfFunc);
BasicBlock *LoopEnd = BasicBlock::Create(Ctx, "loop.end", DeobfFunc);
B.CreateBr(LoopCond);
// cond
B.SetInsertPoint(LoopCond);
PHINode *PtrPhi = B.CreatePHI(Type::getInt8Ty(
Ctx)->getPointerTo(), 2, "ptr");
PtrPhi->addIncoming(StrPtrArg, EntryBB);
Value *Cur = B.CreateLoad(Type::getInt8Ty(Ctx), PtrPhi, "cur");
Value *IsNotNull = B.CreateICmpNE(Cur, B.getInt8(0));
B.CreateCondBr(IsNotNull, LoopBody, LoopEnd);Back2Hack : Encrypt Strings
Enjoy
Decipher string, use it and cipher again
Modularity is the key
LLVM uses LLVM IR modules, each compilation unit creates an IR module. Based on this, we can link a module even if it's not part of the project, allowing us to add bytecode via LLVM passes
We got all the symbols no strip yet, and already play with ir instruction creation with the strings encryption
Loading bitcode
bool ObfsPass::loadByteCode(llvm::Module &M, const unsigned char Bc[], unsigned int BcLen) {
llvm::LLVMContext &Context = M.getContext();
auto MemBuffer = llvm::MemoryBuffer::getMemBuffer(
llvm::StringRef(reinterpret_cast<const char*>(Bc), BcLen),
"embedded_bitcode",
false
);
auto Module = llvm::parseBitcodeFile(MemBuffer->getMemBufferRef(), Context);
if (!Module) {
llvm::errs() << "Error parsing embedded bitcode :( \n";
return false;
}
std::unique_ptr<llvm::Module> ExternalMod = std::move(*Module);
ExternalMod->setModuleIdentifier("embedded_module");
llvm::Linker L(M);
if (L.linkInModule(std::move(ExternalMod))) {
llvm::errs() << "[-] Failed to link :(\n";
return false;
}
return true;
}Replace funcs
if (!loadByteCode(M, sub_bc, sub_bc_len)) {
return Changed ? PreservedAnalyses::none() : PreservedAnalyses::all();
}
// Now let's find it (because we linked it into current module)
Function *SubFunction = M.getFunction("sub");
Function *AddFunction = M.getFunction("add");
if(!SubFunction && AddFunction ){
return Changed ? PreservedAnalyses::none() : PreservedAnalyses::all();
}
//replace all calls to "sub" with a call to "add"
//get all call of AddFunction, and for each one change to SubFunction
for (auto &U : AddFunction->uses()) {
if (CallInst *CI = dyn_cast<CallInst>(U.getUser())) {
CI->setCalledFunction(SubFunction);
}
}CallstackSpoof
Optimizing machine code
During the compilation process, while code generation there are also some optimizations, those optimizations are machine-level passes. Let's play with them
Breaking Ghidra
Rex, are prefix use in x86 to tell the cpu the next opcode is a x64, but what if we use 2,3 or 10 Rex ?
Not all instructions require a REX prefix. The prefix is necessary only if an instruction references one of the extended registers or uses a 64-bit operand. If a REX prefix is used when it has no meaning, it is
ignored.
Breaking Ghidra
Buuuuuuut Ghidra and other compiler don't work like a cpu and sometimes bugs can occured
Breaking classical code path
VT test
Okay this is not insane cuz it's not a very famous sample but
(only machine lvl pass)