Comparing Z80 Dissassemblers: Features, Output Formats, and Plugins

Z80 Dissassembler: A Beginner’s Guide to Reverse-Engineering Z80 CodeThe Zilog Z80 is a classic 8-bit microprocessor that powered home computers, calculators, arcade machines, and embedded systems from the late 1970s through the 1990s — and it still appears in retro projects and legacy hardware today. Reverse-engineering Z80 binaries requires an understanding of its instruction set, addressing modes, and common binary formats, plus practical techniques for disassembly, analysis, and rebuilding readable assembly. This guide covers the fundamentals you need to disassemble Z80 code effectively: instruction basics, common pitfalls, file formats, tools, workflows, and simple hands-on examples.


Who this guide is for

  • Hobbyists and retrocomputing fans who want to inspect or modify classic Z80 programs.
  • Embedded engineers examining firmware for legacy devices.
  • Beginners in reverse engineering learning CPU-specific disassembly techniques.
  • Developers building or extending Z80 disassemblers or analysis tooling.

Quick Z80 overview

  • Architecture: 8-bit CPU with 16-bit address bus (64 KB addressable memory).
  • Registers: A (accumulator), F (flags), B, C, D, E, H, L (general-purpose 8-bit registers), plus register pairs BC, DE, HL which form 16-bit registers. Also alternate register set A’, F’, B’, C’, D’, E’, H’, L’.
  • Index registers: IX and IY (16-bit) for displacement-based addressing.
  • Stack pointer and program counter: SP (16-bit), PC (16-bit).
  • Interrupt and control: I (interrupt vector), R (memory refresh), and interrupt modes IM0–IM2.
  • Instruction set: Rich set including loads, arithmetic, bit operations, block moves, input/output, and several instruction prefixes (CB, ED, DD, FD) that extend functionality.
  • Undocumented and machine-specific quirks: Some implementations and assemblers use nonstandard opcodes or rely on timing/behavior quirks.

Disassembly fundamentals

Disassembly converts machine code bytes back into human-readable assembly instructions. Two major approaches:

  • Linear (raw) disassembly: decode sequentially from a start address until end; simple but can misinterpret data as code or miss alternate control-flow targets.
  • Recursive (flow-based) disassembly: follow control-flow (jumps, calls, returns) to find reachable code; better at avoiding mislabeling, but requires handling indirect jumps/calls and data embedded in code.

Key challenges with Z80:

  • Variable instruction lengths (1–4 bytes commonly, up to 4 for some prefixed forms).
  • Instruction prefixes (CB, ED, DD, FD) change decoding rules and may combine (e.g., DD CB). These create different opcodes and operand sizes.
  • Data mixed with code (tables, strings) — data must be distinguished manually or via heuristics.
  • Indirect jumps/calls and computed addresses (e.g., LD A,(IX+d)) complicate flow analysis.
  • Bank switching or memory-mapped I/O in real hardware can change how addresses should be interpreted.

Common file formats and images

Before disassembling, identify how the binary was stored:

  • Raw binary (.bin): pure sequence of bytes, needs base load address to map to meaningful addresses.
  • ROM images (.rom, .bin): often map to 0x0000 or other hardware-specific addresses; may include header info.
  • Snapshot formats (e.g., .sna, .z80 for ZX Spectrum): include CPU state (PC, registers) and memory layout; useful when starting at the exact runtime PC.
  • File-system or cartridge formats (depends on platform): might include metadata, relocation tables, or compressed data.
  • Object files or relocatable modules (rare for vintage Z80 but possible in cross-compiled systems): need format-specific parsing to resolve symbols and relocations.

To choose a base address for raw binaries: consult platform docs, check for common vector tables, or use known strings/signatures to align addresses.


Tools of the trade

Disassemblers and analysis tools simplify the work; choose based on the platform and your goals.

  • General-purpose disassemblers:
    • IDA Pro / Hex-Rays: Z80 support via plugins; strong interactive analysis and graphing.
    • Ghidra: built-in Z80 support or community processors; free, extensible.
    • Radare2 / Cutter: open-source, supports Z80 via community modules; scriptable.
    • Capstone: disassembly engine useful inside custom tools (binding available for many languages).
  • Z80-specific tools:
    • z80dasm: lightweight Z80 disassembler for raw binaries.
    • sjasmplus / pasmo / zasm: assemblers that can also be used to test generated assembly.
    • nkdasm, disZ80 and other retro tools (varies by platform).
  • Emulators with debugging:
    • Fuse (ZX Spectrum), MAME, or specialized emulators that expose memory, breakpoints, and instruction tracing.
  • Hex editors and binary analysis:
    • HxD, wxHexEditor, Bless for manually inspecting bytes.
  • Scripting languages:
    • Python (with Capstone or custom decoders) is commonly used to build automation and heuristics.

Building a basic Z80 disassembler (conceptual steps)

  1. Loader: read the binary image and map bytes to an address space (base address for raw bin).
  2. Decoder: implement opcode tables for single-byte opcodes and extended tables for CB, ED, DD/FD, and double-prefixed instructions (e.g., DD CB dd op).
  3. Symbol & label generation: assign labels for branch targets, calls, and entry points; convert addresses to labels in output.
  4. Control-flow analysis:
    • Start with known entry points (reset vector, interrupt vector, snapshot PC, or user-specified).
    • Recursively follow conditional and unconditional jumps, calls, and returns.
    • Mark fall-through addresses as code when appropriate.
  5. Data detection: use heuristics to detect tables (e.g., series of valid addresses), ASCII strings, and embedded constants. Allow manual overrides.
  6. Output formatting: produce readable assembly with labels, comments for discovered data, and alignment/pseudo-ops (DB, DW, DS).
  7. Interactive and iterative refinement: allow the analyst to mark regions as code or data, rename labels, and re-run analysis.

Important opcode groups & prefixes

  • No-prefix opcodes: the core instructions (LD, ADD, SUB, JP, JR, CALL, RET, INC, DEC, etc.). Typically 1–3 bytes.
  • CB prefix: bit manipulation and rotate/shift operations (RLC, RRC, RL, RR, SLA, SRA, SRL, BIT, SET, RES). The CB prefix always applies to the following opcode and sometimes to an (HL) operand.
  • ED prefix: extended operations (16-bit arithmetic, block I/O, diagnostic instructions). Some ED opcodes are undocumented on certain Z80 variants.
  • DD / FD prefixes: select IX or IY register usage in place of HL. These prefixes modify the following opcodes and can be combined with CB. Example: DD CB d op — a 4-byte sequence where CB indicates bit operations on (IX+d).
  • Double prefixes: DD FD used together are generally treated as NOP-like and can be ignored or handled specially.

Handling prefixes correctly is essential: a naive decoder that treats CB/ED/DD/FD as independent may misdecode following bytes.


Heuristics for separating code and data

  • Strings: long runs of printable ASCII likely represent text data; output as DB with string pseudo-ops.
  • Jump targets: addresses referenced by JP/JR/CALL are likely code.
  • Valid instruction density: sequences where most decoded bytes form valid instructions are probably code.
  • Alignment and structure: interrupt vectors, tables of addresses, and known patterns (e.g., CRT init sequences) hint at data layout.
  • Execution traces: run the program in an emulator with logging to see actual executed addresses; this resolves dynamic code issues.
  • Manual inspection: final sanity check — human analysts often spot patterns automated tools miss.

Example: Disassemble a short byte sequence

Assume base address 0x0100 and bytes:

3E 05 06 00 21 00 80 CD 10 01 C3 00 10 

Stepwise decode:

  • 0x0100: 3E 05 -> LD A,0x05
  • 0x0102: 06 00 -> LD B,0x00
  • 0x0104: 21 00 80 -> LD HL,0x8000
  • 0x0107: CD 10 01 -> CALL 0x0110
  • 0x010A: C3 00 10 -> JP 0x1000

Labeling and comments make the output clearer:

  • start: LD A,5 LD B,0 LD HL,0x8000 CALL sub_0110 JP 0x1000

This example shows variable lengths and how calls/jumps provide labels for further recursive decoding.


Dealing with tricky cases

  • Self-modifying code: common in some demos and copy-protection schemes. Use emulation and watch memory writes that modify code pages; static disassembly will be incomplete.
  • Indirect jumps/calls: e.g., JP (HL) or CALL (IX+offset) — these require runtime info or conservative assumptions (treat as potential branch to many targets).
  • Compressed/packed code: decompression stubs precede payloads; identify and emulate the decompress routine to reconstruct real code.
  • Bank-switched memory: map bank numbers to address ranges according to platform specifics; you may need hardware docs or snapshots to know bank state.
  • Undocumented opcodes: some Z80 variants have quirks — use authoritative opcode tables per CPU variant (Z80, Z180, etc.).

Practical workflow — step-by-step

  1. Gather: obtain ROM/binary and any platform docs (memory map, interrupt vectors, common entry points).
  2. Choose base address: for raw bin use platform knowledge or identify vectors/strings to align.
  3. Run an initial disassembly with a tool (Ghidra, IDA, z80dasm) producing an annotated listing.
  4. Run the binary in an emulator with breakpoints/logging to observe actual execution and confirm code paths.
  5. Mark data regions and correct misinterpreted code; refine labels and function boundaries.
  6. Identify subroutines, annotate calls, and collect higher-level constructs (loops, tables).
  7. Reassemble to validate changes if you plan to patch the binary. Use an assembler that targets the same conventions.
  8. Document findings: register conventions, I/O ports used, memory maps, and known hardware interactions.

Example tools and commands

  • z80dasm (basic): z80dasm -a 0x0100 -o output.asm input.bin

  • Ghidra:

    • Create new project, import binary as “Raw Binary”, set load address, choose Z80 processor, run auto-analysis.
  • Using an emulator (Fuse for Spectrum):

    • Load snapshot/ROM, set breakpoints at suspected routine entry points, single-step and log PC.

(Commands vary by tool version; consult tool help for exact options.)


Tips and best practices

  • Keep a change log of manual annotations and decisions — these save time when revisiting a complex ROM.
  • Work iteratively: run, inspect, annotate, rerun. Disassembly accuracy improves with each pass.
  • Use multiple tools where helpful: one tool’s heuristics may outrank another’s; combining results often yields the most accurate picture.
  • Learn common library/code idioms for the platform (e.g., ZX Spectrum BASIC ROM routines, CP/M BDOS calls) — these speed identification of purpose and boundaries.
  • Be conservative with assumptions: when in doubt, mark an ambiguous region and return after more evidence.
  • Preserve originals and work on copies when patching or modifying binaries.

Further learning resources

  • Z80 CPU user manual and official opcode tables (Zilog documentation).
  • Platform-specific technical references (e.g., ZX Spectrum Technical Guide, MSX documentation, Game Boy CPU notes for similar 8-bit CPUs).
  • Open-source disassemblers’ source code to learn how they implement prefix handling and heuristics.
  • Community forums and retrocomputing groups for platform-specific tips and undocumented quirks.

Conclusion

Disassembling Z80 code is a manageable and rewarding task once you understand the CPU’s instruction encoding, prefixes, and common platform conventions. Start with good tools, use recursive flow analysis, separate data from code with heuristics and emulator traces, and iterate. Over time you’ll build a mental library of common routines and patterns that make subsequent reverse-engineering faster and more accurate.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *