Assemble, Verify and Execute a Program
Assembly program creation
The process of creating an Assembly program goes through the following steps:
- Writing one or more ASCII files (extension .s) containing the source program, using an ordinary text editor.
- Assembly of the source files, and generation of the object files (extension .o), using an assembler.
- Creation, of the executable file, via a linker.
- Verification of operation and correction of any errors, via a debugger.
Assembler
The Assembler transforms files containing the source program into as many object files containing machine language code. GNU’s Gas Assembler will be used throughout the course.
To assemble a file, you must run the following command:
as -o myfile.o myfile.s
Consult the documentation (man as) for a list of available options.
Linker
The linker combines object modules and produces a single executable file. Specifically: it combines the object modules, resolving references to external symbols; it searches for library files containing the external procedures used by the various modules; and it produces a relocatable, executable module. Since linking creates the binary module to be loaded, the operation must be performed even if the program consists of only one object module.
In particular, the ld
linker will be used during this course.
To create the executable from an object file, the following command must be executed:
ld -o myfile myfile.o
To learn more about the available options of ld, you can consult the manual.
Debugger
A debugger is a software tool that allows you to verify the execution of other programs. Its use is indispensable for finding errors (bugs, hence the name debugger) in programs of high complexity. The main features of a debugger are:
- ability to execute the program “step by step”;
- possibility of conditionally stopping program execution by inserting breakpoints (i.e., points at which the execution flow is stopped and control passes to the debugger);
- ability to view and possibly modify the contents of registers and memory.
The most widely used debugger in the Linux environment is gdb. Gdb runs in text mode, so commands are issued through the shell. However, a number of graphical front-ends have been developed to simplify its use, the most popular of which turns out to be ddd
.
In order to use a debugger, programs must be assembled and linked appropriately using the following command lines:
as –gstabs -o miofile.o miofile.s
ld -o miofile miofile.o
The -gstab
option allows the information needed by the debugger to be placed in the object file, and thus in the executable. To start the gdb run the gdb command. To start ddd run the ddd command.
The following is a table summarizing the most common gdb commands:
Command | Command Description |
---|---|
file name_executable | Load debugging program. |
break line_number | Sets a breakpoint at the specified line. |
run | Executes the program. Program execution pauses when the first breakpoint is reached. In case there are no breakpoints, execution takes place normally, without interruption. |
step | Executes the current instruction when program execution is suspended due to a breakpoint being reached. Reiterate the step command to continue executing one instruction at a time. |
next | Similar to the step command executes the current instruction, but in the case of a function call, it is executed atomically without displaying the instructions that comprise it. |
continue | Continues program execution until the next breakpoint. |
finish | Continues program execution to the end. |
info registers | Displays the contents of the registers. |
p/format $register | Prints the contents of the register register in the format indicated by the format option. Possible options are: x for hexadecimal, o for octal, d for decimal, t for binary. For example, to print the contents of the eax register in binary you must run the command p/t $eax . |
x/nw address | Displays the contents of n memory words from the location whose address is given. For example, if a memory area is labeled location , the command: x/4w &location displays the contents of 4 memory words starting with the address associated with the label. |
help | View instructions for using the online help. |
The above commands can also be executed using ddd
, a GUI for gdb. In that case, their execution is done by clicking on the corresponding buttons in the toolbars or menu items.
Instruction format
Instruction label: operation operand1, operand2
The label can be optional. The number of operands depends on the type of operation.
AT&T syntax
There are two main types of syntax for assembly language: the Intel syntax and the AT&T syntax. The gas compiler uses the latter. Comparing AT&T syntax with Intel syntax, the following differences can be seen:
- In AT&T register names have % as a prefix, so that registers are
%eax, %ebx
and so on instead of justeax
,ebx
, etc. This makes it possible to include external C symbols directly in the assembly source without any risk of confusion and without any need for prefixed underscores. - In AT&T the order of operands is the opposite of that in Intel syntax, i.e.: source, destination. Thus, what in intel syntax is
mov eax,edx
(loads the contents of the EDX register into the EAX register), in AT&T becomesmov %edx, %eax
. - In AT&T the length of the operand is specified by a suffix to the name
of the instruction. The suffix is b for byte (8 bits), w for word (word, 16 bits) and l for double word (double word, 32 bits). For example, the correct syntax for the instruction mentioned just now is
movl %dx,%ax
. However, since gas does not require strict AT&T syntax, the suffix is optional when the length of the operand can be derived from the registers used in the operation. Otherwise, it is set to 32 bits (with a warning). - Immediate operands are denoted by the prefix $. For example
addl $5,%eax
(sums the long value 5 to the EAX register). - The absence of a prefix in an operand indicates that it is a memory address. Therefore, the instruction
movl $var_tmp,%eax
puts the address of the variable var_tmp in the EAX register, whilemovl var_tmp,%eax
puts the contents of the variable var_tmp in the %eax register. - Indexing or indirection is achieved by enclosing the index register, or the address of the indirection memory cell, in parentheses. For example, the instruction
testb $0x80,17(%ebp)
performs a test on the highest bit of the byte value at offset 17 from the cell pointed to by the value contained in EBP.
Statement of Static Data Regions
Static data regions (analogous to global variables) can be declared using special assembler directives for this purpose. Data declarations should be preceded by the .data
directive. Following this directive, the directives .byte
, .short
and .long
can be used to declare one-, two- and four-byte data locations, respectively.
To refer to the address of the data created, we can label it. Labels are very useful and versatile in assembly; they give names to memory locations that will be identified later by the assembler or linker. This is similar to declaring variables by name, but it obeys some lower-level rules. For example, locations declared sequentially will be found in memory next to each other.
Unlike high-level languages where arrays can have many sizes and are accessed by indexes, arrays in assembly language are simply a number of cells placed contiguously in memory. An array can be declared simply by listing values, as in the first example below. For the special case of a byte array, literal strings can be used. In the case where a large area of memory is full of zeros , you can use the directive .zero .