Assemble, Verify and Execute a Program
Assembly program creation
The process of creating an Assembly program goes through the following steps:
- Writing one or more ASCII files (extension .s) containing the source program, using an ordinary text editor.
- Assembly of the source files, and generation of the object files (extension .o), using an assembler.
- Creation, of the executable file, via a linker.
- Verification of operation and correction of any errors, via a debugger.
The Assembler transforms files containing the source program into as many object files containing machine language code. GNU’s Gas Assembler will be used throughout the course.
To assemble a file, you must run the following command:
as -o myfile.o myfile.s
Consult the documentation (man as) for a list of available options.
The linker combines object modules and produces a single executable file. Specifically: it combines the object modules, resolving references to external symbols; it searches for library files containing the external procedures used by the various modules; and it produces a relocatable, executable module. Since linking creates the binary module to be loaded, the operation must be performed even if the program consists of only one object module.
In particular, the
ld linker will be used during this course.
To create the executable from an object file, the following command must be executed:
ld -o myfile myfile.o
To learn more about the available options of ld, you can consult the manual.
A debugger is a software tool that allows you to verify the execution of other programs. Its use is indispensable for finding errors (bugs, hence the name debugger) in programs of high complexity. The main features of a debugger are:
- ability to execute the program “step by step”;
- possibility of conditionally stopping program execution by inserting breakpoints (i.e., points at which the execution flow is stopped and control passes to the debugger);
- ability to view and possibly modify the contents of registers and memory.
The most widely used debugger in the Linux environment is gdb. Gdb runs in text mode, so commands are issued through the shell. However, a number of graphical front-ends have been developed to simplify its use, the most popular of which turns out to be
In order to use a debugger, programs must be assembled and linked appropriately using the following command lines:
as –gstabs -o miofile.o miofile.s ld -o miofile miofile.o
-gstab option allows the information needed by the debugger to be placed in the object file, and thus in the executable. To start the gdb run the gdb command. To start ddd run the ddd command.
The following is a table summarizing the most common gdb commands:
|Load debugging program.|
|Sets a breakpoint at the specified line.|
|Executes the program. Program execution pauses when the first breakpoint is reached. In case there are no breakpoints, execution takes place normally, without interruption.|
|Executes the current instruction when program execution is suspended due to a breakpoint being reached. Reiterate the step command to continue executing one instruction at a time.|
|Similar to the step command executes the current instruction, but in the case of a function call, it is executed atomically without displaying the instructions that comprise it.|
|Continues program execution until the next breakpoint.|
|Continues program execution to the end.|
|Displays the contents of the registers.|
|Prints the contents of the |
|Displays the contents of n memory words from the location whose address is given. For example, if a memory area is labeled |
|View instructions for using the online help.|
The above commands can also be executed using
ddd, a GUI for gdb. In that case, their execution is done by clicking on the corresponding buttons in the toolbars or menu items.
Instruction label: operation operand1, operand2
The label can be optional. The number of operands depends on the type of operation.
There are two main types of syntax for assembly language: the Intel syntax and the AT&T syntax. The gas compiler uses the latter. Comparing AT&T syntax with Intel syntax, the following differences can be seen:
- In AT&T register names have % as a prefix, so that registers are
%eax, %ebxand so on instead of just
ebx, etc. This makes it possible to include external C symbols directly in the assembly source without any risk of confusion and without any need for prefixed underscores.
- In AT&T the order of operands is the opposite of that in Intel syntax, i.e.: source, destination. Thus, what in intel syntax is
mov eax,edx(loads the contents of the EDX register into the EAX register), in AT&T becomes
mov %edx, %eax.
- In AT&T the length of the operand is specified by a suffix to the name
of the instruction. The suffix is b for byte (8 bits), w for word (word, 16 bits) and l for double word (double word, 32 bits). For example, the correct syntax for the instruction mentioned just now is
movl %dx,%ax. However, since gas does not require strict AT&T syntax, the suffix is optional when the length of the operand can be derived from the registers used in the operation. Otherwise, it is set to 32 bits (with a warning).
- Immediate operands are denoted by the prefix $. For example
addl $5,%eax(sums the long value 5 to the EAX register).
- The absence of a prefix in an operand indicates that it is a memory address. Therefore, the instruction
movl $var_tmp,%eaxputs the address of the variable var_tmp in the EAX register, while
movl var_tmp,%eaxputs the contents of the variable var_tmp in the %eax register.
- Indexing or indirection is achieved by enclosing the index register, or the address of the indirection memory cell, in parentheses. For example, the instruction
testb $0x80,17(%ebp)performs a test on the highest bit of the byte value at offset 17 from the cell pointed to by the value contained in EBP.
Statement of Static Data Regions
Static data regions (analogous to global variables) can be declared using special assembler directives for this purpose. Data declarations should be preceded by the
.data directive. Following this directive, the directives
.long can be used to declare one-, two- and four-byte data locations, respectively.
To refer to the address of the data created, we can label it. Labels are very useful and versatile in assembly; they give names to memory locations that will be identified later by the assembler or linker. This is similar to declaring variables by name, but it obeys some lower-level rules. For example, locations declared sequentially will be found in memory next to each other.
Unlike high-level languages where arrays can have many sizes and are accessed by indexes, arrays in assembly language are simply a number of cells placed contiguously in memory. An array can be declared simply by listing values, as in the first example below. For the special case of a byte array, literal strings can be used. In the case where a large area of memory is full of zeros , you can use the directive .zero .