x86 ASM: Intel to AT&T

Computer Science 240 (assembly programming) at Cal State Fullerton uses MASM, but programming on Linux requires an open-source assembler, usually GAS (GNU Assembler). This is a quick guide for those who have taken CS 240 (or are otherwise familiar with MASM) to start coding with GAS.

It is true that assembly syntax depends on the architecture. But even within the same architecture, namely x86, there are different syntaxes.

The two major syntaxes are known as Intel syntax and AT&T syntax. MASM (Microsoft Macro Assembler) on Windows uses Intel syntax, while GAS (GNU Assembler) on Linux systems uses AT&T syntax (though Intel syntax is supported). There is another open-source assembler on Linux, NASM (Netwide Assembler), which uses Intel syntax, but NASM syntax differs from MASM syntax in some ways.

Note: This was originally written for Computer Science 254, on the assumption that actual assembly programming on an open-source system was required. However, this is not the case: An assembly file was provided (and it was written for NASM, not GAS). I’ll keep this up in the spirit of introducing people to open-source systems.

Let’s Get to It

To start, read this guide. It covers most of the main differences. The rest of this page will explain additional differences not covered by that guide.

Immediate Values

In Intel syntax, hexadecimal immediate values have an h at the end: 1234h. In AT&T syntax, they have 0x at the beginning: 0x1234. You do not need to add an extra 0 at the beginning if it starts with a letter: 0ABCDh in Intel syntax would simply be 0xABCD in AT&T syntax (although 0x0ABCD would work just fine).

In Intel syntax, binary immediate values have a b at the end: 10101111b. In AT&T syntax, they have 0b at the beginning: 0b10101111.

Comments

In Intel syntax, comments are marked with ; (a semicolon). In AT&T syntax, comments are marked with # (a hash mark).

Section Directives

The .data directive is the same in MASM and GAS. However, the .code directive in MASM is .text in GAS. Don’t ask me why that is.

Procedures

Procedures in MASM start with [name] PROC and end with [name] ENDP. But in GAS, they are simply labels and have the same syntax as ordinary in-procedure labels. However, you need to add two lines so that they are visible to other modules:

.globl [name]
.type [name], @function

(Yes, that’s “globl” with no A. You can add an A, actually; GAS accepts either.)

These lines can be just about anywhere in the file, but it’d be good to put them right before the label.

Variables

The syntax for variables is very different. In GAS, they are also just labels. What follows is a “data directive.”

  • A variable definition in MASM looks like this:

      myval DWORD 0
    

    In GAS, this would be:

      myval: .int 0
    

    Here, .int is a data directive. The main data directives in GAS are .byte (BYTE in MASM), .word (WORD), .int (DWORD), and .quad (QWORD). There are no separate signed and unsigned directives.

  • GAS does not support uninitialized variables using MASM’s ?. In that case, you should just initialize them to 0.

  • Just as in MASM, you can have multiple values to make an array:

      myarray: .int 1, 2, 3
    
  • The equivalent of DUP in MASM is another data directive, .fill [count], [size], [value]. So, to have 10 doublewords of value 1, you would write:

      ones: .fill 10, 4, 1
    

    You can leave off [value], in which case it will be set to zero.

  • Single and double quotes do matter to GAS. Single quotes indicate characters and double quotes indicate strings. But you should probably be doing that anyway.

  • Strings require the .ascii or .asciz data directives. The .ascii directive makes a byte array without a null terminator, while .asciz makes a byte array with a null terminator. You can use the same escapes (backslashes) as you would in C and C++. For example:

      str1: .ascii "test" #Four bytes
      str2: .asciz "test" #Five bytes
    
  • Because variable names are just labels, there is no need for MASM’s LABEL. Simply place two labels before the same data directive to achieve the desired effect.

  • Variable definitions don’t have actual type information, so you can access data however you want:

      .data
      mydata: .byte 0x78, 0x56, 0x34, 0x12
    
      .text
      .globl main
      .type main, @function
      main:
      	mov mydata, %al #Move a byte
      	mov mydata, %bx #Move a word
      	mov mydata, %ecx #Move a doubleword
    
  • The fact that variables are just labels means you don’t have TYPE, LENGTHOF, and SIZEOF (as far as I can tell), and you may accidentally access memory with the wrong size without any warning.

Variable Addresses

MASM uses OFFSET to get the address of a variable, while GAS uses $ (a dollar sign) to get the address of a variable. Because the dollar sign is also used for immediate values (constants), you can think of the dollar sign as saying, “Treat this variable name as a constant, that is, an address.”

Suffixes: Special Cases

As mentioned in the guide I linked, AT&T syntax uses instruction suffixes to resolve ambiguities in operand sizes (MASM uses PTR). A case that was not mentioned is MOVSX and MOVZX.

In AT&T syntax, these would be movs and movz, followed by two suffixes. The first suffix is the size of the source operand, and the second is the size of the destination operand. So, for example, movswl is “move with sign extension from word to long (doubleword).”

Be careful with memory operands! Remember that variables don’t have actual data types, so you always need to specify the size when working with memory operands. Also, GAS will not warn you if you forget this and use movsx or movzx; it will assume the memory operand is a byte (at least on my computer with version 2.26).

For 64-bit programming, the suffix for quadwords is q.

Reply to this post via e-mail.
Philip Chung
Philip Chung
Software Developer