pyga v255 (WIP)

(Python) Gameboy Advance Assembly Tool

http://labmaster.bios.net.nz/pyga/ - this file last updated Tue 10 Jul 2007 16:51:30 NZST
Discussion: RHDN
Download: pyga-wip-070710a, win32-runtimes

Table of Contents

1. Introduction

pyga v0.1 (suggestions for a new name welcomed) was originally conceived as an ARM assembler for the Gameboy Advance written in Python. This documentation is currently provided online to supplement the program, which is as of yet not officially 'released' for general use. Refer to the RHDN thread for further information, support, and discussion.

Usage/Setup instructions can be found here. Please ensure you read the Features section thoroughly before use.

pyga is distributed under a BSD-style license. See LICENSE (in the main download package) for details.

2. Features

Compatibility with goldroad

pyga is not, and never will be, 100% with goldroad. However, in most cases it will be a largely trivial exercise to get goldroad code assembling correctly.

At the moment, the largest barrier to compatibility is unimplemented features. Many directives have not yet implemented (I'll be working through in order of perceived usefulness) and ARM opcodes are not supported yet so don't even bother with that.

The next biggest difference is that labels in pyga are case sensitive. This is not the case with goldroad. Whilst the case-sensitivity arises from labels being implemented as actual Python variables (where names are case-sensitive), I do not ever plan to try to get around this, so you're just going to have to be consistent from now on.

Literal prefixes: In goldroad, a '#' can prefix any literal, and '0x' or '$' may be used to denote hexadecimal literals. At the moment, pyga tries to cope with these in a rather inelegant way, but support for this may change in the future. Therefore it is strongly recommended that '0x' (not '$') is used to prefix hexadecimal literals and that you do not use '#' to prefix any literals. This is because in Python 0x is used to prefix hexadecimal literals and the '#' character comments out the rest of the line. Additionally, '0' is Python's octal prefix so using the literal '010' actually evaluates to 8 and '099' will generate an error. The moral of the story here is to make sure you do not begin decimal literals with a '0'. Lastly, string literals are not supported.

Finally, there are bound to be little differences in the syntax allowed. Because parsing is implemented quite differently in both programs, goldroad will likely let you get away with little things that pyga won't, and vice versa.

3. Usage

Prerequisites

pyga is written in Python, an interpreted programming language. If you already have Python installed on your system, you're good to go, pyga should run out of the box. If you do not have Python installed, you will need to install it (visit www.python.org), unless you are a Windows user.

If you are using Windows, you may want to install the full interpreter anyway, it's a good excuse to check out Python. However, it is a ~10meg download so if you don't want to download it, you can download the pyga win32 runtimes package I've provided, which is only ~2megs. This is a special set of files that will allow you to run pyga on your system as if you had Python installed. However, it's not a proper Python installation and you won't be able to do anything else with it. If you have downloaded the runtimes package, simply extract the zip file into the same directory as your main pyga files (pyga.exe and pyga.py should now be in the same directory)

Running

pyga is a command-line program. The way in which you invoke pyga will depend on your system setup. Below are a few examples ('$' is the command prompt):

*nix (python interpreter installed to /usr/bin/python):
$ ./pyga.py <args>

windows (pyga win32 runtimes downloaded):
$ pyga <args>

windows (python installed):
$ python pyga.py <args>

windows (python installed, not on PATH - default is c:\Python25\python):
$ c:\Path\To\python pyga.py <args>

Running pyga with the '--help' option will give you usage instructions:

$ ./pyga.py --help
Usage: pyga.py [options] input [output]
Assembles pyga script.

Options:
  -h, --help            show this help message and exit
  -d                    Outputs intermediate file to <input>.py for debugging
                        purposes.
  --debug=FILE          Outputs intermediate file to FILE for debugging
                        purposes.
  -f FORMAT, --format=FORMAT
                        Specify output format.

If a FORMAT option is omitted, the script will attempt to guess the format
desired from the output file, if provided. Default is a binary file. FORMAT
may take the following values:

 bin    New binary file
 patch  Patch existing file
 cons8  Output 8-bit formatted hexadecimal data to console
 cons16 Output 16-bit formatted hexadecimal data to console
 cons32 Output 32-bit formatted hexadecimal data to console
 ips    IPS patch (subject to format restrictions)

4. Assembler Directives

pyga is designed to be similar to (but by no means compatible with) goldroad. As such, many assembler directives purport to behave in the same way as their goldroad equivalents. Directives are prefixed by a single '@' and those currently implemented are listed here.

@dc8, @dc16, @dc32, @dcb, @dcd, @dcw, @define, @endarea, @fsize, @incbin, @include @ltorg, @org, @pool, @textarea

@dcb, @dcw, @dcd, @dc8, @dc16, @dc32

syntax:
@dcb BYTE, BYTE, BYTE...
@dc8 BYTE, BYTE, BYTE...
@dcw HALFWORD, HALFWORD, HALFWORD...
@dc16 HALFWORD, HALFWORD, HALFWORD...
@dcd WORD, WORD, WORD...
@dc32 WORD, WORD, WORD...

Assembles raw data to file. Note that (for our purposes) a halfword is a 16-bit value and a word is a 32-bit value, however @dcw assembles halfwords and @dcd assembles words (in order to remain compatible with goldroad). It is recommended that @dc8, @dc16 and @dc32 are used to avoid confusion.

@dcw/@dc16 and @dcd/@dc32 will automatically align to the next halfword/word if required by inserting the necessary number of padding bytes.

@define

syntax:
@define NAME EXPRESSION
All further occurrences of NAME will be replaced by EXPRESSION, where EXPRESSION can be any text.

@endarea

syntax:
@endarea
Denotes the end of an @textarea code block.

@fsize

syntax:
@fsize SIZE
Pad the object file to the specified size, where SIZE is in kilobytes.
  • If the output format is cons* or ips, this has no effect.
  • If the output format is bin and the generated file is larger than the size specified, a warning is generated (but the file is written anyway).
  • If the output format is patch and the target file is larger than the size specified, a warning is generated (but the file is written anyway).

@incbin

syntax:
@incbin FILE
Imports raw data from FILE. This is the preferred method of including binary data - there are overheads associated with using blocks of @dcX as they evaluate expressions. FILE is interpreted as relative to the current source file.

@include

syntax:
@include FILE
Includes source file for assembly. FILE is interpreted as reltive to the current source file. pyga should detect if you accidentally create an infinite-include loop (file1 includes file 2 which includes file1, etc...).

@org

syntax:
@org LOCATION
Moves the assembler to the address denoted by LOCATION. LOCATION can be any expression that evaluates to a valid address. LOCATION takes into account the current text area. See the section on @org/@textarea for more details and examples.

@pool/@ltorg

syntax:
@pool
@ltorg
Writes out the literal pool at the current location.

@textarea

syntax:
@textarea LOCATION
Assembles the following code to LOCATION. LOCATION can be any expression that evaluates to a valid address. @textarea directives can be nested, blocks of code should be ended by a corresponding @endarea directive. See the section on @org/@textarea for more details and examples.

5. Notes

@org and @textarea

There is some confusion as to the difference between these two directives.

Some examples and their output should illustrate this:

; ends up at offset 0 of the object file
@textarea 0x080A49CE

mov	r3, 99
b 0x080A49DC

;Output:
;00000000: 2363 E004
; ends up at offset 0xA49CE of the object file
@org 0x080A49CE

mov	r3, 99
b 0x080A49DC

;Output:
;000A49CE: 2363 E004
; Crazy complicated example time. Provided that the data in the @textarea
; block has been copied into EWRAM at 0x02000000 prior to this, execution
; goes from ROM to EWRAM, then back to ROM, which then jumps over the
; section that had been copied into EWRAM.

; ends up at offset 0 of the object file
@org 0x080A49CC

ldr	r1,=in_wram|1	;or 1 to land in THUMB mode
bx r1
back:

b jump_over

@pool

@textarea 0x02000000
; this block of data would have been transferred to WRAM earlier on

in_wram:
b 0x02000004

nop ; skipped by branch

mov	r3, 1
ldr	r1,=back|1
bx r1

@org 0x02000020
@pool

@endarea

jump_over:
mov r3, 2


;----ends----

Outputs:
000A49CC: 4901 4708 E014 0000 0001 0200 E000 46C0
000A49DC: 2301 46C0 4905 4708

000A49F8: 49D1 080A 2302


Annotated disassembly of resulting code:

080a49cc 4901 ldr r1, [$080a49d4] (=$02000001)
080a49ce 4708 bx r1
080a49d0 e014 b $080a49fc
080a49d2 0000 (padding)
080a49d4 02000001 (literal)
080a49d8 e000 b $080a49dc
080a49da 46c0 nop (skipped by branch)
080a49dc 2301 mov r3, #0x1
080a49de 46c0 nop (padding)
080a49e0 4905 ldr r1, [$080a49f8] (=$080a49d1)
080a49e2 4708 bx r1
...
080a49f8 080a49d1 (literal)
080a49fc 2302 mov r3, #0x2

and in EWRAM:

02000000 e000 b $02000004
02000002 46c0 nop (skipped by branch)
02000004 2301 mov r3, #0x1
02000006 46c0 nop (padding)
02000008 4905 ldr r1, [$02000020] (=$080a49d1)
0200000a 4708 bx r1
..
02000020 080a49d1 (literal)


ldr,[EXPRESSION]

This command assembles to a valid opcode (as it does in goldroad) provided that EXPRESSION evaluates to an address within 1KB forwards of the current instruction - the assembler converts this into a ldr,[pc, offset].

I mention this because I have seen a few people use it with goldroad to load literals in their code, for example:

ldr r3, [maxoam] 		; Load 0x3FC into r3

...

maxoam
@dcd #0x000003FC			; This is the maximum tile value and the last tile OAM

and

ldr r3,[widthtbla+2]	;get widthtable
	
...

widthtbla
@dcd widthtable
widthtable

Note that in the second example they had to add 2 to the offset, because the label widthtbla, when assembled, wasn't on a word boundary. This is not the recommended way of loading literals - it is advised that you use the following and let the assembler take care of the offsets:

@define maxoam 0x3FC

...

ldr r3,=maxoam

...

@pool

and

ldr r3,=widthtable

...

@pool

widthtable:

6. Assembler Mechanics

Assembly is split into three stages. In the first stage, the Parser analyzes the source file and translates it into corresponding Python code. This can be viewed by passing the assembler the "-d" option on the command line, and is useful for debugging errors in the assembler.

_pyga_org("0x080A49CC")
_pyga_t_ldr_rpci_frompool(1, "in_wram|1")
_pyga_t_bx_r(1)
_pyga_label("back")
_pyga_t_b("jump_over")
_pyga_pool_clear()
_pyga_textarea("0x02000000")
_pyga_label("in_wram")
_pyga_t_b("0x02000004")
_pyga_t_nop()
_pyga_t_mov_ri(3," 1")
_pyga_t_ldr_rpci_frompool(1, "back|1")
_pyga_t_bx_r(1)
_pyga_org("0x02000020")
_pyga_pool_clear()
_pyga_endarea()
_pyga_label("jump_over")
_pyga_t_mov_ri(3," 2")

In the second stage, the generated python code is executed line by line (it's a little bit more complicated than simply executing the file, since we want to be able to generate meaningful error messages) in a prepared environment by the Evaluator. In this environment, all of those _pyga_ names represent either functions or classes (these are defined in util.py and opcodes.py, and are injected into the environment prior to execution of the generated code). The result is a list of 'command objects'.

In the third and final stage, an object called the 'Receiver' interrogates each command object, generating the assembled data. This is then output by the Receiver in the format specified.