( mmm A Dabhand Guide poo, <M :

ARCHIMEDES PARSSSY =1\411 5] Bf MY le1ey- ve] =


Archimedes Assembly Language

A Dabhand Guide

Mike Ginns


Archimedes Assembly Language: A Dabhand Guide

© Mike Ginns 1988-90

ISBN 1-870336-20-8.

First edition May 1988,

Second Edition, third impression October 1991

Editors: Shona Mclsaac and Bruce Smith Typesetting: Bruce Smith

Cover: Paul Holmes and Atherton Clare Designs Illustrations: David Price and David Atherton

All trademarks and registered trademarks mentioned in this book are here- by acknowledged to be the property of their respective owners.

Within this book the letters BBC refer to the British Broadcasting Corporation. The terms BBC micro, Master 128, Master Compact and Archimedes refer to the computers manufactured by Acorn Computers Ltd under licence from the BBC.

All rights reserved. No part of this book (except brief passages quoted for critical purposes) or any of the computer programs to which it relates may be reproduced or translated in any form, by any means mechanical electro- nic or otherwise without the prior written consent of the copyright holder.

Disclaimer: Because neither Dabs Press nor the author have any control over the way the material in this book and optional programs disc is used, no warranty is given or should be implied as to the suitability of the advice or programs for any given application. No liability can be accepted for any consequential loss or damage, however caused, arising as a result of using the programs or advice printed in this book/programs disc.

Published by Dabs Press, 22 Warwick Street, Prestwich, Manchester, M25 7HN. Tel. 061-773 8632. Fax 061-773 8290.

All correspondence to the author and the publisher should be sent to this address.

Typeset in 10 on 11pt Palatino by Dabs Press using an Apple Macintosh desktop publishing system.

Printed and bound in the UK by BPCC Wheaton, Exeter EX2 8RP.

Contents | A

1: Introduction 13 The Archimedes and the ARM 13 RISC Design 13 Notation 15 Acknowledgments 15

2: An Overview of the ARM 17 A Typical Computer System Model 17 Input/Output 18 Memory 18 Communication Buses 18 The Data Bus , 19 Words 19 The Address Bus 20 Byte and Word Accessed Memory 21 Word-aligned Addresses 22 Virtual Memory 23 Executing Machine Code Instructions 24 Pipelining 25

3: Internal Architecture 27 The Arithmetic Logic Unit ae The Barrel Shifter 28 Processor Registers 29 Registers on the ARM 30 Uncommitted Registers 31 Special Purpose Registers 31 R14 : The Link Register 32 R15 : Program Counter and Status Register 32 The Program Counter 32 The Status Flags 33 Setting the Flags 34 Mode Flags 34

Archimedes Assembly Language

User Mode

Supervisor Mode

Interrupt Modes

Registers Different Processor Modes ARM Instructions

The RISC Concept

RISC verses CISC

Instruction Length

Conditional Execution

Data Shifts

4: The BASIC Assembler

General Format of ARM Instructions The Assembler

Entering the Assembler

The Assembler Location Counter P% Reserving Memory

Assembler Listings

Executing Machine Code Programs Returning to BASIC , Comments in Assembly Language Assembler Labels

The ADR Directive

BASIC from the Assembler

Passing Data

Returning Values

The ARM Instruction Set

Conditional Execution

Condition Codes and the Assembler Conditional Execution After Comparisons Controlling the Status Flags

Mixing Conditional and S Suffixes Instruction Groups

Data Processing Format

Opcode Mnemonic


Operand One

Operand Two

Operand Two : A Simple Register


Operand Two : An Immediate Constant 71 Range of Immediate Constants 71 Operand Two : A Shifted Register Operand 74 7 : Shift Instructions 77 Data Processing Instructions 77 Logical Shift Left : LSR 77 Logical Shift Right : LSR 79 Arithmetic Shift Right : ASR 81 Rotate Right : ROR 83 Rotate Right With Extend (One Bit Only) : RRx 84 8 : Processing Instructions 85 ADD : Addition 85

ADC : Add with Carry 87

SUB : Subtract 89

SBC : Subtract with Carry 90

RSB_ : Reverse subtract 91

RSC : Reverse subtract with Carry 92

MOV : Move data 93

MVN: Move Inverted Data 94.

CMP : Compare 95

CMN : Compare negative 99

AND : Logical AND 100

ORR : Logical OR 101

EOR : Logical Exclusive-OR 102

BIC : Bit Clear 103

TST : Test Bits 104

TEQ : Test Equivalence 106

MUL : Multiplication 107

MLA : Multiplication with Accumulator 109

9 : Register R15 110 Register with Data Processing List 110 Register R15 as Operand One 110 Register R15 as Operand Two 110 The Program Counter and Pipelining 111 Register 15 as the Destination Register 113

Archimedes Assembly Language





Data Transfer

Between Memory and Registers Accessing Memory

Addressing Modes

Indirect Addressing

Pre-indexed Addressing

Simple Register

An Immediate Constant

Shifted Register

Using Write Back

Post-indexed Addressing

PC Relative Addressing

Byte and Word Addressing Multiple Register Transfers


Direction of Storage

Pre or Post-address Modification Write Back

Applications of STM, LDM

Branches and SWI

Simple Branch (B)

Conditional Branches

Branches and Conditional Instructions Branch with Link : BL

Preserving the Link Register

Software Interrupt : SWI

Stacks and LDM/STM

Computer Stacks

Types of Stack

Implementing Stacks using LDM and STM Stack Applications

The BASIC Assembler 2

OPT Settings

Error Control

Offset Assembly

Storing Data in Assembly Programs The ALIGN Directive


115 115 115 116 117 118 119 120 121 122 124 125 125 126 127 127 129 130


131 132 133 134 137 138


141 142 142 146


147 149 151 153 155


CALL Parameters The Operating System from BASIC

: Techniques & Debugging

Macro Assembly

Conditional Assembly

Mixing Macros and Conditional Assembly Debugging Machine Code Programs

The Debugger

Using the Debugger


Memory Commands

15 : Interrupts and Events

Interrupts on the Archimedes Disabling Interrupts Interrupt Processing Returning From Interrupts Writing Interrupt Routines Events

SWI OSClaim_Device Vector SWI OSReleaseDevice Vector

16: Vectors

ARM Hardware Vectors

Software Vectors

Intercepting Vectors

Claiming Vectors

Releasing Vectors

Writing Vector Intercept Routines The Operating System Vectors Main Line System Vectors

17 : OS SWI Routines

Input/Output Facilities Character Input/Output String Input/Output Conversion Routines Other Conversion Routines

156 159


161 164 166 167 167 168 168 169


173 173 174 175 176 177 181 182


183 184 185 185 186 186 187 188


195 195 196 201 203



Archimedes Assembly Language

18 :


20 :

System Calls Interrupt Driven Routines

The WIMP Environment

Controlling the WIMP Environment Accessing the Mouse

Initialising the WIMP

WIMP Windows

Creating Windows


Defining Icons

Opening Windows

Polling the WIMP

Simple Window Program Example RISC OS Specific

Managing Fonts

Font Manager * Commands

The Character Fonts

Initialising a Font

Painting Text in Different Fonts Anti-aliasing

Setting Up the Anti-aliasing Colour Palette The Anti-aliasing Transfer Function Changing the Painting Colour

Losing Fonts

Templates and Input/Output Input/Output

String Manipulation

Miscellaneous Statements

Control Constructs


Template Format

Register Use


21: Manipulating Strings

Representing Strings String Manipulation Routines

206 211


213 214 216 216 218 220 221 222 224 227 231


232 233 233 235 237 239 241 242 245


247 247 248 248 248 249 249 250


259 260


String Assignment 260 String Concatenation 262 String Comparison 263 22 : Functions, Operators... 277 SGN 277 ABS 277 DIV and MOD 278 Logical Operators : AND, OR, EOR 280 Logical Operators : NOT 280 Arrays 281 Dimensioning Arrays 281 Array Access 282 SOUND 284. 23 : Control Statements and Loops 286 IF... THEN...ELSE...ENDIF 286 Multi-condition IF..THEN..ELSE Statements 289 Non-numeric Comparisons 291 REPEAT...UNTIL 292 WHILE..ENDWHILE 292 FOR...NEXT 293 CASE Statement 296 Procedures 298 Local Variables 299 Parameter Passing 300 Example of Recursive Procedures 300 24 : Graphics Templates 304 VDU, PLOT 305 MOVE , POINT 307 DRAW, BY, LINE 308 CIRCLE 310 Filled Circles 312 RECTANGLE 313 Outline Rectangle 313 FILL, ORIGIN, MODE 315 CLS, CLG, COLOUR 316 GCOL 318 POINT( 319

Archimedes Assembly Language


25 : RISC OS Specific

Mouse Pointer SWI's

Mouse User Confirm

Co-operative Multi-tasking


WIMP Co-operative Multi-tasking Programs Stopping WIMP Tasks

320 320


321 324 326 327 327 331

A wimp Based Co-operative Multi-tasking Program 331


: OS SWI Routines

: Instruction Set Format OSBYTE Routines Plot Routines Programs Disc Dabhand Guides Guide



Program Listings

2.1 4.1 4.2 4.3 4.4 4.5 5.1 8.1 8.2 8.3 8.4 8.5 8.6 8.7 9.1 9.2 10.1


Words, bytes and word-aligned addresses Entering the assembler from BASIC

Simple moving character

Fully commented version of listing 4.3

A simple loop using labelled addresses Passing data to and from machine code Letter print

Simple two-word addition

Simple two-word subtraction

A demo of comparison and condition codes Case conversion using the ORR instruction Toggling data using the EOR instruction Printing binary

Multiplying two numbers together

The effect of pipelining the program counter

Skipping instructions Demo of post-indexed indirect addressing


336 342 346 350 352 354


22 43 46 49 51 54 57 85 89 96 101 102 104 107 111 112 119

10.2 11.1 11.2 12.1 13.1 13.2 13.3 13.4 13.5 14.1 14.2 15.1 17.1 17.2 17.3 17.4 18.1 18.2 19.1 19.2 19.3 20.1 20.2 20.3 20.4 20.5 21.1 21.2 21.3 21.4 21.5 21.6 21.7 21.8 21.9 21.10 21.11 22.1 22.2 22.3 23.1 23.2

Accessing tables with indirect addressing Branches and loops

Subroutines using branch with Link Example of machine code stacks

Forward references

Forward references using two-pass assembly Using the EQU directives

The ALIGN directive

Passing strings to machine code

The ‘Beep’ macro

Conditional assembly a demonstration Example of an Event driven program Converting a number to a hexadecimal string Converting numbers to binary Manipulating characters using OSWORD 10 Using OSCLI to catalogue a disc

A simple sketch pad using the mouse Example of creating windows

Painting text in the ‘Trinity’ font Demonstration of anti-aliasing shading Painting text in different colours

INPUT template

Demo of INKEY template from machine code sPc(n) template ;

TAB(n) template

TAB(x,y) template

String assignment

String concatenation

String comparison

String length (LEN)

LEFT$ template

RIGHT$ template

MID$ template

INSTR template

STRING$ template

VAL template

.STR$ template

Template to perform Div and MOD

Array access in machine code

Simple sound effects

Example of using the FOR...NEXT template A FOR...NEXT loop in assembly code

120 132 120 145 149 150 154 156 158 162 164 179 204 205 208 209 215 228 235 240 243 251 253 299 256 257 261 262 263 265 266 267 269 270 272 274 275 278 282 285 288 294


Archimedes Assembly Language

23.3 Example of the CASE template 297 23.4 Anexample of recursive procedures 301 24.1. Example use of OS_PLOT 306 24.2 Example of a LINE template 309 24.3. Example of a CIRCLE template 311 24.4 ‘Printing coloured stars B17 25.1 Using Hourglass swI 323 25.2 Using SWI OS_Confirm 325 25.3. Co-operative Multi-tasking 329 25.4 Asimple multi-tasking printer spoooler 330 25.5 A WIMP based multi-tasking program 334 This Book and You!

This book was written using the InterWord wordprocessor on a Master 128 micro. The author's files were edited after transferring them to VIEW. The finished manuscript was transferred serially to an Apple Macintosh where it was typeset in MacAuthor. Final camera-ready copy was daar ced on an Apple LaserWriter IINT from which the book was printed by A. Wheaton & Co.

Correspondence with Dabs Press, or the author, should be sent to the address on page 2 or via electronic mail on Telecom Gold (72:MAG11596) or Prestel (942876210). An answer to your letter or mailbox cannot be guaranteed, but we will try our best.

All corresponents will be advised of future publications, unless we receive a request otherwise. Personal details held will be provided on request, in accordance with the Data Protection Act. Catalogues detailing the full range of Dabs Press books and software are available on request.

Second Edition

This Second Edition has been fully revised to take account of RISC OS. It's contents are still relevant to the Arthur Operating System and any differ- ences between the two are clearly marked at the appropriate point. I am grateful to the wide number of readers of the First Edition for drawing at- tention to a some inaccuracies in the original text. I would like to especially thank Mr Graham Jones of Sutherland and Alan Glover of PRES for raising some interesting points which have been incorporated into this Edition.


1 : Introduction A

The Archimedes and the ARM

The Archimedes is the revolutionary new micro from Acorn Computers. It follows a long line of famous predecessors including the successful BBC micro, the BBC B+ and the current Master series. The Archimedes, however, is unlike anything which has gone before it is a totally new machine. While remaining as compatible as possible with earlier models, it represents a major new departure for Acorn and an exciting leap forward for Acorn enthusiasts.

The Archimedes is unique in many ways it has stunning multi-coloured graphics, a stereo sound system and supports a Window and mouse envir- onment, to mention a few. However, perhaps the most startling innova- tion is the totally new microprocessor used in the system.

Acorn has moved away from the familiar 6502 used in earlier machines. For the Archimedes, Acorn developed its own processor using the most up-to-date ideas and technology.

Called the Acorn RISC Machine - or simply the ARM - it is the power of this remarkable chip which provides the advanced facilities of the machine. The ARM out-performs not only the 6502, but also most comparable processors, including the much-used MC68000.

RISC Design

The ARM represents a totally new philosphy in microprocessor design. It is an example of a Reduced Instruction Set Computer (RISC). It is called this because the designers have dispensed with many of the unnecessary and inefficient instructions found on many processors. The RISC chip is equipped with relatively few instructions, but these few are flexible, powerful and optimised so that they can be processed exceedingly fast. This gives the ARM unprecedented power which, until now, was only available on larger and more expensive machines.


Archimedes Assembly Language

To be able to program the ARM processor directly, we must be able to com- municate with it in its own language ARM machine code. This is very dif- ferent to the high-level languages, such as BBC BASIC, which most people are familiar with. Machine code programs are simple sequences of num- bers and data, held in the computer's memory, which have some signific- ance to the ARM processor.

Faced with the task of writing machine code programs in this numeric form most of us would probably give up! However, to help us in our task, the Archimedes provides a superb ARM machine code assembler. This allows us to write machine code programs in a more understandable form, using assembler statements which are then translated into actual machine code data. It is not difficult to program in assembly language, it just requires the use of certain special techniques.

This book aims to provide a complete tutorial course in writing ARM ass- embly code programs on the Archimedes computer. It explains the special elements which make up the ARM processor, and how these elements are used to execute machine code programs.

For the complete beginner, there is a section containing a comprehensive guide on fundamental topics such as number bases, binary and machine arithmetic, and logic. This enables readers with only a general knowledge of BASIC programming, to learn the concepts and ideas used in the book in a step-by-step way.

The book describes each of the machine code instructions provided by the ARM, together with explanations and examples of how they are used to construct machine code programs. Various assembly programming tech- niques are covered, such as memory allocation, access, data structures, control constructs and so on.

The powerful Arthur operating system used in the Archimedes is covered, with details of how to access its many facilities from the machine code level. How graphics, sound, windows, the font painter and the mouse work from within machine code programs are covered.

To make the transition from BASIC to machine code as painless as_ possible, the book contains a section on implementing BASIC statements in machine code. All of the most useful BASIC statements are covered and for each an assembler ‘template’ routine is developed which will mimic the statement's function in machine code.



Throughout the book, you will be encouraged to put theory into practice by trying out example programs on your own machine. To save typing these into the computer, the accompanying programs disc contains all the pro- grams used in the text. The disc also includes some extra utilities not covered in the book such as a complete memory editor, disassembler and other utilities. Full details can be found in Appendix J.


A standard notation has been adopted throughout the book. The symbols below have the following special meanings:

The & symbol This signifies that the following number is in hexa- decimal. For example, &12CA

The % symbol This signifies that the following number is in binary. For example, %10100111010101010100011100110100

>> and << These are BASIC shift operations and are described in the Archimedes BASIC manual

<> brackets Brackets are used extensively when giving the syntax of various instructions and commands. The two angled brackets mean that the word between should not be taken literally. It simply refers to what sort of object should be used with the command. For example: <Register> means that a register name should be used in the brackets.

{} brackets Unless otherwise stated, the curly brackets mean that the object contained in them is optional and can be omitted. For example: ADD {S means that the ‘S’ argument is optional to the instruction


Thanks are due to Siobhan FitzPatrick for reading my manuscript and pointing out, in as tactful a way as possible, my many mistakes! Thank you to Tony Goodhew for all the support, ideas and practical advice given on


Archimedes Assembly Language

this and other projects. Thanks are also due to Charlie, Mike, Andy and Robert for putting up with me while I wrote the book. James Knight gave invaluable help with the demonstration programs. Thankyou to Mark Gould-Coates for his understanding, help and general comments on all aspects of the book. Special thanks are also due to Jeff Fidler for his encouragement and support, and for providing some very welcome dis- tractions during the writing of difficult parts of the book! Finally, a big thank-you to Bruce and David for publishing the book - and coming up with the idea in the first place!


This book is dedicated to my parents and family for all the help, support and encouragement they have given me over the years.


2: An Overview of the ARM

Before embarking on a detailed examination of the ARM chip and how it is programmed in assembly language, it's important to understand some fundamental computer principles.

In this chapter, we shall consider a general model of a computer and see how the ARM chip fits into this. We shall also examine how the ARM com- municates with other parts of the computer and with the outside world. Finally, the way in which the ARM executes machine code instructions will be investigated.

A Typical Computer System Model

Figure 2.1. A model of a typical computer system.

At the simplest level, most computers can be represented by the model in figure 2.1. Data is obtained from the input to the system. It is then worked on by the central processing unit (CPU) and the results produced are sent to


Archimedes Assembly Language

the output. The main memory of the computer is used during the process- ing as workspace. It holds the program being executed, the data being pro- cessed and intermediate results produced. Any computer system must, therefore, resolve the problems of how to connect these separate elements so that data can pass efficiently between them.


In the majority of computer systems, including the Archimedes, the input/output is handled in the same way as the memory itself. This is known as device memory mapping. Physical input/output devices, eg, the disc controller, keyboard interface, video chip and so on, are made to appear as normal memory locations in the memory map to the processor. When the processor accesses these locations, it in fact accesses the hard- ware registers of the corresponding device. Data can thus be passed to and from devices simply by reading and writing the associated memory loca- tions. This scheme provides a uniform way for the CPU to communicate with the outside world. The remaining problem is connecting the CPU and memory so that any arbitrary location can be accessed.


Memory on the Archimedes can be viewed as a long sequential series of bytes (there are eight bits in a byte). Each byte is given an identifying num- ber starting at '0'. So the first byte of memory is called location '0', the next location '1' and so on. Thus we can talk about the processor accessing the data in location 'n', which means we use the nth byte of memory.

To access memory, therefore, the CPU requires some way of specifying the number of the memory location to be used. It also needs a method of trans- ferring data to and from the memory. This is done by using the address and data bus.

Communication Buses

A bus is simply a series of electrical signal lines connecting the CPU to the other elements in the computer system. Each physical line in the bus can re- present a single binary bit, ie:

+5 volts = Logic 1 0 volts = Logic 0


An Overview of the ARM

By placing combinations of +5 and 0 volts on the separate lines in the bus, binary numbers can be represented and transferred around the computer system. The number of signal lines, or bits, in a bus is called the bus width. Thus, we can talk about buses which are eight, 16 or 32 bits wide.

The Data Bus

The data bus, as the name suggests, is used by the CPU to pass data to and from the computer's memory. It is called a bi-directional bus because data can flow in either direction. In data storage operations, the processor puts the data on to the data bus and the memory reads it. In load operations, the processor requests the memory to put data on to the data bus, which it then reads.

The ARM processor is a 32-bit machine. This means that its data bus is 32 bits wide. This has far-reaching consequences on the performance of the Archimedes and explains, at least in part, why the computer is so powerful. The provision of a 32-bit data bus means that larger pieces of data can be processed in single operations.

An example should illustrate the point. Supposing we wanted to add to- ether the contents of two integer variables. Each of these are 32-bits long. The 6502 processor on the BBC micro, has an 8-bit data bus and would therefore have to process the numbers in four, single-byte chunks. It would have to perform four load operations, four additions and four stores. On the ARM processor the two numbers could be loaded in their entirety and added together in a single operation. This gives a huge speed advantage over 8-bit machines as the number of memory accesses and processor operations is drastically reduced.


A very useful, if slightly vague, concept often quoted when referring to memory is 'word’. A 'word' of memory is a logical unit defined as the num- ber of bits manipulated in parallel by the processor in single operations.

Unfortunately, the definition of a memory word is not universally accepted and tends to vary from computer to computer. For example, the BBC mi- €r0'8 6502 was an 8-bit machine and clearly manipulated eight bits of data “at a time. It should therefore be talked about as having an 8-bit word length. However, in most applications, 16-bit quantities were more often


Archimedes Assembly Language

needed. It was very common, therefore, to talk about words when actually meaning these 16-bit quantities.

This is strictly incorrect, as the 6502 cannot handle 16 bits of data at a time. Sixteen-bit quantities actually had to be accessed by the 6502 in two sep- arate chunks of eight bits. Nevertheless, the terminology persisted and this can be confusing.

The ARM manipulates data of 32 bits in length. Words on the Archimedes are therefore defined as being 32-bit quantities. It is important to appre- ciate and understand this difference between words on the BBC micro and on the Archimedes.

To avoid any confusion, in this book, we shall always refer to words as being 32 bits of memory unless otherwise stated.

The Address Bus

Obviously, the data bus does not provide a complete memory access system. An address bus is also needed so that the CPU can specify which location in memory is to be accessed. The CPU places the address of the required location on the address bus in binary. The memory decoder then reads this and sends control signals to the memory. These cause the relevant memory locations to respond and take part in the transferral of data over the data bus.

The width of the address bus specifies the size of the memory which can be accessed by the CPU. For example, on the BBC micro machines, the 6502 CPU had a 16 bit address bus. This means that 2'° different numbers can be represented on it and thus, 2'° different memory cells can be addressed. The maximum amount of memory available on these machines, ignoring paging techniques such as sideways memory, is therefore 2 F bytes = 65535 bytes = 64 kilobytes.

On the ARM processor, the address bus is 26-bits wide. This allows the Archimedes to have up to 67108864 bytes of memory (64 megabytes). On production machines, 0.5 megabytes, one megabyte, or four megabytes of writable memory are actually provided. This is still very large and will seem massive to anyone who is used to managing with the 32k provided on the standard BBC B computer.


An Overview of the ARM

In ae the size of memory which can be accessed via the address bus is alled the address space. Thus the Archimedes has an address space of 64 megabytes even though, in paractice, not all of this memory is provided.

Byte and Word Accessed Memory

We have already noted that the memory on the Archimedes is byte- organised. That is, each byte of memory has its own unique address. How- ever, we have also seen that the ARM processor has a 32-bit data bus and accesses memory in 32-bit chunks (four bytes). This apparent discrepancy occurs because the ARM can access memory in two ways.

In most cases it will be convenient to use the full power of the 32-bit data bus and access memory as complete 32-bit words. However, in some cases, for example when manipulating 8-bit quantities, it will be more convenient to access bytes individually from anywhere within the memory map. The ARM processor supports both methods, and it is to allow for byte access that each byte of memory has a unique address.

When accessing complete words of data (32 bits in length), the memory can be regarded as being split into separate chunks of four bytes (32 bits) in length. This is illustrated in figure 2.2.

Bit 31 ssnieesnsstcinianarmcedvenmntsrcot Bit 0 Location 0 Byte 2 (Word 0) Location 4 Byte6 | Byte yte (Word 1) Location 8 Byte9 | Byte8 | (Word 2) Location 12 | Byte 15 | Byte 14 | Byte13 | Byte 12 | (Word 3)

Figure 2.2. Byte and word-organised memory.

Archimedes Assembly Language

Word '0' starts at location 0 and includes bytes 0, 1, 2 and 3, word '1' starts at location 4 and includes bytes 4, 5, 6 and 7 and so on. Any complete word can be accessed by the ARM in a single operation.

When specifying which word we want to access in memory, we give the address of the location at which it starts. So the address of the first word is 0, that of the second word is 4, the third is 8 and so on.

Word-aligned Addresses

A memory address which corresponds to the start of a word is called a word boundary and is said to be word-aligned, ie, it is divisible by four. The following addresses are all word-aligned:

&00000000 &00000004 &00000008 &0000000C &00000010

Word-aligned addresses are especially significant to the ARM. When accessing a word of memory, the address given must be word-aligned. For example, we could not access a word consisting of bytes 2, 3, 4 and 5, by specifying location &00000002 as the word address. This is because &00000002 is not a word-aligned address and so the required bytes are in fact split over two separate words of memory.

The program in listing 2.1 gives a demonstration of word-aligned addresses. It repeatedly asks for the address of a memory location. It then prints out which memory word contains the address, and the byte number which the address represents within the given word. The program also tells you whether the entered address is word-aligned or not.

Listing 2.1. Words, bytes and word-aligned address.

10 REM Word-aligned Addresses

20 REM (c) Michael Ginns 1987

30 REM Dabs Press : Archimedes Assembly Language 40 REM


60 MODE 3

70 INPUT "Enter the address: " address


90 IF address MOD 4 = 0 THEN


An Overview of the ARM

100 PRINT "“Word-aligned"

110 ELSE PRINT "Not word-aligned"


130 PRINT "Word containing this address is: " address DIV 4

140 PRINT "Within this word, address is byte number: "; address MOD 4

150 PRINT ' "Enter another address ? (y/n) : "; 160 UNTIL GETS ="n"

The significance of word-aligned addresses will crop up again when we consider ARM machine code instructions, as each of these must start on a word boundary. They are described in detail in a later chapter.

Virtual Memory

Before leaving the subject of how the ARM processor organises its memory, it is useful to look at how the physical memory is spread over the available address space.

We have seen that the Archimedes address bus supports a maximum memory size of 64Mb. Currently, however, a maximum of only 4Mb of writeable memory is provided. How then is this physical memory distri- buted over the much larger 64Mb address space?

The simplest scheme, assuming a 4Mb system, would be to make addresses ()-4Mb correspond to the available memory, and to make addresses higher than this invalid. However, things are not as straightforward as this! The allocation of physical memory is in fact controlled by a highly-sophisticated memory management chip called MEMC. This chip can be programmed to make blocks of real memory appear at any address in the system. Thus, the 4Mb of memory would not appear as one contiguous area, but would be split into blocks which could exist anywhere in the memory map.

The next question is: what happens if we try to access a memory location at which no 'real' memory exists? The answer is that the MEMC chip complains and sends an abort signal to the ARM processor. This normally causes an error message to appear on screen. However, it is possible to trap this event and use it to implement what is called virtual memory.

In a virtual memory system, the computer's main memory is supplemented by some form of secondary or backing store - usually a hard disc. The sec- ondary memory is typically much larger than the main memory, but will have a slower access time.


Archimedes Assembly Language

The program running in the machine assumes that memory is provided over the whole address space (64Mb in the Archimedes). In reality, how- ever, this memory is actually held on the hard disc.

As long as the program accesses locations at which real memory exists, then everything operates normally. However, if an area of non-existent memory is accessed, then the abort error occurs. This is trapped and a spe- cial software routine is called. This routine determines which area of memory the user was attempting to access. It then loads the corresponding block from the hard disc into main memory, replacing a previously loaded memory block. The user routine can then access the required data as if it had been present all the time! The only difference being that there is a slight time delay introduced by disc activity. In this way the computer's main memory is used as a 'buffer' into which chunks of the larger hard disc memory are loaded as they are needed.

Virtual memory is not currently implemented on the Archimedes, but the hardware to support it does exist. It could, therefore, be added to it as an expansion in the future.

Executing Machine Code Instructions

To complete our overview of the ARM system, we will look at how machine code instructions are obeyed by the ARM.

Machine code instructions are binary numbers which have some signific- ance to the processor. Typically, a group of bits in the instruction will de- fine the operation which the processor is to perform. Another group will then tell the processor where to get the data. Further bits may control the use of special options to the instruction and so on. For example, the follow- ing 32-bit binary pattern is the ARM machine code instruction to add two numbers together:


Instructions can therefore be held in memory, like any other piece of data, and moved into the processor using the address and data buses. The ARM processor works by continually repeating a simple sequence of operations. This is commonly known as the fetch-execute cycle and consists of three main parts as follows:


An Overview of the ARM

1) Fetch instruction 2) Decode instruction 3) Execute instruction

In the first part, the address of the instruction to be obeyed is placed on the address bus. The complete instruction, which is always 32-bits long, is then fetched from memory, over the data bus, to the ARM.

In the second part, the previously fetched instruction is decoded. This in- volves looking at the bit pattern making up the instruction, and deciding which of the possible operations in the ARM's instruction set it represents.

In the final part, the previously decoded instruction is executed. That is, the

operation which the instruction specifies is carried out by the hardware elements of the CPU.


A special feature of the ARM processor is that the three parts or phases just mentioned are independent, and are performed by separate sections of the processor. They can, therefore, be overlapped. Obviously we can't overlap the fetching, decoding and executing of the same instruction! However, when an instruction has been fetched, there is no reason why the ARM can- not begin fetching the next one while the first is being decoded. Similarly, while the first instruction is being executed, the second can move on to be

decoded and a third instruction can be fetched and so on. This obviously makes the machine very fast!

The ARM exploits this idea by overlapping all three phases of the cycle. Thus, at a given time, the ARM could hold three different instructions. The

first having just been fetched, the second in the process of being decoded and the third being executed.

Internally, the ARM holds the three instructions in a hardware element called the ‘instruction pipeline’. Instructions move along the pipeline through each of the three phases in turn. New instructions are fetched in at one end of the pipeline and the completed, executed instructions appear at the other. This scheme is illustrated in figure 2.3.


Archimedes Assembly Language


Cycle 1 Instruction 1 <Empty> <Empty> Fetched

Cycle 2 Instruction 2 Instruction 1 <Empty> Fetched Decoded

Cycle 3 Instruction 3 Instruction 2 Instruction 1 Fetched Decoded Executed

Cycle 4 Instruction 4 Instruction 3 Instruction 2 Fetched Decoded Executed

Figure 2.3. Pipelined execution of instruction.

As you can see from figure 2.3, it takes three cycles to fill the pipeline in the first instance. However, from this point onwards the overlap of the phases means that the ARM is, in effect, executing one instruction for every cycle. (There are circumstances when the pipeline has to be 'flushed' and we have to start again from the empty state.)

The pipeline system allows the ARM to perform at least a degree of paral- lel processing of instructions. It attempts to ensure that all parts of the pro- cessor are fully utilised at all times. This highly efficient way of operating helps to explain some of the amazing speed of the ARM processor.


3: Internal Architecture A

We have looked at how the ARM communicates with the outside world, at how it organises memory access and, in general terms, how it processes instructions. Now we can probe a little deeper and examine the hardware elements which perform the operations specified in the processor's instruction