THE HISTORY OF ASSEMBLY LANGUAGE PROGRAMMING, Part 1

Early computer systems were literally programmed by hand.
Front panel switches were used to enter instructions and data.
These switches represented the address, data and control lines of the computer system.To enter data into memory, the address switches were toggled to the correct address, the data switches were toggled next, and finally the WRite switch was toggled. This wrote the binary value on the front panel data switches to the address specified. Once all the data and instruction were entered, the run switch was toggled to run the program.

The programmer also needed to know the instruction set of the processor. Each instruction needed to be manually converted into bit patterns by the programmer so the front panel switches could be set correctly. This led to errors in translation as the programmer could easily misread 8 as the value B. It became obvious that such methods were slow and error prone.

With the advent of better hardware which could address larger memory, and the increase in memory size (due to better production techniques and lower cost), programs were written to perform some of this manual entry. Small monitor programs became popular,
which allowed entry of instructions and data via hex keypads or terminals. Additional devices such as paper tape and punched cards became popular as storage methods for programs.

Programs were still hand-coded, in that the conversion from mnemonics to instructions was still performed manually. To increase programmer productivity, the idea of writing a program to interpret another was a major breakthrough. This would be run by the computer, and translate the actual mnemonics into instructions. The benefits of such a program would be

  • reduced errors
  • faster translation times
  • changes could be made easier and faster

As programmers were writing the source code in mnemonics anyway, it seemed the logical next step. The source file was fed as input into the program, which translated the mnemonics into instructions, then wrote the output to the desired place (paper-tape etc). This sequence is now accepted as common place.

The only advances have been the increasing use of high level languages to increase programmer productivity.

Assembly language programming is writing machine
instructions in mnemonic form, using an assembler to convert
these mnemonics into actual processor instructions and associated
data.

The disadvantages of assembly language programming are

  • the programmer requires knowledge of the processor architecture and instruction set
  • many instructions are required to achieve small tasks source programs tend to be large and difficult to follow
  • programs are machine dependent, requiring complete rewrites if the hardware is changed

THE PROGRAM TRANSLATION SEQUENCE

developing a software program to accomplish a particular task, the implementor chooses an appropriate language, develops the algorithm (a sequence of steps, which when carried out in the order prescribed, achieve the desired result), implements this algorithm in the chosen language (coding), then tests and debugs the final result.

here is also a probable maintenance phase also associated. The chosen language will undoubtably need to be converted into the appropriate binary bit-patterns which make sense to the target processor (the processor on which the software will be run). This
process of conversion is called translation.

The following diagram illustrates the translation sequence necessary to generate machine code from specific languages.

 

ASSEMBLY LANGUAGE PROGRAMMING

Asemblers are programs which generate machine code instructions from a source code program written in assembly language. The features provided by an assembler are,

  • allows the programmer to use mnemonics when writing source code programs.
  • variables are represented by symbolic names, not as memory locations
  • symbolic code is easier to read and follow error checking is provided
  • changes can be quickly and easily incorporated with a re-assembly
  • programming aids are included for relocation and expression evaluation

In writing assembly language programs for micro-computers, it is essential that a standardized format be followed. Most manufacturers provide assemblers, which are programs used to generate machine code instructions for the actual processor to
execute.

The assembler converts the written assembly language source program into a format which run on the processor. Each machine code instruction (the binary or hex value) is replaced by a mnemonic.
A mnemonic is an abbreviation which represents the actual instruction.

	+----------+---------+-----------------+
	| Binary   | Hex     | Mnemonic        |
	+----------+---------+-----------------+
	| 01001111 | 4F      | CLRA            | Clears the A accumulator 
	+----------+---------+-----------------+
	| 00110110 | 36      | PSHA            | Saves A acc on stack 
	+----------+---------+-----------------+
	| 01001101 | 4D      | TSTA            | Tests A acc for 0 
	+----------+---------+-----------------+

Mnemonics are used because they

  • are more meaningful than hex or binary values
  • reduce the chances of making an error
  • are easier to remember than bit values

Assemblers also accept certain characters as representing number bases and addressing modes.

	$ prefix or h suffix for hexadecimal
	 $24 or 24h 

	D for decimal numbers
	 24D 67

	B for binary numbers
	 0101111B

	O or Q for octal numbers
	 377O 232Q 

	# for immediate addressing
	 LDAA #$34 

	,X for indexed addressing
	 LDAA 01,X

Assembly language statements are written one per line. A machine code program thus consists of a sequence of assembly language statements, where each statement contains a mnemonic. Each line of an assembly language program is split into four fields, as shown below

	LABEL	OPCODE	OPERAND		COMMENTS

The label field is optional. A label is an identifier (or text string symbol). Labels are used extensively in programs to reduce reliance upon programmers remembering where data or code is located. A label can be used to refer to< a memory location the value of a piece of data the address of a program, sub-routine, code portion etc.

The maximum length of a label differs between assemblers. Some accept up to 32 characters long, others only four characters. A label, when declared, is suffixed by a colon, and begins with a valid character (A..Z). Consider the following example.

	 START: LDAA #24H

Here, the label START is equal to the address of the instruction LDAA #24H. The label is used in the program as a reference, eg,

	 JMP START

This would result in the processor jumping to the location (address) associated with the label START, thus executing the instruction LDAA #24H immediately after the JMP instruction. When a label is referenced later on in the program, it is done so without the colon suffix.

An advantage of using labels is that inserting or re-arranging code statements do not necessitate re-working actual machine instructions. A simple re-assembly is all that is required. In hand-coding, such changes can take hours to perform.

Each instruction consists of an opcode and possible one or more operands. In the above instruction

	 JMP START

the opcode is JMP and the operand is the address of the label START.

The opcode field contains a mnemonic. Opcode stands for operation code, ie, a machine code instruction. The opcode may also require additional information (operands). This additional information is separated from the opcode by using a space (or tab stop).

The operand field consists of additional information or data that the opcode requires. In certain types of addressing modes, the operand is used to specify

  • constants or labels
  • immediate data
  • data contained in another accumulator or register
  • an address

Examples of operands are

	 TAB ; operand specified by opcode
	 LDAA 0100H ; two byte operand
	 LDAA START ; label operand
	 LDAA #0FH ; immediate operand

The comment field is optional, and is used by the programmer to explain how the coded program works. Comments are preceded by a semi-colon. The assembler, when generating instructions from the source file, ignores all comments. Consider the following examples,

			 ; H means hexadecimal valuesORG
	0100H 		 ;This program starts at address 0100 hex
STATUS:	DFB 23H		 ;This byte is identified as STATUS, and is
			 ;initialized to a value of 23 hex
CODE:	LDAA STATUS	 ;The label called CODE is identified as a
			 ;machine code instruction which loads the
			 ;A accumulator with the contents of the
			 ;memory location associated with the label
			 ;STATUS, ie, the value 23
	JMP CODE	 ;Jump to the address associated with CODE

Note that the programmer does not need to worry about bit patterns, hex values, and the addresses of STATUS or CODE. The assembler, when fed the above program, will generate the correct code. The code output from the assembler will be,

	Memory location		Byte value
	 0100			 23
	 0101			 B6
	 0102			 01
	 0103			 00
	 0104			 7E
	 0105			 01
	 0106			 01

	Location 0100 holds the value associated with the label STATUS
	Locations 0101 to 0103 perform the LDAA STATUS instruction
	Locations 0104 to 0106 perform the JMP CODE instruction

The statement ORG 0100H in the above program is not a machine code instruction. It is an instruction to the assembler, which instructs the assembler to generate the code to run at the designated origin address. Instructions to assemblers are called pseudo-ops. These are used for

  • reserving memory for data variables, arrays and structures
  • determining the start address of the program
  • determining the entry address of the program
  • initializing variable values

The assembler does not generate any machine code instructions for pseudo-ops or comments. Assemblers scan the source program, generating machine instructions. Sometimes, the assembler reaches a reference to a variable which has not yet been defined. This is referred to as a forward reference problem. The assembler can tackle this problem in a number of ways. It is resolved in a two pass assembler as follows,

On the first pass, the assembler simply reads the source file, counting up the number of locations that each instruction will take, and builds a symbol table in memory which lists all the defined variables cross-referenced to their associated memory address. On the second pass, the assembler substitutes opcodes for the mnemonics, and variable names are replaced by the memory locations obtained from the symbol table.


OPERATION OF A TWO-PASS ASSEMBLER

Consider the following source code program for a hypothetical computer. The program computes the so-called Fibonacci numbers, printing all such numbers up to that specified by LIMIT.

Line		Label	Operation	Operand 1	Operand 2
1			COPY		ZERO	 	OLDER
2 	 		COPY		ONE		OLD
3	 		READ 		LIMIT
4 	 		WRITE 		OLD
5		FRONT:	LOAD		OLDER
6			ADD 		OLD 
7			STORE		NEW
8			SUB		LIMIT
9			BRPOS		FINAL
10			WRITE		NEW
11			COPY		OLD		OLDER
12			COPY		NEW		OLD
13			BR		FRONT
14		FINAL:	WRITE		LIMIT
15			STOP
16		ZERO:	CONST 		0
17		ONE	CONST		1
18		OLDER	SPACE
19		OLD	SPACE
20		NEW	SPACE
21		LIMIT	SPACE

The instruction set of the computer is as follows,

Operation Code				Number of
Symbolic	Machine		Length	Operands	Action
ADD		02		2	1		ACC <- ACC + OPD1
BR		00		2	1		Branch to OPD1
BRPOS		01		2	1		Branch to OPD1 if ACC> 0
COPY		13		3	2		OPD2 <- OPD1
LOAD		03		2	1		ACC <- OPD1
READ		12		2	1		OPD1 <- input stream
STOP		11		1	0		Halt execution
STORE		07		2	1		OPD1 <- ACC
SUB		06		2	1		ACC <- (ACC - OPD1)
WRITE		08		2	1		output stream <- OPD1

The functions that the assembler will perform in translating the program are,

  1. replace symbolic addresses by numeric addresses
  2. replace symbolic operation codes by machine operation codes
  3. reserve storage for instructions and data
  4. translate constants into machine representation

IMPLEMENTATION

The assembler uses two counters to keep track of the machine language program. One counter, called the location counter, keeps track of the physical address location being used, and will initially be set to zero for this program (or the value designated by the ORG directive).

The other counter is the line counter, which keeps track of the line number being processed. After each source line has been examined on the first pass, the location counter is incremented by the correct number of bytes.

When the assembler processes line 1 of the source, it cannot replace the symbols ZERO and OLDER by their addresses because those symbols have not yet been defined. This is called a forward reference problem.

The assembler will place the symbols into the symbol table, determine the number of bytes to advance by altering the contents of the location counter to 3, then proceed to process the next source line. After processing line 3 of the source, the current state will be,

	Line	Address	Label	Operation	OPD1	OPD2
	 1	0		COPY			ZERO	OLDER
	 2	3		COPY			ONE	OLD
	 3	6		READ			LIMIT

and the contents of the symbol table will be

	Symbol		Address
	ZERO		---
	OLDER		---
	ONE		---
	OLD		---
	LIMIT		---
	Location Counter: 8
	Line Counter: 4

The symbol table currently holds five symbols, none of which yet has an address. During processing of line 4, the assembler picks up the symbol OLD. It establishes that it is already in the symbol table, so does not enter it again.

During line 5, the assembler encounters FRONT, and it is entered into the symbol table. The assembler also knows its address (10), so it is also placed into the table. After processing line 9 of the program, the current state is,

	Line	Address	Label	Operation	OPD1	OPD2
	1	0			COPY		ZERO	OLDER
	2	3			COPY		ONE	OLD
	3	6			READ		LIMIT
	4	8			WRITE		OLD
	5	10	FRONT		LOAD		OLDER
	6	12			ADD		OLD
	7	14			STORE		NEW
	8	16			SUB		LIMIT
	9	18			BRPOS		FINAL

and the contents of the symbol table will be

	Symbol		Address
	ZERO		---
	OLDER		---
	ONE		---
	OLD		---
	LIMIT		---
	FRONT		10
	NEW		---
	FINAL		---
	Location Counter: 20
	Line Counter: 10

The first pass continues, building up the symbol table. When the assembler determines the address of the various symbols in lines 16 to 21, these are entered into the table. At the end of pass 1, the symbol table should list all declared symbols as well as their addresses.

The state at the end of the first pass is,

	Line	Address	Label	Operation	OPD1	OPD2
	1	0		COPY		ZERO	OLDER
	2	3		COPY		ONE	OLD
	3	6		READ		LIMIT
	4	8		WRITE		OLD
	5	10	FRONT	LOAD		OLDER
	6	12		ADD		OLD
	7	14		STORE		NEW
	8	16		SUB		LIMIT
	9	18		BRPOS		FINAL
	10	20		WRITE		NEW
	11	22		COPY		OLD	OLDER
	12	25		COPY		NEW	OLD
	13	28		BR		FRONT
	14	30	FINAL	WRITE		LIMIT
	15	32		STOP
	16	33	ZERO	CONST		0
	17	34	ONE	CONST		1
	18	35	OLDER	SPACE
	19	36	OLD	SPACE
	20	37	NEW	SPACE
	21	38	LIMIT	SPACE

and the contents of the symbol table will be

	Symbol		Address
	ZERO		33
	OLDER		35
	ONE		34
	OLD		36
	LIMIT		38
	FRONT		10
	NEW		37
	FINAL		30
	Location Counter: 39
	Line Counter: 22

Code generation is performed on the second pass. Before starting, the line and location counters will be reset to 1 and 0 respectively. The assembler now generates one line of object code for each source line. Line one is translated to

	Address	Length	Opcode	OPD1	OPD2
	 00	3	 13	 33	 35

Successive lines are translated in the same manner. On encountering the label FRONT in line 5, the assembler ignores it. Lines 16 to 21, where space is reserved for variables, the
assembler may leave these undefined, or initialize them to zero. The object code generated by the second pass will be,

	Address	Length	Opcode	OPD1	OPD2
	 00	 3	 13	 33	 35
	 03	 3	 13	 34	 36
	 06	 2	 12	 38
	 08	 2	 08	 36
	 10	 2	 03	 35
	 12	 2	 02	 36
	 14	 2	 07	 37
	 16	 2	 06	 38
	 18	 2	 01	 30
	 20	 2	 08	 37
	 22	 3	 13	 36	 35
	 25	 3	 13	 37	 36
	 28	 2	 00	 10
	 30	 2	 08	 38
	 32	 1	 11
	 33	 1	 00
	 34	 1	 01
	 35	 1	 xx
	 36	 1	 xx
	 37	 1	 xx
	 38	 1	 xx

 

Advertisements

One thought on “THE HISTORY OF ASSEMBLY LANGUAGE PROGRAMMING, Part 1

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s