Do READ the disclaimer !

How DJPASM Works

This page is intended to explain how DJPASM works, hoping that it will be useful for those willing to code their own assemblers. Please note that there are many ways to implement an assembler. This is the one I came up with.

When the "Assemble" button is pressed the function named "main" is called. Assembly is done step by step:

1. Read all source lines into an array, ignoring comments
2. Extract source code words from each line, making each word an element of an array for the line it belongs to
3. See if the "macro" directive is used in the source
4. If "macro" directive is present read macros and expand them
5. Scan source for lines with EQU, DT, RADIX, __CONFIG directives
6. Look for opcodes, labels, ORG directives on each line
7. Assemble the code
8. Generate list and hex output in INHX8M format

1. Read all source lines into an array, ignoring comments
2. Extract source code words from each line

These two steps go together (the stripLns() function in t5dual.js). DJPASM uses a 4-dimensional global array to store source code, line numbers, memory address, assembler output and more. For example:
lnsArry[lnCntr][xpnsn][lmnts][0] lnsArry is the array,lnCntr is for indexing the lines, xpnsn is for expanding macros and source code that expands to multiple lines (dt 1,3,9 expands to retlw 1 ,retlw 3, retlw 9 for example) lmnts an array for storing words of source lines.
When these two steps are completed a source code line like
start: bsf status,rp0 ;switch to bank 1 will be placed into its lmnts array:
As you see, there is no white space, colons, commas or comments.

3. See if the "macro" directive is used in the source
4. If "macro" directive is present read macros and expand them

DJPASM macros are simple, because they cannot be nested (macro that uses another macro) and there is no support for recrusion (macro using itself). If the source contains the MACRO directive, two macro passes are done. The first one scans source lines for the MACRO directive. If the directive is found the macro is added to special array for storing macros, and the macro definition is removed from the source array (as it is not needed anymore). The second pass is to find and expand any macros used. fishForMcr() and fndAndXpndMcr() are the functions that deal with macros.

5. Scan source for lines with EQU, DT, RADIX, __CONFIG directives

This pass (the fishForEqu() function in t5dual.js) looks for EQU, DT, RADIX, __CONFIG directives in lines and calls funtions that deal with each of them whenever one is encountered. For example if a line with EQU directive is found, the symbol name and value are added to an array of labels (the equArry defined in table.js). Also,these lines are removed from the source, since we are done with them, and they are unnecessary on the next pass.

6. Look for opcodes, labels, ORG directives on each line

This pass is quite important. It looks for opcodes, labels, ORG directives on each line, and the addresses are assigned here, too. Each line is checked to see if it matches a possible PIC assembly source code:

line with opcode
line with a label only
line with a label and opcode
line with label and ORG directive
line with ORG directive
sth like: 'addlw 45h' OR 'nop'
sth like: 'myadr:'
sth like: 'myadr: movlw 15h' OR ' myadr: clrw '
sth like: 'myadr: org 0200h
sth like: 'org 0200h'

To find out if a word is an opcode, it is searched in the PIC opcode table (the instrTab array defined in table.js). If a word is found to be an opcode its index in the PIC opcode table is stored ( lnsArry[lnCntr][xpnsn][opcdNdxVal] ) in the lines array for the assembly step.
ORG directives causes the address counter to change to the value specified. ORG directives are also removed from the source, so that they do not get in our way during the assembly step.

7. Assemble the code

Thanks to the previous steps the source code now has only PIC instructions, and the indexes of instructions in the opcode table are known for each line. All that has to be done is to assemble them. Let's take a closer look at the instruction table:

var instrTab = Array(); //Opcode look up table
instrTab[0] = Array("addwf", "*,*", 0x0700, "dualAWDst",0xFF00, 0x7F);
instrTab[1] = Array("andwf", "*,*", 0x0500, "dualAWDst",0xFF00, 0x7F);
instrTab[2] = Array("clrf", "*", 0x0180, "dualAWDst",0xFF80, 0x7F); // check this one ! it was oneArg
instrTab[3] = Array("clrw", "", 0x0100, "asIs" ,0xFF80 );
instrTab[4] = Array("comf", "*,*", 0x0900, "dualAWDst",0xFF00, 0x7F);
instrTab[5] = Array("decf", "*,*", 0x0300, "dualAWDst",0xFF00, 0x7F);
instrTab[6] = Array("decfsz", "*,*", 0x0B00, "dualAWDst",0xFF00, 0x7F);
instrTab[7] = Array("incf", "*,*", 0x0A00, "dualAWDst",0xFF00, 0x7F);
instrTab[8] = Array("incfsz", "*,*", 0x0F00, "dualAWDst",0xFF00, 0x7F);
instrTab[9] = Array("iorwf", "*,*", 0x0400, "dualAWDst",0xFF00, 0x7F);
instrTab[10] = Array("movf", "*,*", 0x0800, "dualAWDst",0xFF00, 0x7F);
instrTab[11] = Array("movwf", "*", 0x0080, "dualAWDst",0xFF80, 0x7F); // check this one ! it was oneArg
instrTab[12] = Array("nop", "", 0x0000, "asIs" ,0xFF9F );
instrTab[13] = Array("rlf", "*,*", 0x0D00, "dualAWDst",0xFF00, 0x7F);
instrTab[14] = Array("rrf", "*,*", 0x0C00, "dualAWDst",0xFF00, 0x7F);
instrTab[15] = Array("subwf", "*,*", 0x0200, "dualAWDst",0xFF00, 0x7F);
instrTab[16] = Array("swapf", "*,*", 0x0E00, "dualAWDst",0xFF00, 0x7F);
instrTab[17] = Array("xorwf", "*,*", 0x0600, "dualAWDst",0xFF00, 0x7F);
instrTab[18] = Array("bcf", "*,*", 0x1000, "bit", 0xFC00, 0x7F);
instrTab[19] = Array("bsf", "*,*", 0x1400, "bit", 0xFC00, 0x7F);
instrTab[20] = Array("btfsc", "*,*", 0x1800, "bit", 0xFC00, 0x7F);
instrTab[21] = Array("btfss", "*,*", 0x1C00, "bit", 0xFC00, 0x7F);
instrTab[22] = Array("addlw", "*", 0x3E00, "oneArg", 0xFE00, 0x00FF);
instrTab[23] = Array("andlw", "*", 0x3900, "oneArg", 0xFF00, 0x00FF);
instrTab[24] = Array("call", "*", 0x2000, "oneArg", 0xF800, 0x07FF);
instrTab[25] = Array("clrwdt", "", 0x0064, "asIs" ,0xFFFF );
instrTab[26] = Array("goto", "*", 0x2800, "oneArg", 0xF800, 0x07FF);
instrTab[27] = Array("iorlw", "*", 0x3800, "oneArg", 0xFF00, 0x00FF);
instrTab[28] = Array("movlw", "*", 0x3000, "oneArg", 0xFC00, 0x00FF);
instrTab[29] = Array("retfie", "", 0x0009, "asIs" ,0xFFFF );
instrTab[30] = Array("retlw", "*", 0x3400, "oneArg", 0xFC00, 0x00FF);
instrTab[31] = Array("return", "", 0x0008, "asIs" ,0xFFFF );
instrTab[32] = Array("sleep", "", 0x0063, "asIs" ,0xFFFF );
instrTab[33] = Array("sublw", "*", 0x3C00, "oneArg", 0xFE00, 0x00FF );
instrTab[34] = Array("xorlw", "*", 0x3A00, "oneArg", 0xFF00, 0x00FF );
instrTab[35] = Array("tris", "*", 0x0060, "oneArg", 0xFFF8, 0x0007);

Notice that it is a two dimensional array. The first dimension is the opcode number and the second one :

//Second dimension of instrTab Array
var instruction = 0;
var args = 1; // obsolete... on longer used
var opcode = 2; //the opcode in hex
var rule = 3; // see below for this one
var disasmMask = 4; // used for disassembly
var maxVal = 5 // maximum possible value of argument;

The rule is the type of PIC command. They fall into four groups:

1. instruction with two arguments, one of them being destination
2. instruction with one argument
3. bit instructions
4. instruction without arguments

Here is how they are assembled:

1. hexWord = instrTab[opcdPntr][opcode] | (arg1 & 0x7F) | (arg2 * 128);
2. hexWord = instrTab[opcdPntr][opcode] | (instrTab[opcdPntr][maxVal] & arg1);
3. hexWord = instrTab[opcdPntr][opcode] | (arg1 & 0x7F) | (arg2 * 128);
4. hexWord = instrTab[opcdPntr][opcode];

Note that 1 and 3 are the same but I have different functions for each, because I different error checking for each. Also I put CLRF and MOVWF to group 1, because of error checking issues. Also masking the arguments with 0x7F is not really necessary if the argument is checked for its size.

During this step, a memory map is built to check for overlapping code and give an error in such a case. Also values of symbols and labels used are retrieved from the labels array to generate argument values. If a label or symbol cannot be found in the labels array, an error is generated.


Netscape JavaScript Debugger