30 August 2009

4. Addition

We can now have an expression that consists of a single number. If you don't remember, our definition of an expression was

expression ::= <digit>

The next step is to add a + operator. We may write a BNF like this:
digit  ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 
expression ::= <digit> + <digit>

Unfortunately, this means that an expression can't be just one digit any more. It must always be a digit plus another digit. So let's use this BNF instead, it allows a digit and optionally, a plus sign and another digit. The [] symbol means whatever is inside the brackets is optional.
digit  ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 
expression ::= <digit> [+ <digit>]


Let's make a copy of the file from the previous chapter (name it add.pb) and extend it to accept addition. To do that, all we need to change is the procedure Expression(), since we only changed that rule in our grammar.
Procedure Expression()
; Here's the BNF to look at while writing the procedure:
; expression ::= <digit> [+ <digit>]
Emit("mov eax, " + GetDigit())

If Look = '+' ; remember, the + is optional
GetLook() ; Throw away the + and get the next character
Emit("add eax, " + GetDigit())
EndIf

If Look <> 0 ; The grammar does not allow anything after the digit
Expected("End of line") ; so if there is anything, put an error
EndIf
EndProcedure

Try it. This program should accept inputs simiar to this:
1
7+3
And refuse all other inputs, like this:
7+3+5
Var+4

You probably don't want to be limited to expression with just two terms, so let's find a way to extend our parser to accept any number of additions in the same expression.
expression ::= <digit> {+ <digit>}
The { and } means whatever is inside is optional and could be repeated any number of times. Thus, legal expressions will now be:
1
1+2
1+2+3+4+5+6+7+8+1+2+6+7
This is exactly what we want.

Before we go overboard and implement this, I'd like to make a few changes to make the program more modular:
value  ::= <digit>
addexp ::= <value> {+ <value>}
expression ::= <addexp>
Later on, when we need other values than digits, like variables, we don't need to change addexp or expression, only value.

Here is the program:
; add2.pb
IncludeFile "..\base.pb"
Init()

Procedure IsDigit(C.c)
If C >= '0' And C <= '9'
ProcedureReturn 1
EndIf
EndProcedure

Procedure.s GetDigit()
If IsDigit(Look)
Temp.s = Chr(Look)
GetLook()
ProcedureReturn Temp
Else
Expected("Digit")
EndIf
EndProcedure

Procedure Value()
; value ::= <digit>
Emit("mov eax, " + GetDigit())
EndProcedure

Procedure AddExp()
; addexp ::= <value> {+ <value>}
Value()
While Look = '+'
GetLook()
Emit("push eax")
Value()
Emit("pop ecx")
Emit("add eax, ecx")
Wend
EndProcedure


Procedure Expression()
; expression ::= <addexp>
AddExp()

If Look <> 0
Expected("End of line")
EndIf
EndProcedure

; Parse an expression
Expression()
Input()

As you can see, Value() always moves the value into eax. Then AddExpr() needs to save that value on the stack so that it doesn't get overwritten by the next call to Value(). So our generated code for the expression "1+2" just got worse by a huge amount. Previously just two instructions, now it's five!
You just have to get used to this. It is possible to generate better code, but it is harder. For beginners, this will do. Remember, it's probably way faster than a normal interpreter would be.
Try the program with a few expressions and see if it generates valid code. That is the most important of all. If the generated code doesn’t work correctly, it doesn't matter it it's really, really fast.

To check that a generated code produces the correct result, you can copy and paste it into purebasic, enable inline assembly, and run it with some code to display the result. Here is an example:

; Show the correct result
; (hopefully PB gets it right!)
Debug 1+2+3+4

; Generated code:
MOV eax, 1
PUSH eax
MOV eax, 2
POP ecx
ADD eax, ecx
PUSH eax
MOV eax, 3
POP ecx
ADD eax, ecx
PUSH eax
MOV eax, 4
POP ecx
ADD eax, ecx

; Get result
Show.l
MOV [v_Show], eax
Debug Show


If you have in front of you a program that produces valid x86 code from the entered expressions: congratulations! If not, then be ashamed, you brickhead! (Or rather, I didn't explain everything properly. Please post a comment to get help.)

No comments: