11 September 2009

12. Multiline and If statements

I decided to implement if statements onto boolrel.pb. This way we have boolean operators available. However, in this file we did not yet write support for variables or parentheses. It will be your job to fit all the parts together. Even so, I recommend that you do this exactly as I do to learn it first.

If statements only make sense when there is two different code paths to choose from. This means we must implement support for compiling a whole program consisting of multiple lines. It could be represented by this BNF grammar:
assignment ::= <name> = <expression>
statement ::= <assignment> <newline>
block ::= { <statement> }
program ::= <block>


You can see that we have implemented support for multiple statements (each on a single line) in the "rule" (procedure) block. A statement can be an assignment statement, an if statement, a while statement, a variable declaration, and so on. In the grammar above, the only allowed kind of statement is an assignment.

Let's add another kind of statement, the if statement:

assignment ::= <name> = <expression>
if ::= if <expression> <block> endif
statement ::= ( <ifst> | <assignment> ) <newline>
block ::= { <statement> }
program ::= <block>


An if statement starts with the literal keyword if (this is not the same as the rule <if>). Then comes an expression, which is evaluated to tell us whether to jump around the if or execute the block of code inside it. This is followed by a block (which is simply a list of statements) and ended with the keyword endif. For now, we don't have an else clause.

I want you to pay close attention to the recursive nature of the definition. A block contains a list of statements, some of can be if statements. Each if statement contains a block, which can also consists of if statements. *This means we already have support for nesting ifs!*

Of course, provided we write the code.

If you payed enough attention to the grammar, you would have noticed that it is a bit reduntant. A program is simply defined as a single block. In theory, we could simply delete the rule program and rename the rule block into program.
In practice, this is not so. Because we must handle the so-called "follow tokens" (endif, wend, while, endprocedure) somewhere. I thought block was a good place for this. But at the end of a program there is no follow token, so while block must return on a follow token, program must give an error message ("endif without if"). Thus it is more elegant to use different procedures.
Also, it is just nonsense to say that an if statement consists of the keyword if, followed by a _program_, and then endif. It is more correct to say the if keyword is followed by a block.

First of all we must remove the obnoxious check for end of input from Expression(). This is what it should look like:

Procedure Expression()
; expression ::= <BoolExp>
BoolExp()
EndProcedure


In the assignment statement we fake support for variables, in a real compiler use a procedure similar to Assignment() from the previous chapter. A difference is that we get the name as a parameter. This is because Block() and Statement() need to examine the name first, to see if they need to handle it in a different way. For instance, if the name was "if" or "endif", then this would not be an assignment statement, and Assignment() would never be called.

Procedure Assignment(Name.s)
; assignment ::= <name> = <expression>
MatchWhite('=')
Expression()
; I simply implemented support for variables here
; Normally, you would call VarValue() as shown in the
; previous tutorials, but we don't have VarValue() in
; this program (to not clutter it too much)
Emit("mov [v_" + Name + "], eax")
EndProcedure


For now, we'll just a stub for the DoIf() procedure:

Procedure DoIf()
Emit("; do if")
EndProcedure


Don't be tempted to name this procedure If(). Bonus points if you guess why.

Then the procedure Statement().
Procedure Statement(Name.s)
; statement ::= ( <ifst> | <assignment> ) <newline>
; alternatively,
; statement ::= <ifst> <newline> | <assignment> <newline>
Select Name
Case "if": DoIf()
Default: Assignment(Name)
EndSelect
Match(#LF)
EndProcedure


Statement is simple: Either it's an if statement, or it's an assignment statement. It then calls the relevant procedure to take care of the details.
As you can see, we require a linefeed at the end of each statement. This is a common rule in BASIC-like languages. In C-like languages, a semicolon is used for this purpose. I purposely don't match #CR, because it's too much trouble. Instead we'll make sure that #CR is simply dropped from the input, leaving us with only #LF.

Procedure.s Block()
; block ::= { <statement> }
Protected Name.s
EatWhiteNewlines()
While Look <> 0
Name = GetName()
Select Name
Case "endif"
ProcedureReturn Name
Default
Statement(Name)
EndSelect
EatWhiteNewlines()
Wend
EndProcedure


Block() gets a name and checks if it is a follow token. If it is, then Block() returns the follow token. In this way, a follow token marks the end of a block, which is correct.
If the name is not recognized, Block() passes the name on to Statement(). Block() keeps doing this until the end of the input stream is reached (but remember, it may return before, if it encounters an "endif").
EatWhiteNewlines() is a new procedure that eats all whitespace and newlines. This is to allow for space between the statements in our program.

The rule program should be easy to implement as a procedure:
Procedure Program()
; program ::= <block>
Protected Follow.s = Block()
If Follow <> ""
Expected("Dangling " + Follow)
EndIf
EndProcedure


It is simply a block. Do you remember that Block() was supposed to return a follow token? Program() checks for this, and if Block() returns "endif", "endprocedure" or anything like that, we give an error message.
Since Block() simply returns an empty string if it reaches the end of the file, Program() shows no error in this case. Because after all, the end of the program is when we reach the end of the file, and we shouldn't give an error for that, since it's correct.

There is a huge problem with our program as a whole. Which is why we are still using a dummy for DoIf(). What's the problem? The compiler expects an input of several lines. We can only input one line!

Normally, one would read from a file. You can do that if you want to, just change Init() and GetLook(), and be sure to skip #CR characters. I chose to add support for entering several lines in the console instead, for easier experimentation. Simply add these procedures to base2.pb, and call InitMulti() instead of Init().

Procedure IsNewline(C.c)
If Look = #LF
ProcedureReturn 1
EndIf
EndProcedure

Procedure EatWhiteNewlines()
While IsWhite(Look) Or IsNewline(Look)
GetLook()
Wend
EndProcedure

Procedure InitMulti()
OpenConsole()
Protected TempIn.s
Repeat
TempIn = Input()
Instream + TempIn + #LF$
Until TempIn = ""
PrintN("--------------------------")
GetLook()
EndProcedure


Now you should test that multi-line input works. Call InitMulti() and Program() instead of Init() and Expression():
InitMulti()
Program()
Input()


Your console window should look like this:
a = 5
g = 7
f = 5

--------------------------
mov eax, 5
mov [v_a], eax
mov eax, 7
mov [v_g], eax
mov eax, 5
mov [v_f], eax

As you can see, we terminate the input with an empty line.

Now let's get to the if statement.
if         ::= if <expression> <block> endif


What we need to do is to jump around the block if the expression is false. To jump in assembly, we need to have a label to jump to. Which means we need a way to generate unique labels. I made a separate file for this, labels.pb, which I included at the top of the program.

; This file generates unique labels for use in the program

Procedure.s NewLabel()
Static LabelCounter
LabelCounter + 1
; The cl_ prefix stands for compiler label
; to avoid a name clash with labels used in the language
; (which we can give the prefix l_)
ProcedureReturn "cl_" + Str(LabelCounter)
EndProcedure


Very simple, it just increases a counter and generates labels like cl_1, cl_2, cl_3, cl_10593 and so on.

Here's DoIf(), finally:
Declare.s Block()
Procedure DoIf()
Protected Label.s
Protected FollowToken.s
Label = NewLabel() ; Used when jumping around the block

Expression() ; The conditional expression (do we jump or not?)
Emit("cmp eax, 0") ; Check if the expression is true or false
Emit("jz " + Label) ; if it's false (zero) jump to label
FollowToken = Block() ; else fall through to this block (which is always compiled)

Emit(Label + ":")

If FollowToken <> "endif"
Expected("endif, got " + FollowToken)
EndIf
EndProcedure


We create a new label with our NewLabel() procedure. Then we get an ordinary expression, and compares it to 0 to check if it's true or false. If it's zero (false), we jump to the label.
In either case, we compile a block. The block returns the name that caused it to return, or nothing if we are at the end of the file. If this is an empty string or anything else than endif, the program has an error and compilation must be stopped.
At last, we emit the label, our jump target.

Notice again that DoIf() calls Block(), which may call DoIf() again if required to have nested ifs without any extra work on our part. This is the result of a well thought-out grammar and following the grammar exactly when implementing.

The compiler should now accept inputs like this:
if 6 and 2 < 8
a = 1
if 1
b = 2
endif
endif


Be sure to test if the generated code gives the expected result. When using labels from asm in PureBasic you need to add a ! before the label definition, or add l_ before all references to the label (because PB puts l_ in front of all BASIC labels). Note that you can change the numbers in the generated code to see if works for all inputs, rather than re-entering the whole thing with different numbers.

Now we'll implement else. First we must set Block() to recognize this as a follow token. Simply add ", "else"" (without the quotes) to the case line like this:
    Case "endif", "else"


Then we must, in DoIf(), check if the follow token returned from Block() is "else". If it is, we call a new procedure, DoElse(), and also don't output the label.
Procedure DoIf()
Protected Label.s
Protected FollowToken.s
Label = NewLabel() ; Used when jumping around the block

Expression() ; The conditional expression (do we jump or not?)
Emit("cmp eax, 0") ; Check if the expression is true or false
Emit("jz " + Label) ; if it's false (zero) jump to label
FollowToken = Block() ; else fall through to this block (which is always compiled)

If FollowToken = "else" ; New code
FollowToken = DoElse(Label)
Else
Emit(Label + ":")
EndIf

If FollowToken <> "endif"
Expected("endif, got " + FollowToken)
EndIf
EndProcedure


DoElse() is going to call Block() also, and will return the follow token, which should still be endif.

Procedure DoElse(Label.s)
Protected Label2.s = NewLabel()
Protected Follow.s

Emit("jmp " + Label2)
Emit(Label + ":")
Follow = Block()
Emit(Label2 + ":")

ProcedureReturn Follow
EndProcedure


We need a new label, so that we can jump around the else branch (from the if branch). The first thing we do is to emit this a jump to this label. This jump is actually not in the else branch, but at the end of the if branch. Then we emit the label that is jumped to when the if expression is false (else branch).
After this label, we place the code from a Block(). Finally we emit label2, so that the code from the if branch can jump around the else branch.
And then we return Follow so that it can be checked in DoIf(), because having the check both here and in DoIf() would be unelegant.

Done. With a lot support code and a tiny bit of actual if code, we now have if .. endif and if .. else .. endif. If you merge this with support for variables and parentheses you're on your a way to a complete programming language!


; if3.pb
IncludeFile "..\base2.pb"
IncludeFile "..\labels.pb"

Procedure ToBool(reg.s)
Emit("cmp " + reg + ", 0")
Emit("mov ecx, 0")
Emit("setnz cl")
Emit("mov " + reg + ", ecx")
EndProcedure

Procedure RelOp(Op.s)
Emit("cmp ebp, eax")
Emit("mov eax, 0")
Select Op
Case "=": Emit("sete al") ; set if equal
Case "<": Emit("setl al") ; set if less
Case "<=": Emit("setle al") ; less or equal
Case ">": Emit("setg al") ; set if greater
Case ">=": Emit("setge al")
Case "<>": Emit("setne al") ; set if not equal
Default
Expected("Relational operator, got " + Op)
EndSelect
EndProcedure

Procedure Value()
; value ::= <integer>
Emit("mov eax, " + GetInteger())
EndProcedure

Procedure MulExp()
; mulexp ::= <value> {<mulop> <value>}
Protected Operation.c
Value()
While Look = '*' Or Look = '/' Or Look = '%'
Operation = Look
GetLookWhite()
Emit("push eax")
Value()
Emit("pop ebp")
Select Operation
Case '*': Emit("imul eax, ebp")
Case '/', '%'
Emit("xchg eax, ebp")
Emit("cdq") ; expands eax to edx:eax
Emit("idiv ebp")
EndSelect
If Operation = '%'
Emit("mov eax, edx")
EndIf
Wend
EndProcedure

Procedure AddExp()
; addexp ::= <mulexp> {<addop> <mulexp> }
Protected Operation.c
MulExp()
While Look = '+' Or Look = '-'
Operation = Look
GetLookWhite()
Emit("push eax")
MulExp()
Emit("pop ebp")
Select Operation
Case '+': Emit("add eax, ebp")
Case '-': Emit("sub eax, ebp")
Emit("neg eax")
EndSelect
Wend
EndProcedure

Procedure RelExp()
; relexp ::= <addexp> { <boolop> <addexp> }
Protected Operation.s
AddExp()
While Look = '<' Or Look = '>' Or Look = '='
Operation = Chr(Look)
GetLook()
Select Look
Case '<', '>', '='
Operation + Chr(Look)
GetLook()
EndSelect
EatWhite()
Emit("push eax")
AddExp()
Emit("pop ebp")
RelOp(Operation) ; Generate code
Wend
EndProcedure

Procedure BoolExp()
; boolexp ::= <relexp> { <boolop> <relexp> }
Protected Operation.s
RelExp()
While Look = 'a' Or Look = 'o' Or Look = 'x'
Operation = GetName()
Emit("push eax")
RelExp()
Emit("pop ebp")
ToBool("eax")
ToBool("ebp")
Select Operation
Case "and": Emit("test eax, ebp")
Case "or": Emit("or eax, ebp")
Case "xor": Emit("xor eax, ebp")
Default
Expected("Operator, got " + Operation)
EndSelect
Emit("mov eax, 0")
Emit("setnz al")
Wend
EndProcedure

Procedure Expression()
; expression ::= <BoolExp>
BoolExp()
EndProcedure

Procedure Assignment(Name.s)
; assignment ::= <name> = <expression>
MatchWhite('=')
Expression()
; I simply implemented support for variables here
; Normally, you would call VarValue() as shown in the
; previous tutorials, but we don't have VarValue() in
; this program (to not clutter it too much)
Emit("mov [v_" + Name + "], eax")
EndProcedure

Declare.s Block()

Procedure.s DoElse(Label.s)
Protected Label2.s = NewLabel()
Protected Follow.s

Emit("jmp " + Label2)
Emit(Label + ":")
Follow = Block()
Emit(Label2 + ":")

ProcedureReturn Follow
EndProcedure

Procedure DoIf()
Protected Label.s
Protected FollowToken.s
Label = NewLabel() ; Used when jumping around the block

Expression() ; The conditional expression (do we jump or not?)
Emit("cmp eax, 0") ; Check if the expression is true or false
Emit("jz " + Label) ; if it's false (zero) jump to label
FollowToken = Block() ; else fall through to this block (which is always compiled)

If FollowToken = "else"
FollowToken = DoElse(Label)
Else
Emit(Label + ":")
EndIf

If FollowToken <> "endif"
Expected("endif, got " + FollowToken)
EndIf
EndProcedure

Procedure Statement(Name.s)
; statement ::= ( <ifst> | <assignment> ) <newline>
; alternatively,
; statement ::= <ifst> <newline> | <assignment> <newline>
Select Name
Case "if": DoIf()
Default: Assignment(Name)
EndSelect
Match(#LF)
EndProcedure

Procedure.s Block()
; block ::= { <statement> }
Protected Name.s
EatWhiteNewlines()
While Look <> 0
Name = GetName()
Select Name
Case "endif", "else"
ProcedureReturn Name
Default
Statement(Name)
EndSelect
EatWhiteNewlines()
Wend
EndProcedure

Procedure Program()
; program ::= <block>
EatWhiteNewlines() ; Allow for whitespace before program start
Protected Follow.s = Block()
If Follow <> ""
Expected("Dangling " + Follow)
EndIf
EndProcedure

InitMulti()
Program()
Input()

No comments: