11 September 2009

13. Loops

The tutorial on if's was quite long, so i'll try to make this a short one. What we'll do is to make some loops. First the while loop.

The relevant parts of the grammar is as following:
...
while ::= while <expression> <block> wend
statement ::= ( <if> | <while> | <assignment> ) <newline>
...


Make a copy of the file with the if statements and call it while.pb.

The first thing to do, is to add the follow token "wend" to the list recognized in Block():

...
Case "endif", "else", "wend"
...


Then modify Statement() to recognize a while statement:

Procedure Statement(Name.s)
; statement ::= ( <if> | <while> | <assignment> ) <newline>
Select Name
Case "if": DoIf()
Case "while": DoWhile()
Default: Assignment(Name)
EndSelect
Match(#LF)
EndProcedure


A while loop is just like an if statement, with a goto start at the end:
while expression
...
wend
equals
ifstart:
if expression
...
jmp ifstart
endif


Thus, DoWhile() will be pretty similar to the version of DoIf() that didn't support else.

Procedure DoWhile()
; while ::= while <expression> <block> wend
Protected L1.s = NewLabel()
Protected L2.s = NewLabel()
Protected Follow.s

Emit(L1 + ":")
Expression()
Emit("cmp eax, 0")
Emit("jz " + L2)
Follow = Block()
Emit("jmp " + L1)
Emit(L2 + ":")

If Follow <> "wend"
Expected("wend, got " + Follow)
EndIf
EndProcedure


While we're at it, let's implement a for loop. Our for loop will have the following syntax:
for ::= for <variable> = <expression> to <expression> <block> next

A for loop works like this:

First the result of expression 1 is assigned to the variable.
Then we do this:
LblStart:
if expression 2 >= variable
... code within for loop goes here
variable = variable + 1
jmp LblStart
endif
LblStop:


The loop part (after the initial assignment) can also be written as a while loop:
while expression 2 >= variable
variable = variable + 1
wend


You need to add "next" to the list of follow tokens in block, and add "for" to statement() to call DoFor(). Then write DoFor():
Procedure DoFor()
; for ::= for <variable> = <expression>(1) to <expression>(2) <block> next
Protected Variable.s
Protected Temp.s
Protected Follow.s
Protected LblStart.s = NewLabel()
Protected LblStop.s = NewLabel()

Variable = GetName()
MatchWhite('=')
Expression() ; (1)
Emit2("mov ", VarValue(Variable), "eax")

Emit(LblStart + ":")

Temp = GetName()
If Temp <> "to"
Expected("To, got " + Temp)
EndIf

Expression() ; (2)
Emit2("cmp ", "eax", VarValue(Variable))
Emit( "jl " + LblStop)

Follow = Block()
If Follow <> "next"
Expected("Next, got " + Follow)
EndIf

Emit("inc " + VarValue(Variable))
Emit("jmp " + LblStart)

Emit(LblStop + ":")
EndProcedure


Read the code thorougly until you understand it. Step through it with the debugger if necessary.

As you can see, we must go through Block() (and Statement()) to open a for loop. Since the for loop calls Block() for its body, and it uses no global or static variables, we already have support for nested for loops, and if's within for's and for's within if's.

Exercise:
1. Write the grammar and PureBasic code to compile a loop like PureBasic's Repeat ... Until .
2. Write the grammar and PureBasic code to compile a loop that looks like a while loop, but the expression should only be evaluated once, and it should be the number of times that the loop loops.
Example of such a loop:
I = 10 (assignment statement is not part of the loop)
Times I
Print "Hello " + Str(I) (this is some dummy code for the Block())
Loop

The loop would run 10 times, each time printing "Hello 10". (I should not be changed during the loop.) Of course, you only implement the loop, not the print function.
Remember to allow nesting of loops. Hint: The loop counter may be stored on the stack.

Here is the listing (I added variables and parentheses as described in another chapter):


; for.pb
IncludeFile "..\base2.pb"
IncludeFile "..\labels.pb"

Procedure ToBool(reg.s)
Emit("cmp " + reg + ", 0")
Emit("mov ecx, 0")
Emit("setnz cl")
Emit("mov " + reg + ", ecx")
EndProcedure

Procedure RelOp(Op.s)
Emit("cmp ebp, eax")
Emit("mov eax, 0")
Select Op
Case "=": Emit("sete al") ; set if equal
Case "<": Emit("setl al") ; set if less
Case "<=": Emit("setle al") ; less or equal
Case ">": Emit("setg al") ; set if greater
Case ">=": Emit("setge al")
Case "<>": Emit("setne al") ; set if not equal
Default
Expected("Relational operator, got " + Op)
EndSelect
EndProcedure

Procedure.s VarValue(Var.s)
ProcedureReturn "[v_" + Var + "]"
EndProcedure

Declare Expression()
Procedure Value()
; value ::= <integer> | <variable> | '(' <expression> ')'
If IsDigit(Look)
Emit("mov eax, " + GetInteger())
ElseIf IsAlpha(Look)
Emit("mov eax, " + VarValue(GetName()))
ElseIf Look = '('
MatchWhite('(')
Expression()
MatchWhite(')')
Else
Expected("value")
EndIf
EndProcedure

Procedure MulExp()
; mulexp ::= <value> {<mulop> <value>}
Protected Operation.c
Value()
While Look = '*' Or Look = '/' Or Look = '%'
Operation = Look
GetLookWhite()
Emit("push eax")
Value()
Emit("pop ebp")
Select Operation
Case '*': Emit("imul eax, ebp")
Case '/', '%'
Emit("xchg eax, ebp")
Emit("cdq") ; expands eax to edx:eax
Emit("idiv ebp")
EndSelect
If Operation = '%'
Emit("mov eax, edx")
EndIf
Wend
EndProcedure

Procedure AddExp()
; addexp ::= <mulexp> {<addop> <mulexp> }
Protected Operation.c
MulExp()
While Look = '+' Or Look = '-'
Operation = Look
GetLookWhite()
Emit("push eax")
MulExp()
Emit("pop ebp")
Select Operation
Case '+': Emit("add eax, ebp")
Case '-': Emit("sub eax, ebp")
Emit("neg eax")
EndSelect
Wend
EndProcedure

Procedure RelExp()
; relexp ::= <addexp> { <boolop> <addexp> }
Protected Operation.s
AddExp()
While Look = '<' Or Look = '>' Or Look = '='
Operation = Chr(Look)
GetLook()
Select Look
Case '<', '>', '='
Operation + Chr(Look)
GetLook()
EndSelect
EatWhite()
Emit("push eax")
AddExp()
Emit("pop ebp")
RelOp(Operation) ; Generate code
Wend
EndProcedure

Procedure BoolExp()
; boolexp ::= <relexp> { <boolop> <relexp> }
Protected Operation.s
RelExp()
While Look = 'a' Or Look = 'o' Or Look = 'x'
Operation = GetName()
Emit("push eax")
RelExp()
Emit("pop ebp")
ToBool("eax")
ToBool("ebp")
Select Operation
Case "and": Emit("test eax, ebp")
Case "or": Emit("or eax, ebp")
Case "xor": Emit("xor eax, ebp")
Default
Expected("Operator, got " + Operation)
EndSelect
Emit("mov eax, 0")
Emit("setnz al")
Wend
EndProcedure

Procedure Expression()
; expression ::= <BoolExp>
BoolExp()
EndProcedure

Procedure Assignment(Name.s)
; assignment ::= <name> = <expression>
MatchWhite('=')
Expression()
Emit2("mov ", VarValue(Name), "eax")
EndProcedure

Declare.s Block()

Procedure.s DoElse(Label.s)
Protected Label2.s = NewLabel()
Protected Follow.s

Emit("jmp " + Label2)
Emit(Label + ":")
Follow = Block()
Emit(Label2 + ":")

ProcedureReturn Follow
EndProcedure

Procedure DoIf()
Protected Label.s
Protected FollowToken.s
Label = NewLabel() ; Used when jumping around the block

Expression() ; The conditional expression (do we jump or not?)
Emit("cmp eax, 0") ; Check if the expression is true or false
Emit("jz " + Label) ; if it's false (zero) jump to label
FollowToken = Block() ; else fall through to this block (which is always compiled)

If FollowToken = "else"
FollowToken = DoElse(Label)
Else
Emit(Label + ":")
EndIf

If FollowToken <> "endif"
Expected("endif, got " + FollowToken)
EndIf
EndProcedure

Procedure DoWhile()
; while ::= while <expression> <block> wend
Protected L1.s = NewLabel()
Protected L2.s = NewLabel()
Protected Follow.s

Emit(L1 + ":")
Expression()
Emit("cmp eax, 0")
Emit("jz " + L2)
Follow = Block()
Emit("jmp " + L1)
Emit(L2 + ":")

If Follow <> "wend"
Expected("wend, got " + Follow)
EndIf
EndProcedure

Procedure DoFor()
; for ::= for <variable> = <expression>(1) to <expression>(2) <block> next
Protected Variable.s
Protected Temp.s
Protected Follow.s
Protected LblStart.s = NewLabel()
Protected LblStop.s = NewLabel()

Variable = GetName()
MatchWhite('=')
Expression() ; (1)
Emit2("mov ", VarValue(Variable), "eax")

Emit(LblStart + ":")

Temp = GetName()
If Temp <> "to"
Expected("To, got " + Temp)
EndIf

Expression() ; (2)
Emit2("cmp ", "eax", VarValue(Variable))
Emit( "jl " + LblStop)

Follow = Block()
If Follow <> "next"
Expected("Next, got " + Follow)
EndIf

Emit("inc " + VarValue(Variable))
Emit("jmp " + LblStart)

Emit(LblStop + ":")
EndProcedure

Procedure Statement(Name.s)
; statement ::= ( <if> | <while> | <assignment> ) <newline>
Select Name
Case "if": DoIf()
Case "while": DoWhile()
Case "for": DoFor()
Default: Assignment(Name)
EndSelect
Match(#LF)
EndProcedure

Procedure.s Block()
; block ::= { <statement> }
Protected Name.s
EatWhiteNewlines()
While Look <> 0
Name = GetName()
Select Name
Case "endif", "else", "wend", "next"
ProcedureReturn Name
Default
Statement(Name)
EndSelect
EatWhiteNewlines()
Wend
EndProcedure

Procedure Program()
; program ::= <block>
EatWhiteNewlines() ; Allow for whitespace before program start
Protected Follow.s = Block()
If Follow <> ""
Expected("Dangling " + Follow)
EndIf
EndProcedure

InitMulti()
Program()
Input()

12. Multiline and If statements

I decided to implement if statements onto boolrel.pb. This way we have boolean operators available. However, in this file we did not yet write support for variables or parentheses. It will be your job to fit all the parts together. Even so, I recommend that you do this exactly as I do to learn it first.

If statements only make sense when there is two different code paths to choose from. This means we must implement support for compiling a whole program consisting of multiple lines. It could be represented by this BNF grammar:
assignment ::= <name> = <expression>
statement ::= <assignment> <newline>
block ::= { <statement> }
program ::= <block>


You can see that we have implemented support for multiple statements (each on a single line) in the "rule" (procedure) block. A statement can be an assignment statement, an if statement, a while statement, a variable declaration, and so on. In the grammar above, the only allowed kind of statement is an assignment.

Let's add another kind of statement, the if statement:

assignment ::= <name> = <expression>
if ::= if <expression> <block> endif
statement ::= ( <ifst> | <assignment> ) <newline>
block ::= { <statement> }
program ::= <block>


An if statement starts with the literal keyword if (this is not the same as the rule <if>). Then comes an expression, which is evaluated to tell us whether to jump around the if or execute the block of code inside it. This is followed by a block (which is simply a list of statements) and ended with the keyword endif. For now, we don't have an else clause.

I want you to pay close attention to the recursive nature of the definition. A block contains a list of statements, some of can be if statements. Each if statement contains a block, which can also consists of if statements. *This means we already have support for nesting ifs!*

Of course, provided we write the code.

If you payed enough attention to the grammar, you would have noticed that it is a bit reduntant. A program is simply defined as a single block. In theory, we could simply delete the rule program and rename the rule block into program.
In practice, this is not so. Because we must handle the so-called "follow tokens" (endif, wend, while, endprocedure) somewhere. I thought block was a good place for this. But at the end of a program there is no follow token, so while block must return on a follow token, program must give an error message ("endif without if"). Thus it is more elegant to use different procedures.
Also, it is just nonsense to say that an if statement consists of the keyword if, followed by a _program_, and then endif. It is more correct to say the if keyword is followed by a block.

First of all we must remove the obnoxious check for end of input from Expression(). This is what it should look like:

Procedure Expression()
; expression ::= <BoolExp>
BoolExp()
EndProcedure


In the assignment statement we fake support for variables, in a real compiler use a procedure similar to Assignment() from the previous chapter. A difference is that we get the name as a parameter. This is because Block() and Statement() need to examine the name first, to see if they need to handle it in a different way. For instance, if the name was "if" or "endif", then this would not be an assignment statement, and Assignment() would never be called.

Procedure Assignment(Name.s)
; assignment ::= <name> = <expression>
MatchWhite('=')
Expression()
; I simply implemented support for variables here
; Normally, you would call VarValue() as shown in the
; previous tutorials, but we don't have VarValue() in
; this program (to not clutter it too much)
Emit("mov [v_" + Name + "], eax")
EndProcedure


For now, we'll just a stub for the DoIf() procedure:

Procedure DoIf()
Emit("; do if")
EndProcedure


Don't be tempted to name this procedure If(). Bonus points if you guess why.

Then the procedure Statement().
Procedure Statement(Name.s)
; statement ::= ( <ifst> | <assignment> ) <newline>
; alternatively,
; statement ::= <ifst> <newline> | <assignment> <newline>
Select Name
Case "if": DoIf()
Default: Assignment(Name)
EndSelect
Match(#LF)
EndProcedure


Statement is simple: Either it's an if statement, or it's an assignment statement. It then calls the relevant procedure to take care of the details.
As you can see, we require a linefeed at the end of each statement. This is a common rule in BASIC-like languages. In C-like languages, a semicolon is used for this purpose. I purposely don't match #CR, because it's too much trouble. Instead we'll make sure that #CR is simply dropped from the input, leaving us with only #LF.

Procedure.s Block()
; block ::= { <statement> }
Protected Name.s
EatWhiteNewlines()
While Look <> 0
Name = GetName()
Select Name
Case "endif"
ProcedureReturn Name
Default
Statement(Name)
EndSelect
EatWhiteNewlines()
Wend
EndProcedure


Block() gets a name and checks if it is a follow token. If it is, then Block() returns the follow token. In this way, a follow token marks the end of a block, which is correct.
If the name is not recognized, Block() passes the name on to Statement(). Block() keeps doing this until the end of the input stream is reached (but remember, it may return before, if it encounters an "endif").
EatWhiteNewlines() is a new procedure that eats all whitespace and newlines. This is to allow for space between the statements in our program.

The rule program should be easy to implement as a procedure:
Procedure Program()
; program ::= <block>
Protected Follow.s = Block()
If Follow <> ""
Expected("Dangling " + Follow)
EndIf
EndProcedure


It is simply a block. Do you remember that Block() was supposed to return a follow token? Program() checks for this, and if Block() returns "endif", "endprocedure" or anything like that, we give an error message.
Since Block() simply returns an empty string if it reaches the end of the file, Program() shows no error in this case. Because after all, the end of the program is when we reach the end of the file, and we shouldn't give an error for that, since it's correct.

There is a huge problem with our program as a whole. Which is why we are still using a dummy for DoIf(). What's the problem? The compiler expects an input of several lines. We can only input one line!

Normally, one would read from a file. You can do that if you want to, just change Init() and GetLook(), and be sure to skip #CR characters. I chose to add support for entering several lines in the console instead, for easier experimentation. Simply add these procedures to base2.pb, and call InitMulti() instead of Init().

Procedure IsNewline(C.c)
If Look = #LF
ProcedureReturn 1
EndIf
EndProcedure

Procedure EatWhiteNewlines()
While IsWhite(Look) Or IsNewline(Look)
GetLook()
Wend
EndProcedure

Procedure InitMulti()
OpenConsole()
Protected TempIn.s
Repeat
TempIn = Input()
Instream + TempIn + #LF$
Until TempIn = ""
PrintN("--------------------------")
GetLook()
EndProcedure


Now you should test that multi-line input works. Call InitMulti() and Program() instead of Init() and Expression():
InitMulti()
Program()
Input()


Your console window should look like this:
a = 5
g = 7
f = 5

--------------------------
mov eax, 5
mov [v_a], eax
mov eax, 7
mov [v_g], eax
mov eax, 5
mov [v_f], eax

As you can see, we terminate the input with an empty line.

Now let's get to the if statement.
if         ::= if <expression> <block> endif


What we need to do is to jump around the block if the expression is false. To jump in assembly, we need to have a label to jump to. Which means we need a way to generate unique labels. I made a separate file for this, labels.pb, which I included at the top of the program.

; This file generates unique labels for use in the program

Procedure.s NewLabel()
Static LabelCounter
LabelCounter + 1
; The cl_ prefix stands for compiler label
; to avoid a name clash with labels used in the language
; (which we can give the prefix l_)
ProcedureReturn "cl_" + Str(LabelCounter)
EndProcedure


Very simple, it just increases a counter and generates labels like cl_1, cl_2, cl_3, cl_10593 and so on.

Here's DoIf(), finally:
Declare.s Block()
Procedure DoIf()
Protected Label.s
Protected FollowToken.s
Label = NewLabel() ; Used when jumping around the block

Expression() ; The conditional expression (do we jump or not?)
Emit("cmp eax, 0") ; Check if the expression is true or false
Emit("jz " + Label) ; if it's false (zero) jump to label
FollowToken = Block() ; else fall through to this block (which is always compiled)

Emit(Label + ":")

If FollowToken <> "endif"
Expected("endif, got " + FollowToken)
EndIf
EndProcedure


We create a new label with our NewLabel() procedure. Then we get an ordinary expression, and compares it to 0 to check if it's true or false. If it's zero (false), we jump to the label.
In either case, we compile a block. The block returns the name that caused it to return, or nothing if we are at the end of the file. If this is an empty string or anything else than endif, the program has an error and compilation must be stopped.
At last, we emit the label, our jump target.

Notice again that DoIf() calls Block(), which may call DoIf() again if required to have nested ifs without any extra work on our part. This is the result of a well thought-out grammar and following the grammar exactly when implementing.

The compiler should now accept inputs like this:
if 6 and 2 < 8
a = 1
if 1
b = 2
endif
endif


Be sure to test if the generated code gives the expected result. When using labels from asm in PureBasic you need to add a ! before the label definition, or add l_ before all references to the label (because PB puts l_ in front of all BASIC labels). Note that you can change the numbers in the generated code to see if works for all inputs, rather than re-entering the whole thing with different numbers.

Now we'll implement else. First we must set Block() to recognize this as a follow token. Simply add ", "else"" (without the quotes) to the case line like this:
    Case "endif", "else"


Then we must, in DoIf(), check if the follow token returned from Block() is "else". If it is, we call a new procedure, DoElse(), and also don't output the label.
Procedure DoIf()
Protected Label.s
Protected FollowToken.s
Label = NewLabel() ; Used when jumping around the block

Expression() ; The conditional expression (do we jump or not?)
Emit("cmp eax, 0") ; Check if the expression is true or false
Emit("jz " + Label) ; if it's false (zero) jump to label
FollowToken = Block() ; else fall through to this block (which is always compiled)

If FollowToken = "else" ; New code
FollowToken = DoElse(Label)
Else
Emit(Label + ":")
EndIf

If FollowToken <> "endif"
Expected("endif, got " + FollowToken)
EndIf
EndProcedure


DoElse() is going to call Block() also, and will return the follow token, which should still be endif.

Procedure DoElse(Label.s)
Protected Label2.s = NewLabel()
Protected Follow.s

Emit("jmp " + Label2)
Emit(Label + ":")
Follow = Block()
Emit(Label2 + ":")

ProcedureReturn Follow
EndProcedure


We need a new label, so that we can jump around the else branch (from the if branch). The first thing we do is to emit this a jump to this label. This jump is actually not in the else branch, but at the end of the if branch. Then we emit the label that is jumped to when the if expression is false (else branch).
After this label, we place the code from a Block(). Finally we emit label2, so that the code from the if branch can jump around the else branch.
And then we return Follow so that it can be checked in DoIf(), because having the check both here and in DoIf() would be unelegant.

Done. With a lot support code and a tiny bit of actual if code, we now have if .. endif and if .. else .. endif. If you merge this with support for variables and parentheses you're on your a way to a complete programming language!


; if3.pb
IncludeFile "..\base2.pb"
IncludeFile "..\labels.pb"

Procedure ToBool(reg.s)
Emit("cmp " + reg + ", 0")
Emit("mov ecx, 0")
Emit("setnz cl")
Emit("mov " + reg + ", ecx")
EndProcedure

Procedure RelOp(Op.s)
Emit("cmp ebp, eax")
Emit("mov eax, 0")
Select Op
Case "=": Emit("sete al") ; set if equal
Case "<": Emit("setl al") ; set if less
Case "<=": Emit("setle al") ; less or equal
Case ">": Emit("setg al") ; set if greater
Case ">=": Emit("setge al")
Case "<>": Emit("setne al") ; set if not equal
Default
Expected("Relational operator, got " + Op)
EndSelect
EndProcedure

Procedure Value()
; value ::= <integer>
Emit("mov eax, " + GetInteger())
EndProcedure

Procedure MulExp()
; mulexp ::= <value> {<mulop> <value>}
Protected Operation.c
Value()
While Look = '*' Or Look = '/' Or Look = '%'
Operation = Look
GetLookWhite()
Emit("push eax")
Value()
Emit("pop ebp")
Select Operation
Case '*': Emit("imul eax, ebp")
Case '/', '%'
Emit("xchg eax, ebp")
Emit("cdq") ; expands eax to edx:eax
Emit("idiv ebp")
EndSelect
If Operation = '%'
Emit("mov eax, edx")
EndIf
Wend
EndProcedure

Procedure AddExp()
; addexp ::= <mulexp> {<addop> <mulexp> }
Protected Operation.c
MulExp()
While Look = '+' Or Look = '-'
Operation = Look
GetLookWhite()
Emit("push eax")
MulExp()
Emit("pop ebp")
Select Operation
Case '+': Emit("add eax, ebp")
Case '-': Emit("sub eax, ebp")
Emit("neg eax")
EndSelect
Wend
EndProcedure

Procedure RelExp()
; relexp ::= <addexp> { <boolop> <addexp> }
Protected Operation.s
AddExp()
While Look = '<' Or Look = '>' Or Look = '='
Operation = Chr(Look)
GetLook()
Select Look
Case '<', '>', '='
Operation + Chr(Look)
GetLook()
EndSelect
EatWhite()
Emit("push eax")
AddExp()
Emit("pop ebp")
RelOp(Operation) ; Generate code
Wend
EndProcedure

Procedure BoolExp()
; boolexp ::= <relexp> { <boolop> <relexp> }
Protected Operation.s
RelExp()
While Look = 'a' Or Look = 'o' Or Look = 'x'
Operation = GetName()
Emit("push eax")
RelExp()
Emit("pop ebp")
ToBool("eax")
ToBool("ebp")
Select Operation
Case "and": Emit("test eax, ebp")
Case "or": Emit("or eax, ebp")
Case "xor": Emit("xor eax, ebp")
Default
Expected("Operator, got " + Operation)
EndSelect
Emit("mov eax, 0")
Emit("setnz al")
Wend
EndProcedure

Procedure Expression()
; expression ::= <BoolExp>
BoolExp()
EndProcedure

Procedure Assignment(Name.s)
; assignment ::= <name> = <expression>
MatchWhite('=')
Expression()
; I simply implemented support for variables here
; Normally, you would call VarValue() as shown in the
; previous tutorials, but we don't have VarValue() in
; this program (to not clutter it too much)
Emit("mov [v_" + Name + "], eax")
EndProcedure

Declare.s Block()

Procedure.s DoElse(Label.s)
Protected Label2.s = NewLabel()
Protected Follow.s

Emit("jmp " + Label2)
Emit(Label + ":")
Follow = Block()
Emit(Label2 + ":")

ProcedureReturn Follow
EndProcedure

Procedure DoIf()
Protected Label.s
Protected FollowToken.s
Label = NewLabel() ; Used when jumping around the block

Expression() ; The conditional expression (do we jump or not?)
Emit("cmp eax, 0") ; Check if the expression is true or false
Emit("jz " + Label) ; if it's false (zero) jump to label
FollowToken = Block() ; else fall through to this block (which is always compiled)

If FollowToken = "else"
FollowToken = DoElse(Label)
Else
Emit(Label + ":")
EndIf

If FollowToken <> "endif"
Expected("endif, got " + FollowToken)
EndIf
EndProcedure

Procedure Statement(Name.s)
; statement ::= ( <ifst> | <assignment> ) <newline>
; alternatively,
; statement ::= <ifst> <newline> | <assignment> <newline>
Select Name
Case "if": DoIf()
Default: Assignment(Name)
EndSelect
Match(#LF)
EndProcedure

Procedure.s Block()
; block ::= { <statement> }
Protected Name.s
EatWhiteNewlines()
While Look <> 0
Name = GetName()
Select Name
Case "endif", "else"
ProcedureReturn Name
Default
Statement(Name)
EndSelect
EatWhiteNewlines()
Wend
EndProcedure

Procedure Program()
; program ::= <block>
EatWhiteNewlines() ; Allow for whitespace before program start
Protected Follow.s = Block()
If Follow <> ""
Expected("Dangling " + Follow)
EndIf
EndProcedure

InitMulti()
Program()
Input()

06 September 2009

11. Optimization: Register stack (and assignments)

Today we will do the most fun part of all programming: premature optimization. Remember that the current code generation is pretty lousy and riddled with tons of pushes and pops? We'll get rid of them now.

Instead of using the stack we will use more registers. When we would normally push a value onto the stack to get space for a new value, we will now simply take a new register for the new value.

I made a new register manager (registers.pb) that allows us to easily get unused registers and dispose them when done with them. It's really simple.

#RegMax = 7 ; Highest valid register ID

Structure SRegister
Name.s
Used.i
EndStructure

Global Dim Registers.SRegister(#RegMax)
Registers(0)\Name = "Error: Null register"
Registers(1)\Name = "eax"
Registers(2)\Name = "edx"
Registers(3)\Name = "ecx"
Registers(4)\Name = "ebx"
Registers(5)\Name = "edi"
Registers(6)\Name = "ebp"
Registers(7)\Name = "esi"

Procedure RegGet()
Protected I
For I = 1 To #RegMax
If Registers(I)\Used = 0
Registers(I)\Used = 1
ProcedureReturn @Registers(I)
EndIf
Next
Error("Too complex expression (out of cpu registers)")
EndProcedure

Procedure.s RegStr(*Reg.SRegister)
ProcedureReturn *Reg\Name
EndProcedure

Procedure RegDispose(*Reg.SRegister)
*Reg\Used = 0
EndProcedure

Procedure.s RegUse(*Reg.SRegister)
RegDispose(*Reg)
ProcedureReturn RegStr(*Reg)
EndProcedure

This code is rather simple. RegGet() finds an unused register and returns a pointer to the corresponding item in the array. RegStr() converts the pointer into a string. RegDispose() sets the state of the register to unused again. RegUse() returns the corresponding string and sets the register as unused.

We will now implement this on var.pb. Copy var.pb into regstack.pb. But before we go any further, we need a macro to simplify code generation a bit. Put this into base2.pb:

Macro Emit2(Instruction, Op1, Op2)
Emit(Instruction + Op1 +", " + Op2)
EndMacro

It works just like Emit(), except it takes separate parameters.

Now on to the real deal. AddExp() should now be able to use two arbitrary registers instead of eax and ebp. But how can it know _which_ registers? A good idea here is to let each procedure return which registers it left its result in. So if the result of MulExp() is in eax, it should return a pointer that, when passed to RegStr(), gives the string eax.

Here is the new AddExp():
Procedure AddExp()
; addexp ::= {+ }
Protected *R1, *R2
*R1 = MulExp()
While Look = '+'
GetLookWhite()
*R2 = MulExp()
Emit2("add ", RegStr(*R1), RegUse(*R2))
Wend
ProcedureReturn *R1
EndProcedure

As you can see, the first register is the result of the first call to MulExp(). If Look is not +, then that register also contains the result of AddExp(). If Look IS +, then we take the other operand from MulExp(), perform an add operation (where we discard the register *R2, since we have the result in *R1), and the result is still in *R1. So we return *R1 at the end of the procedure.

AddExp() never allocates any registers, because it gets them from MulExp(). Which is nice and easy on AddExp(), but where does MulExp() get the registers from? Look here:
Procedure MulExp()
; mulexp ::= {* }
Protected *R1, *R2
*R1 = Value()
While Look = '*'
GetLookWhite()
*R2 = Value()
Emit2("imul ", RegStr(*R1), RegUse(*R2))
Wend
ProcedureReturn *R1
EndProcedure

MulExp() gets its registers from Value(). Which is the procedure that needs the biggest changes:
Procedure Value()
; value ::= | | '(' ')'
Protected *R1
If IsDigit(Look)
*R1 = RegGet()
Emit2("mov ", RegStr(*R1), GetInteger())
ElseIf IsAlpha(Look)
*R1 = RegGet()
Emit2("mov ", RegStr(*R1), VarValue(GetName()))
ElseIf Look = '('
MatchWhite('(')
*R1 = Expression()
MatchWhite(')')
EndIf
ProcedureReturn *R1
EndProcedure

As you can see, Value() allocates a new register to put its value into, except when the value is the result of an expression, which must of course already be in an allocated register. Speaking of which, Expression() must be changed to return which register the result is in:

Procedure Expression()
; expression ::= <AddExp>
ProcedureReturn AddExp()
EndProcedure


This may be a bit confusing. Expression() gets its register from AddExp(), which gets its registers from MulExp(), which gets its register from Value(), which, in turn, gets its register from Expression()! This seems like it may run for ever. But Value() will only get its result from Expression() when it sees an opening parenthesis, so unless the source contains an unlimited amount of opening parenthesis (or we forget to throw them away with MatchWhite()) then our program will not run for ever.

Since we are on a roll now, why not implement an assignment statement to round it off nicely? Here is the syntax:

; assignment  ::= <name> = <expression>


That should be simple:
Procedure Assignment()
; assignment ::= <name> = <expression>
Protected *R
Protected Name.s
Name = GetName()
MatchWhite('=')
*R = Expression()
Emit2("mov ", VarValue(Name), RegUse(*R))
EndProcedure


I want to point out that for an expression like (543+54)*(28+48), we have reduced the number of instructions from 13 to 7. Additionally, using only registers are faster than using the stack, because the stack means memory accesses (slow).

Here is the listing (note that it accepts assignment statements only):
; regstack.pb
IncludeFile "..\base2.pb"
IncludeFile "..\registers.pb"

Procedure.s VarValue(Var.s)
ProcedureReturn "[v_" + Var + "]"
EndProcedure

Declare Expression()
Procedure Value()
; value ::= | | '(' ')'
Protected *R1
If IsDigit(Look)
*R1 = RegGet()
Emit2("mov ", RegStr(*R1), GetInteger())
ElseIf IsAlpha(Look)
*R1 = RegGet()
Emit2("mov ", RegStr(*R1), VarValue(GetName()))
ElseIf Look = '('
MatchWhite('(')
*R1 = Expression()
MatchWhite(')')
EndIf
ProcedureReturn *R1
EndProcedure

Procedure MulExp()
; mulexp ::= {* }
Protected *R1, *R2
*R1 = Value()
While Look = '*'
GetLookWhite()
*R2 = Value()
Emit2("imul ", RegStr(*R1), RegUse(*R2))
Wend
ProcedureReturn *R1
EndProcedure

Procedure AddExp()
; addexp ::= {+ }
Protected *R1, *R2
*R1 = MulExp()
While Look = '+'
GetLookWhite()
*R2 = MulExp()
Emit2("add ", RegStr(*R1), RegUse(*R2))
Wend
ProcedureReturn *R1
EndProcedure

Procedure Expression()
; expression ::=
ProcedureReturn AddExp()
EndProcedure

Procedure Assignment()
; assignment ::= =
Protected *R
Protected Name.s
Name = GetName()
MatchWhite('=')
*R = Expression()
Emit2("mov ", VarValue(Name), RegUse(*R))
EndProcedure

Init()
Assignment()
If Look <> 0
Expected("End of line")
EndIf
Input()

04 September 2009

10. Variables

Copy paren.pb to var.pb. We will now implement support for variables in the expression parser. Here is the old grammar:
digit      ::= 0..9
integer    ::= <digit> {<digit>}
value      ::= <integer> | '(' <expression> ')'
mulop      ::= * | /
addop      ::= + | -
mulexp     ::= <value>  { <mulop> <value> }
addexp     ::= <mulexp> { <addop> <mulexp> }
expression ::= <addexp>
A variable can represent a value, right? In fact, we don't actually want the variable itself, we want the value stored in that variable.

So we can add a bit to our grammar:
name       ::= <alpha> { <alpha> | <digit> }
variable   ::= <name>
value      ::= <integer> | <variable> | '(' <expression> ')'

Adding support for this in Value() should be easy:
Procedure Value()
  ; value  ::= <integer> | <variable> | '(' <expression> ')'
  If IsDigit(Look)
    Emit("mov  eax, " + GetInteger())
  ElseIf IsAlpha(Look)
    Emit("mov  eax, " + VarValue(GetName()))
  ElseIf Look = '('
    MatchWhite('(')
    Expression()
    MatchWhite(')')
  EndIf
EndProcedure

We can see an extra procedure here: VarValue(). What this does is to generate asm code to get the value of a variable by using a memory reference (the square brackets around the name). I have chosen to prefix all global variable names with v_ to avoid name collisions with internal labels (which we need when generating if's and so on).

Procedure.s VarValue(Var.s)
  ProcedureReturn "[v_" + Var + "]"
EndProcedure

Done. This isn't getting any harder, is it? Of course we need to register all variables in a symbol table and actually allocate space for them, but that's easy anyways. Maybe I'll show it later.

02 September 2009

9. Parentheses

From now on I will not build on the same source file every time. Because it becomes so long and tedious to manage. Instead I will show you the concepts on a small scale, and you can then implement it yourself in a bigger project if you want to.

In this chapter, we will build on mul.pb, so make a copy of it (name it "parens.pb").
To have the same source as I do, search and replace ecx into ebp.

Here is the old grammar:
digit      ::= 0..9
integer ::= <digit> {<digit>}
value ::= <integer>
mulop ::= * | /
addop ::= + | -
mulexp ::= <value> { <mulop> <value> }
addexp ::= <mulexp> { <addop> <mulexp> }
expression ::= <addexp>

What does value() do? It puts a value in eax.
What does expression() do? It puts a value in eax.
What is inside a parenthesis? Is it not just another ordinary expression? Yes.

So, why not do it like this?
value      ::= <integer> | '(' <expression> ')'

That's what we are going to do. Change Value() into this:

Declare Expression()
Procedure Value()
; value ::= <integer> | '(' <expression> ')'
If IsDigit(Look)
Emit("mov eax, " + GetInteger())
ElseIf Look = '('
MatchWhite('(')
Expression()
MatchWhite(')')
EndIf
EndProcedure


We had to forward declare Expression() to avoid an error message (since it's defined later in the source code). Also you can see two calls to a new function here: MatchWhite(). What it does is to verify that Look is a certain character, discard Look, and then eat any following whitespace. Here's the implementation, put it in base2.pb:
Procedure Match(C.c)
If Look <> C
Expected(Chr(C))
EndIf
GetLook()
EndProcedure

Procedure MatchWhite(C.c)
Match(C)
EatWhite()
EndProcedure


Now that was simple, right? Unfortunately it's not quite right just yet. Currently Expression() gives an error message if it does not reach the end of the line. We must change that. Here is the entire code, with the end of line check moved out of Expression():


; paren.pb
IncludeFile "..\base2.pb"

; Procedure.s VarRef(Var.s)
; ProcedureReturn "[v_" + Var + "]"
; EndProcedure

Declare Expression()
Procedure Value()
; value ::= <integer> | '(' <expression> ')'
If IsDigit(Look)
Emit("mov eax, " + GetInteger())
ElseIf Look = '('
MatchWhite('(')
Expression()
MatchWhite(')')
EndIf
EndProcedure

Procedure MulExp()
; mulexp ::= <value> {* <value>}
Value()
While Look = '*'
GetLookWhite()
Emit("push eax")
Value()
Emit("pop ebp")
Emit("imul eax, ebp")
Wend
EndProcedure

Procedure AddExp()
; addexp ::= <mulexp> {+ <mulexp> }
MulExp()
While Look = '+'
GetLookWhite()
Emit("push eax")
MulExp()
Emit("pop ebp")
Emit("add eax, ebp")
Wend
EndProcedure

Procedure Expression()
; expression ::= <AddExp>
AddExp()
EndProcedure

Init()
Expression()
If Look <> 0
Expected("End of line")
EndIf
Input()

01 September 2009

8. Boolean and Relational operators

Boolean and relational operators can be treated in a variety of ways. We will treat them in a way that is simple, easy and correct, but inefficient. Each operator will return the integer 1 for true and 0 for false. The language will consider 0 to be false and any other value (not just 1) to be true.

This is the absolutely easiest way of doing it, and also allows us to use these operators in freely in any expression, even for assignments: bigger = a > 0.

The first thing we need to do when adding new operators is to decide their precedence. I have decided to give boolean operators the lowest precedence (evaluated last), followed by relational operators, followed by mulops, and then addops. Because then we don't have to use parentheses when writing an expression like this:
If   a+1  >  6*b   And  5-c
If ((a+1) > (6*b)) And (5-c)

As you can see, the operators with the lowest precedence are evaluated last, because they are in the outermost implicit parentheses.
Speaking of which, we have forgotten to implement parentheses! I'll save that for the next installment.

Which operators do we need? We need And, Or, Xor, <, >, <=, >= and <>. Also, you may be thinking of Not. Not is a unary operator (takes only one argument), so it must be treated separately.

We will start with And. What it needs to do, is to look at both operands, and if both are true (different from 0), it should place 1 in eax. Else it should place 0 in eax.

It turns out ebp wasn't well suited, because we don't have separate access to the low 8-bit part of the register. I'll continue with ebp, but the code to convert a value to to a boolean can be made slightly faster if you use ecx instead. Since we do this for learning, we should not care about efficiency.

Here is the new grammar:
digit      ::= 0..9
integer ::= <digit> { <digit> }
value ::= <integer>
mulop ::= * | /
addop ::= + | -
relop ::= < | > | <= | >= | <>
boolop ::= And | Or | Xor
mulexp ::= <value> {<mulop> <value>}
addexp ::= <mulexp> {<addop> <mulexp> }
relexp ::= <addexp> { <relop> <addexp> }
boolexp ::= <relexp> {<boolop> <relexp> }
expression ::= <BoolExp>

We will use the following code in several places, so I placed it in a separate procedure:
Procedure ToBool(reg.s)
Emit("cmp " + reg + ", 0")
Emit("mov ecx, 0")
Emit("setnz cl")
Emit("mov " + reg + ", ecx")
EndProcedure
It converts an integer into another integer, which may only be 0 and 1. Setnz sets cl (the lower part of ecx) to 1 if reg-0 was NOT zero. The result is that ecx becomes 0 when reg was 0, else ecx becomes 1.

BoolExp() looks just like the other procedures:
Procedure BoolExp()
; boolexp ::= <relexp> {<boolop> <relexp> }
Protected Operation.s
RelExp()
While Look = 'a' ; start of and
Operation = GetName()
Emit("push eax")
RelExp()
Emit("pop ebp")
Select Operation
Case "and": ToBool("eax") ; Convert to 0 or 1 depending on
ToBool("ebp") ; truth value of input
Emit("and eax, ebp")
Emit("mov eax, 0")
Emit("setnz al")
Default
Expected("Operator, got " + Operation)
EndSelect
Wend
EndProcedure
First, both operands are converted from their initial value into 0 and 1 (depending on their truth value). Then we perform a bitwise and operation, which works as a boolean and because of the limited input range (values are either 0 or 1).

You should now rewrite Expression() to call BoolExp() instead of AddExp(). Then write a dummy procedure, RelExp() that only calls AddExp(). You compiler should now have support for the And operator! Try it!

Note, that the compiler is currently case sensitive. Don't try to remedy this in BoolExp() - it should be done globally in the base file. If you want to fix it now, simply put an LCase() in GetLook(), and be sure to only expect lowercase letters in Look.

When you implement Or and Xor, you should use the or and xor cpu instructions instead of the and instruction. Since all these operators share a lot of code, we can make the code generation shared, too:
Procedure BoolExp()
; boolexp ::= <relexp> { <boolop> <relexp> }
Protected Operation.s
RelExp()
While Look = 'a' Or Look = 'o' Or Look = 'x'
Operation = GetName()
Emit("push eax")
RelExp()
Emit("pop ebp")
ToBool("eax")
ToBool("ebp")
Select Operation
Case "and": Emit("and eax, ebp")
Case "or": Emit("or eax, ebp")
Case "xor": Emit("xor eax, ebp")
Default
Expected("Operator, got " + Operation)
EndSelect
Emit("mov eax, 0")
Emit("setnz al")
Wend
EndProcedure

Cool. We added two new operators simply by moving four lines and adding two lines. (Ok, so I cheated by modifying the line with While...)

To avoid confusion I will must precise that these operators associate to the left, that is:
1 and 2 or 3
is the same as
(1 and 2) or 3
or verbally, the output of (1 and 2) becomes the input of or 3. Remember that the output of (1 and 2) is a number that is either 0 (false) or 1 (true).

This is the same for all operators we have implemented, however, this order is overridden by the precedence rules. So:
1+2+3 equals
(1+2)+3, however
1+2*3 equals
1+(2*3)

All normal operators are left associative, except one: ^ (raise to the power of). The power operator is right associative. Often this is implemented a normal function instead of an operator, this is because a lot of code is required to compute the result.

We will use the same idea for creating the relational operators as we used for above: the comparison will return either 0 or 1, depending on the input. To keep the code clean, I want to put the code generation for these operators in a separate procedure, RelOp(). This is because the code for recognizing the multi-character relops can get a bit long-winded. Here is RelExp():
Procedure RelExp()
; relexp ::= <addexp> { <boolop> <addexp> }
Protected Operation.s
AddExp()
While Look = '<' Or Look = '>' Or Look = '='
Operation = Chr(Look)
GetLook()
Select Look
Case '<', '>', '='
Operation + Chr(Look)
GetLook()
EndSelect
EatWhite()
Emit("push eax")
AddExp()
Emit("pop ebp")
RelOp(Operation) ; Generate code
Wend
EndProcedure


Here is the code for RelOp(), it's a standard Select statement to generate the correct conditional:
Procedure RelOp(Op.s)
Emit("cmp ebp, eax")
Emit("mov eax, 0")
Select Op
Case "=": Emit("sete al") ; set if equal
Case "<": Emit("setl al") ; set if less
Case "<=": Emit("setle al") ; less or equal
Case ">": Emit("setg al") ; set if greater
Case ">=": Emit("setge al")
Case "<>": Emit("setne al") ; set if not equal
Default
Expected("Relational operator, got " + Op)
EndSelect
EndProcedure

Notice that the algorithm is quite similar to ToBool(), except we now compare with another number instead of 0. Setne and setnz are aliases for the same instructions, by the way.

I now realize that there is a mistake in base2.pb. Simply replace the procedure Expected() with this one:
Procedure Expected(S.s)
Error("Expected " + S)
EndProcedure
Else the error message will be nonsensical.

A minor problem with our implementation of the boolean operators is that they are so called "strict". Normally boolean operators are "short-circuit". You should look up this on wikipedia or somewhere else if you don't know what it means.

The last thing you have to do before you're finished with this part of the tutorial is to put your new code through some testing. Remember to also test invalid inputs. They should give a proper error message.

Here is the complete listing:

; boolrel.pb
IncludeFile "..\base2.pb"

Procedure ToBool(reg.s)
Emit("cmp " + reg + ", 0")
Emit("mov ecx, 0")
Emit("setnz cl")
Emit("mov " + reg + ", ecx")
EndProcedure

Procedure RelOp(Op.s)
Emit("cmp ebp, eax")
Emit("mov eax, 0")
Select Op
Case "=": Emit("sete al") ; set if equal
Case "<": Emit("setl al") ; set if less
Case "<=": Emit("setle al") ; less or equal
Case ">": Emit("setg al") ; set if greater
Case ">=": Emit("setge al")
Case "<>": Emit("setne al") ; set if not equal
Default
Expected("Relational operator, got " + Op)
EndSelect
EndProcedure

Procedure Value()
; value ::= <integer>
Emit("mov eax, " + GetInteger())
EndProcedure

Procedure MulExp()
; mulexp ::= <value> {<mulop> <value>}
Protected Operation.c
Value()
While Look = '*' Or Look = '/' Or Look = '%'
Operation = Look
GetLookWhite()
Emit("push eax")
Value()
Emit("pop ebp")
Select Operation
Case '*': Emit("imul eax, ebp")
Case '/', '%'
Emit("xchg eax, ebp")
Emit("cdq") ; expands eax to edx:eax
Emit("idiv ebp")
EndSelect
If Operation = '%'
Emit("mov eax, edx")
EndIf
Wend
EndProcedure

Procedure AddExp()
; addexp ::= <mulexp> {<addop> <mulexp> }
Protected Operation.c
MulExp()
While Look = '+' Or Look = '-'
Operation = Look
GetLookWhite()
Emit("push eax")
MulExp()
Emit("pop ebp")
Select Operation
Case '+': Emit("add eax, ebp")
Case '-': Emit("sub eax, ebp")
Emit("neg eax")
EndSelect
Wend
EndProcedure

Procedure RelExp()
; relexp ::= <addexp> { <boolop> <addexp> }
Protected Operation.s
AddExp()
While Look = '<' Or Look = '>' Or Look = '='
Operation = Chr(Look)
GetLook()
Select Look
Case '<', '>', '='
Operation + Chr(Look)
GetLook()
EndSelect
EatWhite()
Emit("push eax")
AddExp()
Emit("pop ebp")
RelOp(Operation) ; Generate code
Wend
EndProcedure

Procedure BoolExp()
; boolexp ::= <relexp> { <boolop> <relexp> }
Protected Operation.s
RelExp()
While Look = 'a' Or Look = 'o' Or Look = 'x'
Operation = GetName()
Emit("push eax")
RelExp()
Emit("pop ebp")
ToBool("eax")
ToBool("ebp")
Select Operation
Case "and": Emit("and eax, ebp")
Case "or": Emit("or eax, ebp")
Case "xor": Emit("xor eax, ebp")
Default
Expected("Operator, got " + Operation)
EndSelect
Emit("mov eax, 0")
Emit("setnz al")
Wend
EndProcedure

Procedure Expression()
; expression ::= <BoolExp>
BoolExp()

If Look <> 0
Expected("End of line")
EndIf
EndProcedure

Init()
Expression()
Input()