RosettaCodeData/Task/Compiler-code-generator/00-TASK.txt

266 lines
6.2 KiB
Plaintext

{{task heading|Code Generator}}
A code generator translates the output of the syntax analyzer and/or semantic analyzer
into lower level code, either assembly, object, or virtual.
;Task
Take the output of the Syntax analyzer [[Compiler/syntax_analyzer|task]] - which is a [[Flatten_a_list|flattened]] Abstract Syntax Tree (AST) - and convert it to virtual machine code, that can be run by the
[[Compiler/virtual_machine_interpreter|Virtual machine interpreter]]. The output is in text format, and represents virtual assembly code.
The program should read input from a file and/or stdin, and write output to a file and/or
stdout.
;Example - given the simple program (below), stored in a file called while.t, create the list of tokens, using one of the Lexical analyzer [[Compiler/lexical_analyzer|solutions]]
lex < while.t > while.lex
;Run one of the Syntax analyzer [[Compiler/syntax_analyzer|solutions]]:
parse < while.lex > while.ast
;while.ast can be input into the code generator.
;The following table shows the input to lex, lex output, the AST produced by the parser, and the generated virtual assembly code.
Run as: lex < while.t | parse | gen
{| class="wikitable"
|-
! Input to lex
! Output from lex, input to parse
! Output from parse
! Output from gen, input to VM
|-
| style="vertical-align:top" |
<syntaxhighlight lang="c">count = 1;
while (count < 10) {
print("count is: ", count, "\n");
count = count + 1;
}</syntaxhighlight>
| style="vertical-align:top" |
<b><pre>
1 1 Identifier count
1 7 Op_assign
1 9 Integer 1
1 10 Semicolon
2 1 Keyword_while
2 7 LeftParen
2 8 Identifier count
2 14 Op_less
2 16 Integer 10
2 18 RightParen
2 20 LeftBrace
3 5 Keyword_print
3 10 LeftParen
3 11 String "count is: "
3 23 Comma
3 25 Identifier count
3 30 Comma
3 32 String "\n"
3 36 RightParen
3 37 Semicolon
4 5 Identifier count
4 11 Op_assign
4 13 Identifier count
4 19 Op_add
4 21 Integer 1
4 22 Semicolon
5 1 RightBrace
6 1 End_of_input</pre></b>
| style="vertical-align:top" |
<b><pre>Sequence
Sequence
;
Assign
Identifier count
Integer 1
While
Less
Identifier count
Integer 10
Sequence
Sequence
;
Sequence
Sequence
Sequence
;
Prts
String "count is: "
;
Prti
Identifier count
;
Prts
String "\n"
;
Assign
Identifier count
Add
Identifier count
Integer 1</pre></b>
| style="vertical-align:top" |
<b><pre>Datasize: 1 Strings: 2
"count is: "
"\n"
0 push 1
5 store [0]
10 fetch [0]
15 push 10
20 lt
21 jz (43) 65
26 push 0
31 prts
32 fetch [0]
37 prti
38 push 1
43 prts
44 fetch [0]
49 push 1
54 add
55 store [0]
60 jmp (-51) 10
65 halt</pre></b>
|}
; Input format:
As shown in the table, above, the output from the [[Compiler/syntax_analyzer|syntax analyzer]] is a flattened AST.
In the AST, Identifier, Integer, and String, are terminal nodes, e.g, they do not have child nodes.
Loading this data into an internal parse tree should be as simple as:
<syntaxhighlight lang="python">
def load_ast()
line = readline()
# Each line has at least one token
line_list = tokenize the line, respecting double quotes
text = line_list[0] # first token is always the node type
if text == ";"
return None
node_type = text # could convert to internal form if desired
# A line with two tokens is a leaf node
# Leaf nodes are: Identifier, Integer String
# The 2nd token is the value
if len(line_list) > 1
return make_leaf(node_type, line_list[1])
left = load_ast()
right = load_ast()
return make_node(node_type, left, right)
</syntaxhighlight>
; Output format - refer to the table above
* The first line is the header: Size of data, and number of constant strings.
** size of data is the number of 32-bit unique variables used. In this example, one variable, '''count'''
** number of constant strings is just that - how many there are
* After that, the constant strings
* Finally, the assembly code
;Registers:
* sp: the stack pointer - points to the next top of stack. The stack is a 32-bit integer array.
* pc: the program counter - points to the current instruction to be performed. The code is an array of bytes.
;Data:
32-bit integers and strings
;Instructions:
Each instruction is one byte. The following instructions also have a 32-bit integer operand:
fetch [index]
where index is an index into the data array.
store [index]
where index is an index into the data array.
push n
where value is a 32-bit integer that will be pushed onto the stack.
jmp (n) addr
where (n) is a 32-bit integer specifying the distance between the current location and the
desired location. addr is an unsigned value of the actual code address.
jz (n) addr
where (n) is a 32-bit integer specifying the distance between the current location and the
desired location. addr is an unsigned value of the actual code address.
The following instructions do not have an operand. They perform their operation directly
against the stack:
For the following instructions, the operation is performed against the top two entries in
the stack:
add
sub
mul
div
mod
lt
gt
le
ge
eq
ne
and
or
For the following instructions, the operation is performed against the top entry in the
stack:
neg
not
prtc
Print the word at stack top as a character.
prti
Print the word at stack top as an integer.
prts
Stack top points to an index into the string pool. Print that entry.
halt
Unconditional stop.
; Additional examples
Your solution should pass all the test cases above and the additional tests found '''[[Compiler/Sample_programs|Here]]'''.
{{task heading|Reference}}
The C and Python versions can be considered reference implementations.
;Related Tasks
* [[Compiler/lexical_analyzer|Lexical Analyzer task]]
* [[Compiler/syntax_analyzer|Syntax Analyzer task]]
* [[Compiler/virtual_machine_interpreter|Virtual Machine Interpreter task]]
* [[Compiler/AST_interpreter|AST Interpreter task]]
<hr>
__TOC__