Syntax Analyzer
A Syntax analyzer transforms a token stream (from the [[Compiler/lexical_analyzer|Lexical analyzer]])
into a Syntax tree, based on a grammar.
{{task heading}}
Take the output from the Lexical analyzer [[Compiler/lexical_analyzer|task]],
and convert it to an [https://en.wikipedia.org/wiki/Abstract_syntax_tree Abstract Syntax Tree (AST)],
based on the grammar below. The output should be in a [[Flatten_a_list|flattened format.]]
The program should read input from a file and/or stdin, and write output to a file and/or
stdout. If the language being used has a parser module/library/class, it would be great
if two versions of the solution are provided: One without the parser module, and one
with.
{{task heading|Grammar}}
The simple programming language to be analyzed is more or less a (very tiny) subset of
[[C]]. The formal grammar in
[https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_Form Extended Backus-Naur Form (EBNF)]:
1 1 Identifier count
1 7 Op_assign
1 9 Integer 1
1 10 Semicolon
2 1 Keyword_while
2 7 LeftParen
2 8 Identifier count
2 14 Op_less
2 16 Integer 10
2 18 RightParen
2 20 LeftBrace
3 5 Keyword_print
3 10 LeftParen
3 11 String "count is: "
3 23 Comma
3 25 Identifier count
3 30 Comma
3 32 String "\n"
3 36 RightParen
3 37 Semicolon
4 5 Identifier count
4 11 Op_assign
4 13 Identifier count
4 19 Op_add
4 21 Integer 1
4 22 Semicolon
5 1 RightBrace
6 1 End_of_input
| style="vertical-align:top" |
Sequence
Sequence
;
Assign
Identifier count
Integer 1
While
Less
Identifier count
Integer 10
Sequence
Sequence
;
Sequence
Sequence
Sequence
;
Prts
String "count is: "
;
Prti
Identifier count
;
Prts
String "\n"
;
Assign
Identifier count
Add
Identifier count
Integer 1
|}
;Specifications
;List of node type names:
Identifier String Integer Sequence If Prtc Prts Prti While Assign Negate Not Multiply Divide Mod Add Subtract Less LessEqual Greater GreaterEqual Equal NotEqual And OrIn the text below, Null/Empty nodes are represented by ";". ;Non-terminal (internal) nodes: For Operators, the following nodes should be created: Multiply Divide Mod Add Subtract Less LessEqual Greater GreaterEqual Equal NotEqual And Or For each of the above nodes, the left and right sub-nodes are the operands of the respective operation. In pseudo S-Expression format: (Operator expression expression) Negate, Not For these node types, the left node is the operand, and the right node is null. (Operator expression ;) Sequence - sub-nodes are either statements or Sequences. If - left node is the expression, the right node is If node, with it's left node being the if-true statement part, and the right node being the if-false (else) statement part. (If expression (If statement else-statement)) If there is not an else, the tree becomes: (If expression (If statement ;)) Prtc (Prtc (expression) ;) Prts (Prts (String "the string") ;) Prti (Prti (Integer 12345) ;) While - left node is the expression, the right node is the statement. (While expression statement) Assign - left node is the left-hand side of the assignment, the right node is the right-hand side of the assignment. (Assign Identifier expression) Terminal (leaf) nodes: Identifier: (Identifier ident_name) Integer: (Integer 12345) String: (String "Hello World!") ";": Empty node ;Some simple examples Sequences denote a list node; they are used to represent a list. semicolon's represent a null node, e.g., the end of this path. This simple program: a=11; Produces the following AST, encoded as a binary tree: Under each non-leaf node are two '|' lines. The first represents the left sub-node, the second represents the right sub-node: (1) Sequence (2) |-- ; (3) |-- Assign (4) |-- Identifier: a (5) |-- Integer: 11 In flattened form: (1) Sequence (2) ; (3) Assign (4) Identifier a (5) Integer 11 This program: a=11; b=22; c=33; Produces the following AST: ( 1) Sequence ( 2) |-- Sequence ( 3) | |-- Sequence ( 4) | | |-- ; ( 5) | | |-- Assign ( 6) | | |-- Identifier: a ( 7) | | |-- Integer: 11 ( 8) | |-- Assign ( 9) | |-- Identifier: b (10) | |-- Integer: 22 (11) |-- Assign (12) |-- Identifier: c (13) |-- Integer: 33 In flattened form: ( 1) Sequence ( 2) Sequence ( 3) Sequence ( 4) ; ( 5) Assign ( 6) Identifier a ( 7) Integer 11 ( 8) Assign ( 9) Identifier b (10) Integer 22 (11) Assign (12) Identifier c (13) Integer 33 ;Pseudo-code for the parser. Uses [https://www.engr.mun.ca/~theo/Misc/exp_parsing.htm Precedence Climbing] for expression parsing, and [https://en.wikipedia.org/wiki/Recursive_descent_parser Recursive Descent] for statement parsing. The AST is also built:
4 1 Identifier count
4 7 Op_assign
4 9 Integer 1
4 10 Semicolon
5 1 Identifier n
5 3 Op_assign
5 5 Integer 1
5 6 Semicolon
6 1 Identifier limit
6 7 Op_assign
6 9 Integer 100
6 12 Semicolon
7 1 Keyword_while
7 7 LeftParen
7 8 Identifier n
7 10 Op_less
7 12 Identifier limit
7 17 RightParen
7 19 LeftBrace
8 5 Identifier k
8 6 Op_assign
8 7 Integer 3
8 8 Semicolon
9 5 Identifier p
9 6 Op_assign
9 7 Integer 1
9 8 Semicolon
10 5 Identifier n
10 6 Op_assign
10 7 Identifier n
10 8 Op_add
10 9 Integer 2
10 10 Semicolon
11 5 Keyword_while
11 11 LeftParen
11 12 LeftParen
11 13 Identifier k
11 14 Op_multiply
11 15 Identifier k
11 16 Op_lessequal
11 18 Identifier n
11 19 RightParen
11 21 Op_and
11 24 LeftParen
11 25 Identifier p
11 26 RightParen
11 27 RightParen
11 29 LeftBrace
12 9 Identifier p
12 10 Op_assign
12 11 Identifier n
12 12 Op_divide
12 13 Identifier k
12 14 Op_multiply
12 15 Identifier k
12 16 Op_notequal
12 18 Identifier n
12 19 Semicolon
13 9 Identifier k
13 10 Op_assign
13 11 Identifier k
13 12 Op_add
13 13 Integer 2
13 14 Semicolon
14 5 RightBrace
15 5 Keyword_if
15 8 LeftParen
15 9 Identifier p
15 10 RightParen
15 12 LeftBrace
16 9 Keyword_print
16 14 LeftParen
16 15 Identifier n
16 16 Comma
16 18 String " is prime\n"
16 31 RightParen
16 32 Semicolon
17 9 Identifier count
17 15 Op_assign
17 17 Identifier count
17 23 Op_add
17 25 Integer 1
17 26 Semicolon
18 5 RightBrace
19 1 RightBrace
20 1 Keyword_print
20 6 LeftParen
20 7 String "Total primes found: "
20 29 Comma
20 31 Identifier count
20 36 Comma
20 38 String "\n"
20 42 RightParen
20 43 Semicolon
21 1 End_of_input
| style="vertical-align:top" |
Sequence Sequence Sequence Sequence Sequence ; Assign Identifier count Integer 1 Assign Identifier n Integer 1 Assign Identifier limit Integer 100 While Less Identifier n Identifier limit Sequence Sequence Sequence Sequence Sequence ; Assign Identifier k Integer 3 Assign Identifier p Integer 1 Assign Identifier n Add Identifier n Integer 2 While And LessEqual Multiply Identifier k Identifier k Identifier n Identifier p Sequence Sequence ; Assign Identifier p NotEqual Multiply Divide Identifier n Identifier k Identifier k Identifier n Assign Identifier k Add Identifier k Integer 2 If Identifier p If Sequence Sequence ; Sequence Sequence ; Prti Identifier n ; Prts String " is prime\n" ; Assign Identifier count Add Identifier count Integer 1 ; Sequence Sequence Sequence ; Prts String "Total primes found: " ; Prti Identifier count ; Prts String "\n" ;|} ; Additional examples Your solution should pass all the test cases above and the additional tests found '''[[Compiler/Sample_programs|Here]]'''. {{task heading|Reference}} The C and Python versions can be considered reference implementations. ;Related Tasks * [[Compiler/lexical_analyzer|Lexical Analyzer task]] * [[Compiler/code_generator|Code Generator task]] * [[Compiler/virtual_machine_interpreter|Virtual Machine Interpreter task]] * [[Compiler/AST_interpreter|AST Interpreter task]]