Pseudocode Parser¶
The vc2_pseudocode_parser.parser
module contains a parser and
associated Abstract Syntax Tree (AST) representation for the pseudocode
language used within the VC-2 specifications [VC2].
A quick-start example illustrating much of the pseudocode syntax and basic usage of this module is given below:
>>> from vc2_pseudocode_parser.parser import parse
>>> source = '''
... some_function(arg1, arg2, arg3):
... # Assignments
... hex_foo = 0xF00
... sum = arg1 + arg2 + arg3
...
... # If statements
... if (sum > 0):
... sign = 1
... else if (sum < 0):
... sign = -1
... else:
... sign = 0
...
... # For-each: loop over fixed set of values
... sum2 = 0
... for each value in arg1, arg2, arg3:
... sum2 += value
...
... # For: loop over range of integers
... sum_1_to_100 = 0
... for n = 1 to 100:
... sum_1_to_100 += n
...
... # While loop
... count = 0
... while (sum > 0):
... sum //= 2
... count += 1
...
... # Maps (like Python's dictionaries
... map = {}
...
... # Maps subscripted with labels
... map[foo] = 123
... map[bar] = 321
...
... # Labels are first-class values (and are defined by their first use)
... label = baz
... map[label] = 999
...
... # Function calls
... foo(map)
...
... # Return from functions
... return count
... '''
>>> ast = parse(source)
>>> ast.functions[0].name
'some_function'
>>> assignment = ast.functions[0].body[0]
>>> assignment.variable
Variable(offset=68, name='hex_foo')
>>> assignment.value
NumberExpr(offset=78, offset_end=83, value=3840, display_base=16, display_digits=3)
Parser¶
A pseudocode snippet may be parsed into an Abstract Syntax Tree (AST) using the
parse()
function:
-
parse
(string)¶ Parse a pseudocode listing into an abstract syntax tree (
Listing
).May raise a
ParseError
orASTConstructionError
exception on failure.
Parsing failures will result of one of the exceptions below being raised. In
all cases, the str
representation of these errors produces a
user-friendly description of the problem.
For example:
>>> from vc2_pseudocode_parser.parser import ParseError, ASTConstructionError
>>> try:
... ast = parse("foo(): return (a + 3")
... except (ParseError, ASTConstructionError) as e:
... print(str(e))
At line 1 column 21:
foo(): return (a + 3
^
Expected ')' or <operator>
-
exception
ParseError
¶ Re-exported from
peggie.ParseError
.
-
exception
ASTConstructionError
(line, column, snippet)¶ Exceptions thrown during construction of an AST.
-
exception
LabelUsedAsVariableNameError
(line, column, snippet, variable_name)¶ Bases:
vc2_pseudocode_parser.parser.ast.ASTConstructionError
Thrown when a name previously used as a label is assigned to like a variable.
-
exception
CannotSubscriptLabelError
(line, column, snippet, label_name)¶ Bases:
vc2_pseudocode_parser.parser.ast.ASTConstructionError
Thrown when name which has previously used as a label is subscripted like a variable.
Abstract Syntax Tree (AST)¶
The parser generates a fairly detailed AST containing both semantic and some non-semantic features of the source such as explicit uses of parentheses, comments and vertical whitespace. Nodes in the AST also include character indices of the corresponding source code enabling the construction of helpful error messages.
Every node in the AST is a subclass of the ASTNode
base class:
AST Root (Listing
)¶
The root element of a complete AST is Listing
:
-
class
Listing
(functions, leading_empty_lines=<factory>)¶ The root of a pseudocode AST.
-
functions
: List[vc2_pseudocode_parser.parser.ast.Function]¶ List of
Function
trees for each function defined in the tree.
-
leading_empty_lines
: List[vc2_pseudocode_parser.parser.ast.EmptyLine]¶ List of
EmptyLine
trees relating to empty (or comment-only) lines at the start of the source listing.
-
This in turn is principally made up of a list of Function
nodes:
-
class
Function
(offset, name, arguments, body, eol=None)¶ A function definition:
<name>(<arguments[0]>, <arguments[1]>, <...>): <eol> <body>
-
arguments
: List[vc2_pseudocode_parser.parser.ast.Variable]¶ The
Variable
objects corresponding with the arguments to this function.
-
body
: List[vc2_pseudocode_parser.parser.ast.Stmt]¶ The list of
Stmt
which make up this function.
-
eol
: Optional[vc2_pseudocode_parser.parser.ast.EOL] = None¶ EOL
at the end of the function heading.
-
Statements¶
AST nodes representing statements in the AST are subclasses of
Stmt
.
-
class
Stmt
(offset, offset_end)¶ Base-class for all statement AST nodes.
If-else-if-else statements are defined as follows:
-
class
IfElseStmt
(if_branches, else_branch=None)¶ An
if
statement with an arbitrary number ofelse if
clauses and optionalelse
clause.A
if
statement is broken as illustrated by the following example:if (condition0): # \ if_branches[0] body0 # / else if (condition1): # \ if_branches[1] body1 # / else if (condition2): # \ if_branches[2] body2 # / else: # \ else_branch body3 # /
-
if_branches
: List[vc2_pseudocode_parser.parser.ast.IfBranch]¶ The opening
if
clause followed by anyelse if
clauses are represented as a series ofIfBranch
.
-
else_branch
: Optional[vc2_pseudocode_parser.parser.ast.ElseBranch] = None¶ If an
else
clause is present, aElseBranch
giving its contents.
-
-
class
IfBranch
(offset, condition, body, eol=None)¶ An
if
orelse if
clause in an if-else-if-else statement:if (<condition>): <eol> <body>
Or:
else if (<condition>): <eol> <body>
-
condition
: vc2_pseudocode_parser.parser.ast.Expr¶ The
Expr
representing the condition for the branch condition.
-
body
: List[vc2_pseudocode_parser.parser.ast.Stmt]¶ The list of
Stmt
to execute when the condition is True.
-
eol
: Optional[vc2_pseudocode_parser.parser.ast.EOL] = None¶ EOL
following theif
orelse if
clause heading.
-
-
class
ElseBranch
(offset, body, eol=None)¶ An
else
clause in an if-else-if-else statement:else: <eol> <body>
-
body
: List[vc2_pseudocode_parser.parser.ast.Stmt]¶ The list of
Stmt
to execute when the else branch is reached.
-
eol
: Optional[vc2_pseudocode_parser.parser.ast.EOL] = None¶ EOL
following theelse
clause heading.
-
For-each loops are defined as follows:
-
class
ForEachStmt
(offset, variable, values, body, eol=None)¶ A
for each
loop:for each <variable> in <values[0]>, <values[1]>, <...>: <eol> <body>
-
variable
: vc2_pseudocode_parser.parser.ast.Variable¶ The loop
Variable
.
-
values
: List[vc2_pseudocode_parser.parser.ast.Expr]¶ The
Expr
giving the set of values the loop will iterate over.
-
body
: List[vc2_pseudocode_parser.parser.ast.Stmt]¶ The list of
Stmt
to execute in each iteration.
-
eol
: Optional[vc2_pseudocode_parser.parser.ast.EOL] = None¶ EOL
following thefor each
heading.
-
For loops are defined as follows:
-
class
ForStmt
(offset, variable, start, end, body, eol=None)¶ A
for
loop:for <variable> = <start> to <end>: <eol> <body>
-
variable
: vc2_pseudocode_parser.parser.ast.Variable¶ The loop
Variable
.
-
start
: vc2_pseudocode_parser.parser.ast.Expr¶ The
Expr
giving the loop starting value.
-
end
: vc2_pseudocode_parser.parser.ast.Expr¶ The
Expr
giving the (inclusive) loop ending value.
-
body
: List[vc2_pseudocode_parser.parser.ast.Stmt]¶ The list of
Stmt
to execute in each iteration.
-
eol
: Optional[vc2_pseudocode_parser.parser.ast.EOL] = None¶ EOL
following thefor
heading.
-
While loops are defined as follows:
-
class
WhileStmt
(offset, condition, body, eol=None)¶ A
while
loop:while (<condition>): <eol> <body>
-
condition
: vc2_pseudocode_parser.parser.ast.Expr¶ The
Expr
representing the loop condition.
-
body
: List[vc2_pseudocode_parser.parser.ast.Stmt]¶ The list of
Stmt
to execute in each iteration.
-
eol
: Optional[vc2_pseudocode_parser.parser.ast.EOL] = None¶ EOL
following thewhile
heading.
-
Function call statements are defined as follows:
-
class
FunctionCallStmt
(call, eol)¶ A statement which represents a call to a function.
-
call
: vc2_pseudocode_parser.parser.ast.FunctionCallExpr¶ The
FunctionCallExpr
which defines the function call.
-
eol
: vc2_pseudocode_parser.parser.ast.EOL¶ EOL
following the function call.
-
Return statements are defined as follows:
-
class
ReturnStmt
(offset, value, eol)¶ A return statement:
return <value> <eol>
-
value
: vc2_pseudocode_parser.parser.ast.Expr¶ A
Expr
giving the value to be returned.
-
eol
: vc2_pseudocode_parser.parser.ast.EOL¶ EOL
following the return statement.
-
Assignment statements are defined as follows:
-
class
AssignmentStmt
(variable, op, value, eol)¶ A simple or compound assignment statement:
<variable> <op> <value> <eol> # e.g. x += 1 + 2
-
variable
: Union[vc2_pseudocode_parser.parser.ast.Variable, vc2_pseudocode_parser.parser.ast.Subscript]¶
-
op
: vc2_pseudocode_parser.parser.operators.AssignmentOp¶ The type of assignment being performed (
AssignmentOp
).
-
value
: vc2_pseudocode_parser.parser.ast.Expr¶ The
Expr
giving the value being assigned.
-
eol
: vc2_pseudocode_parser.parser.ast.EOL¶ EOL
following the assignment statement.
-
The following assignment operators are defined:
Expressions¶
AST nodes representing expressions in the AST are subclasses of
Expr
.
Unary expressions are defined as follows:
-
class
UnaryExpr
(offset, op, value)¶ A unary expression, e.g.
-foo
.-
op
: vc2_pseudocode_parser.parser.operators.UnaryOp¶ The operator (
UnaryOp
)
-
value
: vc2_pseudocode_parser.parser.ast.Expr¶ The
Expr
the operator applies to.
-
With unary operators enumerated as:
-
class
UnaryOp
(value)¶ An enumeration.
-
plus
= '+'¶
-
minus
= '-'¶
-
bitwise_not
= '~'¶
-
logical_not
= 'not'¶
-
Binary expressions are defined as follows:
-
class
BinaryExpr
(lhs, op, rhs)¶ A binary expression, e.g.
a + 1
.-
lhs
: vc2_pseudocode_parser.parser.ast.Expr¶ The
Expr
on the left-hand side of the expression.
-
op
: vc2_pseudocode_parser.parser.operators.BinaryOp¶ The operator (
BinaryOp
)
-
rhs
: vc2_pseudocode_parser.parser.ast.Expr¶ The
Expr
on the right-hand side of the expression.
-
With binary operators enumerated as:
-
class
BinaryOp
(value)¶ An enumeration.
-
logical_or
= 'or'¶
-
logical_and
= 'and'¶
-
eq
= '=='¶
-
ne
= '!='¶
-
lt
= '<'¶
-
le
= '<='¶
-
gt
= '>'¶
-
ge
= '>='¶
-
bitwise_or_
= '|'¶
-
bitwise_xor
= '^'¶
-
bitwise_and_
= '&'¶
-
lsh
= '<<'¶
-
rsh
= '>>'¶
-
add
= '+'¶
-
sub
= '-'¶
-
mul
= '*'¶
-
idiv
= '//'¶
-
mod
= '%'¶
-
pow
= '**'¶
-
Calls to functions are defined as follows:
-
class
FunctionCallExpr
(offset, offset_end, name, arguments)¶ A call to a function:
<name>(<arguments[0]>, <arguments[1]>, <...>) # e.g. foo(1, 2, 3)
-
arguments
: List[vc2_pseudocode_parser.parser.ast.Expr]¶ The list of
Expr
giving the arguments to the call.
-
Uses of variables and subscripted variables are defined as follows:
-
class
VariableExpr
(variable)¶ A use of a variable or subscripted variable.
Value literals are defined as follows:
-
class
BooleanExpr
(offset, value)¶ A boolean literal, i.e.
True
andFalse
.
-
class
NumberExpr
(offset, offset_end, value, display_base=10, display_digits=1)¶ A numerical literal integer, e.g.
123
or0xF00
.
-
class
EmptyMapExpr
(offset, offset_end)¶ An empty map literal (e.g.
{}
).
-
class
LabelExpr
(label)¶ A label literal.
-
label
: vc2_pseudocode_parser.parser.ast.Label¶ The
Label
used.
-
For parenthesised expressions, e.g. (1 + 2)
, the presence of the
parentheses is explicitly marked in the AST. While this is not semantically
important (since evaluation order is explicit in an AST) it may be helpful in
retaining parentheses added for human legibility when translating the
pseudocode into other forms.
-
class
ParenExpr
(offset, offset_end, value)¶ A parenthesised expression.
-
value
: vc2_pseudocode_parser.parser.ast.Expr¶ The parenthesised
Expr
-
Variables and subscripts¶
A Variable
is defined as follows:
Variables may be subscripted (multiple times) and this is represented by a
nesting of Subscript
objects:
-
class
Subscript
(offset_end, variable, subscript)¶ A subscripted variable (e.g.
x[1]
).-
variable
: Union[vc2_pseudocode_parser.parser.ast.Subscript, vc2_pseudocode_parser.parser.ast.Variable]¶ The
Variable
being subscripted (orSubscript
in the case of a multiply subscripted variable.
-
subscript
: vc2_pseudocode_parser.parser.ast.Expr¶ The
Expr
giving the subscript value.
-
Labels¶
Labels are defined as follows:
Note
Labels and variables are syntactically ambiguous in the pseudocode language but are disambiguated in the AST. Names in the pseudocode are deemed to be variables if they correspond with function arguments, loop variables or assignment targets within a function’s namespace. All other names are deemed to be labels.
Comments and vertical whitespace¶
All comments and vertical whitespace (i.e. blank lines) are captured by the AST. This enables these non-semantic components to be retained in language translations.
-
class
Comment
(offset, string)¶ An end-of-line comment.
-
class
EmptyLine
(offset, offset_end, comment=None)¶ Represents an empty (or comment-only) line in the source.
-
comment
: Optional[vc2_pseudocode_parser.parser.ast.Comment] = None¶ The
Comment
on this line, if present.
-
Simple statements are syntactically terminated by an optional comment followed
by a newline and then a number of empty or comment-only lines. These details
are captured by the EOL
node.
-
class
EOL
(offset, offset_end, comment=None, empty_lines=<factory>)¶ The end-of-line terminator following a statement.
-
comment
: Optional[vc2_pseudocode_parser.parser.ast.Comment] = None¶ A
Comment
on the same line as the statement, if present.
-
empty_lines
: List[vc2_pseudocode_parser.parser.ast.EmptyLine]¶ Any trailing
EmptyLine
after this statement.
-
For example, given a snippet as follows:
foo() # Comment 0
bar() # Comment 1
# Comment 2
# Comment 3
baz() # Comment 4
Here the function call bar()
would be captured by a
FunctionCallStmt
. The FunctionCallStmt.eol
would
contain an EOL
with EOL.comment
set to a
Comment
containing # Comment 1
and EOL.empty_lines
the following four lines (and their comments).
Function declarations, if
, else if
, else
, for each
, for
and
while
headings are separated from their bodies by a :
and optional
EOL
. When the :
is followed by a newline, an EOL
object will be given, but for in-line definitions, the EOL
will be
omitted. For example:
func1():
foo()
func2(): foo()
Here the Function
for func1
will have Function.eol
set to an EOL
node for func2
it will be None
.
Grammar¶
The pseudocode language grammar can be described by its
peggie
grammar:
# Grammar for the VC-2 specification pseudocode language
start <- any_ws @=function+ eof
# Function definition
function <- identifier ws function_arguments ws stmt_block
function_arguments <- "(" ws (identifier ws ("," ws identifier ws)* ","?)? ws ")"
# A series of statements
stmt_block <- ":" ws single_line_stmt
/ ":" eol @>((@=stmt)+)
# Statements (all end with an eol)
stmt <- if_else_stmt
/ for_each_stmt
/ for_stmt
/ while_stmt
/ function_call_stmt
/ return_stmt
/ assignment_stmt
single_line_stmt <- function_call_stmt
/ return_stmt
/ assignment_stmt
function_call_stmt <- function_call eol
if_else_stmt <- @=("if" ws condition ws stmt_block)
@=(@=("else" ws_ "if" ws condition ws stmt_block)*)
@=(("else" ws stmt_block)?)
for_each_stmt <- "for" ws_ "each" ws_ identifier ws_ "in" ws_ for_each_list ws stmt_block
for_stmt <- "for" ws_ identifier ws "=" ws expr ws_ "to" ws_ expr ws stmt_block
while_stmt <- "while" ws condition ws stmt_block
assignment_stmt <- variable ws assignment_op ws expr eol
return_stmt <- "return" ws_ expr eol
condition <- "(" ws expr ws ")"
for_each_list <- expr (ws "," ws expr)*
assignment_op <- r"(\+|-|\*|//|\*\*|&|\^|\||<<|>>)?="
# Expressions (defined below in ascending order of precidence)
expr <- maybe_log_or_expr
maybe_log_or_expr <- maybe_log_and_expr (ws_ "or" ws_ maybe_log_and_expr)*
maybe_log_and_expr <- maybe_log_not_expr (ws_ "and" ws_ maybe_log_not_expr)*
maybe_log_not_expr <- "not" ws_ maybe_log_not_expr / maybe_cmp_expr
maybe_cmp_expr <- maybe_or_expr (ws r"==|!=|<=|>=|<|>" ws maybe_or_expr)*
maybe_or_expr <- maybe_xor_expr (ws "|" ws maybe_xor_expr)*
maybe_xor_expr <- maybe_and_expr (ws "^" ws maybe_and_expr)*
maybe_and_expr <- maybe_shift_expr (ws "&" ws maybe_shift_expr)*
maybe_shift_expr <- maybe_arith_expr (ws r"<<|>>" ws maybe_arith_expr)*
maybe_arith_expr <- maybe_prod_expr (ws r"\+|-" ws maybe_prod_expr)*
maybe_prod_expr <- maybe_unary_expr (ws r"\*|//|%" ws maybe_unary_expr)*
maybe_unary_expr <- r"\+|-|~" ws maybe_unary_expr / maybe_pow_expr
maybe_pow_expr <- maybe_paren_expr (ws "**" ws maybe_unary_expr)*
maybe_paren_expr <- "(" ws expr ws ")" / atom
# Atoms
atom <- function_call
/ variable
/ empty_map
/ boolean
/ number
variable <- identifier (ws subscript)*
subscript <- "[" ws expr ws "]"
function_call <- identifier ws function_call_arguments
function_call_arguments <- "(" ws (expr ws ("," ws expr ws)* ","?)? ws ")"
# Literals
identifier <- !reserved_word r"[a-zA-Z_][a-zA-Z0-9_]*"
reserved_word <- r"(if|else|elif|elseif|for|each|foreach|in|to|while|return|[Tt]rue|[Ff]alse|and|or|not)(?![a-zA-Z0-9_])"
boolean <- ("True" / "False")
number <- r"(0[bB][01]+)|(0[xX][0-9a-fA-F]+)|([0-9]+)"
empty_map <- "{" ws "}"
# Whitespace and comments
comment <- r"#((?![\n\r]).)*(\n|\r\n|\r|(?!.))"
ws <- h_space?
ws_ <- h_space
any_ws <- (comment / h_space / v_space)*
eol <- h_space? (comment / v_space / eof) any_ws
h_space <- r"[ \t]+"
v_space <- "\n" / "\r\n" / "\r"
eof <- !.
Operator precedence and associativity tables¶
A table of operator precedence and associativities is also provided which may be useful for, for example, producing pretty-printed outputs (with excess parentheses removed).
-
OPERATOR_PRECEDENCE_TABLE
= {operator: int, ...}¶ BinaryOp
andUnaryOp
operator precedence scores. Higher scores mean higher precedence.
-
OPERATOR_ASSOCIATIVITY_TABLE
= {operator: associativity, ...}¶