Pseudocode Parser¶
The vc2_pseudocode_parser.parser module contains a parser and
associated Abstract Syntax Tree (AST) representation for the pseudocode
language used within the VC-2 specifications [VC2].
A quick-start example illustrating much of the pseudocode syntax and basic usage of this module is given below:
>>> from vc2_pseudocode_parser.parser import parse
>>> source = '''
... some_function(arg1, arg2, arg3):
... # Assignments
... hex_foo = 0xF00
... sum = arg1 + arg2 + arg3
...
... # If statements
... if (sum > 0):
... sign = 1
... else if (sum < 0):
... sign = -1
... else:
... sign = 0
...
... # For-each: loop over fixed set of values
... sum2 = 0
... for each value in arg1, arg2, arg3:
... sum2 += value
...
... # For: loop over range of integers
... sum_1_to_100 = 0
... for n = 1 to 100:
... sum_1_to_100 += n
...
... # While loop
... count = 0
... while (sum > 0):
... sum //= 2
... count += 1
...
... # Maps (like Python's dictionaries
... map = {}
...
... # Maps subscripted with labels
... map[foo] = 123
... map[bar] = 321
...
... # Labels are first-class values (and are defined by their first use)
... label = baz
... map[label] = 999
...
... # Function calls
... foo(map)
...
... # Return from functions
... return count
... '''
>>> ast = parse(source)
>>> ast.functions[0].name
'some_function'
>>> assignment = ast.functions[0].body[0]
>>> assignment.variable
Variable(offset=68, name='hex_foo')
>>> assignment.value
NumberExpr(offset=78, offset_end=83, value=3840, display_base=16, display_digits=3)
Parser¶
A pseudocode snippet may be parsed into an Abstract Syntax Tree (AST) using the
parse() function:
-
parse(string)¶ Parse a pseudocode listing into an abstract syntax tree (
Listing).May raise a
ParseErrororASTConstructionErrorexception on failure.
Parsing failures will result of one of the exceptions below being raised. In
all cases, the str representation of these errors produces a
user-friendly description of the problem.
For example:
>>> from vc2_pseudocode_parser.parser import ParseError, ASTConstructionError
>>> try:
... ast = parse("foo(): return (a + 3")
... except (ParseError, ASTConstructionError) as e:
... print(str(e))
At line 1 column 21:
foo(): return (a + 3
^
Expected ')' or <operator>
-
exception
ParseError¶ Re-exported from
peggie.ParseError.
-
exception
ASTConstructionError(line, column, snippet)¶ Exceptions thrown during construction of an AST.
-
exception
LabelUsedAsVariableNameError(line, column, snippet, variable_name)¶ Bases:
vc2_pseudocode_parser.parser.ast.ASTConstructionErrorThrown when a name previously used as a label is assigned to like a variable.
-
exception
CannotSubscriptLabelError(line, column, snippet, label_name)¶ Bases:
vc2_pseudocode_parser.parser.ast.ASTConstructionErrorThrown when name which has previously used as a label is subscripted like a variable.
Abstract Syntax Tree (AST)¶
The parser generates a fairly detailed AST containing both semantic and some non-semantic features of the source such as explicit uses of parentheses, comments and vertical whitespace. Nodes in the AST also include character indices of the corresponding source code enabling the construction of helpful error messages.
Every node in the AST is a subclass of the ASTNode base class:
AST Root (Listing)¶
The root element of a complete AST is Listing:
-
class
Listing(functions, leading_empty_lines=<factory>)¶ The root of a pseudocode AST.
-
functions: List[vc2_pseudocode_parser.parser.ast.Function]¶ List of
Functiontrees for each function defined in the tree.
-
leading_empty_lines: List[vc2_pseudocode_parser.parser.ast.EmptyLine]¶ List of
EmptyLinetrees relating to empty (or comment-only) lines at the start of the source listing.
-
This in turn is principally made up of a list of Function nodes:
-
class
Function(offset, name, arguments, body, eol=None)¶ A function definition:
<name>(<arguments[0]>, <arguments[1]>, <...>): <eol> <body>
-
arguments: List[vc2_pseudocode_parser.parser.ast.Variable]¶ The
Variableobjects corresponding with the arguments to this function.
-
body: List[vc2_pseudocode_parser.parser.ast.Stmt]¶ The list of
Stmtwhich make up this function.
-
eol: Optional[vc2_pseudocode_parser.parser.ast.EOL] = None¶ EOLat the end of the function heading.
-
Statements¶
AST nodes representing statements in the AST are subclasses of
Stmt.
-
class
Stmt(offset, offset_end)¶ Base-class for all statement AST nodes.
If-else-if-else statements are defined as follows:
-
class
IfElseStmt(if_branches, else_branch=None)¶ An
ifstatement with an arbitrary number ofelse ifclauses and optionalelseclause.A
ifstatement is broken as illustrated by the following example:if (condition0): # \ if_branches[0] body0 # / else if (condition1): # \ if_branches[1] body1 # / else if (condition2): # \ if_branches[2] body2 # / else: # \ else_branch body3 # /
-
if_branches: List[vc2_pseudocode_parser.parser.ast.IfBranch]¶ The opening
ifclause followed by anyelse ifclauses are represented as a series ofIfBranch.
-
else_branch: Optional[vc2_pseudocode_parser.parser.ast.ElseBranch] = None¶ If an
elseclause is present, aElseBranchgiving its contents.
-
-
class
IfBranch(offset, condition, body, eol=None)¶ An
iforelse ifclause in an if-else-if-else statement:if (<condition>): <eol> <body>
Or:
else if (<condition>): <eol> <body>
-
condition: vc2_pseudocode_parser.parser.ast.Expr¶ The
Exprrepresenting the condition for the branch condition.
-
body: List[vc2_pseudocode_parser.parser.ast.Stmt]¶ The list of
Stmtto execute when the condition is True.
-
eol: Optional[vc2_pseudocode_parser.parser.ast.EOL] = None¶ EOLfollowing theiforelse ifclause heading.
-
-
class
ElseBranch(offset, body, eol=None)¶ An
elseclause in an if-else-if-else statement:else: <eol> <body>
-
body: List[vc2_pseudocode_parser.parser.ast.Stmt]¶ The list of
Stmtto execute when the else branch is reached.
-
eol: Optional[vc2_pseudocode_parser.parser.ast.EOL] = None¶ EOLfollowing theelseclause heading.
-
For-each loops are defined as follows:
-
class
ForEachStmt(offset, variable, values, body, eol=None)¶ A
for eachloop:for each <variable> in <values[0]>, <values[1]>, <...>: <eol> <body>
-
variable: vc2_pseudocode_parser.parser.ast.Variable¶ The loop
Variable.
-
values: List[vc2_pseudocode_parser.parser.ast.Expr]¶ The
Exprgiving the set of values the loop will iterate over.
-
body: List[vc2_pseudocode_parser.parser.ast.Stmt]¶ The list of
Stmtto execute in each iteration.
-
eol: Optional[vc2_pseudocode_parser.parser.ast.EOL] = None¶ EOLfollowing thefor eachheading.
-
For loops are defined as follows:
-
class
ForStmt(offset, variable, start, end, body, eol=None)¶ A
forloop:for <variable> = <start> to <end>: <eol> <body>
-
variable: vc2_pseudocode_parser.parser.ast.Variable¶ The loop
Variable.
-
start: vc2_pseudocode_parser.parser.ast.Expr¶ The
Exprgiving the loop starting value.
-
end: vc2_pseudocode_parser.parser.ast.Expr¶ The
Exprgiving the (inclusive) loop ending value.
-
body: List[vc2_pseudocode_parser.parser.ast.Stmt]¶ The list of
Stmtto execute in each iteration.
-
eol: Optional[vc2_pseudocode_parser.parser.ast.EOL] = None¶ EOLfollowing theforheading.
-
While loops are defined as follows:
-
class
WhileStmt(offset, condition, body, eol=None)¶ A
whileloop:while (<condition>): <eol> <body>
-
condition: vc2_pseudocode_parser.parser.ast.Expr¶ The
Exprrepresenting the loop condition.
-
body: List[vc2_pseudocode_parser.parser.ast.Stmt]¶ The list of
Stmtto execute in each iteration.
-
eol: Optional[vc2_pseudocode_parser.parser.ast.EOL] = None¶ EOLfollowing thewhileheading.
-
Function call statements are defined as follows:
-
class
FunctionCallStmt(call, eol)¶ A statement which represents a call to a function.
-
call: vc2_pseudocode_parser.parser.ast.FunctionCallExpr¶ The
FunctionCallExprwhich defines the function call.
-
eol: vc2_pseudocode_parser.parser.ast.EOL¶ EOLfollowing the function call.
-
Return statements are defined as follows:
-
class
ReturnStmt(offset, value, eol)¶ A return statement:
return <value> <eol>
-
value: vc2_pseudocode_parser.parser.ast.Expr¶ A
Exprgiving the value to be returned.
-
eol: vc2_pseudocode_parser.parser.ast.EOL¶ EOLfollowing the return statement.
-
Assignment statements are defined as follows:
-
class
AssignmentStmt(variable, op, value, eol)¶ A simple or compound assignment statement:
<variable> <op> <value> <eol> # e.g. x += 1 + 2
-
variable: Union[vc2_pseudocode_parser.parser.ast.Variable, vc2_pseudocode_parser.parser.ast.Subscript]¶
-
op: vc2_pseudocode_parser.parser.operators.AssignmentOp¶ The type of assignment being performed (
AssignmentOp).
-
value: vc2_pseudocode_parser.parser.ast.Expr¶ The
Exprgiving the value being assigned.
-
eol: vc2_pseudocode_parser.parser.ast.EOL¶ EOLfollowing the assignment statement.
-
The following assignment operators are defined:
Expressions¶
AST nodes representing expressions in the AST are subclasses of
Expr.
Unary expressions are defined as follows:
-
class
UnaryExpr(offset, op, value)¶ A unary expression, e.g.
-foo.-
op: vc2_pseudocode_parser.parser.operators.UnaryOp¶ The operator (
UnaryOp)
-
value: vc2_pseudocode_parser.parser.ast.Expr¶ The
Exprthe operator applies to.
-
With unary operators enumerated as:
-
class
UnaryOp(value)¶ An enumeration.
-
plus= '+'¶
-
minus= '-'¶
-
bitwise_not= '~'¶
-
logical_not= 'not'¶
-
Binary expressions are defined as follows:
-
class
BinaryExpr(lhs, op, rhs)¶ A binary expression, e.g.
a + 1.-
lhs: vc2_pseudocode_parser.parser.ast.Expr¶ The
Expron the left-hand side of the expression.
-
op: vc2_pseudocode_parser.parser.operators.BinaryOp¶ The operator (
BinaryOp)
-
rhs: vc2_pseudocode_parser.parser.ast.Expr¶ The
Expron the right-hand side of the expression.
-
With binary operators enumerated as:
-
class
BinaryOp(value)¶ An enumeration.
-
logical_or= 'or'¶
-
logical_and= 'and'¶
-
eq= '=='¶
-
ne= '!='¶
-
lt= '<'¶
-
le= '<='¶
-
gt= '>'¶
-
ge= '>='¶
-
bitwise_or_= '|'¶
-
bitwise_xor= '^'¶
-
bitwise_and_= '&'¶
-
lsh= '<<'¶
-
rsh= '>>'¶
-
add= '+'¶
-
sub= '-'¶
-
mul= '*'¶
-
idiv= '//'¶
-
mod= '%'¶
-
pow= '**'¶
-
Calls to functions are defined as follows:
-
class
FunctionCallExpr(offset, offset_end, name, arguments)¶ A call to a function:
<name>(<arguments[0]>, <arguments[1]>, <...>) # e.g. foo(1, 2, 3)
-
arguments: List[vc2_pseudocode_parser.parser.ast.Expr]¶ The list of
Exprgiving the arguments to the call.
-
Uses of variables and subscripted variables are defined as follows:
-
class
VariableExpr(variable)¶ A use of a variable or subscripted variable.
Value literals are defined as follows:
-
class
BooleanExpr(offset, value)¶ A boolean literal, i.e.
TrueandFalse.
-
class
NumberExpr(offset, offset_end, value, display_base=10, display_digits=1)¶ A numerical literal integer, e.g.
123or0xF00.
-
class
EmptyMapExpr(offset, offset_end)¶ An empty map literal (e.g.
{}).
-
class
LabelExpr(label)¶ A label literal.
-
label: vc2_pseudocode_parser.parser.ast.Label¶ The
Labelused.
-
For parenthesised expressions, e.g. (1 + 2), the presence of the
parentheses is explicitly marked in the AST. While this is not semantically
important (since evaluation order is explicit in an AST) it may be helpful in
retaining parentheses added for human legibility when translating the
pseudocode into other forms.
-
class
ParenExpr(offset, offset_end, value)¶ A parenthesised expression.
-
value: vc2_pseudocode_parser.parser.ast.Expr¶ The parenthesised
Expr
-
Variables and subscripts¶
A Variable is defined as follows:
Variables may be subscripted (multiple times) and this is represented by a
nesting of Subscript objects:
-
class
Subscript(offset_end, variable, subscript)¶ A subscripted variable (e.g.
x[1]).-
variable: Union[vc2_pseudocode_parser.parser.ast.Subscript, vc2_pseudocode_parser.parser.ast.Variable]¶ The
Variablebeing subscripted (orSubscriptin the case of a multiply subscripted variable.
-
subscript: vc2_pseudocode_parser.parser.ast.Expr¶ The
Exprgiving the subscript value.
-
Labels¶
Labels are defined as follows:
Note
Labels and variables are syntactically ambiguous in the pseudocode language but are disambiguated in the AST. Names in the pseudocode are deemed to be variables if they correspond with function arguments, loop variables or assignment targets within a function’s namespace. All other names are deemed to be labels.
Comments and vertical whitespace¶
All comments and vertical whitespace (i.e. blank lines) are captured by the AST. This enables these non-semantic components to be retained in language translations.
-
class
Comment(offset, string)¶ An end-of-line comment.
-
class
EmptyLine(offset, offset_end, comment=None)¶ Represents an empty (or comment-only) line in the source.
-
comment: Optional[vc2_pseudocode_parser.parser.ast.Comment] = None¶ The
Commenton this line, if present.
-
Simple statements are syntactically terminated by an optional comment followed
by a newline and then a number of empty or comment-only lines. These details
are captured by the EOL node.
-
class
EOL(offset, offset_end, comment=None, empty_lines=<factory>)¶ The end-of-line terminator following a statement.
-
comment: Optional[vc2_pseudocode_parser.parser.ast.Comment] = None¶ A
Commenton the same line as the statement, if present.
-
empty_lines: List[vc2_pseudocode_parser.parser.ast.EmptyLine]¶ Any trailing
EmptyLineafter this statement.
-
For example, given a snippet as follows:
foo() # Comment 0
bar() # Comment 1
# Comment 2
# Comment 3
baz() # Comment 4
Here the function call bar() would be captured by a
FunctionCallStmt. The FunctionCallStmt.eol would
contain an EOL with EOL.comment set to a
Comment containing # Comment 1 and EOL.empty_lines
the following four lines (and their comments).
Function declarations, if, else if, else, for each, for and
while headings are separated from their bodies by a : and optional
EOL. When the : is followed by a newline, an EOL
object will be given, but for in-line definitions, the EOL will be
omitted. For example:
func1():
foo()
func2(): foo()
Here the Function for func1 will have Function.eol
set to an EOL node for func2 it will be None.
Grammar¶
The pseudocode language grammar can be described by its
peggie grammar:
# Grammar for the VC-2 specification pseudocode language
start <- any_ws @=function+ eof
# Function definition
function <- identifier ws function_arguments ws stmt_block
function_arguments <- "(" ws (identifier ws ("," ws identifier ws)* ","?)? ws ")"
# A series of statements
stmt_block <- ":" ws single_line_stmt
/ ":" eol @>((@=stmt)+)
# Statements (all end with an eol)
stmt <- if_else_stmt
/ for_each_stmt
/ for_stmt
/ while_stmt
/ function_call_stmt
/ return_stmt
/ assignment_stmt
single_line_stmt <- function_call_stmt
/ return_stmt
/ assignment_stmt
function_call_stmt <- function_call eol
if_else_stmt <- @=("if" ws condition ws stmt_block)
@=(@=("else" ws_ "if" ws condition ws stmt_block)*)
@=(("else" ws stmt_block)?)
for_each_stmt <- "for" ws_ "each" ws_ identifier ws_ "in" ws_ for_each_list ws stmt_block
for_stmt <- "for" ws_ identifier ws "=" ws expr ws_ "to" ws_ expr ws stmt_block
while_stmt <- "while" ws condition ws stmt_block
assignment_stmt <- variable ws assignment_op ws expr eol
return_stmt <- "return" ws_ expr eol
condition <- "(" ws expr ws ")"
for_each_list <- expr (ws "," ws expr)*
assignment_op <- r"(\+|-|\*|//|\*\*|&|\^|\||<<|>>)?="
# Expressions (defined below in ascending order of precidence)
expr <- maybe_log_or_expr
maybe_log_or_expr <- maybe_log_and_expr (ws_ "or" ws_ maybe_log_and_expr)*
maybe_log_and_expr <- maybe_log_not_expr (ws_ "and" ws_ maybe_log_not_expr)*
maybe_log_not_expr <- "not" ws_ maybe_log_not_expr / maybe_cmp_expr
maybe_cmp_expr <- maybe_or_expr (ws r"==|!=|<=|>=|<|>" ws maybe_or_expr)*
maybe_or_expr <- maybe_xor_expr (ws "|" ws maybe_xor_expr)*
maybe_xor_expr <- maybe_and_expr (ws "^" ws maybe_and_expr)*
maybe_and_expr <- maybe_shift_expr (ws "&" ws maybe_shift_expr)*
maybe_shift_expr <- maybe_arith_expr (ws r"<<|>>" ws maybe_arith_expr)*
maybe_arith_expr <- maybe_prod_expr (ws r"\+|-" ws maybe_prod_expr)*
maybe_prod_expr <- maybe_unary_expr (ws r"\*|//|%" ws maybe_unary_expr)*
maybe_unary_expr <- r"\+|-|~" ws maybe_unary_expr / maybe_pow_expr
maybe_pow_expr <- maybe_paren_expr (ws "**" ws maybe_unary_expr)*
maybe_paren_expr <- "(" ws expr ws ")" / atom
# Atoms
atom <- function_call
/ variable
/ empty_map
/ boolean
/ number
variable <- identifier (ws subscript)*
subscript <- "[" ws expr ws "]"
function_call <- identifier ws function_call_arguments
function_call_arguments <- "(" ws (expr ws ("," ws expr ws)* ","?)? ws ")"
# Literals
identifier <- !reserved_word r"[a-zA-Z_][a-zA-Z0-9_]*"
reserved_word <- r"(if|else|elif|elseif|for|each|foreach|in|to|while|return|[Tt]rue|[Ff]alse|and|or|not)(?![a-zA-Z0-9_])"
boolean <- ("True" / "False")
number <- r"(0[bB][01]+)|(0[xX][0-9a-fA-F]+)|([0-9]+)"
empty_map <- "{" ws "}"
# Whitespace and comments
comment <- r"#((?![\n\r]).)*(\n|\r\n|\r|(?!.))"
ws <- h_space?
ws_ <- h_space
any_ws <- (comment / h_space / v_space)*
eol <- h_space? (comment / v_space / eof) any_ws
h_space <- r"[ \t]+"
v_space <- "\n" / "\r\n" / "\r"
eof <- !.
Operator precedence and associativity tables¶
A table of operator precedence and associativities is also provided which may be useful for, for example, producing pretty-printed outputs (with excess parentheses removed).
-
OPERATOR_PRECEDENCE_TABLE= {operator: int, ...}¶ BinaryOpandUnaryOpoperator precedence scores. Higher scores mean higher precedence.
-
OPERATOR_ASSOCIATIVITY_TABLE= {operator: associativity, ...}¶