MC911 – Semantic Analysis

Project Guidelines

In this phase, you will extend your parser with semantic analysis, like type checking.

Third Step – Decorating AST

First, you will need to define a symbol table that keeps track of previously declared identifiers. The symbol table will be consulted whenever the compiler needs to lookup information about new types, variables and constant declarations. For instance:

class SymbolTable(dict):
    """
    Class representing a symbol table. It should
    provide functionality for adding and looking
    up nodes associated with identifiers.
    """
    def __init__(self, decl=None):
        super().__init__()
        self.decl = decl
    def add(self, name, value):
        self[name] = value
    def lookup(self, name):
        return self.get(name, None)
    def return_type(self):
        if self.decl:
            return self.decl.mode
        return None

Data Types

Next, you will need to define objects that represent the different builtin datatypes and record information about their capabilities.

You will define classes representing types in Lya programming language. First, define a general class used to represent all types. Each type is then a singleton instance of the type class.

class ExprType(object):
      pass

int_type = ExprType("int",...)
bool_type = ExprType("bool",...)
char_type = ExprType("char",...)
string_type = ExprType("string", ...)

The contents of the type class is entirely up to you. However, you will minimally need to encode some information about what operators are supported (+, -, *, etc.), default values, etc.

Once you have defined the built-in types, you will need to make sure they get registered with any symbol tables or code that checks for type names. Example:

class Environment(object):
    def __init__(self):
        self.stack = []
        self.root = SymbolTable()
        self.stack.append(self.root)
        self.root.update({
            "int": int_type,
            "char": char_type,
            "string": string_type,
            "bool": bool_type
        })
    def push(self, enclosure):
        self.stack.append(SymbolTable(decl=enclosure))
    def pop(self):
        self.stack.pop()
    def peek(self):
        return self.stack[-1]
    def scope_level(self):
        return len(self.stack)
    def add_local(self, name, value):
        self.peek().add(name, value)
    def add_root(self, name, value):
        self.root.add(name, value)
    def lookup(self, name):
        for scope in reversed(self.stack):
            hit = scope.lookup(name)
            if hit is not None:
                return hit
        return None
    def find(self, name):
        if name in self.stack[-1]:
            return True
        else:
            return False

Fourth Step – Semantic Rules

Finally, you’ll need to write code that walks the AST and enforces a set of semantic rules. Here is a list example of things you’ll need to check:

1. Names and symbols:

All identifiers must be defined before they are used. This includes variables, constants, and typenames. For example, this kind of code generates an error:

        a = 3;   // Error. ’a’ is not defined.
        dcl a int;

Note: mode names such as ”int”, ”bool”, and ”char” are built-in names that should be defined at the start of the program.

2. Types of literals:

All literal symbols must be assigned a type of ”int”, ”bool”, or ”string”. For example:

        syn a = 42; // "int" 
        syn b = ’b’; // "char"
        syn c = "hello"; // "string" 
        syn d = true; // "bool"

To do this assignment, check the Python type of the literal value and attach a type name as appropriate.

3. Binary operator type checking

Binary operators only operate on operands of the same type and pro- duce a result of the same type. Otherwise, you get a type error. For example:

         dcl a int = 2; // Ok
         dcl b char = 3; // Error. 3 is not char
         dcl c int = a+3; // Ok
         dcl d int = a+b; // Error. int + char
         dcl e bool = true; // Ok

4. Unary operator type checking

Unary operators return a result that’s the same type as the operand.

5. Supported operators

Here are the operators supported by each type:

       int:    binary { +, -, *, /, %, ==, !=, >, >=, <, , >=, <, <= },
               unary { }
       string: binary { +, ==, != },
               unary { }
       bool:   binary { ==, != },
               unary { ! }

Attempts to use unsupported operators should result in an error. For example:

       dcl a chars[10] = "Hello" + "World"; // Ok
       dcl b chars[10] = "Hello" * "World"; // Error (unsupported op *)

6. Assignment

The left and right hand sides of an assignment operation must be declared as the same type. Values can only be assigned to variable declarations, not to constants.

For walking the AST, use the NodeVisitor class defined in lya ast.py. A shell of the code is provided below.

class Visitor(NodeVisitor):
    """
    Program Visitor class. This class uses the visitor pattern as
    described in lya_ast.py.   It’s define methods of the form
    visit_NodeName() for each kind of AST node that we want to process.
    Note: You will need to adjust the names of the AST nodes if you
    picked different names.
    """
    def __init__(self):
        self.environment = Environment()
        self.typemap = {
            "int": IntType,
            "char": CharType,
            "string": StringType,
            "bool": BoolType
        }
    def raw_type_unary(self, node, op, val):
        if hasattr(val, "check_type"):
            if op not in val.check_type.unary_ops:
                error(node.lineno,
                      "Unary operator {} not supported".format(op))
            return val.check_type
    def raw_type_binary(self, node, op, left, right):
        if hasattr(left, "check_type") and hasattr(right, "check_type"):
            if left.check_type != right.check_type:
                error(node.lineno,
                "Binary operator {} does not have matching types".format(op))
                return left.check_type
            errside = None
            if op not in left.check_type.binary_ops:
                errside = "LHS"
            if op not in right.check_type.binary_ops:
                errside = "RHS"
            if errside is not None:
                error(node.lineno,
                      "Binary operator {} not supported on {} of
                      expression".format(op, errside))
        return left.check_type

def visit_Program(self,node):
    self.environment.push(node)
    node.environment = self.environment
    node.symtab = self.environment.peek()
    # Visit all of the statements
    for stmts in node.stmts: self.visit(stmts)
def visit_SynStmt(self, node):
    # Visit all of the synonyms
    for syn in node.syns:
        self.visit(syn)
def visit_UnaryExpr(self,node):
    self.visit(node.expr)
    # Make sure that the operation is supported by the type
    raw_type = self.raw_type_unary(node, node.op, node.expr)
    # Set the result type to the same as the operand
    node.raw_type = raw_type
def visit_BinaryExpr(self,node):
    # Make sure left and right operands have the same type
    # Make sure the operation is supported
    self.visit(node.left)
    self.visit(node.right)
    raw_type = self.raw_type_binary(node, node.op, node.left, node.right)
    # Assign the result type
    node.raw_type = raw_type