The Clash of the Scopes and Symbols

In my ArabicBASIC interpreter I started out with a single symbol table which actually performed several roles: storing symbols and their associated values. The first role is indeed the symbol table’s role, but the second really belongs to the concept of scope. Scope not only keeps track of a symbol’s value, it also tracks where that symbol can be accessed: globally or only within a block or from a parent block and so on.

So naturally, I treated the symbol table as a scope table. By the time I got around to architecting how ArabicBASIC would handle sub-routines and functions, I discovered that there really could be a need for both as separate entities. Reasoning about forward references helped me to see the sense in having separate symbol and scope tables. In ArabicBASIC, I would like a coder to have the option of calling a user-defined function or subroutine before it’s defined; this requires forward references. Enabling a code style of grouping function definitions at the end of the file instead of cluttering the head improves code readability, I think, so it’s worth it.

However, in my current implementation, a symbol is only added to the table when it is encountered by the parse tree visitor, i.e. in the order it was written in the source script. Thus, forward references are not possible like this. Two visits to the parse tree are necessary, and at first I felt that this is better handled by two independent data structures. A first pass at the parse tree would push each symbol into the Symbol table without any care for its value from an assignment nor the subroutine body. The second pass would then execute assignments and expression evaluations.

So, I might need a separate symbol table to store all the symbols so that ArabicBASIC can support forward references. But, wait maybe I shouldn’t have two separate data structures because what’s the purpose of the symbol table then if two data structures basically mirror each other? And, why can’t I just populate the global scope with symbols with null values on the first pass?

But, now I’m confused and I stand at a fork in the road: two tables or one table with two, possibly tightly coupled roles? What shape it should all take is really a software engineer’s choice: this is where the art comes in and the science of formal theory recedes…just a little.

What do they say about premature optimization? Right, it is evil (and time consuming). So let’s get back to, er, BASICs shall we :-) and consider the following while I sleep on the original question of one or two data structures:

  • BASIC has only a global scope. There are no function scopes nor local blocks, for example
  • Per ECMA-116, the 1986 specification for BASIC, variables must be declared and assigned before being referenced –> No forward references for variables. I like this.

I also need a scope table to track the current state of symbols only representing Variables. Scope is often implemented as a stack, but it can also be a hash map (dictionary) for faster resolving (i.e. accessing a value)

After all, there’s no reason that functions have to be global = they should also be in the Scope data structure. In fact, there’s good reasons for functions not to be global!


Leave a comment