Sometime last year, I was reading the documentation for Raku, formerly known as Perl 6. I’m a programming language aficionado and I used to write Perl. So, I was eager to catch up on how the language evolved and what decisions were made about its design. While reading through the guide on a cozy Saturday night as I’m known to do, I saw this interesting tidbit: Raku can do operations on number characters from various human languages and scripts, such as…Arabic. That’s great: I happen to know Arabic as a second language. I then wondered, “If it can operate on Arabic numbers, it follows that maybe it can do other things with Arabic text?” And the answer is yes, but why stop there?
You see, Raku has an in-built (EBNF) Grammar feature for writing parsers, which is perfect for writing Domain Specific Languages (DSL). Suddenly, the idea hit me: Why aren’t there any programming languages with keywords and literals in other scripts? Maybe it can be done in our era given Unicode and the superior multi-lingual capabilities of modern operating systems. Maybe, Raku can make it possible for me to write one. And indeed it, did.
But, one thing really bothered me about using Raku as the implementation language. No, it’s not the irony of implementing one interpreted language with another interpreted language. It was only that Raku doesn’t yet have a way of generating native binaries. That means anyone wanting to use my ARABASIC interpreter had to install Raku and then run my interpreter. That limited usage to already experienced computer users, possibly even just programmers. I wanted it to be more widely available.
Time to retool
I finally settled on a parser generator tool, ANTLR4 to avoid getting bogged down by hand-writing a lexer, tokenizer and parser (however much fun it could be, I do have a deliverable here). ANTLR4 can generate a parser in a variety of languages, such as Python, C++ and Java plus others. I have to decide which one to use.
Python is great because it’s rather easy to code in and has some third-party libraries for making native executables. But, all of the usable ones include the python runtime which makes binaries rather large, in fact too large for my liking.
Java would work out better, since we use it at work and I had studied for Oracle’s Java certification (but dropped it after starting at OSU = too busy). Plenty of very performant XML parsers are written in Java. And, Java applications can be “nativized” by compiling with GraalVM.
And then there’s C++. Now, C++ and C are what most programming language interpreters are written in. Since I want to enter this sub-field of computer science, it makes sense to generate the lexer and parser into C++. The catch is that I only have experience with C and none at all with it’s object-oriented cousin, C++. What a dilemma: reuse what I know from work or start green with C++?
I must keep in mind that the focus of this project is to deliver an Arabic BASIC interpreter as a stand-alone executable. The goal of the project is not to also learn C++ (although I suspect I’ll get fairly good at C++ by June if I choose this path). I do think having dual purposes in a project simply lengthens the timeline. I know from experience that projects were lengthened quite a bit when I had to learn a fundamental technology from scratch, such as React. But, the allure of writing an interpreter in the same language as the Big Ones is strong.
I’ll tell you next week which one :-)
Leave a comment