Scanning for ntlang

Due Wed Jan 31th by 11:59pm in your Lab01 GitHub repo

Links

Tests: https://github.com/USF-CS631-S24/tests

Autograder: https://github.com/phpeterson-usf/autograder

Overview

Our goal for Project01 will be to implement an interpreter for ntlang, which is short for Number Tool Language, that will be able compute expressions on numbers in different bases and be able to output computed values in different bases. In this lab we are going to work on the first part of the ntlang implementation, which is the scanner. You will extend given C program to scan tokens from input text. Scanning is one of the first steps in program source code processing (interpretation or compilation). You should do your development in a RISC-V environment.

Program Requirements

For scanners and parsers, it is common to describe syntax using EBNF (Extended Backus-Naur Form):

https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form

https://ics.uci.edu/~pattis/misc/ebnf2.pdf

For scanning, also called lexing (for lexical analysis), we convert input text into a sequence of tokens. We also call the specification of accepted tokens, “microsyntax” of a programming langauge.

Here is the EBNF for the microsyntax of ntlang:

tokens     ::= (token)*
token      ::= intlit | binlit | symbol
symbol     ::= '+' | '-' | '*' | '/' | '>>' | '<<' | '~' | '&' | '|' | '^' | '>-'
intlit     ::= digit (digit)*
binlit     ::= '0b' ['0', '1'] (['0', '1'])*
digit      ::= '0' | ... | '9'

# Ignore
whitespace ::= ' ' | '\t' (' ' | '\t')*

See the Wikipedia

You should use the following enum and strings array. You need to use these as is to get autograding to work properly:

enum scan_token_enum {
    TK_INTLIT, /* 1, 22, 403 */
    TK_BINLIT, /* 0b1010, 0b11110000 */
    TK_PLUS,   /* + */
    TK_MINUS,  /* - */
    TK_MULT,   /* * */
    TK_DIV,    /* / */
    TK_LSR,    /* >> */
    TK_ASR,    /* >- */
    TK_LSL,    /* << */
    TK_NOT,    /* ~ */
    TK_AND,    /* & */
    TK_OR,     /* | */
    TK_XOR,    /* ^ */
    TK_LPAREN, /* ( */
    TK_RPAREN, /* ) */
    TK_EOT     /* end of text */
};

char *scan_token_strings[] = {
    "TK_INTLIT",
    "TK_BINLIT",
    "TK_PLUS",
    "TK_MINUS",
    "TK_MULT",
    "TK_DIV",
    "TK_LSR",
    "TK_ASR",
    "TK_LSL",
    "TK_NOT",
    "TK_AND",
    "TK_OR",
    "TK_XOR",
    "TK_LPAREN",
    "TK_RPAREN",
    "TK_EOT"
};

Your Lab01 repo should have the following files:

Makefile
README
lab01.c
scan1.c
scan2.c

Note that the Makefile, README, scan1.c, and scan2.c are given in the starter code (the initial lab01 repo). You will need to add lab01.c to the Makefile. When you execute the make command it should generate a lab01 executable:

$ ls
Makefile  README.md  lab01.c  scan1.c  scan2.c
$ make
gcc -g -o scan1 scan1.c
gcc -g -o scan2 scan2.c
gcc -g -o lab01 lab01.c

For all labs and projects this semester the name of the main executable will be the name of the lab or project. For example, this is Lab01, so the name of the main executable will be lab01.

We will use the Unix make program to compile and link our programs. By default the make command looks for build rules in a file called Makefile. Here is the Makefile provided in the lab01 starter code:

# PROGS defines a list of each program to be generated
PROGS = scan1 scan2

# For each PROG we list all the required object files
SCAN1_OBJS = scan1.o
SCAN2_OBJS = scan2.o

# Pattern rule to assemble .s files into .o files
%.o: %.s
	as -g -o $@ $<

# Pattern rule to compile .c files into .o files
%.o: %.c
	gcc -c -g -o $@ $<

# First real rule that will initiate the build of all PROGS
all : $(PROGS)

scan1 : $(SCAN1_OBJS)
	gcc -g -o $@ $^

scan2 : $(SCAN2_OBJS)
	gcc -g -o $@ $^

clean :
	rm -f $(PROGS) $(SCAN1_OBJS) $(SCAN2_OBJS)

.PHONY: clean

While this looks a little complicated, this form will be useful when we start to use Assembly Language and build larger programs. The PROG variable hold a list of the final executables that will be generated by the Makefile. For each executable we define an OBJS variable, that contains a list of the constituent object files for an executable and these could be written in C or in Assembly Language. In this given Makefile, we provide rules to build two executables: scan1 and scan2. clean rule allows use to type make clean on the command line to remove any generated files. You will need to follow the structure provided to add lab01 as a generated executable from lab01.c.

Autograder

To run the Autograder tests for Lab01, make sure you have cloned the tests repo for the class and you have configured the Autograder to point to the location of you tests repo in your home directory. Once you have the autograder installed and configured, you should be able to run the Lab01 tests like this:

$ grade test -p lab01
. 01(10/10) 02(10/10) 03(20/20) 04(10/10) 05(20/20) 06(10/10) 07(20/20) 100/100

Note that the grade program can detect the lab or project being autograded by looked at the current directory. So, if you are in your lab01-<gitid> repo, you can just type grade test:

$ grade test
. 01(10/10) 02(10/10) 03(20/20) 04(10/10) 05(20/20) 06(10/10) 07(20/20) 100/100

Code Submission

You will submit your code in your Lab01 GitHub repo. You will be provided a link to create your repo. Please only submit your Makefile, lab01.c, and optionally a README.md file. You should not include any binary files such as executables or object files. In general, you should not include any files that will be generated as a result of building your program with make.

Rubric

100% Lab01 autograder tests.