Princeton University
COS 333: Advanced Programming Techniques

Assignment 1: A Registrar Application: Baseline Version

Purpose

The purpose of this assignment is to help you learn or review the Python programming language and database programming in Python. If done properly, the assignment also will give you practice at composing modular code.

Rules

Make sure you study the COS 333 Policies web page before doing this assignment or any of the course's assignments.

Your Task

Any university registrar's office must maintain data about courses and classes offered during the upcoming semester. Those data typically are kept in a database. The registrar's office must provide an interface that allows students and other interested parties to query the database.

Pretend that you're working for Princeton's Registrar's Office. You're given a database containing data about courses and classes offered during an upcoming Princeton semester. (Actually the data are for the Spring 2010 semester.) Your task is to compose Python programs that allow Princeton students and other interested parties to query the database.

Subsequent assignments will be similar to this one. So it will be to your advantage to modularize your code so you easily can reuse code from this assignment in subsequent ones.

The Given Files

Browse to the TigerFile page for this assignment. Download the reg1.zip file. Then unzip that file to create files named reg.sqlite, ref_regoverviews.pyc, ref_regdetails.pyc, testregoverviewsgiven.py, testregdetailsgiven.py, and replace.py. Subsequent sections of this document describe those files.

The Database

The database is a SQLite database that is stored in a given file named reg.sqlite. The database consists of these five tables:

classes
Fields: classid, courseid, days, starttime, endtime, bldg, roomnum
courses
Fields: courseid, area, title, descrip, prereqs
crosslistings
Fields: courseid, dept, coursenum
coursesprofs
Fields: courseid, profid
profs
Fields: profid, profname

These are the relationships among the tables:

courses and classes:

For each row of the courses table with courseid x, there are 0 or more rows of the classes table with courseid x. (Some courses don't have a corresponding class; that is, a particular course might not be offered this semester. Some courses have 1 corresponding class. Some courses have more than 1 corresponding class.)

For each row of the classes table with courseid x, there is exactly 1 row of the courses table with courseid x. (Each class has exactly one corresponding course.)

courses and crosslistings:
For each row of the courses table with courseid x, there are 1 or more rows of the crosslistings table with courseid x. (Some courses have 1 department and course number. Some courses have more than 1 department and course number.)

For each row of the crosslistings table with courseid x, there is exactly one row of the courses table with courseid x.
courses and coursesprofs:
For each row of the courses table with courseid x, there are 0 or more rows of the coursesprofs table with courseid x. (Some courses don't have an assigned professor. Some courses have 1 professor. Some courses have more than one professor. Incidentally, I find it strange that Princeton maps professors to courses instead of mapping professors to classes.)

For each row of the coursesprofs table with courseid x, there is exactly one row of the courses table with courseid x.
coursesprofs and profs:
For each row of the coursesprofs table with profid x, there is exactly 1 row of the profs table with profid x. (Each professor has exactly one name!)

For each row of the profs table with profid x, there are 0 or more rows in the coursesprofs table with profid x. (Some professors are teaching 0 courses. Some are teaching 1 course. Some are teaching more than one course.)

Your first step must be to familiarize yourself with the database.

The `regoverviews.py` Program

You must compose two programs. The first program must be named regoverviews.py.

The given file named ref_regoverviews.pyc contains a reference program in bytecode form. Your regoverviews.py must have the same behavior as ref_regoverviews.pyc. That is, when given the same command-line arguments and database, your regoverviews.py and ref_regoverviews.pyc must write exactly the same (character-for-character) output to stdout and stderr, and must have the same exit status.

Run ref_regoverviews.pyc to determine the desired behavior of your regoverviews.py. Here are some highlights of that behavior...

When executed via a command that contains -h as a command-line argument, your regoverviews.py must display a help message:

$ python regoverviews.py -h
usage: regoverviews.py [-h] [-d dept] [-n num] [-a area] [-t title]

Registrar application: show overviews of classes

options:
  -h, --help  show this help message and exit
  -d dept     show only those classes whose department contains dept
  -n num      show only those classes whose course number contains num
  -a area     show only those classes whose distrib area contains area
  -t title    show only those classes whose course title contains title

Hint: Use the standard Python argparse module.

These example commands describe how your regoverviews.py must behave when given proper command-line arguments. Give them a try using ref_regoverviews.pyc:

$ python regoverviews.py
Display data for all classes.
$ python regoverviews.py -d COS
Display data for all classes whose department, when converted to lowercase letters, contains "cos".
$ python regoverviews.py -n 333
Display data for all classes whose course number, when converted to lowercase letters, contains "333".
$ python regoverviews.py -n b
Display data for all classes whose course number, when converted to lowercase letters, contains "b".
$ python regoverviews.py -a Qr
Display data for all classes whose distribution area, when converted to lowercase letters, contains "qr".
$ python regoverviews.py -t intro
Display data for all classes whose title, when converted to lowercase letters, contains "intro".
$ python regoverviews.py -t science
Display data for all classes whose title, when converted to lowercase letters, contains "science".
$ python regoverviews.py -t C_S
Display data for all classes whose title, when converted to lowercase letters, contains "c_s". The point is that your program, unlike SQL, must not interpret the underscore character as a wildcard character.
$ python regoverviews.py -t c%S
Display data for all classes whose title, when converted to lowercase letters, contains "c%s". The point is that your program, unlike SQL, must not interpret the percent character as a wildcard character.
$ python regoverviews.py -d cos -n 3
Display data for all classes whose department when converted to lowercase letters contains "cos" and whose course number when converted to lowercase letters contains "3".
$ python regoverviews.py -d cos -a qr -n 2 -t intro
Display data for all classes whose department when converted to lowercase letters contains "cos" and whose area when converted to lowercase letters contains "qr" and whose course number when converted to lowercase letters contains "2" and whose title when converted to lowercase letters contains "intro".
$ python regoverviews.py -t "Independent Study"
Display data for all classes whose title, when converted to lowercase letters, contains "independent study".
$ python regoverviews.py -t "Independent Study "
Display data for all classes whose title, when converted to lowercase letters, contains "independent study ".
$ python regoverviews.py -t "Independent Study "
Display data for all classes whose title, when converted to lowercase letters, contains "independent study ".
$ python regoverviews.py -t " Independent Study"
Display data for all classes whose title, when converted to lowercase letters, contains " independent study".
$ python regoverviews.py -t " Independent Study"
Display data for all classes whose title, when converted to lowercase letters, contains " independent study".
$ python regoverviews.py -t=-c
Display data for all classes whose title, when converted to lowercase letters, contains "-c".

In short, your regoverviews.py must display the classid, dept, coursenum, area, and title of each class that matches the criteria specified by the user via command-line arguments.

Your regoverviews.py must use SQL prepared statements to protect the database against SQL injection attacks.

Your regoverviews.py must write its output in a human-readable format. These are some salient characteristics of the output:

The output must have have the appearance of a table.
Each column must have a title.
Each title must be underlined with hyphens.
The rows must sorted. The primary sort must be by dept in ascending order, the secondary sort must be by coursenum in ascending order, and tertiary sort must be by classid in ascending order.
Within each row, each line must consist of no more than 72 characters, not including the newline character.
Within each row, each line must end after a word, not within a word. That is, no newline characters may appear within words.

Normally your regoverviews.py must terminate with exit status 0. If it detects a database-related error, then it must terminate with exit status 1. If it detects erroneous command-line arguments, then it must terminate with exit status 2 — as is the default behavior of argparse.

Hint: Use the Python percent (%) operator and C-style conversion specifications to format each row of output. An expression of this form works:

row = '%5s %4s %6s %4s %s' % (classid, dept, coursenum, area, title)

(There are other reasonable approaches; but given your knowledge from COS 217 of the C language and its printf() conversion specifications, that approach is likely to be the most straightforward for you.)

Then, having generated each row, use the standard Python textwrap module to wrap it to lines. Within each row, lines after the first must be indented using 5+1+4+1+6+1+4+1=23 spaces.

Hint: Consider the fact that a SQL statement of this form:

SELECT somefields FROM sometable WHERE somefield LIKE '%'

has the same semantics as a SQL statement of this form:

SELECT somefields FROM sometable

That fact could help you to express the logic in your regoverviews.py more succinctly.

The `regdetails.py` Program

Your second program must be named regdetails.py.

The given file named ref_regdetails.pyc contains a reference program in bytecode form. Your regdetails.py must have the same behavior as ref_regdetails.pyc. That is, when given the same command-line arguments and database, your regdetails.py and ref_regdetails.pyc must write exactly the same (character-for-character) output to stdout and stderr, and must have the same exit status.

Run ref_regdetails.pyc to determine the desired behavior of your regdetails.py. Here are some highlights of that behavior...

When executed via a command that contains -h as a command-line argument, your regdetails.py must display a help message:

$ python regdetails.py -h
usage: regdetails.py [-h] classid

Registrar application: show details about a class

positional arguments:
  classid     the id of the class whose details should be shown

options:
  -h, --help  show this help message and exit

Hint: Use the the standard Python argparse module.

When executed via a command that contains a valid classid, your regdetails.py must write to stdout the classid, courseid, days, starttime, endtime, bldg, roomnum, dept(s), coursenum(s), area, title, descrip, prereqs, and profname(s) for the class with the given classid. These commands illustrate. Give them a try using ref_regdetails.pyc:

$ python regdetails.py 8321
$ python regdetails.py 9032
$ python regdetails.py 8293
$ python regdetails.py 9977
$ python regdetails.py 9012
$ python regdetails.py 10188

Your regdetails.py must use SQL prepared statements to protect the database against SQL injection attacks.

Your regdetails.py must write its data in a human-readable format. These are some salient characteristics of that format:

Titles and blank lines must appear where appropriate.
The data in the "depts and coursenums" rows must be sorted primarily by dept in ascending order and secondarily by coursenum in ascending order.
The data in the "profnames" rows must be sorted by profname in ascending order.
Within each row, each line must consist of no more than 72 characters, not including the newline character.
Within each row, each line must end after a word, not within a word. That is, no newline characters may appear within words.

Normally your regdetails.py must terminate with exit status 0. If it detects a database-related error, or if no class has the specified classid, then it must terminate with exit status 1. If it detects erroneous command-line arguments, then it must terminate with exit status 2 — as is the default behavior of argparse.

Hint: Use the standard Python textwrap module to wrap each row to lines.

Hint: Consider whether it's reasonable (or even possible!) to retrieve all required data from the database by executing a single SQL SELECT statement. Consider whether it's more reasonable to retrieve the required data by executing multiple SQL SELECT statements.

Error Handling

Your regoverviews.py and regdetails.py must be robust.

Your regoverviews.py must handle erroneous command-line arguments gracefully. These commands illustrate. Give them a try using ref_regoverviews.pyc:

$ python regoverviews.py a qr
$ python regoverviews.py -A qr
$ python regoverviews.py "-a " qr
$ python regoverviews.py -a qr st
$ python regoverviews.py -a
$ python regoverviews.py -a qr -d
$ python regoverviews.py -a -d cos
$ python regoverviews.py -x

Your regdetails.py must handle erroneous command-line arguments gracefully. These commands illustrate. Give them a try using ref_regdetails.pyc:

$ python regdetails.py
$ python regdetails.py 8321 9032
$ python regdetails.py abc123
$ python regdetails.py 9034

Your regoverviews.py and regdetails.py must handle "database cannot be opened" errors. If the database cannot be opened, then your programs must write a descriptive error message — the one contained within the thrown Exception object — to their stderr.

Your regoverviews.py and regdetails.py also must handle "corrupted database" errors. In the context of this course, a corrupted database is one that causes the database driver, upon interaction with the database, to throw an exception. In the case of SQLite, a corrupted database might consist of an invalid SQLite file — for example, a file that's empty or contains simple text. More generally, a corrupted database might be a database that is missing a table that the interaction requires, or is missing a column/field that the interaction requires. If the database is corrupted such that the SQLite driver's execution of a SELECT statement throws an exception, then your programs must write a descriptive error message — the one contained within the thrown Exception object — to their stderr.

A Common Question and its Answer

Suppose the user runs regdetails.py for a particular classid, the program queries the classes table to find the corresponding courseid, and the program then queries the database to fetch the row in the courses table with that courseid. Furthermore, suppose the database is corrupted such that no row with that courseid exists in the courses table. Must our regdetails.py handle that particular database corruption error?

Answer

No. Your grader won't test your regdetails.py program on that error. Generalizing...

Database management systems can enforce foreign key integrity constraints on the databases that they manage. So any DBMS would not allow its databases to contain that kind of violation. And so client programs would not need to check for such violations.

That's a very good thing. With a richer database, it would be very difficult for a client program systematically to check for foreign key integrity constraint violations. And it would be absurdly redundant for every client program to check for such violations.

So it's more realistic to compose your regoverviews.py and regdetails.py such that they don't check for foreign key integrity constraint violations. And so it's fine to compose your programs such that they check for only the kinds of database-related errors that are described in this specification: file-level errors and schema-level errors. The first easily could happen if the reg.sqlite file is missing from the working directory. The second could happen if, for example, the reg.sqlite file is present but is missing some table.

Each error message written by your regoverviews.py must be prefixed with sys.argv[0] (which is normally but not necessarily "regoverviews.py") followed by a colon and a space. Similarly, each error message written by your regdetails.py must be prefixed with sys.argv[0] (which is normally but not necessarily "regdetails.py") followed by a colon and a space.

Testing

We'll cover software testing techniques in lectures later in the semester. In the meantime, to test your programs it will be sufficient to rely upon (1) your knowledge of testing from the COS 217 course, and (2) this A Software Testing Taxonomy document.

Test your regoverviews.py and regdetails.py by (1) reviewing this assignment specification thoroughly, making sure that your programs conform to every aspect of it, and (2) comparing the behavior of your programs with the behavior of the reference programs.

Perform boundary (alias corner case) testing. Of course, make sure that your programs handle normal data. But also make sure that your programs handle unusual data: courses that have multiple cross-referenced departments/numbers, long titles, long descriptions, multiple professors, no professors, and so forth. Also make sure that your programs handle errors: erroneous command-line arguments, database cannot be opened, corrupted database, and so forth.

Perform statement (alias coverage) testing. Your tests should cause every statement of your regoverviews.py and regdetails.py to be executed.

Make a copy of the testregoverviewsgiven.py file; name the copy testregoverviews.py. Also make a copy of the testregdetailsgiven.py file; name the copy testregdetails.py. Automate your testing using the testregoverviews.py, testregdetails.py, and replace.py programs. First study those programs. Then issue these commands to automate your testing of your regoverviews.py:

python testregoverviews.py regoverviews.py > out1 2>&1
python testregoverviews.py ref_regoverviews.pyc > out2 2>&1
python replace.py out2 ref_regoverviews.pyc regoverviews.py

The first command runs testregoverviews.py on your regoverviews.py, and stores the output in a file named out1. The second command runs testregoverviews.py on the given ref_regoverviews.pyc, and stores the output in a file named out2. The third command replaces, within out2, each occurrence of "ref_regoverviews.pyc" (such as might appear at the beginning of error messages) with "regoverviews.py".

Then compare the contents of the out1 and out2 files: on a Mac or Linux computer issue the command diff out1 out2; on a MS Windows computer issue the command fc out1 out2. The command should indicate that the contents of out1 and out2 are identical.

Thereafter, edit your testregoverviews.py to add additional tests, and repeat that sequence

Similarly, issue these commands to automate testing of your regdetails.py:

python testregdetails.py regdetails.py > out3 2>&1
python testregdetails.py ref_regdetails.pyc > out4 2>&1
python replace.py out4 ref_regdetails.pyc regdetails.py

Then compare the contents of the out3 and out4 files: on a Mac or Linux computer issue the command diff out3 out4; on a MS Windows computer issue the command fc out3 out4. The command must indicate that the contents of out3 and out4 are identical.

Thereafter, edit your testregdetails.py to add additional tests, and repeat that sequence.

Hint: If you have trouble understanding the output of the diff out1 out2 command, then widen your terminal window and issue the command diff -y out1 out2. The output will be a side-by-side display of the two files, with markers denoting lines that differ. Same for diff out3 out4.

Hint: Instead of using diff or fc to compare files, you could use a file comparing website such as Diffchecker.

Program Style

Your programs must be well styled. Generally, by consensus of the Python community, good Python style is defined by the PEP 8 -- Style Guide for Python Code website.

The Python community has developed a static code analysis tool -- a linter tool -- named pylint. The pylint tool is similar to the splint tool that you used in COS 217: just as splint (configured with a .splintrc file) generates a report critiquing the style of given C code, so pylint (configured with a .pylintrc file) generates a report critiquing the style of given Python code. The pylint tool enforces many of the PEP 8 guidelines, and some additional guidelines too. In COS 333 "good style" is defined by the pylint tool, as configured with the given .pylintrc file. The lecture example programs (with a few intentional or unavoidable exceptions) cause pylint to generate perfect reports, and your assignment programs should do the same.

Using pylint is easy. You can critique one file comprising your programs at a time:

python -m pylint regoverviews.py
python -m pylint regdetails.py
...

However, a better approach is to critique all files comprising your programs at the same time:

python -m pylint regoverviews.py regdetails.py ...

When given multiple files, pylint performs some cross-file critiquing. So the "all files at the same time" approach may generate warnings in addition to those generated by the "one file at a time" approach. Make sure you use the "all files at the same time" approach, as your grader will.

Advice

The command-line format of regoverviews.py is incomplete in the logical sense. Most notably the format doesn't allow the user to express OR relationships in queries. For example, the format doesn't allow the user to express queries of the form "Display data for all classes whose area is 'qr' OR 'st'." or "Display data for all classes whose dept is 'cos' OR whose area is 'qr'." Don't be concerned about that. Remember that the purpose of the assignment is not to develop a great registrar's office application per se. Instead its purpose is to give you experience with database programming — experience that, we hope, will help you when developing your project, and beyond.
Modularize your code, at both the file level and the function level, as much as you can. Good modularity will help you to reuse your code from this assignment in the next assignment.
Make sure that the output of your regoverviews.py has the same format as does the output of ref_regoverviews.pyc. If it doesn't, then (no doubt) your testing programs will detect many differences between the output of the two programs, and it will be difficult for you to determine how well your regoverviews.py works. Moreover, if it doesn't, then your regoverviews.py will fail many of our automated checks, it will be difficult for your grader to determine if how well your regoverviews.py works, and your grade will suffer. Same for your regdetails.py.

Submission

Compose a readme file. Your readme file must contain:

Your full name and netid and your teammate's full name and netid.
A paragraph describing your contribution, and another paragraph describing your teammate's contribution. Please be thorough; we're looking for two substantial paragraphs, not a sentence or two.
A description of whatever help (if any) you received from other people while doing the assignment, as prescribed by the course Policies web page.
A description of the sources of information that you used while doing the assignment.
An indication of how much time you spent doing the assignment.
Your assessment of the assignment: Did it help you to learn? What did it help you to learn? Do you have any suggestions for improvement? Etc.
(Optionally) Any information that will help us to grade your work in the most favorable light. In particular you must describe all known bugs.

Your readme file must be a plain text file. Don't create your readme file using Microsoft Word or any other word processor.

Submit your assignment files using the TigerFile page for this assignment. Make sure you submit regoverviews.py, regdetails.py, testregoverviews.py, testregdetails.py, any .py files used by those programs, and readme.

Grading

Assume that your grader already has activated the cos333 virtual environment before he/she runs your programs. The document from the first lecture entitled A COS 333 Computing Environment describes the cos333 virtual environment.

Your grade will be based upon:

Correctness: the correctness of your programs as specified by this document and the given reference programs. If your code fails the tests on some particular functionality, then your grader won't be able to inspect your code manually to try to assign partial credit for that functionality. So please make sure your code behaves properly.
Style: the quality of your program style as reported by the pylint tool, when using the given .pylintrc file, and when executed via the command python -m pylint *.py. Ten points (that is, ten percent) of your grade will be based upon the quality of your program style as reported by pylint. Your grader will start with the 10-point score reported by pylint. Your grader then will "round down" that score to the 0.5 level to compute your program style grade. For example, if your pylint score is 9.8, then your program style grade will be 9.5; if your pylint score is 7.4, then your program style grade will be 7.0. Your grader will not run pylint on your testregoverviews.py or testregdetails.py programs.
Testing: the quality of your testregoverviews.py and testregdetails.py programs. Two points (that is, two percent) of your grade will be based upon your testregoverviews.py program. You'll receive two points if your testregoverviews.py program works and is reasonably thorough, one point if your testregoverviews.py program works and is minimal, and zero points if your testregoverviews.py program doesn't work or you didn't submit it. Similarly, two points of your grade will be based upon your testregdetails.py program.

Optional: Automated Statement Testing

To support your statement testing, you're encouraged (but not required) to use the Python coverage tool to generate a coverage report showing which lines of your application have and have not been executed by your tests. These are the steps:

Repeatedly issue commands of the form python -m coverage run -p regoverviews.py arguments. Each of those commands runs your regoverviews.py with the specified arguments, and generates a coverage report in a file named .coverageX (for some X). The report indicates which lines of your regoverviews.py were executed and which were not.
Repeatedly issue commands of the form python -m coverage run -p regdetails.py argument. Each of those commands runs your regdetails.py with the specified argument, and generates a coverage report in a file named .coverageX (for some X). The report indicates which lines of your regdetails.py were executed and which were not.
Issue the command python -m coverage combine to combine the coverage reports generated by steps 1 and 2 into one large coverage report in a file named .coverage.
Issue the command python -m coverage html to use the .coverage file to generate a human-readable report as a set of HTML documents in a directory named htmlcov.
Browse to htmlcov/index.html to check the report.
Ideally, the files in your htmlcov directory should show that 100% of your programs' lines were executed. If the report shows less than 100% coverage, then revise your testing plan accordingly, delete the .coverage* files and the htmlcov directory, and repeat steps 1 through 5.

Princeton University COS 333: Advanced Programming Techniques