The purpose of this assignment is to help you learn (1) arrays and pointers in the C programming language, (2) how to create and use stateless modules in C, (3) the design by contract style of programming, and (4) how to use the Linux operating system and the GNU programming tools, especially bash
, emacs
, gcc217
, and gdb
.
Students from past semesters reported taking, on average, 10 hours to complete this assignment.
This assignment is an individual assignment, not a partnered assignment.
Composing the two Str_search
functions (as described below) is the challenge part of this assignment. While doing the challenge part of the assignment, you are bound to observe the course policies regarding assignment conduct as given in the course Policies web page, plus one additional constraint: you may not use any "human" sources of information. That is, you may not consult with the course's staff members, the Intro Lab TAs, other current students via Ed, or any other people while working on the challenge part of an assignment, except for clarification of requirements.
The challenge part is worth 16 percent of this assignment. So if you don't do any of the challenge part and all other parts of your assignment solution are perfect, then your grade for the assignment will be 84 percent.
As you know, the C programming environment contains a standard library. The facilities provided in the standard library are declared in interface (alias header, alias .h
) files. One of those interface files is string.h
; it contains the declarations of string handling functions, that is, functions that perform operations on character strings. Chapter 13 and Appendix D of the King textbook and the Linux man
pages describe the string handling functions.
Your task is to use C to compose a Str
module that contains versions of the most commonly used standard string handling functions. Specifically, design your Str
module so it contains these functions, each of which behaves the same as a corresponding standard C function:
Str Function
|
Standard C Function |
Str_getLength
|
strlen
|
Str_copy
|
strcpy
|
Str_concat
|
strcat
|
Str_compare
|
strcmp
|
Str_search
|
strstr
|
Design each function definition so it calls the assert
macro to validate the function's parameters. Specifically, design each function definition to assert that each parameter is not NULL
.
Don't add any other calls of assert
to your code. But consider whether it would be possible to do so. In particular, provide an answer to this question in your readme
file:
Is it possible forStr_copy
orStr_concat
to callassert
to verify that the destination memory area specified by the caller is large enough? Explain.
Create two implementations of your Str
module. Place the first implementation in a file named stra.c
. Design the function definitions in stra.c
such that they use array notation as much as possible, and traverse each given string using an index relative to the beginning of the string. For example, in stra.c
you might define the Str_getLength
function like this:
size_t Str_getLength(const char pcSrc[]) { size_t uLength = 0; assert(pcSrc != NULL); while (pcSrc[uLength] != '\0') uLength++; return uLength; }
In your assignment solution you may use any of the definitions of theStr_getLength
function given in this assignment specification (other than the one marked "unacceptable").
Note that the type of uLength
is size_t
. The type size_t
is defined in the standard interface file stddef.h
. It is a system-dependent unsigned integer type that is large enough to hold the length of any string. Typically it is defined to be an alias for either unsigned int
or unsigned long
. On armlab, it is an alias for unsigned long
. Several of the standard string handling functions use type size_t
, and so several of your functions must use it too.
Place the second implementation in a file named strp.c
. Design the function definitions in strp.c
such that they use pointer notation as much as possible, and traverse each given string using an incremented pointer — not using an index relative to the beginning of the string. For example, in strp.c
you might define the Str_getLength
function like this:
size_t Str_getLength(const char *pcSrc) { const char *pcEnd; assert(pcSrc != NULL); pcEnd = pcSrc; while (*pcEnd != '\0') pcEnd++; return (size_t)(pcEnd - pcSrc); }
It is unacceptable to create the strp.c
implementation from the stra.c
implementation simply by replacing each occurrence of a[i]
with *(a+i)
. For example, in strp.c
this definition of Str_getLength
would be unacceptable:
size_t Str_getLength(const char *pcSrc) /* unacceptable */ { /* unacceptable */ size_t uLength = 0; /* unacceptable */ assert(pcSrc != NULL); /* unacceptable */ while (*(pcSrc + uLength) != '\0') /* unacceptable */ uLength++; /* unacceptable */ return uLength; /* unacceptable */ } /* unacceptable */
In strp.c
that function definition would be unacceptable because (1) it traverses the given string by incrementing the integer uLength
rather than by incrementing a pointer, and (2) effectively it uses uLength
as an index relative to the beginning of the string pcStr
.
Place the Str
module's interface in a file named str.h
. You may use either array or pointer notation in the interface. The two notations are equivalent to the compiler, so both implementations match the one interface. Use the #ifndef...#define...#endif
construct to protect your str.h
file against accidental multiple inclusion into the same .c
file.
This assignment does not focus on efficiency. Nevertheless, your functions must not be grossly inefficient. For example, a function is grossly inefficient if it traverses a (potentially long) string multiple times when a single traversal would suffice.
Design your Str
functions so they do not call any of the standard string handling functions. In the context of this assignment, pretend that the standard string handling functions do not exist. However your functions may call each other, and you may define additional (static
, non-interface) functions.
Beware of type mismatches. In particular, beware of the difference between type size_t
and type int
: a variable of type size_t
can store larger integers than a variable of type int
can. Your functions must (in principle) be able to handle strings whose lengths exceed the capacity of type int
. Also beware of type mismatches related to the use of the const
keyword. Using the splint
tool, as described below, will help you detect type mismatches.
The C preprocessor performs three jobs:
Merge physical lines of source code into logical lines.
Remove comments from ("de-comment") the source code.
Handle preprocessor directives (#define
, #include
, etc.) that reside in the source code.
For Assignment 1 you composed a program that does the second of those jobs. For this assignment you must compose a program that mimics (a small subset of) the third of those jobs. More specifically:
One of the preprocessor directives that the preprocessor handles is #define
. In its simplest form, a #define
preprocessor directive maps a macro to a replacement string. The preprocessor replaces each instance of the macro with the replacement string in all code that follows the #define
directive.
For example, this #define
directive:
#define MAX 1000
maps the macro MAX
to the replacement string 1000
. When the preprocessor subsequently encounters an occurrence of the macro MAX
, the preprocessor replaces MAX
with 1000
. (Our King book describes the #define
directive in more detail.)
Your job in this part of the assignment is to compose a C program named replace
that mimics the preprocessor's handling of simple #define
directives. More specifically, you are given a partial program named replace.c
. Your job is to edit the file to make it complete.
Doing so, in part, involves completing the definition of the replaceAndWrite
function. A comment in replace.c
describes what that function must do. Define that function so it calls the assert
macro to make sure that each parameter is not NULL
. Don't be concerned about the context of the replacements; for example, define your function such that it performs replacements even if the text to be replaced occurs within quotes, comments, etc.
replace.c
must be a client of your Str
module. That is, replace.c
must call functions declared in str.h
and defined in either stra.c
or strp.c
in meaningful ways, whenever reasonably possible, to perform its task. It's fine for replace.c
to do low-level string manipulation, of the kind found within the definitions of your Str
functions, in places where it would be awkward to call your Str
functions. In such low-level string-manipulation code you may use either array notation or pointer notation.
Create your program in the environment of your choice, staging the files to the armlab cluster to build, test, debug, and submit.
Get a working copy of your own version of the COS 217 assignment repository for this assignment by completing the Setup Step 5 from the Git and GitHub Primer document from the first precept. The COS 217 assignment repository for this assigment is: https://github.com/COS217/Strings
The repository contains files that you will find useful, which are described in subsequent steps.
stra.c
Compose stra.c
, as described in previous sections of this specification.
stra.c
The given teststr.c
is a test client. As its name implies, teststr.c
thoroughly tests your Str
implementations.
Use teststr.c
and stra.c
to build a program named teststra
. Then run teststra
. Confirm that the output is correct.
Of course you must use thegcc217
command to preprocess, compile, assemble, and link your programs. Ultimately you must perform each of those four steps individually, and examine the intermediate results to the extent possible. But it would be fine to use "shortcut"gcc217
commands during development, and perform the four steps individually only once, immediately before you submit your work.
You might find it helpful to iterate between Steps 2 and 3. For example, you might editteststr.c
, temporarily commenting-out calls of allStr
functions exceptStr_getLength
. Then compose the definition ofStr_getLength
instra.c
. Then build and test. Then editteststr.c
, "commenting-in" calls ofStr_copy
. Then compose the definition ofStr_copy
instra.c
. Then build and test. And so forth, for eachStr
function.
stra.c
Use splint
and critTer
to check for stylistic errors in stra.c
.
The splint
tool is a high-powered code analysis tool developed by researchers at the University of Virginia. It writes a list of stylistic flaws that it finds in your source code. Some of its warnings may be a little cryptic; feel free to ask about them on Ed as the need arises.
Assuming that you've copied the .splintrc
file from the /u/cos217
directory to your HOME
directory on armlab (as described in the document entitled A Minimal COS 217 Computing Environment from the first precept), using splint
is easy. At the armlab bash
prompt, issue this command:
splint somefile1.c somefile2.c ...
where somefile1.c
, somefile2.c
, ... are all of the .c
files that comprise the program.
For example, issue this command to use splint
to analyze your array-based version:
Thesplint teststr.c stra.c
critTer
(critique from the terminal) tool is a more COS 217-specific code analysis tool. The first version of critTer
was developed by Erin (Rosenbaum) Sardo '11; and the current version was developed by Alisa Kroutikova '15. (Erin and Alisa are COS 217 alumnae. Erin is a Senior Staff Software Engineer at Slack. Alisa is a Director at Elysium Digital.)
Using critTer
is even easier than using splint
, as it takes one file at a time. At the armlab bash
prompt, issue this command:
critTer somefile.c
For example, issue this command to use critTer
to critique stra.c
:
critTer stra.c
You might find it interesting to usecritTer
to critiqueteststr.c
. When giventeststr.c
,critTer
generates warnings about the use of "magic numbers", the length of some functions, and the length of the file. Don't be concerned about those warnings. You need not address them in yourreadme
file. We judged that using magic numbers and defining long functions inteststr.c
was clearer than the alternative, and that splittingteststr.c
into multiple files would have complicated the build process unnecessarily.
Think of splint
(with the given .splintrc
file) and critTer
as defining coding standards for the COS 217 course. Critique your code using those tools, and edit your code to eliminate all splint
and critTer
warnings before you submit.
The splint
and critTer
tools are aggressive; some of the warnings that they generate might not indicate true stylistic errors in your code. If you disagree with a splint
or critTer
warning, then copy the warning to your readme
file, and explain your disagreement with it in your readme
file. Also feel free to discuss the matter with your preceptor before you submit.
If you would like to learn more about splint
, see www.splint.org.
strp.c
Compose strp.c
, as described in previous sections of this specification.
strp.c
Use teststr.c
and strp.c
to build a program named teststrp
. Then run teststrp
. Confirm that the output is correct.
You might find it helpful to iterate between Steps 5 and 6.
strp.c
Use splint
to critique strp.c
. This is the appropriate command:
splint teststr.c strp.c
Use critTer
to critique strp.c
. This is the appropriate command:
critTer strp.c
Edit your code to eliminate all splint
and critTer
warnings. If you disagree with a splint
or critTer
warning, then copy the warning to your readme
file, and explain your disagreement with it in your readme
file. Also feel free to discuss the matter with your preceptor before you submit.
replace.c
Edit the given replace.c
, thus composing your replace.c
, as described in previous sections of this specification.
Hint: consider the following case for replaceAndWrite
. If you find yourself becoming concerned about impossible string lengths or memory allocation, you might be on the wrong track.
Argument | Contents |
---|---|
pcLine |
A null-terminated string with 1 million 'a' characters. |
pcFrom |
"a" |
pcTo |
A null-terminated string with 1 million 'b' characters. |
If and only if yourStr
functions are not working, then designreplace.c
so it calls the standard C string handling functions declared instring.h
instead of yourStr
functions. In that case don't forget to change#include "str.h"
to#include <string.h>
near the beginning ofreplace.c
.
replace.c
Use replace.c
and either stra.c
or strp.c
(your choice) to build a program named replace
.
If and only if yourStr
functions are not working, then you may build thereplace
program using the standard C string handling functions instead ofstra.c
orstrp.c
.
The given samplereplace
is an executable binary file. Your replace
program must have the same behavior as samplereplace
. That is, when given the same input as samplereplace
, replace
must write exactly the same data to stdout
and stderr
as samplereplace
does.
Test replace
thoroughly by running it on several input files with a variety of "from string" and "to string" command-line arguments. You will find it convenient to use the given 01empty.txt
, 02data.txt
, and 03allones.txt
files as input to replace
. You will find it convenient to use the given testreplace
and testreplacediff
scripts to automate the testing process. Comments in those scripts describe how to use them.
replace.c
Use splint
to critique replace.c
. Either of these commands is appropriate:
splint replace.c stra.c splint replace.c strp.c
Of course, if and only if yourStr
functions are not working and you have designedreplace.c
to call standard C string handling functions, then you may critiquereplace.c
alone, without simultaneously critiquingstra.c
orstrp.c
.
Use critTer
to critique replace.c
. This is the appropriate command:
critTer replace.c
Edit your code to eliminate all splint
and critTer
warnings. If you disagree with a splint
or critTer
warning, then copy the warning to your readme
file, and explain your disagreement with it in your readme
file. Also feel free to discuss the matter with your preceptor before you submit.
readme
FileEdit your copy of the given readme
file by answering each question that is expressed therein, including the question about assert
described in Part 1 of these specifications.
One of the sections of the readme
file requires you to list the authorized sources of information that you used to complete the assignment. Another section requires you to list the unauthorized sources of information that you used to complete the assignment. Your grader will not grade your submission unless you have completed those sections. To complete the "authorized sources" section of your readme
file, copy the list of authorized sources given in the "Policies" web page to that section, and edit it as appropriate.
Provide the instructors with your feedback on the assignment. To do that, issue this command:
FeedbackCOS217.py 2
and answer the questions that it asks. That command stores its questions and your answers in a file named feedback
in your working directory.
Submit all source code files, the files that gcc217
generated from them, and your readme
and feedback
files electronically on armlab using these commands:
submit 2 readme feedback submit 2 str.h submit 2 stra.c stra.i stra.s stra.o submit 2 strp.c strp.i strp.s strp.o submit 2 teststr.i teststr.s teststr.o submit 2 teststra teststrp submit 2 replace.c replace.i replace.s replace.o submit 2 replace
where:
teststra
is an executable file built using teststr.o
and stra.o
.teststrp
is an executable file built using teststr.o
and strp.o
.replace
is an executable binary file built using replace.o
and either stra.o
or strp.o
.It is considered good practice to leave each working copy of your repository with a clean working tree up to date with the master branch. This helps reduce any potential confusion about the state of the codebase, and also helps you ensure that the final version you have submitted is indeed the final version.
In part, good program style is defined by the splint
and critTer
tools as noted above, and by the rules given in The Practice of Programming (Kernighan and Pike) as summarized by the Rules of Programming Style document.
The more course specific style rules listed in the previous assignment specification also apply, as do these:
Modularity: Divide your program into logically coherent modules. Each module must have an interface (.h
) file and an implementation (.c
) file.
Modularity: A module's interface must be minimal; it must reveal only those aspects of the module that clients must see. In particular, A module's interface must declare only those functions that clients must see. Other functions must be defined as static
in the implementation.
Parameter Validation: Validate function parameters via assert
s whenever possible.
Global Variables: If you must define a global variable in a multi-file program, then make it static
if you can and const
if you can.
Function Comments:
static
) function must have a declaration in its module's interface and a definition in its module's implementation. The function declaration in the interface must have a comment. The comment must describe what the function does (and not how the function works). To describe how the function works, feel free to add local comments, that is, comments within the function definition.static
function has a definition in its module's implementation, but must not have a corresponding declaration in its module's interface. So each static
function definition in the implementation must have a comment. The comment must describe what the function does. To describe how the function works, add local comments (that is, comments within the function definition) as necessary.main
function definition also does not have a corresponding declaration in an interface. So the main
function definition in the implementation must have a comment. The comment must describe what the function does. To describe how the function works, add local comments (that is, comments within the function definition) as necessary. The main
function comment is the most important comment in the program.For example, this is an appropriate way to comment the Str_getLength
function:
In file str.h
:
... /* Return the length of string pcSrc, not including the trailing '\0'. */ size_t Str_getLength(const char pcSrc[]); ...
In file strp.c
:
... size_t Str_getLength(const char *pcSrc) { const char *pcEnd; assert(pcSrc != NULL); pcEnd = pcSrc; while (*pcEnd != '\0') pcEnd++; return (size_t)(pcEnd - pcSrc); } ...
Note that the comment appears in the interface but not in the implementation, and describes what the function does but not how it works. Also note that, as per the style rules described in the Assignment 1 specification, the comment explicitly states what the function returns, and explicitly refers to the function's parameter (pcSrc
).
Minimal requirement to receive credit for stra.c
and strp.c
:
stra.c
and strp.c
must build with the teststr.c
client. (This means you must submit str.h
along with stra.c
and strp.c
!
Minimal requirement to receive credit for replace.c
:
The replace.c
client must build.
We will grade your work on two kinds of quality:
Quality from the user's point of view. From the user's point of view, your code has quality if it behaves as it must. The correct behavior of the Str
module is defined by the previous sections of this assignment specification, and by the C90 specifications of the corresponding string.h
functions as presented in the King textbook and the man
pages. The correct behavior of the replace
program is defined by the previous sections of this assignment specification, and by the given samplereplace
program.
Quality from the programmer's point of view. From the programmer's point of view, your code has quality if it is well styled and thereby easy to maintain. Good program style is defined by the previous section of this assignment specification.
To encourage good coding practices, we will deduct points if gcc217
generates warning messages.
Sometimes C programmers use idioms that rely on the fact that the null byte character ('\0'
), the NULL
address, the integer 0, and the logical concept FALSE have the same representation. You may use those idioms. For example, you may define Str_getLength
like this:
size_t Str_getLength(const char pcSrc[]) { size_t uLength = 0; assert((long)pcSrc); /* NULL address, 0, and FALSE are identical. */ while (pcSrc[uLength]) /* null byte character and FALSE are identical. */ uLength++; return uLength; }
or like this:
size_t Str_getLength(const char *pcSrc) { const char *pcEnd; assert((long)pcSrc); /* NULL address, 0, and FALSE are identical. */ pcEnd = pcSrc; while (*pcEnd) /* null byte character and FALSE are identical. */ pcEnd++; return (size_t)(pcEnd - pcSrc); }
(Note the use of the cast operator within the calls of assert
to suppress splint
warnings.)
But you are not required to use those idioms. In fact, we recommend that you avoid using idioms that adversely affect clarity.
const
Keyword and Str_search
The use of the const
keyword within Str_search
is tricky, as this question-and-answer sequence indicates.
According to the man
pages, the parameters of strstr
are of type const char*
. That implies that the parameters of Str_search
also must be of type const char*
. Why are they not of type char*
?
Suppose you were to define Str_search
like this:
char *Str_search(char *pcHaystack, char *pcNeedle) { ... }
Further suppose the client then calls Str_search
like this:
const char *pcBig = "hello"; const char *pcSmall = "lo"; ... ... Str_search(pcBig, pcSmall) ... ...
(Note that's a perfectly reasonable way to call the function.) In that case the compiler, noting that pcBig
is of type const char*
and that pcHaystack
is of type char*
, would generate a warning on the function call. Thus pcHaystack
(and pcNeedle
) must be of type const char*
.
According to the man
pages, the return type of strstr
is char*
. That implies that the return type of Str_search
also must be of type char*
. Why is the return type not const char*
?
Suppose you were to define Str_search
like this:
const char *Str_search(const char *pcHaystack, const char *pcNeedle) { ... }
Further suppose the client then calls Str_search
like this:
char *pcBig = "hello"; char *pcSmall = "lo"; char *pc; ... pc = Str_search(pcBig, pcSmall); ...
(Note that's a perfectly reasonable way to call the function.) In that case the compiler, noting that pc
is of type char*
and that the value returned by Str_search
is of type const char*
, would generate a warning on the assignment statement. Thus the return type of Str_search
must be char*
and must not be const char*
.
Within the definition of Str_search
, I decided to define a local variable named pc
that "points into" the first of the two given strings, as indicated by this code:
char *Str_search(const char *pcHaystack, const char *pcNeedle) { ... pc = pcHaystack; ... /* Increment pc so it points to the appropriate character. */ ... return pc; }
If I define pc
to be of type char*
, then the assignment statement generates a warning. If I define pc
to be of type const char*
, then the return
statement generates a warning. How can I resolve that problem?
Unfortunately, C provides no elegant solution. We recommend that you define pc
to be of type const char*
so the assignment statement is warningless. Then use a cast operator in the return
statement:
return (char*)pc;
to explicitly inform the compiler to not generate a warning. C programmers refer to that solution as "casting away the constness" of the variable. Sadly, often that inelegant technique is unavoidable.
The assignment requires that you use pointer notation as much as possible in your strp.c
file. So, for example, you must define Str_copy
like this:
char *Str_copy(char *pcDest, const char *pcSrc) { ... }
The assignment requires that you use array notation as much as possible in your stra.c
file. So it would be logical to try to define your Str_copy
function something like this:
char[] Str_copy(char pcDest[], const char pcSrc[]) { ... }
Sadly, C disallows char[]
as a function return type. So in your stra.c
file, you must define Str_copy
in this hybrid way:
char *Str_copy(char pcDest[], const char pcSrc[]) { ... }
That same point applies to the definitions of other functions as well.
This assignment was written
by Robert M. Dondero, Jr.
with input from other faculty members