C Language | Introduction to C | Tokens and Statements
Source code can be divided into tokens, the smallest meaningful units in a program. Statements are executable units built from tokens.
The Smallest Units of Code
During lexical analysis, a compiler divides source text into tokens. C tokens fall into six categories:
- identifiers
- keywords
- constants
- string literals
- operators
- punctuators
Keywords such as return are reserved by the language. Identifiers name functions, variables, and other program elements. Constants and string literals represent fixed values. Operators perform calculations. Punctuators such as parentheses, braces, commas, and semicolons organize code.
#include <stdio.h>
int main() {
printf("Kitty on your lap\n");
return 0;
}
Ignoring the preprocessor directive, this program contains tokens such as int, main, (, ), {, printf, the string literal, return, 0, semicolons, and }.
Whitespace separates adjacent keywords, identifiers, and constants. Spaces, horizontal tabs, vertical tabs, newlines, and form feeds count as whitespace.
Code 1
int
main (
void
){
return
0;
}
This code is difficult to read but valid. The following one-line version is also valid:
int main(){return 0;}
Removing required whitespace changes token boundaries and causes errors.
intmain(){return0;}
int mai n(){ retu rn 0; }
Executable Units
A statement is an executable unit composed of tokens. Many statements end with a semicolon.
printf("Stand by Ready!!\n");
return 0;
Newlines usually improve readability but do not end statements. A preprocessor directive such as #include, however, ends at a newline.
#include <stdio.h>
int main() { printf("Stand by Ready!!\n"); return 0; }
Common statement categories include labeled statements, expression statements, compound statements, selection statements, iteration statements, and jump statements. A compound statement is a block enclosed in {} and does not require a trailing semicolon.