Unit 1: Arrays, Stacks, and Recursion
1. Arrays
An array is a linear data structure that stores a collection of elements of the same data type in contiguous memory locations. Each element is identified by at least one array index or key.
Single-dimensional Arrays
A single-dimensional (1D) array is the simplest form, represented as a single row or column of elements.
- Declaration:
int arr[10]; (This declares an integer array named arr that can hold 10 elements).
- Indexing: Elements are accessed using an index, which starts at 0 and goes up to size-1. The first element is
arr[0], and the last is arr[9].
- Memory: If the base address (address of
arr[0]) is B and the size of the data type (e.g., int) is W bytes, the address of any element arr[i] can be calculated as:
Address(arr[i]) = B + i * W
Multi-dimensional Arrays
A multi-dimensional array has more than one dimension. The most common is the two-dimensional (2D) array, which can be visualized as a table or grid with rows and columns.
- Declaration:
int matrix[3][4]; (This declares a 2D array with 3 rows and 4 columns).
- Accessing: Elements are accessed using two indices:
matrix[row][col]. For example, matrix[1][2].
- Memory Representation: Although we visualize it as a grid, memory is linear. 2D arrays are flattened into 1D in two ways:
- Row-Major Order: (Default in C/C++/Java) All elements of the first row are stored contiguously, followed by all elements of the second row, and so on.
Address Calculation (Row-Major) for arr[i][j] in an M x N array (M rows, N cols):
Address(arr[i][j]) = B + (i * N + j) * W
- Column-Major Order: (Default in FORTRAN) All elements of the first column are stored contiguously, followed by all elements of the second column, and so on.
Address Calculation (Column-Major) for arr[i][j] in an M x N array:
Address(arr[i][j]) = B + (j * M + i) * W
Sparse Matrices
A sparse matrix is a matrix in which most of the elements are zero. It is inefficient to store this matrix in a standard 2D array because a huge amount of memory is wasted on storing zeros.
Array Representation (Triplet Representation)
This method stores only the non-zero elements. A 2D array is created with three columns:
- Column 1: Row index of the non-zero element.
- Column 2: Column index of the non-zero element.
- Column 3: The non-zero value itself.
The first row of this representation is often used to store the metadata: (Total Rows, Total Columns, Total Non-Zero Elements).
Example:
Original Matrix (4x4):
[ 0, 0, 3, 0 ]
[ 5, 0, 0, 0 ]
[ 0, 1, 0, 0 ]
[ 0, 0, 0, 2 ]
Array (Triplet) Representation (5x3):
Linked Representation
This method uses a linked list to store the non-zero elements. Each node in the linked list contains:
- Row index
- Column index
- Value
- Pointer to the next node
A header node can store the matrix dimensions. This is more flexible than the array representation as it doesn't require pre-allocating space for the non-zero elements.
2. Stacks
A Stack is a linear data structure that follows the LIFO (Last-In, First-Out) principle. This means the element added last is the first one to be removed. Think of it as a stack of plates.
The primary operations are:
- push(element): Adds an element to the top of the stack.
- pop(): Removes and returns the element from the top of the stack.
- peek() or top(): Returns the top element without removing it.
- isEmpty(): Checks if the stack is empty.
- isFull(): Checks if the stack is full (relevant for array implementation).
Implementing Stacks in an Array
A stack can be easily implemented using a 1D array and a variable top that keeps track of the index of the last element added.
- Initialization: Create an array
stack[MAX_SIZE] and initialize top = -1 (indicating the stack is empty).
- push(value):
- Check for Stack Overflow (if
top == MAX_SIZE - 1).
- Increment
top (top = top + 1).
- Set
stack[top] = value.
- pop():
- Check for Stack Underflow (if
top == -1).
- Get the value:
value = stack[top].
- Decrement
top (top = top - 1).
- Return
value.
Common Mistake: Forgetting to check for Overflow (before push) and Underflow (before pop). This can lead to undefined behavior or program crashes.
Implementing Multiple Stacks in an Array
It is possible to implement two stacks in a single array efficiently. One stack starts from index 0 (grows right), and the second stack starts from index MAX_SIZE - 1 (grows left).
- Stack 1:
top1 starts at -1 and increments on push.
- Stack 2:
top2 starts at MAX_SIZE and decrements on push.
- Overflow Condition: The stacks are full only when
top1 + 1 == top2. This setup utilizes the array space much more efficiently.
Prefix, Infix, and Postfix Expressions
These are different notations for writing arithmetic expressions.
- Infix: The standard notation where the operator is between the operands. (e.g.,
A + B)
- Problem: Requires parentheses and rules of precedence (BODMAS) to define the order of evaluation.
- Prefix (Polish Notation): The operator comes before the operands. (e.g.,
+ A B)
- Postfix (Reverse Polish Notation): The operator comes after the operands. (e.g.,
A B +)
Utility: Prefix and Postfix notations do not require parentheses or precedence rules. They are ideal for computer evaluation, which is why stacks are used.
Utility and Conversion of Expressions
Algorithm: Infix to Postfix Conversion
This is a classic stack application. We use a stack to hold operators and parentheses.
- Create an empty stack (for operators) and an empty string (for the postfix result).
- Scan the infix expression from left to right.
- If the character is an operand (e.g., A, B, 1, 2): Append it directly to the postfix string.
- If the character is an open parenthesis '(': Push it onto the stack.
- If the character is a close parenthesis ')': Pop operators from the stack and append them to the postfix string until an open parenthesis '(' is encountered. Pop and discard the '('.
- If the character is an operator:
- While the stack is not empty AND the top of the stack is an operator with higher or equal precedence than the current operator:
- Pop the operator from the stack and append it to the postfix string.
- After the loop, push the current operator onto the stack.
- After scanning the entire infix expression, pop any remaining operators from the stack and append them to the postfix string.
Exam Tip: Be sure to manage precedence. (e.g., * and / have higher precedence than + and -). Associativity (left-to-right) is handled by the "equal precedence" part of the rule.
Algorithm: Evaluation of Postfix Expression
This also uses a stack, but this time to hold operands (numbers).
- Create an empty stack (for operands).
- Scan the postfix expression from left to right.
- If the character is an operand: Push its value onto the stack.
- If the character is an operator:
- Pop the top two operands from the stack (e.g.,
operand2 = pop(), operand1 = pop()).
- Perform the operation:
result = operand1 operator operand2.
- Push the
result back onto the stack.
- After scanning the entire expression, the final result will be the only value left in the stack. Pop and return it.
Example: Evaluate 2 3 + 5 *
- Scan '2': Push 2. Stack: [2]
- Scan '3': Push 3. Stack: [2, 3]
- Scan '+': Pop 3, Pop 2. Calculate 2 + 3 = 5. Push 5. Stack: [5]
- Scan '5': Push 5. Stack: [5, 5]
- Scan '*': Pop 5, Pop 5. Calculate 5 * 5 = 25. Push 25. Stack: [25]
- End of expression. Pop 25. Result is 25.
Applications of Stack
- Expression Evaluation: Infix to Postfix/Prefix conversion and evaluation.
- Function Call Management: The "Call Stack" stores activation records (local variables, return addresses) for function calls. This is how recursion is managed.
- Backtracking: Used in algorithms like maze-solving or finding paths in a graph (Depth First Search).
- Undo/Redo Operations: In text editors or software, a stack can store previous states.
- Parenthesis Checking: Validating if an expression has balanced parentheses.
Limitations of Array Representation of Stack
- Fixed Size: The array size is declared at compile time (static). This leads to two problems:
- Stack Overflow: If the stack size is too small, pushing a new element will fail.
- Wasted Memory: If the stack size is too large, but only a few elements are ever used, the remaining space is wasted.
- Cost of Resizing: While dynamic arrays can be used, resizing (creating a new, larger array and copying all elements) is an O(n) operation, which can be slow.
(These limitations are solved by using a Linked List to implement a stack, as seen in Unit 2).
3. Recursion
Recursion is a programming technique where a function calls itself, either directly or indirectly. A recursive function solves a problem by breaking it down into smaller, simpler versions of the same problem.
Recursive Definitions and Implementation
A recursive definition (and its implementation) must have two parts:
- Base Case (or Terminating Condition): This is the simplest version of the problem, which can be solved directly without further recursion. It stops the function from calling itself infinitely.
- Recursive Step: This part breaks the problem into a smaller subproblem and calls the function itself to solve that subproblem. It must make progress toward the base case.
Common Mistake: Forgetting the base case. This leads to infinite recursion and eventually a Stack Overflow error, as the call stack runs out of memory.
Example: Factorial
Problem: Find the factorial of a non-negative integer n (n!).
- Recursive Definition:
- n! = 1, if n = 0 (Base Case)
- n! = n * (n-1)!, if n > 0 (Recursive Step)
- Implementation (Pseudocode):
Example: Fibonacci Series
Problem: Find the n-th Fibonacci number (0, 1, 1, 2, 3, 5, 8...)
- Recursive Definition:
- Fib(n) = 0, if n = 0 (Base Case 1)
- Fib(n) = 1, if n = 1 (Base Case 2)
- Fib(n) = Fib(n-1) + Fib(n-2), if n > 1 (Recursive Step)
- Implementation (Pseudocode):
Advantages and Limitations of Recursion
Advantages:
- Elegance and Simplicity: Recursive solutions can be very clean and easy to read for problems that are naturally recursive (e.g., tree traversals, Tower of Hanoi).
- Problem Solving: It's a powerful tool for solving complex problems by breaking them down (Divide and Conquer strategy).
Limitations:
- Performance Overhead: Every function call adds an activation record to the call stack. This consumes memory and time. An iterative solution (using loops) is almost always faster and uses less memory.
- Stack Overflow: If the recursion is too deep (e.g.,
factorial(100000)), the call stack can run out of memory, crashing the program.
- Redundant Computations: The Fibonacci example is a classic case of inefficiency.
fibonacci(5) calls fibonacci(4) and fibonacci(3). But fibonacci(4) *also* calls fibonacci(3), leading to the same subproblem being solved multiple times. (This can be fixed with memoization).
Exam Tip: Any problem that can be solved recursively can also be solved iteratively (using loops). Often, an iterative solution is preferred for performance, but a recursive solution is easier to design and understand first.