High-Performance Computing with Numba (Fast Computation)

Hızlı Hesaplama (Fast Computation) in the context of Python often refers to optimizing code for speed, especially for numerical and scientific tasks where Python's interpreted nature can sometimes be a bottleneck. While libraries like NumPy provide vectorized operations that are highly optimized C/Fortran implementations, not all computational problems can be easily expressed in a vectorized manner, or sometimes custom loops are unavoidable.

Numba is an open-source JIT (Just-In-Time) compiler that translates a subset of Python and NumPy code into fast machine code. It's designed to make Python code run as fast as C or Fortran without requiring you to rewrite your code in those languages. Numba works by reading the Python bytecode for your decorated function, inferring the types of the arguments, and then generating highly optimized machine code specific to the types it inferred. This compilation happens 'just in time' when the function is first called, leading to a slight overhead on the first execution, but significant speedups on subsequent calls.

Key features and benefits of Numba:

- Easy Integration: You typically only need to add a decorator like `@jit` or `@njit` (no-Python-object mode, generally preferred for maximum performance) to your Python function.
- Significant Speedups: For CPU-bound numerical tasks, Numba can provide speedups ranging from 10x to 1000x or more, especially when dealing with explicit loops over large arrays.
- NumPy Compatibility: Numba understands NumPy arrays and operations, allowing you to use familiar NumPy syntax within your Numba-compiled functions.
- Automatic Parallelization: With `parallel=True` in the `@njit` decorator, Numba can automatically parallelize certain loops (e.g., using `numba.prange`) across multiple CPU cores.
- GPU Support: Numba also provides tools to compile Python code for NVIDIA CUDA GPUs, opening up possibilities for massive parallel computation.

Limitations and Considerations:

- Type Inference: Numba relies on type inference. If it cannot infer types or encounters unsupported Python features (e.g., complex object manipulations, certain dictionary operations), it may fall back to object mode (which is slower) or fail to compile.
- Initial Overhead: The first call to a Numba-decorated function involves compilation time. For functions called only once with small inputs, this overhead might negate the speed benefits.
- Best for Numerical Code: Numba is most effective for functions that primarily deal with numbers, NumPy arrays, and basic Python data structures.

Example Code

import numpy as np
import time
from numba import njit

 A computationally intensive function without Numba
def sum_squares_python(array):
    total = 0.0
    for x in array:
        total += x - x
    return total

 The same function with Numba's @njit decorator
@njit
def sum_squares_numba(array):
    total = 0.0
    for x in array:
        total += x - x
    return total

 Create a large NumPy array for demonstration
SIZE = 107
data = np.arange(SIZE, dtype=np.float64)

print(f"Benchmarking sum of squares for an array of {SIZE} elements...
")

 --- Benchmark Pure Python version ---
start_time = time.perf_counter()
result_python = sum_squares_python(data)
end_time = time.perf_counter()
python_time = end_time - start_time
print(f"Pure Python execution time: {python_time:.4f} seconds")
print(f"Result (Python): {result_python:.2f}
")

 --- Benchmark Numba version (first run includes compilation) ---
 The first call to a @njit function compiles it.
start_time = time.perf_counter()
result_numba_first_run = sum_squares_numba(data)
end_time = time.perf_counter()
numba_first_run_time = end_time - start_time
print(f"Numba (first run - compilation + execution): {numba_first_run_time:.4f} seconds")
print(f"Result (Numba first run): {result_numba_first_run:.2f}
")

 --- Benchmark Numba version (subsequent runs are optimized) ---
start_time = time.perf_counter()
result_numba_optimized = sum_squares_numba(data)
end_time = time.perf_counter()
numba_optimized_time = end_time - start_time
print(f"Numba (subsequent run - optimized execution): {numba_optimized_time:.4f} seconds")
print(f"Result (Numba optimized): {result_numba_optimized:.2f}
")

 --- Benchmark NumPy's vectorized version (for comparison) ---
start_time = time.perf_counter()
result_numpy = np.sum(data - data)  Or np.sum(np.square(data))
end_time = time.perf_counter()
numpy_time = end_time - start_time
print(f"NumPy vectorized execution time: {numpy_time:.4f} seconds")
print(f"Result (NumPy): {result_numpy:.2f}
")

print("--- Summary ---")
print(f"Pure Python vs Numba (optimized): {python_time / numba_optimized_time:.2f}x speedup")
print(f"Pure Python vs NumPy: {python_time / numpy_time:.2f}x speedup")

 Assert that results are consistent
assert np.isclose(result_python, result_numba_first_run)
assert np.isclose(result_python, result_numba_optimized)
assert np.isclose(result_python, result_numpy)

print("All results are consistent!")

High-Performance Computing with Numba (Fast Computation)

Example Code

Related Topics