# NumPy 101

In [None]:
# Assoc. Prof. Dr. Piyabute Fuangkhon
# Department of Digital Business Management
# Martin de Tours School of Management and Economics
# Assumption University
# Update: 22/05/2024

## Introduction to NumPy for Data Analytics

NumPy is a fundamental package for scientific computing in Python. It provides support for arrays, matrices, and many mathematical functions. Let's look at the basics of NumPy and how it can be used for data analytics. This tutorial will guide you through various operations, from basic array creation to complex linear algebra and random number generation.

## Importing the NumPy Library

We need to import the NumPy library. It's a common practice to import it with the alias 'np'.

In [None]:
# Import the NumPy library with the alias 'np'.
import numpy as np
print(np.__version__)

## Array Creation

NumPy arrays are similar to lists in Python, but they allow for more efficient operations. We'll start with creating simple arrays and then explore different ways to initialize arrays.

Let's create arrays that could represent sales data, inventory levels, or any other sequential data that you might encounter in business.

In [None]:
# Step 1: Create NumPy array using np.array
array_from_list = np.array([1, 2, 3, 4, 5]) # A 1D array from a list
print("1D array from list =", array_from_list)

# Step 2: Create NumPy array using np.arange
array_arange = np.arange(10) # A 1D array with values from 0 to 9
print("1D array using arange =", array_arange)

# Step 3: Create NumPy array using np.linspace
array_linspace = np.linspace(0, 1, 5) # A 1D array with 5 values evenly spaced between 0 and 1
print("1D array using linspace =", array_linspace)
print()

# Step 4: Create NumPy 2D array using np.array
array_from_list_2d = np.array([[1, 2, 3], [4, 5, 6]]) # Use np.array to create a two-dimensional array with elements [[1, 2, 3], [4, 5, 6]]
print("2D array from list =\n", array_from_list_2d)

# Step 5: Create NumPy array using np.zeros
array_of_zeros = np.zeros((3, 4)) # A 2D array of zeros with shape (3, 4)
print("2D array of zeros =\n", array_of_zeros)

# Step 6: Create NumPy array using np.ones
array_of_ones = np.ones((2, 3)) # A 2D array of ones with shape (2, 3)
print("2D array of ones =\n", array_of_ones)

# Step 7: Create NumPy array using np.arange and np.reshape
array_of_sequence = np.arange(1, 2 * 3 + 1).reshape(2, 3) # A 2D array of sequence numbers with shape (2, 3)
print("2D array of sequence numbers =\n", array_of_sequence)

## Array Attributes

Understanding array attributes is crucial for manipulating data effectively. We'll explore the shape, size, dimensions, and data type of arrays.

Consider an array representing monthly sales data for a product. Understanding its shape and size helps in reshaping and performing aggregate functions.

In [None]:
# Step 1: Create arrays
array_from_list = np.array([1, 2, 3, 4, 5])
array_of_zeros = np.zeros((3, 4))
array_of_ones = np.ones((2, 3))

# Step 2: Print arrays
print("Variables")
print("============================================================================")
print("Array_from_list =>\n", array_from_list)
print("Array_of_zeros =>\n", array_of_zeros)
print("Array_of_ones =>\n", array_of_ones)

# Step 3: Shape of arrays
print("\nShape of arrays (array.shape)")
print("============================================================================")
print("Array_from_list =", array_from_list.shape) # The shape attribute returns the dimensions of the array
print("Array_of_zeros =", array_of_zeros.shape) # The shape attribute returns the dimensions of the array
print("Array_of_ones =", array_of_ones.shape) # The shape attribute returns the dimensions of the array

# Step 4: Size of arrays
print("\nSize of arrays (array.size)")
print("============================================================================")
print("Array_from_list =", array_from_list.size) # The size attribute returns the total number of elements in the array
print("Array_of_zeros =", array_of_zeros.size) # The size attribute returns the total number of elements in the array
print("Array_of_ones =", array_of_ones.size) # The size attribute returns the total number of elements in the array

# Step 5: Number of dimensions of arrays
print("\nNumber of dimensions of arrays (array.ndim)")
print("============================================================================")
print("Array_from_list =", array_from_list.ndim) # The ndim attribute returns the number of dimensions of the array
print("Array_of_zeros =", array_of_zeros.ndim) # The ndim attribute returns the number of dimensions of the array
print("Array_of_ones =", array_of_ones.ndim) # The ndim attribute returns the number of dimensions of the array

# Step 6: Data type of arrays
print("\nData type of arrays (array.dtype)")
print("============================================================================")
print("Array_from_list =", array_from_list.dtype) # The dtype attribute returns the data type of the elements in the array
print("Array_of_zeros =", array_of_zeros.dtype) # The dtype attribute returns the data type of the elements in the array
print("Array_of_ones =", array_of_ones.dtype) # The dtype attribute returns the data type of the elements in the array)

## Array Indexing and Slicing

Techniques used to access, modify, and extract specific elements, subarrays, or ranges within an array based on their positions. This is particularly useful when working with subsets of data, such as specific months in sales data or particular products in inventory data.

In [None]:
# Assuming arrays are already created in previous steps
array_from_list = np.array([1, 2, 3, 4, 5])
array_of_sequence = np.arange(1, 2 * 3 + 1).reshape(2, 3)

# Step 1: Accessing elements in 1D array and 2D array
print("1D - array (array_from_list) =>\n", array_from_list)
print("2D - array (array_of_sequence) =>\n", array_of_sequence)

print("\nAccessing elements")
print("============================================================================")
print("1D - Element at index 2 in array_from_list =", array_from_list[2]) # Accessing the third element (index 2) in the 1D array
print("2D - Element at row 1, column 2 in array_of_sequence =", array_of_sequence[1, 2]) # Accessing the element at row 1, column 2 in the 2D array

# Step 2: Slicing 1D array and 2D array
print("\nSlicing")
print("============================================================================")
print("1D - Elements from index 1 to 3 in array_from_list =\n", array_from_list[1:4]) # Slicing elements from index 1 to 3 in the 1D array (note that the end index is exclusive)
print("2D - First two rows, first two columns of array_of_sequence =\n", array_of_sequence[:2, :2]) # Slicing the first two rows and first two columns in the 2D array

# Step 3: Boolean indexing 1D array and 2D array
print("\nBoolean indexing")
print("============================================================================")
bool_idx = array_from_list > 3 # Creating a boolean index array for elements greater than 3 in the 1D array
print("1D - Elements greater than 3 in array_from_list =", array_from_list[bool_idx])
bool_idx_2d = array_of_sequence > 3 # Creating a boolean index array for elements greater than 3 in the 2D array
print("2D - Elements greater than 3 in array_of_sequence =", array_of_sequence[bool_idx_2d])

# Step 4: Fancy indexing 1D array and 2D array
print("\nFancy indexing")
print("============================================================================")
fancy_idx = [0, 2, 4] # Fancy indexing for specific indices in the 1D array
print("1D - Elements at indices 0, 2, and 4 in array_from_list =\n", array_from_list[fancy_idx])
fancy_idx_rows = [0, 1] # Fancy indexing for specific row indices in the 2D array
fancy_idx_cols = [1, 2] # Fancy indexing for specific column indices in the 2D array
print("2D - Elements at row indices [0, 1] and column indices [1, 2] in array_of_sequence =\n", array_of_sequence[fancy_idx_rows, :][:, fancy_idx_cols])


## Array Operations

Actions performed on arrays, such as arithmetic calculations, element-wise operations, and transformations, to manipulate and analyze data stored in arrays. These operations can help calculate profit margins, growth rates, and other business metrics.

In [None]:
# Step 1: Define two 1D arrays for element-wise operations
array1 = np.array([1, 2, 3, 4, 5])
array2 = np.array([10, 20, 30, 40, 50])

# Print the original arrays
print("array1 =>", array1)
print("array2 =>", array2)
print()

# Step 2: Perform and print element-wise addition
print("array1 + array2 =", array1 + array2) # Perform element-wise addition

# Step 3: Perform and print element-wise subtraction
print("array1 - array2 =", array1 - array2) # Perform element-wise subtraction

# Step 4: Perform and print element-wise multiplication
print("array1 * array2 =", array1 * array2) # Perform element-wise multiplication

# Step 5: Perform and print element-wise division
print("array1 / array2 =", array1 / array2) # Perform element-wise division

In [None]:
# Step 1: Define two arrays for broadcasting operations
array3 = np.array([[1, 2, 3], [4, 5, 6]])
array4 = np.array([10, 20, 30])

# Print the original arrays
print("array3 =>\n", array3)
print("array4 =>\n", array4)
print()

# Step 2: Perform and print broadcasting operations
print("array3 + array4 =\n", array3 + array4) # Calculate and print broadcasting addition
print("array3 - array4 =\n", array3 - array4) # Calculate and print broadcasting subtraction
print("array3 * array4 =\n", array3 * array4) # Calculate and print broadcasting multiplication
print("array3 / array4 =\n", array3 / array4) # Calculate and print broadcasting division

## Aggregate Functions

Operations that process multiple elements of an array to return a single value, such as sum, mean, minimum, maximum, and standard deviation.

In [None]:
# Step 1: Define a 1D array for aggregate functions
array1 = np.array([1, 2, 3, 4, 5])

# Print the original array
print("array1 =>", array1)
print()

# Step 2: Perform and print aggregate functions
print("Sum =", np.sum(array1)) # Calculate and print the sum of the array elements
print("Mean =", np.mean(array1)) # Calculate and print the mean (average) of the array elements
print("Standard Deviation =", np.std(array1)) # Calculate and print the standard deviation of the array elements
print("Minimum =", np.min(array1)) # Calculate and print the minimum value in the array
print("Maximum =", np.max(array1)) # Calculate and print the maximum value in the array

## Mathematical Functions

Functions that perform various mathematical computations on array elements, including operations like sine, cosine, exponential, and logarithm.

In [None]:
# Step 1: Define arrays for angles (in radians) and values
angles = np.array([0, np.pi/2, np.pi, 3*np.pi/2, 2*np.pi])
values = np.array([1, 2, 3, 4, 5])

# Print the original arrays
print("angles =>", angles)
print("values =>", values)
print()

# Step 2: Perform and print mathematical functions
print("sin(angles) =", np.sin(angles)) # Calculate and print the sine of the angles
print("cos(angles) =", np.cos(angles)) # Calculate and print the cosine of the angles
print("exp(values) =", np.exp(values)) # Calculate and print the exponential of the values
print("log(values) =", np.log(values)) # Calculate and print the natural logarithm of the values

## Reshaping and Resizing Arrays

Processes of changing the dimensions or structure of an array without altering its data, including operations like reshaping into different shapes and resizing to adjust the number of elements.

In [None]:
# Step 1: Reshape
array_original = np.arange(1, 13) # Creating an array and reshaping it
print("Original array (1D) =>", array_original)

array_reshaped = array_original.reshape(3, 4) # Reshaping the original 1D array to a 3x4 2D array
print("\nReshaped array (3x4) =\n", array_reshaped)

# Step 2: Ravel
array_raveled = array_reshaped.ravel() # Flattening the 2D array back to a 1D array using ravel
print("\nRaveled array (1D) =\n", array_raveled)

# Step 3: Flatten
array_flattened = array_reshaped.flatten() # Flattening the 2D array back to a 1D array using flatten
print("\nFlattened array (1D) =\n", array_flattened)

# Step 4: Transpose
array_transposed = array_reshaped.transpose() # Transposing the 3x4 2D array to a 4x3 2D array
print("\nTransposed array (3x4 to 4x3) =\n", array_transposed)


## Stacking and Splitting Arrays

Operations that combine multiple arrays into a single array along a specified axis (stacking) or divide an array into multiple sub-arrays along a specified axis (splitting).

In [None]:
# Step 1: Horizontal stacking (hstack)
array1 = np.array([1, 2, 3])
array2 = np.array([4, 5, 6])
array_hstack = np.hstack((array1, array2)) # Horizontal stacking array1 and array2
print("array1 =>", array1)
print("array2 =>", array2)
print()
print("Horizontal stacking array1 and array2 =", array_hstack)

# Step 2: Vertical stacking (vstack)
array3 = np.array([[1, 2, 3], [4, 5, 6]])
array4 = np.array([[7, 8, 9], [10, 11, 12]])
print()
print("array3 =>\n", array3)
print("array4 =>\n", array4)
array_vstack = np.vstack((array3, array4)) # Vertical stacking array3 and array4
print()
print("Vertical stacking array3 and array4 =\n", array_vstack)

# Step 3: Depth stacking (dstack)
array5 = np.array([[1, 2, 3], [4, 5, 6]])
array6 = np.array([[7, 8, 9], [10, 11, 12]])
array_dstack = np.dstack((array5, array6)) # Depth stacking array5 and array6
print()
print("array5 =>\n", array5)
print("array6 =>\n", array6)
print()
print("Depth stacking array5 and array6 =\n", array_dstack)

# Step 4: Splitting arrays
array7 = np.arange(1, 13).reshape(3, 4)
array_hsplit = np.hsplit(array7, 2) # Horizontal split (hsplit)
print()
print("array7 =>\n", array7)
print()
print("Horizontal splitting array7 =")
for i, arr in enumerate(array_hsplit):
 print(f"Part {i}:\n{arr}")

# Step 5: Vertical split (vsplit)
array_vsplit = np.vsplit(array7, 3) # Vertical split (vsplit)
print("\nVertical splitting array7 =")
for i, arr in enumerate(array_vsplit):
 print(f"Part {i}:\n{arr}")

# Step 6: Depth split (dsplit)
array8 = np.dstack((array5, array6, array5))
print()
print("array8 =>\n", array8)
print()
array_dsplit = np.dsplit(array8, 3) # Depth split (dsplit)
print("\nDepth splitting array8 =")
for i, arr in enumerate(array_dsplit):
 print(f"Part {i}:\n{arr}")

## Linear Algebra

Computational procedures involving matrices and vectors, such as matrix multiplication, calculating determinants, finding eigenvalues and eigenvectors, and solving systems of linear equations. These techniques are useful in optimization problems, financial modeling, and various analytical tasks.

In [None]:
# Step 1: Dot product
vector1 = np.array([1, 2, 3])
vector2 = np.array([4, 5, 6])
print("vector1 =>", vector1)
print("vector2 =>", vector2)
dot_product = np.dot(vector1, vector2) # Calculate the dot product of vector1 and vector2
print("\nDot product of vector1 and vector2 =", dot_product)

# Step 2: Matrix multiplication
matrix1 = np.array([[1, 2], [3, 4]])
matrix2 = np.array([[5, 6], [7, 8]])
print()
print("matrix1 =>\n", matrix1)
print("matrix2 =>\n", matrix2)
matrix_multiplication = np.matmul(matrix1, matrix2) # Calculate the matrix multiplication of matrix1 and matrix2
print("\nMatrix multiplication of matrix1 and matrix2 =\n", matrix_multiplication)

# Step 3: Determinant
matrix3 = np.array([[1, 2], [3, 4]])
print()
print("matrix3 =>\n", matrix3)
determinant = np.linalg.det(matrix3) # Calculate the determinant of matrix3
print("\nDeterminant of matrix3 =", determinant)

# Step 4: Eigenvalues and eigenvectors
matrix4 = np.array([[1, 2], [2, 1]])
print()
print("matrix4 =>\n", matrix4)
eigenvalues, eigenvectors = np.linalg.eig(matrix4) # Calculate the eigenvalues and eigenvectors of matrix4
print("\nEigenvalues of matrix4 =", eigenvalues)
print("Eigenvectors of matrix4 =\n", eigenvectors)

# Step 5: Solving linear equations
A = np.array([[2, 1], [1, 3]])
b = np.array([1, 2])
print()
print("A =>\n", A)
print("b =>\n", b)
solution = np.linalg.solve(A, b) # Solve the system of linear equations Ax = b
print("\nSolution of the system of linear equations Ax = b =", solution)

## Random Number Generation

The process of using Python's NumPy library to create sequences of random numbers for various applications like simulations and data analysis.

In [None]:
# Step 1: Generating random numbers
random_numbers = np.random.rand(3) # Generate a 1D array of 3 random numbers between 0 and 1
print("Generating random numbers (0 to 1) =", random_numbers)

# Step 2: Setting random seed
np.random.seed(100) # Set the seed for reproducibility
random_numbers_seeded = np.random.rand(3) # Generate a 1D array of 3 random numbers with seed 100
print("Generating random numbers with seed 100 =", random_numbers_seeded)

# Step 3: Random sampling
random_integers = np.random.randint(10, 50, 3) # Generate a 1D array of 3 random integers between 10 and 50
print("Random sampling (integers between 10 and 50) =", random_integers)

# Step 4: Random distributions
random_normal = np.random.randn(3) # Generate a 1D array of 3 random numbers from a normal distribution (mean=0, std=1)
print("Random numbers from a normal distribution (mean=0, std=1) =", random_normal)

# Step 5: Uniform distribution
random_uniform = np.random.uniform(0, 10, 3) # Generate a 1D array of 3 random numbers from a uniform distribution between 0 and 10
print("Random numbers from a uniform distribution (between 0 and 10) =", random_uniform)

## File Input and Output Operations

Actions that facilitate reading data from external sources (input) and writing data to external destinations (output), enabling interaction with files.

In [None]:
# Step 1: Saving and loading 1D arrays
array_to_save = np.array([1, 2, 3, 4, 5]) # Create a sample array
print("array_to_save =>", array_to_save)

np.savetxt('array.txt', array_to_save) # Save the array to a text file
print("\nArray saved to 'array.txt'")

loaded_array_txt = np.loadtxt('array.txt') # Load the array from the text file
print("\nArray loaded from 'array.txt' =", loaded_array_txt)

np.save('array.npy', array_to_save) # Save the array to a binary file
print("\nArray saved to 'array.npy'")

loaded_array_npy = np.load('array.npy') # Load the array from the binary file
print("\nArray loaded from 'array.npy' =", loaded_array_npy)

# Step 2: Working with 2D arrays and text files
array_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) # Create a 2D array

np.savetxt('array_2d.txt', array_2d, delimiter=',') # Save the 2D array to a text file with a custom delimiter
print("\n2D array saved to 'array_2d.txt' with comma delimiter")

loaded_array_2d_txt = np.loadtxt('array_2d.txt', delimiter=',') # Load the 2D array from the text file
print("\n2D array loaded from 'array_2d.txt' =\n", loaded_array_2d_txt)

# Step 3: Working with 2D arrays and binary files
np.save('array_2d.npy', array_2d) # Save the 2D array to a binary file
print("\n2D array saved to 'array_2d.npy'")

loaded_array_2d_npy = np.load('array_2d.npy') # Load the 2D array from the binary file
print("\n2D array loaded from 'array_2d.npy' =\n", loaded_array_2d_npy)

# Practice

The code block below reads sales data from a URL and stores the data in an array. The first row in the dataset defines the attribute names. Your task is to find and display useful information (aggregated data) from this data using NumPy library.

In [None]:
# Step 1: Import the 'urllib' library
import urllib
import numpy as np

# Step 2: Specify the URL of the file to be opened
url = "https://piyabute.s3.ap-southeast-1.amazonaws.com/notebook/sales_data_1000.csv"

# Step 3: Open the URL and read the content
data = []

with urllib.request.urlopen(url) as response:
 lines = response.read().decode('utf-8').split('\n')

 # Step 4: Split the header line
 headers = lines[0].strip().split(',')

 # Step 5: Split each subsequent line and collect data
 for line in lines[1:]:
 if line.strip(): # Skip any empty lines
 row = line.strip().split(',')
 data.append(row)

# Step 6: Convert data to numpy array for easier manipulation
data = np.array(data)

# Step 7: Print the first 5 rows (including headers) to verify the data
print("First 5 rows of data:")
print(headers)
print(data[:5])