NumPy Playground
NumPy or Numeric Python is a package for computation on homogenous n-dimensional arrays.
Uses:
- perform operations on all the elements of two list directly.
A. Imports
#array handling
import numpy as np
#random sampling from distributions
from numpy.random import randn, normal, standard_normal
#plotting
from matplotlib import pyplot as plt
import seaborn as sns
B. Preliminaries
1. row vector, column vector, and matrix
All are of type numpy.ndarray
row_v = np.array ([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) # row vector (1-D)
# column vector (2-D)
col_v = np.array ([[0], [1], [2], [3], [4], [5], [6], [7], [8], [9]])
mat_2d = np.array ([[1, 'a'], [2, 'b'], [3, 'c']]) # 2-D matrix
mat_3d = np.array ([[[1, 'a'], [2, 'b'], [3, 'c']], \
[[4, 'd'], [5, 'e'], [6, 'f']], \
[[7, 'g'], [8, 'h'], [9, 'i']]]) # 3-D matrix
print ('row_v shape:', row_v.shape, '\n', row_v, '\n',\
'element (1) of 1D array:', row_v[1], '\n')
print ('col_v shape:', col_v.shape, '\n', col_v, '\n',\
'element (2,0) of 2D array:', col_v[2][0], '\n')
print ('mat_2d shape:', mat_2d.shape, '\n', mat_2d, '\n',\
'element (2,1) of 2D array:', mat_2d[2][1], '\n')
print ('mat_3d shape:', mat_3d.shape, '\n', mat_3d, '\n',\
'element (2,1,0) of 3D array:', mat_3d[2][1][0],\
'element (2,2,1) of 3D array:', mat_3d[2][2][1],'\n')
row_v shape: (10,)
[0 1 2 3 4 5 6 7 8 9]
element (1) of 1D array: 1
col_v shape: (10, 1)
[[0]
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]]
element (2,0) of 2D array: 2
mat_2d shape: (3, 2)
[['1' 'a']
['2' 'b']
['3' 'c']]
element (2,1) of 2D array: c
mat_3d shape: (3, 3, 2)
[[['1' 'a']
['2' 'b']
['3' 'c']]
[['4' 'd']
['5' 'e']
['6' 'f']]
[['7' 'g']
['8' 'h']
['9' 'i']]]
element (2,1,0) of 3D array: 8 element (2,2,1) of 3D array: i
2. np.zeros, np.ones, and np.full
numerical arrays
np.zeros (shape, dtype = int/float)
np.ones (shape, dtype = int/float)
print (type (np.zeros ((1, 2), dtype = int))) #type
print ('integer 1D ndarray:\n',\
np.zeros ((3), dtype = int), '\n') #integer 1D ndarray
print ('integer 2D ndarray:\n',\
np.ones ((1, 2), dtype = int), '\n') #integer 2D ndarray
print ('float 2D ndarray:\n',\
np.zeros ((1, 2), dtype = float), '\n') #float 2D ndarray
print ('float 3D ndarray:\n',\
np.ones ((2, 3, 2), dtype = float), '\n') #float 3D ndarray
<class 'numpy.ndarray'>
integer 1D ndarray:
[0 0 0]
integer 2D ndarray:
[[1 1]]
float 2D ndarray:
[[0. 0.]]
float 3D ndarray:
[[[1. 1.]
[1. 1.]
[1. 1.]]
[[1. 1.]
[1. 1.]
[1. 1.]]]
boolean arrays
np.zeros (shape, dtype = bool)
print ('boolean 2D ndarray:\n',\
np.ones ((1, 2), dtype = bool), '\n')
print ('boolean 3D ndarray:\n',\
np.ones ((2, 3, 2), dtype = bool), '\n')
boolean 2D ndarray:
[[ True True]]
boolean 3D ndarray:
[[[ True True]
[ True True]
[ True True]]
[[ True True]
[ True True]
[ True True]]]
any-type arrays
np.full (shape, value)
Deduce type from value
2D boolean array
np.full ((2,3), False)
array([[False, False, False],
[False, False, False]])
2D integer array
np.full ((2,3), 7)
array([[7, 7, 7],
[7, 7, 7]])
3. np.arange, and np.linspace
The essential difference between NumPy linspace and NumPy arange is that linspace enables you to control the precise end value, whereas arange gives you more direct control over the increments between values in the sequence.
np.arange (start = , stop = , step = )
- only ‘stop’ is mandatory
print ('arange (10): stop at 10', np.arange (10))
print ('arange (-1, 10, 2): start at -1,\n \
stop at 10, step size = 2:', np.arange (-1, 10, 2))
arange (10): stop at 10 [0 1 2 3 4 5 6 7 8 9]
arange (-1, 10, 2): start at -1,
stop at 10, step size = 2: [-1 1 3 5 7 9]
np.linspace (start = , stop = , num = )
- creates sequences of evenly spaced values within a defined interval
- num includes the endpoints
np.linspace (0, 100, 5)
array([ 0., 25., 50., 75., 100.])
4. Structured arrays and Field Access
Structured arrays are ndarrays whose datatype is a composition of simpler datatypes organized as a sequence of named fields.
If the ndarray object is a structured array the fields of the array can be accessed by indexing the array with strings, dictionary-like.
Returns a new view to the array
x = np.array ([('Bishop', 1, 44.99), ('Bengio', 2, 39.99),\
('Sutton', 2, 24.99)],\
dtype = [('Author', 'U10'), ('Edition', 'i4'),\
('Price', 'f4')])
print ('x:\n', x)
print ('\nx.shape:\n', x.shape)
print ('\nx [2]:\n', x [2])
print ('\nx ["Author"]:\n', x ['Author'])
print ('\nx ["Author"].shape: same as x.shape:\n', x ['Author'].shape)
print ('\nx ["Price"] = 19.99')
x ['Price'] = 19.99
print ('\nx ["Price"]:\n', x ['Price'])
x:
[('Bishop', 1, 44.99) ('Bengio', 2, 39.99) ('Sutton', 2, 24.99)]
x.shape:
(3,)
x [2]:
('Sutton', 2, 24.99)
x ["Author"]:
['Bishop' 'Bengio' 'Sutton']
x ["Author"].shape: same as x.shape:
(3,)
x ["Price"] = 19.99
x ["Price"]:
[19.99 19.99 19.99]
Structured datatypes are designed to be able to mimic ‘structs’ in the C language, and share a similar memory layout. They are meant for interfacing with C code and for low-level manipulation of structured buffers, for example for interpreting binary blobs. For these purposes they support specialized features such as subarrays, nested datatypes, and unions, and allow control over the memory layout of the structure.
Users looking to manipulate tabular data, such as stored in csv files, may find other pydata projects more suitable, such as xarray, pandas, or DataArray. These provide a high-level interface for tabular data analysis and are better optimized for that use. For instance, the C-struct-like memory layout of structured arrays in numpy can lead to poor cache behavior in comparison.
x = np.zeros((2,2), dtype=[('a', np.int32), ('b', np.float64, (3,3))])
print ('x:\n', x)
print ('\nx.shape:\n', x.shape)
print ('Indexing x["field-name"] returns a new view to the array,\
which is of the same shape as x (except when the field is a sub-array) ')
print ('\nx ["a"]:\n', x ['a'])
print ('\nx ["a"].shape: same as x.shape:\n', x ['a'].shape)
print ('\nx ["b"]:\n', x ['b'])
print ('\nx ["b"].shape: NOT same as x.shape:\n', x ['b'].shape)
x:
[[(0, [[0., 0., 0.], [0., 0., 0.], [0., 0., 0.]])
(0, [[0., 0., 0.], [0., 0., 0.], [0., 0., 0.]])]
[(0, [[0., 0., 0.], [0., 0., 0.], [0., 0., 0.]])
(0, [[0., 0., 0.], [0., 0., 0.], [0., 0., 0.]])]]
x.shape:
(2, 2)
Indexing x["field-name"] returns a new view to the array,which is of the same shape as x (except when the field is a sub-array)
x ["a"]:
[[0 0]
[0 0]]
x ["a"].shape: same as x.shape:
(2, 2)
x ["b"]:
[[[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]]
[[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]]]
x ["b"].shape: NOT same as x.shape:
(2, 2, 3, 3)
C. Array Properties
shape
ndarray.shape
print ('Shape of 2D matrix: ', mat_2d.shape)
print ('Shape of 3D matrix: ', mat_3d.shape)
Shape of 2D matrix: (3, 2)
Shape of 3D matrix: (3, 3, 2)
size
ndarray.size
print ('Size of 2D matrix: ', mat_2d.size)
print ('Size of 3D matrix: ', mat_3d.size)
number of dimensions
ndarray.ndim
print ('Dimensions of 2D matrix: ', mat_2d.ndim)
print ('Dimensions of 3D matrix: ', mat_3d.ndim)
-
TODO TODO TODO
: ndarray.flags, dtype, itemsize, strides
D. Array Broadcasting
Operations between differently sized arrays is called broadcasting
Operations between same sized arrays is called vectorization
1. Introduction
NumPy provides a mechanism for performing mathematical operations on arrays of unequal shapes.
an example
an example of a (3, 4) * (4, ) multiplication
x_2d = np.array ([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12]])
y_1d = np.array ([1, 2, 3, 4])
print ('x.shape:', x_2d.shape)
print ('y.shape:', y_1d.shape)
print ('(x * y).shape:\n', (x_2d * y_1d).shape)
print ('x * y:\n', x_2d * y_1d)
Rules of Broadcasting
To determine if two arrays are broadcast-compatible, align the entries of their shapes such that their trailing dimensions are aligned, and then check that each pair of aligned dimensions satisfy either of the following conditions:
- the aligned dimensions have the same size
- one of the dimensions has a size of 1
The two arrays are broadcast-compatible if either of these conditions are satisfied for each pair of aligned dimensions.
Broadcasting is not reserved for operations between 1-D and 2-D arrays, and furthermore both arrays in an operation may undergo broadcasting. That being said, not all pairs of arrays are broadcast-compatible.
Intuition
- perform broadcasting of an array to a higher dimension
- use this intuition of broadcasting to understand how it happens when performing mathemarical operations on multiple arrays
- np.broadcast_to (df, dim_tuple)
y = np.array([[ 0],
[ 1],
[-1]])
print (y.shape)
np.broadcast_to (y, (3, 3, 2))
an example
an example of a (3, 1, 2) * (3, 1) multiplication
(3, 1, 2)
….(3, 1)
———
(3, 3, 2)
the second rule of broadcasting (one of the dimensions has a size of 1) is applicable
x = np.array([[[0, 1]],
[[2, 3]],
[[4, 5]]])
print ('x.shape:', x.shape)
print ('y.shape:', y.shape)
print ('(x * y).shape:', (x * y).shape)
print ('x * y:\n', x * y)
2. Inserting Size-1 Dimensions into An Array
To tailor the shape for broadcasting
using reshape
print ('row_v:', row_v)
print ('row_v.shape:', row_v.shape)
print ('reshape.shape:', (row_v.reshape (1, row_v.shape[0], 1, 1)).shape)
print ('reshaped:\n', row_v.reshape (1, row_v.shape[0], 1, 1))
TODO TODO TODOnp.ravel - numpy.ravel( M[ : , 0] ) – converts shape from (R, 1) to (R,)
using np.newaxis
ndarray [np.newaxis, :, np.newaxis, np.newaxis]
print ('row_v:', row_v)
print ('row_v.shape:', row_v.shape)
print ('newaxis.shape:', row_v [np.newaxis, :,\
np.newaxis, np.newaxis].shape)
print ('reshaped:\n', row_v [np.newaxis, :,\
np.newaxis, np.newaxis])
3. Using size-1 dimension to make arrays broadcasting compatible
x_1d = np.array([1, 2, 3])
print ('x_1d.shape:', x_1d.shape)
print ('x_1d:\n', x_1d)
x_2d = x_1d.reshape (x_1d.shape[0], 1)
print ('\nx_2d.shape:', x_2d.shape)
print ('x_2d:\n', x_2d)
y_1d = np.array ([4, 5, 6, 7])
print ('\ny_1d.shape:', y_1d.shape)
print ('y_1d:\n', y_1d)
#ValueError: operands could not be broadcast together
# with shapes (3,) (4,)
#print (x_1d * y_1d)
print ('\n(x_2d * y_1d).shape:', (x_2d * y_1d).shape)
print ('x_2d * y_1d:\n', x_2d * y_1d)
print ('\ny_1d * x_2d:\n', y_1d * x_2d)
An interesting application Pairwise Distances can be found at https://www.pythonlikeyoumeanit.com/Module3_IntroducingNumpy/Broadcasting.html#Size-1-Axes-&-The-newaxis-Object
E. Indexing
Note:
- The best way to think about NumPy arrays is that they consist of two parts, a data buffer which is just a block of raw elements, and a view which describes how to interpret the data buffer.
- https://stackoverflow.com/questions/22053050/difference-between-numpy-array-shape-r-1-and-r
1. Selection Object
ndarray [selection_object] selection_object can be
- integers
- a selection tuple
- a slice object
- a Ellipsis object
- a numpy.newaxis object
- a non-tuple sequence object
2. Kinds of indexing
There are three kinds of indexing available:
-
basic slicing
- integers
- slice object
- advanced indexing
-
field access
Which one occurs depends on the selection object.
Note:
- the rules of basic indexing specifically call for a tuple of indices. Supplying a list of indices triggers advanced indexing rather than basic indexing!
print ('2D Array:\n', mat_2d)
print ('Using Basic Indexing [(1, -1)]:\n',\
mat_2d [(1, -1)])
print ('Using Advanced Indexing [[1, -1]]:\n',\
mat_2d [[1, -1]])
print ('Using Advanced Indexing [[1, -1], [-1]]:\n',\
mat_2d [[1, -1], [-1]])
2.1 basic slicing
Note:
- the rules of basic indexing specifically call for a tuple of indices. Supplying a list of indices triggers advanced indexing rather than basic indexing!
All arrays generated by basic slicing are always views of the original array.
Note:
NumPy slicing creates a view instead of a copy as in the case of builtin Python sequences such as string, tuple and list. Care must be taken when extracting a small portion from a large array which becomes useless after the extraction, because the small portion extracted contains a reference to the large original array whose memory will not be released until all arrays derived from it are garbage-collected. In such cases an explicit copy() is recommended.
2.1.1. selection object is integers or tuple
Note:
- In Python, x[(exp1, exp2, …, expN)] is equivalent to x[exp1, exp2, …, expN]; the latter is just syntactic sugar for the former.
- (exp1, exp2, …, expN) is a selection tuple.
print (mat_2d [(2, 1)])
print (mat_2d [2, 1]) #syntactic sugar
print (mat_2d [2][1]) #is this too syntactic sugar? TODO
print (mat_3d [(0,1,1)])
2.1.2. selection object is a slice object
ndarray [slice object]
- ndarray [start : stop : step]
- ‘start : stop : step’ is a slice object
- class slice
(start, stop[, step])
print ('row vector:', row_v)
start = 2
stop = 8
step = 2
print ('Elements of row vector starting at ',\
start, 'with step size of ', step,\
'\n ending at', stop, ': ',\
row_v [start : stop : step])
slicing syntax forms a slice objectbehind the scenes
print (row_v [start : stop : step])
print (row_v [slice (start, stop, step)])
negativestart / stop / step
Note:
- Negative ‘start’ and ‘stop’ are interpreted as n + start and n + stop where n is the number of elements in the corresponding dimension.
- Negative ‘step’ makes stepping go towards smaller indices.
print (row_v)
print ('neg_start pos_stop', row_v [-5 : 8])
print ('neg_start neg_stop', row_v [-5 : -2])
print ('neg_start neg_stop neg_step', row_v [-2 : -5: -2])
defaultindices
Note:
- Assume n is the number of elements in the dimension being sliced. Then, if i is not given it defaults to 0 for k > 0 and n - 1 for k < 0 . If j is not given it defaults to n for k > 0 and -n-1 for k < 0 . If k is not given it defaults to 1.
-
::
is the same as : and means select all indices along this axis.
# if i is not given it defaults to 0 for k > 0
print ('if i is not given it defaults to 0 for k > 0:',\
row_v [:8:1])
# if i is not given it defaults to n - 1 for k < 0
print ('if i is not given it defaults to n - 1 for k < 0:',\
row_v [:8:-1])
# If j is not given it defaults to n for k > 0
print ('If j is not given it defaults to n for k > 0:',\
row_v [0::1])
# If j is not given it defaults to -n-1?? (or -1?) for k < 0
print ('If j is not given it defaults to -n-1?? (or -1?) for k < 0:',\
row_v [5::-1])
# If k is not given it defaults to 1
print ('If k is not given it defaults to 1:',\
row_v[2:7])
print ('ndarray [::]:', row_v [::])
print ('ndarray [:]:', row_v [::])
2.1.3. selection object is a tuple of integers and slice objects
print ('mat_2d:\n', mat_2d)
print ('mat_2d.shape:\n', mat_2d.shape)
print ('mat_2d [0:3:2, 1]:', mat_2d [0:3:2, 1])
An integer, i, returns the same values as i:i+1 except the dimensionality of the returned object is reduced by 1.
print ('row_v:', row_v)
print ('row_v.shape:', row_v.shape)
print ('row_v [2]:', row_v [2])
print ('row_v [2].shape:', row_v [2].shape)
print ('row_v [2:3]:', row_v [2:3])
print ('row_v [2:3].shape:', row_v [2:3].shape)
2.1.4. selection object is a tuple of Ellipsis and slice objects
print ('mat_2d:\n', mat_2d)
print ('mat_2d.shape:\n', mat_2d.shape)
print ('all rows, col_0: mat_2d [..., 0:1:1]:\n', mat_2d [..., 0:1:1])
print ('shape:', mat_2d [..., 0:1:1].shape)
2.1.5. selection object is a tuple of Ellipsis and integers
Ellipsis expands to the number of : objects needed for the selection tuple to index all dimensions. In most cases, this means that length of the expanded selection tuple is x.ndim. There may only be a single ellipsis present.
print ('mat_2d:\n', mat_2d)
print ('mat_2d.shape:\n', mat_2d.shape)
print ('all rows, col_1: mat_2d [..., 1]:\n', mat_2d [..., 1])
print ('all rows, col_1: mat_2d [:, 1]:\n', mat_2d [:, 1])
print ('shape:', mat_2d [..., 1].shape)
print ('Note that the last dimension unfolded.')
print ('Here the shape (3,) means the array is indexed \
by a single index which runs from 0 to 2')
print ('\n\nmat_3d:\n', mat_3d)
print ('mat_3d.shape:\n', mat_3d.shape)
print ('all rows of rows, col_1: mat_3d [..., 1]:\n', mat_3d [..., 1])
print ('all rows of rows, col_1: mat_3d [:, :, 1]:\n', mat_3d [:, :, 1])
print ('shape:', mat_3d [..., 1].shape)
print ('Note that the last dimension unfolded.')
print ('\n\nmat_3d:\n', mat_3d)
print ('mat_3d.shape:\n', mat_3d.shape)
print ('all rows, col_0 of dim2, col_1 of dim3: mat_3d [..., 0, 1]:\n',\
mat_3d [..., 0, 1])
print ('all rows, col_0 of dim2, col_1 of dim3: mat_3d [:, 0, 1]:\n',\
mat_3d [:, 0, 1])
print ('shape:', mat_3d [..., 0, 1].shape)
print ('Note that both the second-last and last dimensions unfolded.')
2.1.6. selection object is the newaxis object
np.newaxis
Each newaxis object in the selection tuple serves to expand the dimensions of the resulting selection by one unit-length dimension. The added dimension is the position of the newaxis object in the selection tuple.
print ('mat_2d:\n', mat_2d)
print ('mat_2d.shape:\n', mat_2d.shape)
print ('\nmat_2d [1]:\n', mat_2d [1],\
'shape:', mat_2d [1].shape)
print ('Expanded dimension: mat_2d [np.newaxis, 1]:\n',\
mat_2d [np.newaxis, 1],\
'shape:', mat_2d [np.newaxis, 1].shape)
print ('\nmat_2d [:, 1]:\n', mat_2d [:, 1],\
'shape:', mat_2d [:, 1].shape)
print ('Expanded dimension: mat_2d [:, 1, np.newaxis]:\n',\
mat_2d [:, 1, np.newaxis],\
'shape:', mat_2d [:, 1, np.newaxis].shape)
np.newaxis can be used to add a dimension to an extracted row or column, to avoid calling ndarray.reshape(), when making matrices size compatible for various operations.
- https://stackoverflow.com/questions/22053050/difference-between-numpy-array-shape-r-1-and-r
print ('mat_2d:\n', mat_2d)
print ('mat_2d.shape:\n', mat_2d.shape)
print ('\nfetch a row: mat_2d [1]:\n', mat_2d [1])
print ('Note the trailing "," : mat_2d [1].shape:', mat_2d [1].shape)
print ('fetch a row: mat_2d [np.newaxis, 1]:\n', mat_2d [np.newaxis, 1],\
'shape:', mat_2d [np.newaxis, 1].shape)
print ('\nfetch a column: mat_2d [:, 1]:\n', mat_2d [:, 1])
print ('Note the trailing "," : mat_2d [:, 1].shape:',\
mat_2d [:, 1].shape)
print ('fetch a column: mat_2d [:, 1, np.newaxis]:\n',\
mat_2d [:, 1, np.newaxis],\
'shape:', mat_2d [:, 1, np.newaxis].shape)
2.1.7 Basic slicing extends Python’s basic concept of slicing to N dimensions.
print ('2D array:\n', mat_2d)
print ('first two elements of col_1:\n', mat_2d [0:2:1, 1:2:1])
# 0:2:1 - selects rows [0,1]
# 1:2:1 - selects cols [1]
print ('all two elements of col_1:\n', mat_2d [:, 1:2:1])
2.2 advanced indexing
2.2.1 introduction
Advanced indexing is triggered when the selection object is:
- a non-tuple sequence object
- an ndarray (of data type integer or bool)
- tuple with at least one
- sequence object, or
- ndarray (of data type integer or bool).
Advanced indexing always returns a copy of the data (contrast with basic slicing that returns a view)
Note:
- the rules of basic indexing specifically call for a tuple of indices. Supplying a list of indices triggers advanced indexing rather than basic indexing!
print ('2D Array:\n', mat_2d)
print ('Using Basic Indexing [(1, -1)]:\n',\
mat_2d [(1, -1)])
print ('Using Advanced Indexing [[1, -1]]:\n',\
mat_2d [[1, -1]])
print ('Using Advanced Indexing [[1, -1], [-1]]:\n',\
mat_2d [[1, -1], [-1]])
The definition of advanced indexing means that x[(1,2,3),] is fundamentally different than x[(1,2,3)]. The latter is equivalent to x[1,2,3] which will trigger basic selection while the former will trigger advanced indexing. It is a sequence.
x = np.array ([[0, 1, 2], [3, 4, 5]])
print ('x:', x)
print ('\nTuple as a selection object: x[(0, 1)]:\n', x[(0, 1)])
print ('\nSequence as a selection object: x[(0, 1), ]:\n', x[(0, 1), ])
print ('\nSequence as a selection object: x[(0, 1), (2), ]:\n', x[(0, 1), (2), ])
2.2.2 integer array indexing
example
print ('mat_3d:\n', mat_3d)
print ('\nmat_3d [[0, 1], [1], [1]]:\n', mat_3d [[0, 1], [1], [1]])
example - get corner elements
x = np.array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
print (x, '\n')
rows = [[0, 0], [3, 3]]
cols = [[0, 2], [0, 2]]
print (x [rows, cols])
example - get corner elements - using broadcasting - using np.ix_
rows = np.array ([0, 3])
cols = np.array ([0, 2])
print ('rows:', rows)
print ('cols', cols)
print ('\nindex for broadcasting: rows [:, np.newaxis]:\n',\
rows [:, np.newaxis])
print ('rows [:, np.newaxis].shape: ', rows [:, np.newaxis].shape)
print ('cols.shape:', cols.shape)
print ('broadcasting index will be (2, 2)')
print ('\nx [rows [:, np.newaxis], cols]:\n',\
x [rows [:, np.newaxis], cols])
print ('\nx [np.ix_ (rows, cols)]:\n', x [np.ix_ (rows, cols)])
2.2.3 boolean array indexing
ndarray.nonzero ()
x = np.array ([1, 2, 0, 7, 4, 2, 0, 6, 8])
print ('x:\n', x)
print ('x.nonzero () returns a tuple:', x.nonzero ())
print ('get non-zero elements:', x[x.nonzero ()])
np.isnan (ndarray)
x = np.array([[1., 2.], [np.nan, 3.], [np.nan, np.nan]])
print ('x:\n', x)
print ('\nnp.isnan (x):\n', np.isnan (x))
#bool_array = np.isnan (x)
#print ('bool_array:\n', bool_array)
#idx = bool_array.nonzero ()
#print ('idx:\n', idx)
print ('\nx [~np.isnan (x)]:\n', x [~np.isnan (x)])
ndarray [condition]
x = np.array ([1, 2, 0, 7, 4, 2, 0, 6, 8])
print ('x:\n', x)
print ('\nx [x < 7]:\n', x [x < 7])
In general if an index includes a Boolean array, the result will be identical to inserting sel_obj.nonzero() into the same position and using the integer array indexing mechanism
x[ind_1, boolean_array, ind_2] is equivalent to x[(ind_1,) + boolean_array.nonzero() + (ind_2,)].
x = np.array ([1, 2, 0, 7, 4, 2, 0, 6, 8])
print ('x:\n', x)
print ('\nx < 5:\n', x < 5)
print ('\nx < 5:\n', (x < 5).nonzero ())
print ('\nx [(x < 5).nonzero ()]:\n', x [(x < 5).nonzero ()])
select all rows which sum up to less or equal two
x = np.array([[0, 1], [1, 1], [2, 2]])
print ('x:\n', x)
print ('\nx.shape:\n', x.shape)
rowsum = x.sum (-1)
print ('\nrowsum: x.sum (-1): \n', rowsum)
print ('\nx [rowsum <= 2, :]:\n', x [rowsum <= 2, :])
if rowsum would have two dimensions as well - keepdims
x = np.array([[0, 1], [1, 1], [2, 2]]) print (‘x:\n’, x) print (‘\nx.shape:\n’, x.shape) rowsum = x.sum (-1, keepdims = True) print (‘\nrowsum: x.sum (-1): \n’, rowsum) print (‘\nrowsum.shape \n’, rowsum.shape) #print (‘\nrowsum.nonzero() \n’, rowsum.nonzero ()) print (‘\nx [rowsum <= 2]: no need of “, :”\n’) print (x [rowsum <= 2])
x [condition] += n
Note:
- x is on LHS, so, the origiinal x gets replaced by the copy that advanced indexing created
x = np.array ([1, -2, 0, 7, -4, 2, 0, -6, 8])
print ('x:\n', x)
print ('\nx [x < 0] += 20')
x [x < 0] += 20
print ('\nx:\n', x)
2.3 field access
Refer ‘Structured arrays and Field Access’ in the preliminaries section.
3. flat iterator indexing
x.flat returns an iterator that will iterate over the entire array. This iterator object can also be indexed using basic slicing or advanced indexing as long as the selection object is not a tuple. This should be clear from the fact that x.flat is a 1-dimensional view.
https://www.geeksforgeeks.org/numpy-indexing/
x = np.array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
print ('x:\n', x)
print ('\nx [0:2]:\n', x [0:2])
print ('\nx.flat [0:2]:\n', x.flat [0:2])
F. Vectorized Operations
Operations between differently sized arrays is called broadcasting
Operations between same sized arrays is called vectorization
a. Introduction
https://www.pythonlikeyoumeanit.com/Module3_IntroducingNumpy/VectorizedOperations.html
1. data
x = np.array([[ 0., 1., 2.],
[ 3., 4., 5.],
[ 6., 7., 8.]])
y = np.array([[-4. , -3.5, -3. ],
[-2.5, -2. , -1.5],
[-1. , -0.5, -0. ]])
The examples that follow are based on a taxonomy of:
- operations on elements of a single array
- operations on ‘corresponding’ elements of two arrays
- operations on elements of a single array using the same scalar
- summary operations on elements of a single array
Refer ‘Taxonomy of Operations’ further below for another taxonomy.
2. example of operations on entries of a single array
unaryoperations?
Operations on a single array are not necessarily unaryexample: the logical (binary) operation x < 6
Arithmetic operations with scalars are as you would expect, propagating the value to each element
square - this is a binary operation : x ** y
x ** 2
array([[ 0., 1., 4.],
[ 9., 16., 25.],
[36., 49., 64.]])
np.sqrt (ndarray)
np.sqrt (x)
array([[0. , 1. , 1.41421356],
[1.73205081, 2. , 2.23606798],
[2.44948974, 2.64575131, 2.82842712]])
operations on a slice of an array
print ('x:\n', x)
print ('\nAdd 0.5 to the second column')
print ('\n0.5 + x [:, 1]:\n', 0.5 + x [:, 1])
x:
[[0. 1. 2.]
[3. 4. 5.]
[6. 7. 8.]]
Add 0.5 to the second column
0.5 + x [:, 1]:
[1.5 4.5 7.5]
- this is a binary operation : x < y
print ('x < 5:\n', x < 5)
x < 5:
[[ True True True]
[ True True False]
[False False False]]
3. mathematical operations performed between two arrays are designed to act on the corresponding pairs of entries between the two arrays
binaryoperations?
+ operator
print ('x:\n', x)
print ('\ny:\n', y)
print ('\nx + y:\n', x + y)
x:
[[0. 1. 2.]
[3. 4. 5.]
[6. 7. 8.]]
y:
[[-4. -3.5 -3. ]
[-2.5 -2. -1.5]
[-1. -0.5 -0. ]]
x + y:
[[-4. -2.5 -1. ]
[ 0.5 2. 3.5]
[ 5. 6.5 8. ]]
* operator
print ('x:\n', x)
print ('\ny:\n', y)
print ('\nx * y:\n', x * y)
x:
[[0. 1. 2.]
[3. 4. 5.]
[6. 7. 8.]]
y:
[[-4. -3.5 -3. ]
[-2.5 -2. -1.5]
[-1. -0.5 -0. ]]
x * y:
[[-0. -3.5 -6. ]
[-7.5 -8. -7.5]
[-6. -3.5 -0. ]]
in x*y [0][0]
np.dot and np.multiply
This function returns the dot product of two arrays. For 1-D arrays, it is the inner product of the vectors.
np.dot(np.array([1, -3, 4]), np.array([2, 0, 1]))
6
For 2-D vectors, it is the equivalent to matrix multiplication. np.matmul
print ('x:\n', x)
print ('\ny:\n', y)
print ('\nnp.dot (x, y):\n', np.dot (x, y))
print ('\nnp.matmul (x, y):\n', np.matmul (x, y))
x:
[[0. 1. 2.]
[3. 4. 5.]
[6. 7. 8.]]
y:
[[-4. -3.5 -3. ]
[-2.5 -2. -1.5]
[-1. -0.5 -0. ]]
np.dot (x, y):
[[ -4.5 -3. -1.5]
[-27. -21. -15. ]
[-49.5 -39. -28.5]]
np.matmul (x, y):
[[ -4.5 -3. -1.5]
[-27. -21. -15. ]
[-49.5 -39. -28.5]]
Is matmul a vectorized operation?
np.multiply
print ('x:\n', x)
print ('\ny:\n', y)
print ('\nnp.multiply (x, y):\n', np.multiply (x, y))
x:
[[0. 1. 2.]
[3. 4. 5.]
[6. 7. 8.]]
y:
[[-4. -3.5 -3. ]
[-2.5 -2. -1.5]
[-1. -0.5 -0. ]]
np.multiply (x, y):
[[-0. -3.5 -6. ]
[-7.5 -8. -7.5]
[-6. -3.5 -0. ]]
4. Operate on sequences of numbers
Sequential functionscan act on an array’s entries as if they form a single sequence, or act on subsequences of the array’s entries, according to the array’s axes.
sum
print ('x:\n', x)
print ('\nsum (x):\n', sum (x))
print ('\nnp.sum (x):\n', np.sum (x))
print ('\nnp.sum (x, axis = 0):\n', np.sum (x, axis = 0))
print ('\nnp.sum (x, axis = 1):\n', np.sum (x, axis = 1))
x:
[[0. 1. 2.]
[3. 4. 5.]
[6. 7. 8.]]
sum (x):
[ 9. 12. 15.]
np.sum (x):
36.0
np.sum (x, axis = 0):
[ 9. 12. 15.]
np.sum (x, axis = 1):
[ 3. 12. 21.]
Why is python sum summing up columns? , and returning ndarray too?
b. Taxonomy of Operations
i. Mathematical Operations
1. Unary Functions: f (x)
np.sqrt (ndarray)
np.sqrt (x)
array([[0. , 1. , 1.41421356],
[1.73205081, 2. , 2.23606798],
[2.44948974, 2.64575131, 2.82842712]])
np.log (ndarray) - ln (x)
np.log (x)
/opt/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:1: RuntimeWarning: divide by zero encountered in log
"""Entry point for launching an IPython kernel.
array([[ -inf, 0. , 0.69314718],
[1.09861229, 1.38629436, 1.60943791],
[1.79175947, 1.94591015, 2.07944154]])
np.exp (ndarray) - e^x
np.exp (x)
array([[1.00000000e+00, 2.71828183e+00, 7.38905610e+00],
[2.00855369e+01, 5.45981500e+01, 1.48413159e+02],
[4.03428793e+02, 1.09663316e+03, 2.98095799e+03]])
2. Binary Functions: f (x, y)
There are two cases that we must consider when working with binary functions, in the context of NumPy arrays:
- When both operands of the function are arrays (of the same shape).
- When one operand of the function is a scalar (i.e. a single number) and the other is an array.
+ operator
print ('x:\n', x)
print ('\ny:\n', y)
print ('\nx + y:\n', x + y)
print ('\nScalar operand: x + 5:\n', x + 5)
x:
[[0. 1. 2.]
[3. 4. 5.]
[6. 7. 8.]]
y:
[[-4. -3.5 -3. ]
[-2.5 -2. -1.5]
[-1. -0.5 -0. ]]
x + y:
[[-4. -2.5 -1. ]
[ 0.5 2. 3.5]
[ 5. 6.5 8. ]]
Scalar operand: x + 5:
[[ 5. 6. 7.]
[ 8. 9. 10.]
[11. 12. 13.]]
square - this is a binary operation : x ** y and np.power - one operand is a scalar
print ('x:\n', x)
print ('\nx ** 2:\n', x ** 2)
print ('\nnp.power (x, 3):\n', np.power (x, 3))
x:
[[0. 1. 2.]
[3. 4. 5.]
[6. 7. 8.]]
x ** 2:
[[ 0. 1. 4.]
[ 9. 16. 25.]
[36. 49. 64.]]
np.power (x, 3):
[[ 0. 1. 8.]
[ 27. 64. 125.]
[216. 343. 512.]]
np.maximum
print ('x:\n', x)
print ('\ny:\n', y)
print ('\nnp.maximum (x, y):\n', np.maximum (x, y))
x:
[[0. 1. 2.]
[3. 4. 5.]
[6. 7. 8.]]
y:
[[-4. -3.5 -3. ]
[-2.5 -2. -1.5]
[-1. -0.5 -0. ]]
np.maximum (x, y):
[[0. 1. 2.]
[3. 4. 5.]
[6. 7. 8.]]
3. functions that operate on sequence of numbers: f ({x_i} i = 0..n-1)
np.mean
np.median
np.var - variance
np.std - standard deviation
np.max - see binary function ‘np.maximum’ for array of maximum elements among corresponding elements of the two arrays
np.min - minimum element of the array
np.argmax - index (sequential) of the maximum element of the array
np.argmin
np.sum
print ('x:\n', x)
print ('\nnp.mean (x):\n', np.mean (x))
print ('\nnp.median (x):\n', np.median (x))
print ('\nnp.var (x):\n', np.var (x))
print ('\nnp.std (x):\n', np.std (x))
print ('\nnp.max (x):\n', np.max (x))
print ('\nnp.min (x):\n', np.min (x))
print ('\nnp.argmax (x):\n', np.argmax (x))
print ('\nnp.argmin (x):\n', np.argmin (x))
print ('\nnp.sum (x):\n', np.sum (x))
x:
[[0. 1. 2.]
[3. 4. 5.]
[6. 7. 8.]]
np.mean (x):
4.0
np.median (x):
4.0
np.var (x):
6.666666666666667
np.std (x):
2.581988897471611
np.max (x):
8.0
np.min (x):
0.0
np.argmax (x):
8
np.argmin (x):
0
np.sum (x):
36.0
3.1 np.func (ndarray, axis = )
print ('x:\n', x)
print ('\nnp.sum (x, axis = 0):\n', np.sum (x, axis = 0))
print ('\nnp.sum (x, axis = 1):\n', np.sum (x, axis = 1))
x:
[[0. 1. 2.]
[3. 4. 5.]
[6. 7. 8.]]
np.sum (x, axis = 0):
[ 9. 12. 15.]
np.sum (x, axis = 1):
[ 3. 12. 21.]
another example
x = np.arange(24).reshape(4,2,3)
print ('x:\n', x)
print ('\nx.shape:\n', x.shape)
print ('\nnp.sum (x, axis = 0):\n', np.sum (x, axis = 0))
print ('\nnp.sum (x, axis = 1):\n', np.sum (x, axis = 1))
print ('\nnp.sum (x, axis = 2):\n', np.sum (x, axis = 2))
print ('\nnp.sum (x, axis = (0, 1)):\n', np.sum (x, axis = (0,1)))
print ('2+5+8+11+14+17+20+23 = :', 2+5+8+11+14+17+20+23)
x:
[[[ 0 1 2]
[ 3 4 5]]
[[ 6 7 8]
[ 9 10 11]]
[[12 13 14]
[15 16 17]]
[[18 19 20]
[21 22 23]]]
x.shape:
(4, 2, 3)
np.sum (x, axis = 0):
[[36 40 44]
[48 52 56]]
np.sum (x, axis = 1):
[[ 3 5 7]
[15 17 19]
[27 29 31]
[39 41 43]]
np.sum (x, axis = 2):
[[ 3 12]
[21 30]
[39 48]
[57 66]]
np.sum (x, axis = (0, 1)):
[ 84 92 100]
2+5+8+11+14+17+20+23 = : 100
ii. Logical Operations
Binary operations
data
x = np.array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
example
print ('x:\n', x)
print ('\nx.shape:\n', x.shape)
print ('\nx < 6:\n', x < 6)
x:
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]]
x.shape:
(4, 4)
x < 6:
[[ True True True True]
[ True True False False]
[False False False False]
[False False False False]]
Sequence operations
np.allclose
You should never rely on two floating point numbers being exactly equal. Rather, you should require that they are sufficiently “close” in value. In this same vein, you ought not check that the entries of two float-type arrays are precisely equal. Towards this end, the function allclose can be used to verify that all corresponding pairs of entries between two arrays are approximately equal in value
x = np.array([0.1, 0.2, 0.3])
y = np.array([1., 2., 3.]) / 10
print ('x:\n', x)
print ('\ny:\n', y)
print ('\nnp.allclose(x, y):\n', np.allclose(x, y))
x:
[0.1 0.2 0.3]
y:
[0.1 0.2 0.3]
np.allclose(x, y):
True
c. Performance of Vectorized Operations
measure time using timeit
import timeit
element_count = 10000000 #how many elements in the array
thread_invoc_count = 10 #how many times to run the thread
#data
x = np.random.standard_normal (element_count)
#thread callables
def python_sum ():
return sum (x)
def python_loop_sum ():
sum = 0
for i in x:
sum += i
return sum
def numpy_sum ():
return np.sum (x)
#threads
t1 = timeit.timeit (python_sum, number = thread_invoc_count)
t2 = timeit.timeit (python_loop_sum, number = thread_invoc_count)
t3 = timeit.timeit (numpy_sum, number = thread_invoc_count)
#call threads
print ('python_sum thread took', t1, 'seconds.')
print ('python_loop_sum thread took', t2, 'seconds.')
print ('numpy_sum thread took', t3, 'seconds.')
python_sum thread took 21.519882301000052 seconds.
python_loop_sum thread took 29.515229176000048 seconds.
numpy_sum thread took 0.06145643499985454 seconds.
with: element_count = 10000000 thread_invoc_count = 10
- python_sum thread took 21.53 seconds.
- python_loop_sum thread took 29.51 seconds.
- numpy_sum thread took 0.06 seconds.
G. Linear Algebra
https://becominghuman.ai/an-essential-guide-to-numpy-for-machine-learning-in-python-5615e1758301
multiple linear regression - http://www2.lawrence.edu/fast/GREGGJ/Python/numpy/numpyLA.html
np.linalg
data
#Create a Matrix
matrix = np.array([[1,2,3],[4,5,6],[7,8,9]])
print('\nmatrix:\n', matrix)
matrix:
[[1 2 3]
[4 5 6]
[7 8 9]]
ndarray.T - transpose - np.transpose
print('\nmatrix:\n', matrix)
print('\nmatrix.T:\n', matrix.T)
print('\nnp.transpose:\n', np.transpose (matrix))
matrix:
[[1 2 3]
[4 5 6]
[7 8 9]]
matrix.T:
[[1 4 7]
[2 5 8]
[3 6 9]]
np.transpose:
[[1 4 7]
[2 5 8]
[3 6 9]]
np.linalg.det (ndarray) and np.linalg.matrix_rank
def is_det_zero (det):
return np.allclose ([0], [det])
print('\nmatrix:\n', matrix)
det = np.linalg.det(matrix)
print('\nnp.linalg.det:\n', det)
print ('\nis determinant zero?:\n', is_det_zero (det))
rank = np.linalg.matrix_rank(matrix)
print('\nnp.linalg.matrix_rank:\n', rank)
matrix:
[[1 2 3]
[4 5 6]
[7 8 9]]
np.linalg.det:
-9.51619735392994e-16
is determinant zero?:
True
np.linalg.matrix_rank:
2
ndarray.diagonal
print('\nmatrix:\n', matrix)
print('\nThe Principal diagonal:\n', matrix.diagonal())
print('\nThe diagonal at offset 1:\n', matrix.diagonal(offset=1))
print('\nThe diagonal at offset -1:\n', matrix.diagonal(offset=-1))
print('\nThe diagonal at offset -2:\n', matrix.diagonal(offset=-2))
matrix:
[[1 2 3]
[4 5 6]
[7 8 9]]
The Principal diagonal:
[1 5 9]
The diagonal at offset 1:
[2 6]
The diagonal at offset -1:
[4 8]
The diagonal at offset -2:
[7]
ndarray.trace
The trace of a matrix is the sum of its diagonal components.
print('\nmatrix:\n', matrix)
trace = matrix.trace ()
print ('\ntrace:\n', trace)
matrix:
[[1 2 3]
[4 5 6]
[7 8 9]]
trace:
15
ndarray.eig - eigenvalues and eigenvectors
print('\nmatrix:\n', matrix)
evalues, evectors = np.linalg.eig (matrix)
print ('\nEigenvalues:\n', evalues)
print ('\nEigenvectors:\n', evectors)
matrix:
[[1 2 3]
[4 5 6]
[7 8 9]]
Eigenvalues:
[ 1.61168440e+01 -1.11684397e+00 -9.75918483e-16]
Eigenvectors:
[[-0.23197069 -0.78583024 0.40824829]
[-0.52532209 -0.08675134 -0.81649658]
[-0.8186735 0.61232756 0.40824829]]
np.linalg.inv - inverse of a matrix
print('\nmatrix:\n', matrix)
inverse = np.linalg.inv (matrix)
print ('\ninverse:\n', inverse)
matrix:
[[1 2 3]
[4 5 6]
[7 8 9]]
inverse:
[[ 3.15251974e+15 -6.30503948e+15 3.15251974e+15]
[-6.30503948e+15 1.26100790e+16 -6.30503948e+15]
[ 3.15251974e+15 -6.30503948e+15 3.15251974e+15]]
solving systems of linear equations - Ax = b : solve for x
A = np.array([[2,1,-2],[3,0,1],[1,1,-1]])
b = np.transpose(np.array([[-3,5,-2]]))
print('\nA:\n', A)
print('\nb:\n', b)
x = np.linalg.solve (A, b)
print ('\nAx = b. x =:\n', x)
A:
[[ 2 1 -2]
[ 3 0 1]
[ 1 1 -1]]
b:
[[-3]
[ 5]
[-2]]
Ax = b. x =:
[[ 1.]
[-1.]
[ 2.]]
H. Random Sampling from Distributions
TODO Note:
- randn generates samples from the normal distribution, while numpy. random. rand from uniform (in range [0,1)).
- np.cumsum (np.random.randn(10,1)) - plot
i. np.random.randn and np.random.standard_normal
Specific Normal Distribution
- mean = 0
- variance = 1
randn
backward compatibility with Matlab
- takes dimensions as individual parameters
randn (n) returns a 1D ndarray
randn (3)
array([-0.69781929, 0.76250905, -0.00100293])
randn (m, n) returns a 2D (m x n) ndarray
randn (3, 2)
array([[ 0.01257397, -1.09267948],
[ 0.43461778, -0.9010176 ],
[ 0.94876503, -0.06544939]])
randn (i, j, k) returns a 3D (i x j x k) ndarray
randn (2, 3, 4)
array([[[-0.18420962, -2.80150569, -1.94776301, 0.58616938],
[-0.16765517, 0.0843139 , 0.88771571, 0.05693744],
[ 0.88764414, -0.92584994, 0.96424221, 2.3480603 ]],
[[-0.14195735, -0.03706071, 0.19416724, -1.05178575],
[-0.66259882, -1.4020511 , 0.87980418, -0.7594163 ],
[-0.07895493, 0.68616642, -1.58868401, 1.62971673]]])
plot
sns.kdeplot (randn (500))
plt.show ()
standard_normal
NumPy-centric
- takes dimensions as a tuple
- This allows other parameters like dtype and order to be passed to the function as well.
standard_normal ((m, n)) returns a 2D (m x n) ndarray
standard_normal ((2, 4))
array([[-1.86646271, -0.19924265, 1.13467334, -0.17763385],
[-0.21805904, -0.47804114, 0.5908614 , 1.49768637]])
plot
sns.kdeplot (standard_normal (500))
plt.show ()
ii. np.random.normal
Generic Normal Distribution
- mean: loc =
- variance: scale =
NumPy-centric
- takes dimensions as a tuple
- This allows other parameters like loc and scale to be passed to the function as well.
normal (loc = , scale = , (m, n)) returns a 2D (m x n) ndarray with values having ‘loc’ as mean, and ‘scale’ as variance
normal (loc=15.0, scale=5.0, size=(2,3))
array([[ 7.8948123 , 4.39815973, 9.39660432],
[20.40527397, 19.64774283, 9.2247971 ]])
plot
sns.kdeplot (normal (loc = 15.0, scale = 5.0, size = 500))
plt.show ()
iii. np.random.seed and np.random.RandomState
https://stackoverflow.com/questions/5836335/consistently-create-same-random-numpy-array/5837352#5837352