NUMPY简介¶

NumPy是使用Python进行科学计算的基础包。它包含如下内容:

一个强大的N维数组对象
复杂的（广播）功能
用于集成C / C ++和Fortran代码的工具
有用的线性代数，傅里叶变换和随机数功能

除了明显的科学用途外，NumPy还可以用作通用数据的高效多维容器。可以定义任意数据类型。这使NumPy能够无缝快速地与各种数据库集成。

安装方式¶

安装方式有很多种，比如通过源码安装，通过wheel文件安装等，其实最方便最常用的还是通过pip安装:

pip install numpy

新手入门¶

入门条件¶

在阅读本教程之前，您应该了解一些Python。如果您想重温记忆，请查看Python教程。

如果您希望使用本教程中的示例，则还必须在计算机上安装某些软件。有关说明，请参阅 http://scipy.org/install.html。

基础知识¶

NumPy的主要对象是同构多维数组。它是一个元素表（通常是数字），都是相同的类型，由正整数元组索引。在NumPy维度中称为轴。
例如，3D空间中的点的坐标[1, 2, 1]具有一个轴。该轴有3个元素，所以我们说它的长度为3.在下面所示的例子中，数组有2个轴。第一轴的长度为2，第二轴的长度为3。

In [11]:

import numpy as np
a=np.array([
    [1,2,3],
    [4,5,6]
])
print('第一个轴的长度是:{}'.format(len(a)))
print('第二个轴的长度是:{}'.format(len(a[0])))

第一个轴的长度是:2
第二个轴的长度是:3

numpy的数组类是ndarray，它的别名是array。请注意，numpy.array这与标准Python库类array.array不同，后者仅处理一维数组并提供较少的功能。ndarray对象更重要的属性是：

ndarray.ndim¶

数组的轴（维度）的个数。在Python世界中，维度的数量被称为rank。

ndarray.shape¶

数组的维度。这是一个整数的元组，表示每个维度中数组的大小。对于有n行和m列的矩阵，shape将是(n,m)。因此，shape元组的长度就是rank或维度的个数 ndim。

ndarray.size¶

数组元素的总数。这等于shape的元素的乘积。

ndarray.dtype¶

一个描述数组中元素类型的对象。可以使用标准的Python类型创建或指定dtype。另外NumPy提供它自己的类型。例如numpy.int32、numpy.int16和numpy.float64。

ndarray.itemsize¶

数组中每个元素的字节大小。例如，元素为 float64 类型的数组的 itemsize 为8（=64/8），而 complex32 类型的数组的 itemsize 为4（=32/8）。它等于 ndarray.dtype.itemsize 。

ndarray.data¶

该缓冲区包含数组的实际元素。通常，我们不需要使用此属性，因为我们将使用索引访问数组中的元素。

下面举了两个例子，注意对比理解属性的变化

In [12]:

import numpy as np
a=np.array([
    [1,2,3],
    [4,5,6]
])
print('ndarray.ndim:',a.ndim)
print('ndarray.shape:',a.shape)
print('ndarray.size:',a.size)
print('ndarray.dtype:',a.dtype)
print('ndarray.itemsize:',a.itemsize)

ndarray.ndim: 2
ndarray.shape: (2, 3)
ndarray.size: 6
ndarray.dtype: int32
ndarray.itemsize: 4

In [13]:

import numpy as np
a=np.arange(32).reshape(4,4,2)
print(a)
print('ndarray.ndim:',a.ndim)
print('ndarray.shape:',a.shape)
print('ndarray.size:',a.size)
print('ndarray.dtype:',a.dtype)
print('ndarray.itemsize:',a.itemsize)

[[[ 0  1]
  [ 2  3]
  [ 4  5]
  [ 6  7]]

 [[ 8  9]
  [10 11]
  [12 13]
  [14 15]]

 [[16 17]
  [18 19]
  [20 21]
  [22 23]]

 [[24 25]
  [26 27]
  [28 29]
  [30 31]]]
ndarray.ndim: 3
ndarray.shape: (4, 4, 2)
ndarray.size: 32
ndarray.dtype: int32
ndarray.itemsize: 4

数组的创建¶

有几种创建数组的方法。例如，你可以使用array函数从常规Python列表或元组中创建数组。得到的数组的类型是从Python列表中元素的类型推导出来的。

从python数据中创建¶

In [14]:

import numpy as np
a=np.array([1,2,3,4,5])
print('a is :',a)
print('a\'s type is :',type(a))
print('a\'s dtype is :',a.dtype)

a is : [1 2 3 4 5]
a's type is : <class 'numpy.ndarray'>
a's dtype is : int32

一个常见的错误在于使用多个数值参数调用 array 函数，而不是提供一个数字列表（List）作为参数。

a = np.array(1,2,3,4)    # WRONG
a = np.array([1,2,3,4])  # RIGHT

用numpy生成占位符¶

通常，数组的元素最初是未知的，但它的大小是已知的。因此，NumPy提供了几个函数来创建具有初始占位符内容的数组。这就减少了数组增长的必要，因为数组增长的操作花费很大。

函数 zeros 创建一个由0组成的数组，函数 ones 创建一个由1数组的数组，函数 empty 内容是随机的并且取决于存储器的状态。默认情况下，创建的数组的dtype是 float64。

In [15]:

import numpy as np
a=np.zeros((4,2))
b=np.ones((4,2))
c=np.empty((4,2))
print('result of np.zeros((4,2)) is :\n',a,'\n')
print('result of np.ones((4,2)) is :\n',b,'\n')
print('result of np.empty((4,2)) is :\n',c,'\n')

result of np.zeros((4,2)) is :
 [[0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]] 

result of np.ones((4,2)) is :
 [[1. 1.]
 [1. 1.]
 [1. 1.]
 [1. 1.]] 

result of np.empty((4,2)) is :
 [[0.00000000e+000 0.00000000e+000]
 [0.00000000e+000 0.00000000e+000]
 [0.00000000e+000 6.18570189e-321]
 [3.11510340e-307 3.11521375e-307]]

使用arange函数创建¶

要创建数字序列，NumPy提供了一个类似于 range 的函数，该函数返回数组而不是列表。

In [16]:

import numpy as np
a=np.arange(10,30,5)#10代表起始数，30代表终止数，5代表步长
print('a is :',a)
print('a\'s type is :',type(a))
print('a\'s dtype is :',a.dtype)

a is : [10 15 20 25]
a's type is : <class 'numpy.ndarray'>
a's dtype is : int32

当 arange 与浮点参数一起使用时，由于浮点数的精度是有限的，通常不可能预测获得的元素数量。出于这个原因，通常最好使用函数 linspace ，它接收我们想要的元素数量而不是步长作为参数：

In [17]:

from numpy import pi
a=np.linspace(0,2,9)#从0到2之间取9个数
b=np.linspace( 0, 2*pi, 10 )# 在取值数量很多时适用
c=np.sin(b)
print('a is :\n',a,'\n')
print('a\'s type is :',type(a),'\n')
print('a\'s dtype is :',a.dtype,'\n')

print('b is :\n',b,'\n')
print('b\'s type is :',type(b),'\n')
print('b\'s dtype is :',b.dtype,'\n')

print('c is :\n',c,'\n')
print('c\'s type is :',type(c),'\n')
print('c\'s dtype is :',c.dtype,'\n')

a is :
 [0.   0.25 0.5  0.75 1.   1.25 1.5  1.75 2.  ] 

a's type is : <class 'numpy.ndarray'> 

a's dtype is : float64 

b is :
 [0.         0.6981317  1.3962634  2.0943951  2.7925268  3.4906585
 4.1887902  4.88692191 5.58505361 6.28318531] 

b's type is : <class 'numpy.ndarray'> 

b's dtype is : float64 

c is :
 [ 0.00000000e+00  6.42787610e-01  9.84807753e-01  8.66025404e-01
  3.42020143e-01 -3.42020143e-01 -8.66025404e-01 -9.84807753e-01
 -6.42787610e-01 -2.44929360e-16] 

c's type is : <class 'numpy.ndarray'> 

c's dtype is : float64

另见：¶

array, zeros, zeros_like, ones, ones_like, empty, empty_like, arange, linspace, numpy.random.rand, numpy.random.randn, fromfunction, fromfile

打印数组¶

当你打印数组时，NumPy以与嵌套列表类似的方式显示它，但是具有以下布局：

最后一个轴从左到右打印，
倒数第二个从上到下打印，
其余的也从上到下打印，每个切片与下一个用空行分开。一维数组被打印为行、二维为矩阵和三维为矩阵列表。

In [18]:

a = np.arange(6)  
print('一维打印')
print(a)#一维打印
b = np.arange(12).reshape(4,3)
print('二维打印')
print(b)#二维打印
c = np.arange(24).reshape(2,3,4)
print('三维打印')
print(c)#三维打印

一维打印
[0 1 2 3 4 5]
二维打印
[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]
三维打印
[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]

In [19]:

#如果数组太大而无法打印，NumPy将自动跳过数组的中心部分并仅打印角点：
print(np.arange(10000))
print(np.arange(10000).reshape(100,100))

[   0    1    2 ... 9997 9998 9999]
[[   0    1    2 ...   97   98   99]
 [ 100  101  102 ...  197  198  199]
 [ 200  201  202 ...  297  298  299]
 ...
 [9700 9701 9702 ... 9797 9798 9799]
 [9800 9801 9802 ... 9897 9898 9899]
 [9900 9901 9902 ... 9997 9998 9999]]

In [20]:

#要禁用此行为并强制NumPy打印整个数组，你可以使用 set_printoptions 更改打印选项。
np.set_printoptions(threshold=np.nan)

基本操作¶

在数组元素上进行数学运算，产生新的数组。

In [21]:

import numpy as np
a=np.array([1,2,3,4])
b=np.arange(4)
c=a-b
d=a**2
e=10*np.sin(a)
f=a<3
print('a is：\n',a,'\n')
print('b is：\n',b,'\n')
print('c(a-b) is：\n',c,'\n')
print('d(a**2) is：\n',d,'\n')
print('e(10*np.sin(a)) is：\n',e,'\n')
print('c(a<3) is：\n',f,'\n')

a is：
 [1 2 3 4] 

b is：
 [0 1 2 3] 

c(a-b) is：
 [1 1 1 1] 

d(a**2) is：
 [ 1  4  9 16] 

e(10*np.sin(a)) is：
 [ 8.41470985  9.09297427  1.41120008 -7.56802495] 

c(a<3) is：
 [ True  True False False]

与许多矩阵语言不同，乘法运算符 * 的运算在NumPy数组中是元素级别的，也称为哈达马积。矩阵乘积可以使用@运算符（在python> = 3.5中）或dot函数或方法执行：

In [22]:

a=np.array( [[1,1],[0,1]])
b=np.array( [[2,0],[3,4]])
c=a*b#元素乘积
d=a@b#矩阵乘积
e=a.dot(b)#另一种矩阵乘积的方式
print('a is：\n',a,'\n')
print('b is：\n',b,'\n')
print('c(a*b) is：\n',c,'\n')
print('d(a@b) is：\n',d,'\n')
print('e(a.dot(b)) is：\n',e,'\n')

a is：
 [[1 1]
 [0 1]] 

b is：
 [[2 0]
 [3 4]] 

c(a*b) is：
 [[2 0]
 [0 4]] 

d(a@b) is：
 [[5 4]
 [3 4]] 

e(a.dot(b)) is：
 [[5 4]
 [3 4]]

某些操作（例如+=和*=）适用于修改现有数组，而不是创建新数组。

In [26]:

a = np.ones((2,3), dtype=int)
print('a is :\n',a,'\n')
a *= 3
print('a *= 3 is :\n',a,'\n')
b = np.random.random((2,3))
print('b is :\n',b,'\n')
b +=a
print('b +=a is :\n',b,'\n')

a += b                  # b不会自动转换成int类型，会报错

a is :
 [[1 1 1]
 [1 1 1]] 

a *= 3 is :
 [[3 3 3]
 [3 3 3]] 

b is :
 [[0.49859704 0.93300205 0.03490493]
 [0.04724951 0.45715042 0.38407476]] 

b +=a is :
 [[3.49859704 3.93300205 3.03490493]
 [3.04724951 3.45715042 3.38407476]]

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-26-b1ab720e80e5> in <module>
      8 print('b +=a is :\n',b,'\n')
      9 
---> 10 a += b                  # b不会自动转换成int类型，会报错

TypeError: Cannot cast ufunc add output from dtype('float64') to dtype('int32') with casting rule 'same_kind'

当使用不同类型的数组操作时，结果数组的类型对应于更一般或更精确的数组（称为向上转换的行为）。

In [30]:

a = np.ones(3, dtype=np.int32)
b = np.linspace(0,pi,3)
c = a+b
d = np.exp(c*1j)
print('a.dtype.name is ',a.dtype.name)
print('b.dtype.name is ',b.dtype.name)
print('c.dtype.name is ',c.dtype.name)
print('d.dtype.name is ',d.dtype.name)

a.dtype.name is  int32
b.dtype.name is  float64
c.dtype.name is  float64
d.dtype.name is  complex128

许多一元运算，例如计算数组中所有元素的总和，都是作为 ndarray 类的方法实现的。

In [34]:

a = np.random.random((2,3))
sum=a.sum()
min=a.min()
max=a.max()
print('a is:\n ',a,'\n')
print('sum(a.sum()) is:\n',sum,'\n')
print('min(a.min()) is:\n',min,'\n')
print('max(a.max()) is:\n',max,'\n')

a is:
  [[0.50477827 0.7117427  0.02840961]
 [0.96608057 0.02399329 0.67794353]] 

sum(a.sum()) is:
 2.9129479747742897 

min(a.min()) is:
 0.02399329487095969 

max(a.max()) is:
 0.9660805718319886

默认情况下，这些操作适用于数组，就好像它是数字列表一样，无论其形状如何。但是，通过指定 axis 参数，你可以沿着数组的指定轴应用操作：

In [37]:

a = np.arange(12).reshape(3,4)
sum=a.sum(axis=0)# sum of each column
min=a.min(axis=1) # min of each row
cumsum=a.cumsum(axis=1)# cumulative sum along each row
print('a is:\n ',a,'\n')
print('sum(a.sum()) is:\n',sum,'\n')
print('min(a.min()) is:\n',min,'\n')
print('cumsum(a.cumsum()) is:\n',cumsum,'\n')

a is:
  [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]] 

sum(a.sum()) is:
 [12 15 18 21] 

min(a.min()) is:
 [0 4 8] 

cumsum(a.cumsum()) is:
 [[ 0  1  3  6]
 [ 4  9 15 22]
 [ 8 17 27 38]]

通用函数¶

NumPy提供熟悉的数学函数，例如sin，cos和exp。在NumPy中，这些被称为“通用函数”（ufunc）。在NumPy中，这些函数在数组上以元素方式运行，产生一个数组作为输出。

In [40]:

a=np.arange(3)
b=np.exp(a)
c=np.sqrt(a)
print('a is:\n ',a,'\n')
print('b is:\n ',b,'\n')
print('c is:\n ',c,'\n')

a is:
  [0 1 2] 

b is:
  [1.         2.71828183 7.3890561 ] 

c is:
  [0.         1.         1.41421356]

另请参见： all, any, apply_along_axis, argmax, argmin, argsort, average, bincount, ceil, clip, conj, corrcoef, cov, cross, cumprod, cumsum, diff, dot, floor, inner, inv, lexsort, max, maximum, mean, median, min, minimum, nonzero, outer, prod, re, round, sort, std, sum, trace, transpose, var, vdot, vectorize, where

索引、切片和迭代¶

一维数组可以被索引，切片和迭代，就像列出和其他Python序列一样。

In [48]:

a = np.arange(10)**3
print('a is:\n ',a,'\n')
print('a[2] is:',a[2],'\n')
print('a[2:5] is:',a[2:5],'\n')
a[:6:2] = -1000# equivalent to a[0:6:2] = -1000; from start to position 6, exclusive, set every 2nd element to -1000
print('a is:\n ',a,'\n')
print('a[ : :-1] is:\n ',a[ : :-1],'\n')#倒排a
for i in a:
    print(i**(1/3.))#由于jupyter的原因，这里报错了

a is:
  [  0   1   8  27  64 125 216 343 512 729] 

a[2] is: 8 

a[2:5] is: [ 8 27 64] 

a is:
  [-1000     1 -1000    27 -1000   125   216   343   512   729] 

a[ : :-1] is:
  [  729   512   343   216   125 -1000    27 -1000     1 -1000] 

nan
1.0
nan
3.0
nan
5.0
5.999999999999999
6.999999999999999
7.999999999999999
8.999999999999998

d:\testproj\test_python\zhyblog\venv\lib\site-packages\ipykernel_launcher.py:9: RuntimeWarning: invalid value encountered in power
  if __name__ == '__main__':

多维（Multidimensional）数组每个轴可以有一个索引。这些索在元组中以逗号分隔给出：

In [54]:

def f(x,y):
    return 10*x+y
a = np.fromfunction(f,(5,4),dtype=int)
print('a is:\n ',a,'\n')
print('a[2,3] is ',a[2,3])
print('a[0:5,1] is ',a[0:5,1])# each row in the second column of a
print('a[ : ,1] is ',a[ : ,1]) # equivalent to the previous example
print('a[1:3, : ] is \n',a[1:3, : ])# each column in the second and third row of b

a is:
  [[ 0  1  2  3]
 [10 11 12 13]
 [20 21 22 23]
 [30 31 32 33]
 [40 41 42 43]] 

a[2,3] is  23
a[0:5,1] is  [ 1 11 21 31 41]
a[ : ,1] is  [ 1 11 21 31 41]
a[1:3, : ] is 
 [[10 11 12 13]
 [20 21 22 23]]

当提供比轴数更少的索引时，缺失的索引被认为是一个完整切片 :

In [57]:

a[-1]

Out[57]:

array([40, 41, 42, 43])

b[i] 方括号中的表达式 i 被视为后面紧跟着 : 的多个实例，用于表示剩余轴。NumPy也允许你使用三个点写为 b[i,...]。
三个点（ ... ）表示产生完整索引元组所需的冒号。例如，如果 x 是rank为的5数组（即，它具有5个轴），则

x[1,2,...] 等于 x[1,2,:,:,:]
x[...,3] 等效于 x[:,:,:,:,3]
x[4,...,5,:] 等效于 x[4,:,:,5,:]

In [64]:

c = np.array( [[[  0,  1,  2],               # a 3D array (two stacked 2D arrays)
                 [ 10, 12, 13]],
                [[100,101,102],
                 [110,112,113]]])
print('c.shape is ',c.shape,'\n')
print('c[1,...] is :\n',c[1,...],'\n')                                   # same as c[1,:,:] or c[1]
print('c[...,2] is :\n',c[...,2],'\n')                                   # same as c[:,:,2]

c.shape is  (2, 2, 3) 

c[1,...] is :
 [[100 101 102]
 [110 112 113]] 

c[...,2] is :
 [[  2  13]
 [102 113]]

迭代（Iterating）多维数组是相对于第一个轴完成的：

In [68]:

for row in c:
    print(row,'\n')

[[ 0  1  2]
 [10 12 13]] 

[[100 101 102]
 [110 112 113]]

但是，如果想要对数组中的每个元素执行操作，可以使用 flat 属性，该属性是数组中所有元素的迭代器：

In [69]:

for element in c.flat:
    print(element)

芒果python

numpy快速入门（一）