这是RUL系列的终篇，在这一节中，将预测测试集合的RUL值。

In [1]:

from rul_code import *
score = model.predict(X_test)
score[0:10]

Using TensorFlow backend.

<Figure size 1800x1600 with 20 Axes>

<Figure size 1600x1200 with 2 Axes>

<Figure size 1000x2000 with 9 Axes>

<Figure size 800x800 with 1 Axes>

<Figure size 800x800 with 1 Axes>

<Figure size 800x800 with 2 Axes>

<Figure size 800x800 with 2 Axes>

D:\TestProj\test_python\zhyblog36\jupyter_data\c2_ml_series\RUL\rul_code.py:158: UserWarning: The `nb_epoch` argument in `fit` has been renamed `epochs`.
  model.fit(X_train, Y_train, nb_epoch=20)

Epoch 1/20
20631/20631 [==============================] - 1s 56us/step - loss: 0.0493
Epoch 2/20
20631/20631 [==============================] - 1s 50us/step - loss: 0.0091
Epoch 3/20
20631/20631 [==============================] - 1s 57us/step - loss: 0.0083
Epoch 4/20
20631/20631 [==============================] - 1s 49us/step - loss: 0.0081
Epoch 5/20
20631/20631 [==============================] - 1s 50us/step - loss: 0.0080
Epoch 6/20
20631/20631 [==============================] - 1s 46us/step - loss: 0.0078
Epoch 7/20
20631/20631 [==============================] - 1s 47us/step - loss: 0.0077
Epoch 8/20
20631/20631 [==============================] - 1s 52us/step - loss: 0.0075
Epoch 9/20
20631/20631 [==============================] - 1s 49us/step - loss: 0.0070
Epoch 10/20
20631/20631 [==============================] - 1s 49us/step - loss: 0.0064
Epoch 11/20
20631/20631 [==============================] - 1s 51us/step - loss: 0.0053
Epoch 12/20
20631/20631 [==============================] - 1s 51us/step - loss: 0.0048
Epoch 13/20
20631/20631 [==============================] - 1s 50us/step - loss: 0.0047
Epoch 14/20
20631/20631 [==============================] - 1s 54us/step - loss: 0.0047
Epoch 15/20
20631/20631 [==============================] - 1s 51us/step - loss: 0.0047
Epoch 16/20
20631/20631 [==============================] - 1s 50us/step - loss: 0.0047
Epoch 17/20
20631/20631 [==============================] - ETA: 0s - loss: 0.004 - 1s 54us/step - loss: 0.0047
Epoch 18/20
20631/20631 [==============================] - 1s 50us/step - loss: 0.0046
Epoch 19/20
20631/20631 [==============================] - 1s 51us/step - loss: 0.0047
Epoch 20/20
20631/20631 [==============================] - 1s 56us/step - loss: 0.0046

Out[1]:

array([[0.98152685],
       [0.99618113],
       [1.0054567 ],
       [0.96753395],
       [0.98038983],
       [0.95989645],
       [0.95633876],
       [0.9598727 ],
       [0.99037075],
       [0.94166225]], dtype=float32)

可见所有的值基本上都在0-1这个区间内，但也有个别例外。

In [2]:

print(score.min(), score.max())

0.00066646934 1.0319034

我们要将这些数值转换为可以表明设备剩余生命周期RUL的数值。首先，我们将构造一列，用来记录测试集中传感器的最大生命周期。

In [3]:

test = pd.merge(test, test.groupby('unit', as_index=False)['cycles'].max(), how='left', on='unit')
test.rename(columns={"cycles_x": "cycles", "cycles_y": "maxcycles"}, inplace=True)
test['score'] = score
test.head()

Out[3]:

	unit	cycles	op_setting1	op_setting2	s2	s3	s4	s6	s7	s8	...	s11	s12	s13	s14	s15	s17	s20	s21	maxcycles	score
0	1	1	0.0023	0.0003	643.02	1585.29	1398.21	21.61	553.90	2388.04	...	47.20	521.72	2388.03	8125.55	8.4052	392	38.86	23.3735	31	0.981527
1	1	2	-0.0027	-0.0003	641.71	1588.45	1395.42	21.61	554.85	2388.01	...	47.50	522.16	2388.06	8139.62	8.3803	393	39.02	23.3916	31	0.996181
2	1	3	0.0003	0.0001	642.46	1586.94	1401.34	21.61	554.11	2388.05	...	47.50	521.97	2388.03	8130.10	8.4441	393	39.08	23.4166	31	1.005457
3	1	4	0.0042	0.0000	642.44	1584.12	1406.42	21.61	554.07	2388.03	...	47.28	521.38	2388.05	8132.90	8.3917	391	39.00	23.3737	31	0.967534
4	1	5	0.0014	0.0000	642.51	1587.19	1401.92	21.61	554.16	2388.01	...	47.31	522.15	2388.03	8129.54	8.4031	390	38.99	23.4130	31	0.980390

5 rows × 21 columns

需要注意的是测试集中只包含未筛选过的数据，也就是原始数据，但是我们在建模和预测时使用的是清洗过的数据；另外一个注意点是，为了预测RUL，我们需要预测测试集中所有的传感器的生命周期总数。使用如下公式：$$max(predictedcycles_i)=\frac{cycles_i}{(1−predictedfTTF_i)}$$

In [5]:

def totcycles(data):
    return(data['cycles'] / (1-data['score']))
    
test['maxpredcycles'] = totcycles(test)

用预测的最大生命周期数减去测试集中的生命周期数据，即得到剩余生命周期数。$$RUL_i=max(predictedcycles_i)−max(cycles)$$

In [6]:

def RULfunction(data):
    return(data['maxpredcycles'] - data['maxcycles'])

test['RUL'] = RULfunction(test)
test['RUL'].head()

Out[6]:

0     23.132624
1    492.715187
2   -580.784244
3     92.205611
4    223.969788
Name: RUL, dtype: float64

接下来就是基于预测的最大生命周期数预测每一个传感器的剩余生命周期。

In [9]:

t = test.columns == 'RUL'
ind = [i for i, x in enumerate(t) if x]

predictedRUL = []

for i in range(test.unit.min(), test.unit.max()+1):
    npredictedRUL=test[test.unit==i].iloc[test[test.unit==i].cycles.max()-1,ind]
    predictedRUL.append(npredictedRUL)
    
predictedRUL[0:10]

Out[9]:

[RUL    192.795839
 Name: 30, dtype: float64, RUL    168.883901
 Name: 79, dtype: float64, RUL    58.486248
 Name: 205, dtype: float64, RUL    80.724052
 Name: 311, dtype: float64, RUL    100.456993
 Name: 409, dtype: float64, RUL    112.270979
 Name: 514, dtype: float64, RUL    99.430293
 Name: 674, dtype: float64, RUL    65.31831
 Name: 840, dtype: float64, RUL    140.288526
 Name: 895, dtype: float64, RUL    89.190025
 Name: 1087, dtype: float64]

In [8]:

len(predictedRUL)

Out[8]:

接下来我们将预测值和真实值做一个图形化的比较：

In [28]:

xtrueRUL = list(RUL.loc[:,0])
otrueRUL = []

for i in range(0,len(xtrueRUL)):
    otrueRUL = np.concatenate((otrueRUL, list(reversed(np.arange(xtrueRUL[i])))))

otrueRUL

Out[28]:

array([111., 110., 109., ...,   2.,   1.,   0.])

In [39]:

xpredictedRUL = list(round(x) for x in predictedRUL)
opredictedRUL = []

for i in range(0,len(xpredictedRUL)):
    opredictedRUL = np.concatenate((opredictedRUL, list(reversed(np.arange(xpredictedRUL[i]['RUL'])))))
opredictedRUL

Out[39]:

array([192., 191., 190., ...,   2.,   1.,   0.])

In [40]:

mx = 1000

fig = plt.figure(figsize = (12, 8))
fig.add_subplot(1,2,1)
plt.plot(opredictedRUL[0:mx], color='blue')
plt.legend(['Predicted RUL'], bbox_to_anchor=(0., 1.02, 1., .102), loc=3, mode="expand", borderaxespad=0)
plt.ylim(0, opredictedRUL[0:mx].max()+10)
plt.ylabel('RUL (cycles)')

fig.add_subplot(1,2,2)
plt.plot(otrueRUL[0:mx], color='purple')
plt.legend(['True RUL'], bbox_to_anchor=(0., 1.02, 1., .102), loc=3, mode="expand", borderaxespad=0)
plt.ylabel('RUL (cycles)')
plt.ylim(0,otrueRUL[0:mx].max()+10)
plt.show()

如果把预测值和真实值放在同一个折线图里面，虽然这样不太合适，因为不含有时间维度，但是仍然可以直观地感受二者的差异：预测值通常比实际值要大。

In [42]:

plt.figure(figsize = (16, 8))
plt.plot(RUL)
plt.plot(predictedRUL)
plt.xlabel('# Unit', fontsize=16)
plt.xticks(fontsize=16)
plt.ylabel('RUL', fontsize=16)
plt.yticks(fontsize=16)
plt.legend(['True RUL','Predicted RUL'], bbox_to_anchor=(0., 1.02, 1., .102), loc=3, mode="expand", borderaxespad=0)
plt.show()

在大多数数学科学领域内，适当的高估是可以接受的。

芒果python

RUL-终篇-预测剩余生命周期

分类