RUL-终篇-预测剩余生命周期

这是RUL系列的终篇,在这一节中,将预测测试集合的RUL值。

In [1]:
from rul_code import *
score = model.predict(X_test)
score[0:10]
Using TensorFlow backend.
<Figure size 1800x1600 with 20 Axes>
<Figure size 1600x1200 with 2 Axes>
<Figure size 1000x2000 with 9 Axes>
<Figure size 800x800 with 1 Axes>
<Figure size 800x800 with 1 Axes>
<Figure size 800x800 with 2 Axes>
<Figure size 800x800 with 2 Axes>
D:\TestProj\test_python\zhyblog36\jupyter_data\c2_ml_series\RUL\rul_code.py:158: UserWarning: The `nb_epoch` argument in `fit` has been renamed `epochs`.
  model.fit(X_train, Y_train, nb_epoch=20)
Epoch 1/20
20631/20631 [==============================] - 1s 56us/step - loss: 0.0493
Epoch 2/20
20631/20631 [==============================] - 1s 50us/step - loss: 0.0091
Epoch 3/20
20631/20631 [==============================] - 1s 57us/step - loss: 0.0083
Epoch 4/20
20631/20631 [==============================] - 1s 49us/step - loss: 0.0081
Epoch 5/20
20631/20631 [==============================] - 1s 50us/step - loss: 0.0080
Epoch 6/20
20631/20631 [==============================] - 1s 46us/step - loss: 0.0078
Epoch 7/20
20631/20631 [==============================] - 1s 47us/step - loss: 0.0077
Epoch 8/20
20631/20631 [==============================] - 1s 52us/step - loss: 0.0075
Epoch 9/20
20631/20631 [==============================] - 1s 49us/step - loss: 0.0070
Epoch 10/20
20631/20631 [==============================] - 1s 49us/step - loss: 0.0064
Epoch 11/20
20631/20631 [==============================] - 1s 51us/step - loss: 0.0053
Epoch 12/20
20631/20631 [==============================] - 1s 51us/step - loss: 0.0048
Epoch 13/20
20631/20631 [==============================] - 1s 50us/step - loss: 0.0047
Epoch 14/20
20631/20631 [==============================] - 1s 54us/step - loss: 0.0047
Epoch 15/20
20631/20631 [==============================] - 1s 51us/step - loss: 0.0047
Epoch 16/20
20631/20631 [==============================] - 1s 50us/step - loss: 0.0047
Epoch 17/20
20631/20631 [==============================] - ETA: 0s - loss: 0.004 - 1s 54us/step - loss: 0.0047
Epoch 18/20
20631/20631 [==============================] - 1s 50us/step - loss: 0.0046
Epoch 19/20
20631/20631 [==============================] - 1s 51us/step - loss: 0.0047
Epoch 20/20
20631/20631 [==============================] - 1s 56us/step - loss: 0.0046
Out[1]:
array([[0.98152685],
       [0.99618113],
       [1.0054567 ],
       [0.96753395],
       [0.98038983],
       [0.95989645],
       [0.95633876],
       [0.9598727 ],
       [0.99037075],
       [0.94166225]], dtype=float32)

可见所有的值基本上都在0-1这个区间内,但也有个别例外。

In [2]:
print(score.min(), score.max())
0.00066646934 1.0319034

我们要将这些数值转换为可以表明设备剩余生命周期RUL的数值。首先,我们将构造一列,用来记录测试集中传感器的最大生命周期。

In [3]:
test = pd.merge(test, test.groupby('unit', as_index=False)['cycles'].max(), how='left', on='unit')
test.rename(columns={"cycles_x": "cycles", "cycles_y": "maxcycles"}, inplace=True)
test['score'] = score
test.head()
Out[3]:
unit cycles op_setting1 op_setting2 s2 s3 s4 s6 s7 s8 ... s11 s12 s13 s14 s15 s17 s20 s21 maxcycles score
0 1 1 0.0023 0.0003 643.02 1585.29 1398.21 21.61 553.90 2388.04 ... 47.20 521.72 2388.03 8125.55 8.4052 392 38.86 23.3735 31 0.981527
1 1 2 -0.0027 -0.0003 641.71 1588.45 1395.42 21.61 554.85 2388.01 ... 47.50 522.16 2388.06 8139.62 8.3803 393 39.02 23.3916 31 0.996181
2 1 3 0.0003 0.0001 642.46 1586.94 1401.34 21.61 554.11 2388.05 ... 47.50 521.97 2388.03 8130.10 8.4441 393 39.08 23.4166 31 1.005457
3 1 4 0.0042 0.0000 642.44 1584.12 1406.42 21.61 554.07 2388.03 ... 47.28 521.38 2388.05 8132.90 8.3917 391 39.00 23.3737 31 0.967534
4 1 5 0.0014 0.0000 642.51 1587.19 1401.92 21.61 554.16 2388.01 ... 47.31 522.15 2388.03 8129.54 8.4031 390 38.99 23.4130 31 0.980390

5 rows × 21 columns

需要注意的是测试集中只包含未筛选过的数据,也就是原始数据,但是我们在建模和预测时使用的是清洗过的数据;另外一个注意点是,为了预测RUL,我们需要预测测试集中所有的传感器的生命周期总数。使用如下公式:$$max(predictedcycles_i)=\frac{cycles_i}{(1−predictedfTTF_i)}$$

In [5]:
def totcycles(data):
    return(data['cycles'] / (1-data['score']))
    
test['maxpredcycles'] = totcycles(test)

用预测的最大生命周期数减去测试集中的生命周期数据,即得到剩余生命周期数。$$RUL_i=max(predictedcycles_i)−max(cycles)$$

In [6]:
def RULfunction(data):
    return(data['maxpredcycles'] - data['maxcycles'])

test['RUL'] = RULfunction(test)
test['RUL'].head()
Out[6]:
0     23.132624
1    492.715187
2   -580.784244
3     92.205611
4    223.969788
Name: RUL, dtype: float64

接下来就是基于预测的最大生命周期数预测每一个传感器的剩余生命周期。

In [9]:
t = test.columns == 'RUL'
ind = [i for i, x in enumerate(t) if x]

predictedRUL = []

for i in range(test.unit.min(), test.unit.max()+1):
    npredictedRUL=test[test.unit==i].iloc[test[test.unit==i].cycles.max()-1,ind]
    predictedRUL.append(npredictedRUL)
    
predictedRUL[0:10]
Out[9]:
[RUL    192.795839
 Name: 30, dtype: float64, RUL    168.883901
 Name: 79, dtype: float64, RUL    58.486248
 Name: 205, dtype: float64, RUL    80.724052
 Name: 311, dtype: float64, RUL    100.456993
 Name: 409, dtype: float64, RUL    112.270979
 Name: 514, dtype: float64, RUL    99.430293
 Name: 674, dtype: float64, RUL    65.31831
 Name: 840, dtype: float64, RUL    140.288526
 Name: 895, dtype: float64, RUL    89.190025
 Name: 1087, dtype: float64]
In [8]:
len(predictedRUL)
Out[8]:
100

接下来我们将预测值和真实值做一个图形化的比较:

In [28]:
xtrueRUL = list(RUL.loc[:,0])
otrueRUL = []

for i in range(0,len(xtrueRUL)):
    otrueRUL = np.concatenate((otrueRUL, list(reversed(np.arange(xtrueRUL[i])))))

otrueRUL
Out[28]:
array([111., 110., 109., ...,   2.,   1.,   0.])
In [39]:
xpredictedRUL = list(round(x) for x in predictedRUL)
opredictedRUL = []

for i in range(0,len(xpredictedRUL)):
    opredictedRUL = np.concatenate((opredictedRUL, list(reversed(np.arange(xpredictedRUL[i]['RUL'])))))
opredictedRUL
Out[39]:
array([192., 191., 190., ...,   2.,   1.,   0.])
In [40]:
mx = 1000

fig = plt.figure(figsize = (12, 8))
fig.add_subplot(1,2,1)
plt.plot(opredictedRUL[0:mx], color='blue')
plt.legend(['Predicted RUL'], bbox_to_anchor=(0., 1.02, 1., .102), loc=3, mode="expand", borderaxespad=0)
plt.ylim(0, opredictedRUL[0:mx].max()+10)
plt.ylabel('RUL (cycles)')

fig.add_subplot(1,2,2)
plt.plot(otrueRUL[0:mx], color='purple')
plt.legend(['True RUL'], bbox_to_anchor=(0., 1.02, 1., .102), loc=3, mode="expand", borderaxespad=0)
plt.ylabel('RUL (cycles)')
plt.ylim(0,otrueRUL[0:mx].max()+10)
plt.show()

如果把预测值和真实值放在同一个折线图里面,虽然这样不太合适,因为不含有时间维度,但是仍然可以直观地感受二者的差异:预测值通常比实际值要大。

In [42]:
plt.figure(figsize = (16, 8))
plt.plot(RUL)
plt.plot(predictedRUL)
plt.xlabel('# Unit', fontsize=16)
plt.xticks(fontsize=16)
plt.ylabel('RUL', fontsize=16)
plt.yticks(fontsize=16)
plt.legend(['True RUL','Predicted RUL'], bbox_to_anchor=(0., 1.02, 1., .102), loc=3, mode="expand", borderaxespad=0)
plt.show()

在大多数数学科学领域内,适当的高估是可以接受的。