RUL-数据的平滑处理

在上节,我们分析了单个传感器的趋势,也组合了多个传感器观察了整合的趋势。接下来,我们就要开始动真格的了。
首先,我们得找到训练集中每种传感器的最大的生命周期,为此我们将使用group by命令,然后我们将使用merge命令将最大值合并到原始的训练集。

In [13]:
from rul_code import *
train = pd.merge(train, train.groupby('unit', as_index=False)['cycles'].max(), how='left', on='unit')
train.rename(columns={"cycles_x": "cycles", "cycles_y": "maxcycles"}, inplace=True)

接下来,要得出剩余多少个周期。这个周期的计算公式是用最大生命周期减去周期数。$$ TTF_i = max(cycles)-cycles_i $$

In [14]:
train['TTF'] = train['maxcycles'] - train['cycles']
train['TTF']
Out[14]:
0        191
1        190
2        189
3        188
4        187
        ... 
20626      4
20627      3
20628      2
20629      1
20630      0
Name: TTF, Length: 20631, dtype: int64

另一个准备步骤是缩放,将使用python的MinMaxScaler。在进行缩放之前,我们观察一下原始数据,不同的传感器的数值存在着很大的差异。

In [15]:
scaler = MinMaxScaler()
train.describe().transpose()
Out[15]:
count mean std min 25% 50% 75% max
unit 20631.0 51.506568 29.227633 1.0000 26.0000 52.0000 77.0000 100.0000
cycles 20631.0 108.807862 68.880990 1.0000 52.0000 104.0000 156.0000 362.0000
op_setting1 20631.0 -0.000009 0.002187 -0.0087 -0.0015 0.0000 0.0015 0.0087
op_setting2 20631.0 0.000002 0.000293 -0.0006 -0.0002 0.0000 0.0003 0.0006
s2 20631.0 642.680934 0.500053 641.2100 642.3250 642.6400 643.0000 644.5300
s3 20631.0 1590.523119 6.131150 1571.0400 1586.2600 1590.1000 1594.3800 1616.9100
s4 20631.0 1408.933782 9.000605 1382.2500 1402.3600 1408.0400 1414.5550 1441.4900
s6 20631.0 21.609803 0.001389 21.6000 21.6100 21.6100 21.6100 21.6100
s7 20631.0 553.367711 0.885092 549.8500 552.8100 553.4400 554.0100 556.0600
s8 20631.0 2388.096652 0.070985 2387.9000 2388.0500 2388.0900 2388.1400 2388.5600
s9 20631.0 9065.242941 22.082880 9021.7300 9053.1000 9060.6600 9069.4200 9244.5900
s11 20631.0 47.541168 0.267087 46.8500 47.3500 47.5100 47.7000 48.5300
s12 20631.0 521.413470 0.737553 518.6900 520.9600 521.4800 521.9500 523.3800
s13 20631.0 2388.096152 0.071919 2387.8800 2388.0400 2388.0900 2388.1400 2388.5600
s14 20631.0 8143.752722 19.076176 8099.9400 8133.2450 8140.5400 8148.3100 8293.7200
s15 20631.0 8.442146 0.037505 8.3249 8.4149 8.4389 8.4656 8.5848
s17 20631.0 393.210654 1.548763 388.0000 392.0000 393.0000 394.0000 400.0000
s20 20631.0 38.816271 0.180746 38.1400 38.7000 38.8300 38.9500 39.4300
s21 20631.0 23.289705 0.108251 22.8942 23.2218 23.2979 23.3668 23.6184
maxcycles 20631.0 216.615724 50.028600 128.0000 185.0000 207.0000 240.0000 362.0000
TTF 20631.0 107.807862 68.880990 0.0000 51.0000 103.0000 155.0000 361.0000
In [16]:
train.head()
Out[16]:
unit cycles op_setting1 op_setting2 s2 s3 s4 s6 s7 s8 ... s11 s12 s13 s14 s15 s17 s20 s21 maxcycles TTF
0 1 1 -0.0007 -0.0004 641.82 1589.70 1400.60 21.61 554.36 2388.06 ... 47.47 521.66 2388.02 8138.62 8.4195 392 39.06 23.4190 192 191
1 1 2 0.0019 -0.0003 642.15 1591.82 1403.14 21.61 553.75 2388.04 ... 47.49 522.28 2388.07 8131.49 8.4318 392 39.00 23.4236 192 190
2 1 3 -0.0043 0.0003 642.35 1587.99 1404.20 21.61 554.26 2388.08 ... 47.27 522.42 2388.03 8133.23 8.4178 390 38.95 23.3442 192 189
3 1 4 0.0007 0.0000 642.35 1582.79 1401.87 21.61 554.45 2388.11 ... 47.13 522.86 2388.08 8133.83 8.3682 392 38.88 23.3739 192 188
4 1 5 -0.0019 -0.0002 642.37 1582.85 1406.22 21.61 554.00 2388.06 ... 47.28 522.19 2388.04 8133.80 8.4294 393 38.90 23.4044 192 187

5 rows × 21 columns

接下来,我们将数据复制一份并命名为ntrain,于是就有了两份数据,一份未缩放一份缩放的。并且需要选择哪些数据是需要进行平滑处理的。处理完后,再观察一下这些数据,特别是最大值和最小值。

In [17]:
ntrain = train.copy()
ntrain.iloc[:,2:19] = scaler.fit_transform(ntrain.iloc[:,2:19])
ntrain.describe().transpose()
Out[17]:
count mean std min 25% 50% 75% max
unit 20631.0 51.506568 29.227633 1.0 26.000000 52.000000 77.000000 100.0
cycles 20631.0 108.807862 68.880990 1.0 52.000000 104.000000 156.000000 362.0
op_setting1 20631.0 0.499490 0.125708 0.0 0.413793 0.500000 0.586207 1.0
op_setting2 20631.0 0.501959 0.244218 0.0 0.333333 0.500000 0.750000 1.0
s2 20631.0 0.443052 0.150618 0.0 0.335843 0.430723 0.539157 1.0
s3 20631.0 0.424746 0.133664 0.0 0.331807 0.415522 0.508829 1.0
s4 20631.0 0.450435 0.151935 0.0 0.339467 0.435348 0.545324 1.0
s6 20631.0 0.980321 0.138898 0.0 1.000000 1.000000 1.000000 1.0
s7 20631.0 0.566459 0.142527 0.0 0.476651 0.578100 0.669887 1.0
s8 20631.0 0.297957 0.107554 0.0 0.227273 0.287879 0.363636 1.0
s9 20631.0 0.195248 0.099089 0.0 0.140761 0.174684 0.213991 1.0
s11 20631.0 0.411410 0.158981 0.0 0.297619 0.392857 0.505952 1.0
s12 20631.0 0.580697 0.157261 0.0 0.484009 0.594883 0.695096 1.0
s13 20631.0 0.317871 0.105763 0.0 0.235294 0.308824 0.382353 1.0
s14 20631.0 0.226095 0.098442 0.0 0.171870 0.209516 0.249613 1.0
s15 20631.0 0.451118 0.144306 0.0 0.346287 0.438630 0.541362 1.0
s17 20631.0 0.434221 0.129064 0.0 0.333333 0.416667 0.500000 1.0
s20 20631.0 0.524241 0.140114 0.0 0.434109 0.534884 0.627907 1.0
s21 20631.0 0.546127 0.149476 0.0 0.452361 0.557443 0.652582 1.0
maxcycles 20631.0 216.615724 50.028600 128.0 185.000000 207.000000 240.000000 362.0
TTF 20631.0 107.807862 68.880990 0.0 51.000000 103.000000 155.000000 361.0

接下来就利用训练集的数据的设置来处理测试集的数据。

In [18]:
ntest = test.copy()
pd.DataFrame(ntest.columns).transpose()
Out[18]:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
0 unit cycles op_setting1 op_setting2 s2 s3 s4 s6 s7 s8 s9 s11 s12 s13 s14 s15 s17 s20 s21
In [19]:
ntest.iloc[:,2:19] = scaler.transform(ntest.iloc[:,2:19])
ntest.describe().transpose()
Out[19]:
count mean std min 25% 50% 75% max
unit 13096.0 51.543907 28.289423 1.000000 28.000000 52.000000 76.000000 100.000000
cycles 13096.0 76.836515 53.057749 1.000000 33.000000 69.000000 113.000000 303.000000
op_setting1 13096.0 0.499358 0.126591 0.028736 0.413793 0.500000 0.586207 0.948276
op_setting2 13096.0 0.503532 0.245025 0.000000 0.333333 0.500000 0.750000 1.083333
s2 13096.0 0.381051 0.120753 -0.024096 0.297440 0.376506 0.460843 0.930723
s3 13096.0 0.371903 0.109075 -0.043601 0.295618 0.369523 0.443046 0.795945
s4 13096.0 0.379564 0.112902 0.036124 0.298785 0.374578 0.452397 0.862762
s6 13096.0 0.970067 0.170408 0.000000 1.000000 1.000000 1.000000 1.000000
s7 13096.0 0.629231 0.109708 0.165862 0.557166 0.636071 0.706924 0.964573
s8 13096.0 0.259037 0.087033 -0.015152 0.196970 0.257576 0.318182 0.606061
s9 13096.0 0.164576 0.051316 0.012564 0.131428 0.159697 0.190164 0.598133
s11 13096.0 0.337026 0.116617 -0.029762 0.250000 0.333333 0.410714 0.839286
s12 13096.0 0.651967 0.119323 0.147122 0.573561 0.658849 0.737740 1.081023
s13 13096.0 0.280919 0.083727 0.014706 0.220588 0.279412 0.338235 0.647059
s14 13096.0 0.201299 0.052578 0.044174 0.167045 0.198421 0.229229 0.622046
s15 13096.0 0.388395 0.111617 0.030396 0.310504 0.384763 0.459407 0.833013
s17 13096.0 0.380969 0.102798 0.083333 0.333333 0.416667 0.416667 0.750000
s20 13096.0 0.583335 0.109830 0.131783 0.511628 0.589147 0.658915 0.984496
s21 13096.0 0.609697 0.116156 0.056890 0.534935 0.614471 0.689589 1.032450

最后,将这些数据以图形的方式呈现出来,有一个直观的感受

In [20]:
fig = plt.figure(figsize = (8, 8))
fig.add_subplot(1,2,1)
plt.plot(train[train.unit==1].s2)
plt.plot(test[test.unit==1].s2)
plt.legend(['Train','Test'], bbox_to_anchor=(0., 1.02, 1., .102), loc=3, mode="expand", borderaxespad=0)
plt.ylabel('Original unit')
fig.add_subplot(1,2,2)
plt.plot(ntrain[ntrain.unit==1].s2)
plt.plot(ntest[ntest.unit==1].s2)
plt.legend(['Scaled Train','Scaled Test'], bbox_to_anchor=(0., 1.02, 1., .102), loc=3, mode="expand", borderaxespad=0)
plt.ylabel('Scaled unit')
plt.show()