RUL-数据准备-第一次清洗-去除空值

RUL-数据准备-第一次清洗-去除空值

接上节,Turbofan Engine Degradation Simulation Data Set可以从NASA官网下载,这里会用到三个数据集,分别是train_FD001、test_FD001、RUL_FD001。RUL_FD001包含真实的剩余寿命数据,而test_FD001表示我们预测的剩余寿命数据,这两个数据集合将用于对比,以确定预测的准确度。
我们先看一看这三个数据集的数据长什么样子:

In [82]:
import pandas as pd
train = pd.read_csv('D:/RUL/CMAPSSData/train_FD001.txt', parse_dates=False, delimiter=" ", decimal=".", header=None)
train
Out[82]:
0 1 2 3 4 5 6 7 8 9 ... 18 19 20 21 22 23 24 25 26 27
0 1 1 -0.0007 -0.0004 100.0 518.67 641.82 1589.70 1400.60 14.62 ... 8138.62 8.4195 0.03 392 2388 100.0 39.06 23.4190 NaN NaN
1 1 2 0.0019 -0.0003 100.0 518.67 642.15 1591.82 1403.14 14.62 ... 8131.49 8.4318 0.03 392 2388 100.0 39.00 23.4236 NaN NaN
2 1 3 -0.0043 0.0003 100.0 518.67 642.35 1587.99 1404.20 14.62 ... 8133.23 8.4178 0.03 390 2388 100.0 38.95 23.3442 NaN NaN
3 1 4 0.0007 0.0000 100.0 518.67 642.35 1582.79 1401.87 14.62 ... 8133.83 8.3682 0.03 392 2388 100.0 38.88 23.3739 NaN NaN
4 1 5 -0.0019 -0.0002 100.0 518.67 642.37 1582.85 1406.22 14.62 ... 8133.80 8.4294 0.03 393 2388 100.0 38.90 23.4044 NaN NaN
5 1 6 -0.0043 -0.0001 100.0 518.67 642.10 1584.47 1398.37 14.62 ... 8132.85 8.4108 0.03 391 2388 100.0 38.98 23.3669 NaN NaN
6 1 7 0.0010 0.0001 100.0 518.67 642.48 1592.32 1397.77 14.62 ... 8132.32 8.3974 0.03 392 2388 100.0 39.10 23.3774 NaN NaN
7 1 8 -0.0034 0.0003 100.0 518.67 642.56 1582.96 1400.97 14.62 ... 8131.07 8.4076 0.03 391 2388 100.0 38.97 23.3106 NaN NaN
8 1 9 0.0008 0.0001 100.0 518.67 642.12 1590.98 1394.80 14.62 ... 8125.69 8.3728 0.03 392 2388 100.0 39.05 23.4066 NaN NaN
9 1 10 -0.0033 0.0001 100.0 518.67 641.71 1591.24 1400.46 14.62 ... 8129.38 8.4286 0.03 393 2388 100.0 38.95 23.4694 NaN NaN
10 1 11 0.0018 -0.0003 100.0 518.67 642.28 1581.75 1400.64 14.62 ... 8140.58 8.4340 0.03 392 2388 100.0 38.94 23.4787 NaN NaN
11 1 12 0.0016 0.0002 100.0 518.67 642.06 1583.41 1400.15 14.62 ... 8134.25 8.3938 0.03 391 2388 100.0 39.06 23.3660 NaN NaN
12 1 13 -0.0019 0.0004 100.0 518.67 643.07 1582.19 1400.83 14.62 ... 8128.10 8.4152 0.03 393 2388 100.0 38.93 23.2757 NaN NaN
13 1 14 0.0009 -0.0000 100.0 518.67 642.35 1592.95 1399.16 14.62 ... 8134.43 8.3964 0.03 393 2388 100.0 39.18 23.3826 NaN NaN
14 1 15 -0.0018 -0.0003 100.0 518.67 642.43 1583.82 1402.13 14.62 ... 8127.56 8.4199 0.03 391 2388 100.0 38.99 23.3500 NaN NaN
15 1 16 0.0006 0.0005 100.0 518.67 642.13 1587.98 1404.50 14.62 ... 8136.11 8.3936 0.03 392 2388 100.0 38.97 23.4550 NaN NaN
16 1 17 0.0002 0.0002 100.0 518.67 642.58 1584.96 1399.95 14.62 ... 8137.27 8.4542 0.03 392 2388 100.0 38.81 23.3319 NaN NaN
17 1 18 -0.0031 -0.0001 100.0 518.67 642.62 1591.04 1396.12 14.62 ... 8132.73 8.4028 0.03 392 2388 100.0 38.89 23.3987 NaN NaN
18 1 19 0.0032 -0.0003 100.0 518.67 641.79 1587.56 1400.35 14.62 ... 8129.13 8.4321 0.03 391 2388 100.0 38.80 23.3464 NaN NaN
19 1 20 -0.0037 0.0001 100.0 518.67 643.04 1581.11 1405.23 14.62 ... 8129.71 8.4210 0.03 392 2388 100.0 39.03 23.4220 NaN NaN
20 1 21 -0.0012 0.0001 100.0 518.67 642.37 1586.07 1398.13 14.62 ... 8134.02 8.4049 0.03 392 2388 100.0 39.09 23.3101 NaN NaN
21 1 22 0.0002 0.0000 100.0 518.67 642.77 1592.93 1400.57 14.62 ... 8130.41 8.4034 0.03 392 2388 100.0 38.92 23.3792 NaN NaN
22 1 23 0.0034 -0.0003 100.0 518.67 642.14 1588.19 1394.75 14.62 ... 8127.90 8.4240 0.03 392 2388 100.0 38.94 23.4562 NaN NaN
23 1 24 -0.0010 0.0003 100.0 518.67 642.38 1590.83 1398.81 14.62 ... 8133.88 8.3891 0.03 392 2388 100.0 39.00 23.3696 NaN NaN
24 1 25 0.0023 -0.0004 100.0 518.67 642.77 1594.10 1399.39 14.62 ... 8136.61 8.3917 0.03 393 2388 100.0 38.95 23.4288 NaN NaN
25 1 26 0.0000 0.0002 100.0 518.67 642.16 1589.08 1396.07 14.62 ... 8131.15 8.4260 0.03 394 2388 100.0 38.86 23.4149 NaN NaN
26 1 27 -0.0012 -0.0004 100.0 518.67 642.44 1590.47 1401.84 14.62 ... 8134.60 8.4046 0.03 393 2388 100.0 38.99 23.4472 NaN NaN
27 1 28 -0.0024 0.0005 100.0 518.67 642.35 1582.84 1399.13 14.62 ... 8127.30 8.4323 0.03 390 2388 100.0 39.01 23.2841 NaN NaN
28 1 29 0.0012 -0.0001 100.0 518.67 641.91 1584.83 1400.99 14.62 ... 8133.06 8.4189 0.03 393 2388 100.0 38.93 23.3597 NaN NaN
29 1 30 -0.0022 0.0000 100.0 518.67 642.20 1593.52 1396.08 14.62 ... 8137.86 8.4065 0.03 390 2388 100.0 39.05 23.4110 NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
20601 100 171 -0.0005 -0.0004 100.0 518.67 643.05 1593.56 1420.48 14.62 ... 8144.03 8.5085 0.03 396 2388 100.0 38.81 23.1513 NaN NaN
20602 100 172 -0.0037 0.0001 100.0 518.67 642.97 1602.35 1424.93 14.62 ... 8142.08 8.4970 0.03 394 2388 100.0 38.49 23.1922 NaN NaN
20603 100 173 0.0006 0.0004 100.0 518.67 643.40 1595.53 1418.63 14.62 ... 8143.13 8.4856 0.03 393 2388 100.0 38.42 23.1793 NaN NaN
20604 100 174 0.0011 0.0002 100.0 518.67 642.91 1602.24 1425.52 14.62 ... 8143.61 8.4990 0.03 395 2388 100.0 38.80 23.1784 NaN NaN
20605 100 175 -0.0013 -0.0005 100.0 518.67 642.85 1602.03 1416.24 14.62 ... 8140.88 8.4851 0.03 394 2388 100.0 38.54 23.0713 NaN NaN
20606 100 176 -0.0017 -0.0003 100.0 518.67 643.33 1601.44 1421.40 14.62 ... 8139.27 8.4405 0.03 396 2388 100.0 38.46 23.1020 NaN NaN
20607 100 177 -0.0011 -0.0005 100.0 518.67 643.34 1593.22 1418.91 14.62 ... 8142.95 8.5133 0.03 395 2388 100.0 38.60 23.0352 NaN NaN
20608 100 178 0.0005 -0.0003 100.0 518.67 642.98 1594.80 1422.69 14.62 ... 8141.85 8.4876 0.03 395 2388 100.0 38.55 23.2252 NaN NaN
20609 100 179 0.0020 0.0004 100.0 518.67 643.22 1599.36 1423.94 14.62 ... 8138.58 8.5218 0.03 396 2388 100.0 38.62 23.1685 NaN NaN
20610 100 180 -0.0010 0.0001 100.0 518.67 643.64 1595.98 1416.45 14.62 ... 8138.98 8.5150 0.03 395 2388 100.0 38.54 23.2345 NaN NaN
20611 100 181 0.0024 -0.0005 100.0 518.67 643.25 1597.83 1414.63 14.62 ... 8139.30 8.5518 0.03 396 2388 100.0 38.52 23.1774 NaN NaN
20612 100 182 0.0007 -0.0001 100.0 518.67 643.52 1604.31 1417.73 14.62 ... 8140.87 8.4855 0.03 396 2388 100.0 38.41 23.1289 NaN NaN
20613 100 183 -0.0011 -0.0002 100.0 518.67 643.34 1594.60 1427.27 14.62 ... 8144.21 8.5006 0.03 395 2388 100.0 38.49 23.0709 NaN NaN
20614 100 184 0.0027 -0.0004 100.0 518.67 642.91 1598.88 1420.89 14.62 ... 8142.28 8.4989 0.03 396 2388 100.0 38.44 23.1229 NaN NaN
20615 100 185 -0.0014 0.0004 100.0 518.67 643.95 1600.81 1420.34 14.62 ... 8142.32 8.4804 0.03 395 2388 100.0 38.60 23.2127 NaN NaN
20616 100 186 0.0026 0.0004 100.0 518.67 643.61 1593.55 1425.32 14.62 ... 8138.08 8.4735 0.03 394 2388 100.0 38.51 23.1173 NaN NaN
20617 100 187 0.0015 0.0002 100.0 518.67 643.63 1596.96 1421.49 14.62 ... 8140.49 8.5087 0.03 396 2388 100.0 38.67 23.2308 NaN NaN
20618 100 188 -0.0008 -0.0002 100.0 518.67 643.19 1597.77 1426.57 14.62 ... 8139.94 8.4814 0.03 395 2388 100.0 38.36 23.0552 NaN NaN
20619 100 189 0.0015 0.0001 100.0 518.67 643.69 1599.85 1423.15 14.62 ... 8139.78 8.4870 0.03 397 2388 100.0 38.65 23.0591 NaN NaN
20620 100 190 -0.0001 0.0002 100.0 518.67 643.12 1594.45 1426.04 14.62 ... 8142.28 8.5162 0.03 395 2388 100.0 38.42 23.0603 NaN NaN
20621 100 191 -0.0005 -0.0000 100.0 518.67 643.69 1610.87 1427.19 14.62 ... 8143.56 8.5092 0.03 398 2388 100.0 38.39 23.1218 NaN NaN
20622 100 192 -0.0009 0.0001 100.0 518.67 643.53 1601.23 1419.48 14.62 ... 8143.46 8.4892 0.03 397 2388 100.0 38.56 23.0770 NaN NaN
20623 100 193 -0.0001 0.0002 100.0 518.67 643.09 1599.81 1428.93 14.62 ... 8142.02 8.5424 0.03 397 2388 100.0 38.47 23.0230 NaN NaN
20624 100 194 -0.0011 0.0003 100.0 518.67 643.72 1597.29 1427.41 14.62 ... 8139.67 8.5215 0.03 394 2388 100.0 38.38 23.1324 NaN NaN
20625 100 195 -0.0002 -0.0001 100.0 518.67 643.41 1600.04 1431.90 14.62 ... 8142.90 8.5519 0.03 394 2388 100.0 38.14 23.1923 NaN NaN
20626 100 196 -0.0004 -0.0003 100.0 518.67 643.49 1597.98 1428.63 14.62 ... 8137.60 8.4956 0.03 397 2388 100.0 38.49 22.9735 NaN NaN
20627 100 197 -0.0016 -0.0005 100.0 518.67 643.54 1604.50 1433.58 14.62 ... 8136.50 8.5139 0.03 395 2388 100.0 38.30 23.1594 NaN NaN
20628 100 198 0.0004 0.0000 100.0 518.67 643.42 1602.46 1428.18 14.62 ... 8141.05 8.5646 0.03 398 2388 100.0 38.44 22.9333 NaN NaN
20629 100 199 -0.0011 0.0003 100.0 518.67 643.23 1605.26 1426.53 14.62 ... 8139.29 8.5389 0.03 395 2388 100.0 38.29 23.0640 NaN NaN
20630 100 200 -0.0032 -0.0005 100.0 518.67 643.85 1600.38 1432.14 14.62 ... 8137.33 8.5036 0.03 396 2388 100.0 38.37 23.0522 NaN NaN

20631 rows × 28 columns

In [83]:
test = pd.read_csv('D:/RUL/CMAPSSData/test_FD001.txt', parse_dates=False, delimiter=" ", decimal=".", header=None)
test
Out[83]:
0 1 2 3 4 5 6 7 8 9 ... 18 19 20 21 22 23 24 25 26 27
0 1 1 0.0023 0.0003 100.0 518.67 643.02 1585.29 1398.21 14.62 ... 8125.55 8.4052 0.03 392 2388 100.0 38.86 23.3735 NaN NaN
1 1 2 -0.0027 -0.0003 100.0 518.67 641.71 1588.45 1395.42 14.62 ... 8139.62 8.3803 0.03 393 2388 100.0 39.02 23.3916 NaN NaN
2 1 3 0.0003 0.0001 100.0 518.67 642.46 1586.94 1401.34 14.62 ... 8130.10 8.4441 0.03 393 2388 100.0 39.08 23.4166 NaN NaN
3 1 4 0.0042 0.0000 100.0 518.67 642.44 1584.12 1406.42 14.62 ... 8132.90 8.3917 0.03 391 2388 100.0 39.00 23.3737 NaN NaN
4 1 5 0.0014 0.0000 100.0 518.67 642.51 1587.19 1401.92 14.62 ... 8129.54 8.4031 0.03 390 2388 100.0 38.99 23.4130 NaN NaN
5 1 6 0.0012 0.0003 100.0 518.67 642.11 1579.12 1395.13 14.62 ... 8127.46 8.4238 0.03 392 2388 100.0 38.91 23.3467 NaN NaN
6 1 7 -0.0000 0.0002 100.0 518.67 642.11 1583.34 1404.84 14.62 ... 8134.97 8.3914 0.03 391 2388 100.0 38.85 23.3952 NaN NaN
7 1 8 0.0006 -0.0000 100.0 518.67 642.54 1580.89 1400.89 14.62 ... 8125.93 8.4213 0.03 393 2388 100.0 39.05 23.3224 NaN NaN
8 1 9 -0.0036 0.0000 100.0 518.67 641.88 1593.29 1412.28 14.62 ... 8134.15 8.4353 0.03 391 2388 100.0 39.10 23.4521 NaN NaN
9 1 10 -0.0025 -0.0001 100.0 518.67 642.07 1585.25 1398.64 14.62 ... 8134.08 8.4093 0.03 391 2388 100.0 38.87 23.3820 NaN NaN
10 1 11 0.0007 -0.0004 100.0 518.67 642.04 1581.03 1403.83 14.62 ... 8132.38 8.3919 0.03 391 2388 100.0 39.06 23.3609 NaN NaN
11 1 12 0.0026 0.0003 100.0 518.67 642.54 1587.43 1397.82 14.62 ... 8132.33 8.3984 0.03 391 2388 100.0 39.11 23.3845 NaN NaN
12 1 13 -0.0056 0.0003 100.0 518.67 641.94 1589.09 1403.94 14.62 ... 8131.12 8.4166 0.03 392 2388 100.0 39.08 23.3677 NaN NaN
13 1 14 0.0017 -0.0004 100.0 518.67 642.23 1583.16 1402.88 14.62 ... 8130.30 8.4293 0.03 392 2388 100.0 39.03 23.4572 NaN NaN
14 1 15 -0.0003 -0.0003 100.0 518.67 642.50 1584.81 1398.79 14.62 ... 8133.62 8.4163 0.03 392 2388 100.0 39.04 23.3672 NaN NaN
15 1 16 -0.0018 0.0003 100.0 518.67 642.32 1584.51 1407.76 14.62 ... 8133.83 8.4300 0.03 390 2388 100.0 38.87 23.3484 NaN NaN
16 1 17 0.0014 0.0002 100.0 518.67 642.19 1582.70 1404.12 14.62 ... 8126.78 8.4577 0.03 391 2388 100.0 39.09 23.3409 NaN NaN
17 1 18 0.0035 0.0001 100.0 518.67 642.59 1586.53 1403.69 14.62 ... 8133.22 8.4323 0.03 391 2388 100.0 38.96 23.4481 NaN NaN
18 1 19 0.0029 0.0001 100.0 518.67 642.43 1585.58 1402.30 14.62 ... 8129.31 8.3892 0.03 391 2388 100.0 39.06 23.3809 NaN NaN
19 1 20 0.0011 -0.0001 100.0 518.67 642.61 1587.78 1400.70 14.62 ... 8128.59 8.4099 0.03 392 2388 100.0 39.00 23.3325 NaN NaN
20 1 21 0.0038 -0.0002 100.0 518.67 642.70 1583.30 1399.20 14.62 ... 8126.86 8.4174 0.03 392 2388 100.0 38.96 23.4025 NaN NaN
21 1 22 0.0012 0.0001 100.0 518.67 642.45 1582.78 1404.06 14.62 ... 8128.89 8.4557 0.03 392 2388 100.0 38.94 23.3770 NaN NaN
22 1 23 0.0009 -0.0000 100.0 518.67 642.12 1587.51 1395.09 14.62 ... 8130.97 8.4116 0.03 393 2388 100.0 39.10 23.3186 NaN NaN
23 1 24 -0.0006 -0.0001 100.0 518.67 642.32 1594.29 1400.15 14.62 ... 8130.70 8.4074 0.03 393 2388 100.0 38.94 23.3971 NaN NaN
24 1 25 0.0028 -0.0003 100.0 518.67 642.25 1582.43 1400.23 14.62 ... 8128.65 8.4007 0.03 393 2388 100.0 38.96 23.3785 NaN NaN
25 1 26 0.0047 -0.0005 100.0 518.67 642.48 1583.28 1408.07 14.62 ... 8129.12 8.3949 0.03 391 2388 100.0 38.77 23.3557 NaN NaN
26 1 27 -0.0007 0.0001 100.0 518.67 642.08 1586.65 1400.31 14.62 ... 8127.24 8.4494 0.03 392 2388 100.0 38.87 23.3931 NaN NaN
27 1 28 0.0022 0.0005 100.0 518.67 641.93 1594.25 1401.29 14.62 ... 8134.89 8.4470 0.03 392 2388 100.0 38.83 23.3502 NaN NaN
28 1 29 0.0014 0.0001 100.0 518.67 641.95 1587.15 1398.11 14.62 ... 8133.13 8.4212 0.03 392 2388 100.0 39.02 23.3621 NaN NaN
29 1 30 -0.0025 0.0004 100.0 518.67 642.79 1585.72 1400.97 14.62 ... 8134.79 8.4110 0.03 391 2388 100.0 39.09 23.4069 NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
13066 100 169 0.0026 -0.0000 100.0 518.67 642.36 1591.25 1408.48 14.62 ... 8176.17 8.4480 0.03 392 2388 100.0 38.83 23.3286 NaN NaN
13067 100 170 -0.0006 -0.0004 100.0 518.67 642.70 1589.92 1407.93 14.62 ... 8178.23 8.4235 0.03 393 2388 100.0 38.88 23.2980 NaN NaN
13068 100 171 -0.0026 -0.0003 100.0 518.67 642.83 1586.64 1411.19 14.62 ... 8172.32 8.4667 0.03 392 2388 100.0 38.84 23.2103 NaN NaN
13069 100 172 0.0007 0.0003 100.0 518.67 642.61 1594.51 1403.93 14.62 ... 8179.29 8.4279 0.03 394 2388 100.0 38.95 23.3630 NaN NaN
13070 100 173 -0.0003 -0.0000 100.0 518.67 642.56 1591.49 1405.75 14.62 ... 8182.17 8.4615 0.03 393 2388 100.0 38.92 23.3491 NaN NaN
13071 100 174 -0.0005 0.0003 100.0 518.67 642.81 1589.97 1404.90 14.62 ... 8183.70 8.4234 0.03 392 2388 100.0 38.64 23.2858 NaN NaN
13072 100 175 0.0019 0.0000 100.0 518.67 642.25 1595.08 1398.46 14.62 ... 8182.50 8.4691 0.03 393 2388 100.0 39.00 23.2858 NaN NaN
13073 100 176 0.0002 0.0002 100.0 518.67 642.73 1590.96 1410.76 14.62 ... 8186.68 8.4922 0.03 395 2388 100.0 38.79 23.2757 NaN NaN
13074 100 177 -0.0041 -0.0003 100.0 518.67 642.80 1593.26 1411.29 14.62 ... 8188.45 8.4296 0.03 393 2388 100.0 39.02 23.2923 NaN NaN
13075 100 178 0.0001 0.0005 100.0 518.67 642.32 1591.78 1408.08 14.62 ... 8184.89 8.4574 0.03 394 2388 100.0 38.65 23.1937 NaN NaN
13076 100 179 0.0041 0.0001 100.0 518.67 642.30 1588.93 1412.82 14.62 ... 8185.87 8.4371 0.03 393 2388 100.0 38.78 23.2962 NaN NaN
13077 100 180 -0.0004 0.0001 100.0 518.67 642.65 1593.47 1415.79 14.62 ... 8193.02 8.5022 0.03 392 2388 100.0 38.75 23.3786 NaN NaN
13078 100 181 -0.0010 0.0000 100.0 518.67 642.94 1586.09 1408.95 14.62 ... 8193.03 8.4124 0.03 393 2388 100.0 38.88 23.1571 NaN NaN
13079 100 182 -0.0027 -0.0001 100.0 518.67 642.28 1598.05 1414.43 14.62 ... 8193.05 8.4333 0.03 394 2388 100.0 38.90 23.2921 NaN NaN
13080 100 183 0.0009 0.0001 100.0 518.67 642.78 1595.34 1406.21 14.62 ... 8197.42 8.4463 0.03 394 2388 100.0 38.66 23.3496 NaN NaN
13081 100 184 0.0001 -0.0004 100.0 518.67 642.60 1595.89 1416.17 14.62 ... 8197.66 8.4568 0.03 394 2388 100.0 38.80 23.1229 NaN NaN
13082 100 185 0.0032 0.0004 100.0 518.67 642.84 1592.18 1413.60 14.62 ... 8195.29 8.4424 0.03 394 2388 100.0 38.81 23.3007 NaN NaN
13083 100 186 -0.0025 0.0005 100.0 518.67 643.38 1592.04 1416.25 14.62 ... 8203.03 8.4802 0.03 393 2388 100.0 38.89 23.3233 NaN NaN
13084 100 187 0.0019 0.0002 100.0 518.67 642.40 1595.19 1416.95 14.62 ... 8197.56 8.4632 0.03 394 2388 100.0 38.87 23.3093 NaN NaN
13085 100 188 0.0010 0.0005 100.0 518.67 643.35 1589.88 1409.88 14.62 ... 8206.25 8.4457 0.03 394 2388 100.0 38.54 23.3504 NaN NaN
13086 100 189 -0.0003 0.0002 100.0 518.67 643.29 1592.33 1417.66 14.62 ... 8209.84 8.4423 0.03 393 2388 100.0 38.71 23.3188 NaN NaN
13087 100 190 -0.0038 0.0002 100.0 518.67 642.95 1598.97 1421.28 14.62 ... 8207.95 8.4765 0.03 395 2388 100.0 38.74 23.3551 NaN NaN
13088 100 191 -0.0031 -0.0001 100.0 518.67 642.92 1589.54 1413.65 14.62 ... 8201.94 8.4877 0.03 396 2388 100.0 38.89 23.2279 NaN NaN
13089 100 192 -0.0034 0.0001 100.0 518.67 643.05 1598.18 1418.58 14.62 ... 8210.24 8.4171 0.03 395 2388 100.0 38.77 23.2148 NaN NaN
13090 100 193 0.0018 0.0004 100.0 518.67 643.10 1595.60 1414.62 14.62 ... 8213.57 8.4429 0.03 395 2388 100.0 38.63 23.2952 NaN NaN
13091 100 194 0.0049 0.0000 100.0 518.67 643.24 1599.45 1415.79 14.62 ... 8213.28 8.4715 0.03 394 2388 100.0 38.65 23.1974 NaN NaN
13092 100 195 -0.0011 -0.0001 100.0 518.67 643.22 1595.69 1422.05 14.62 ... 8210.85 8.4512 0.03 395 2388 100.0 38.57 23.2771 NaN NaN
13093 100 196 -0.0006 -0.0003 100.0 518.67 643.44 1593.15 1406.82 14.62 ... 8217.24 8.4569 0.03 395 2388 100.0 38.62 23.2051 NaN NaN
13094 100 197 -0.0038 0.0001 100.0 518.67 643.26 1594.99 1419.36 14.62 ... 8220.48 8.4711 0.03 395 2388 100.0 38.66 23.2699 NaN NaN
13095 100 198 0.0013 0.0003 100.0 518.67 642.95 1601.62 1424.99 14.62 ... 8214.64 8.4903 0.03 396 2388 100.0 38.70 23.1855 NaN NaN

13096 rows × 28 columns

In [84]:
RUL = pd.read_csv('D:/RUL/CMAPSSData/RUL_FD001.txt', parse_dates=False, delimiter=" ", decimal=".", header=None)
RUL
Out[84]:
0 1
0 112 NaN
1 98 NaN
2 69 NaN
3 82 NaN
4 91 NaN
5 93 NaN
6 91 NaN
7 95 NaN
8 111 NaN
9 96 NaN
10 97 NaN
11 124 NaN
12 95 NaN
13 107 NaN
14 83 NaN
15 84 NaN
16 50 NaN
17 28 NaN
18 87 NaN
19 16 NaN
20 57 NaN
21 111 NaN
22 113 NaN
23 20 NaN
24 145 NaN
25 119 NaN
26 66 NaN
27 97 NaN
28 90 NaN
29 115 NaN
... ... ...
70 118 NaN
71 50 NaN
72 131 NaN
73 126 NaN
74 113 NaN
75 10 NaN
76 34 NaN
77 107 NaN
78 63 NaN
79 90 NaN
80 8 NaN
81 9 NaN
82 137 NaN
83 58 NaN
84 118 NaN
85 89 NaN
86 116 NaN
87 115 NaN
88 136 NaN
89 28 NaN
90 38 NaN
91 20 NaN
92 85 NaN
93 55 NaN
94 128 NaN
95 137 NaN
96 82 NaN
97 59 NaN
98 117 NaN
99 20 NaN

100 rows × 2 columns

通过观察上面的数据发现,train和test数据集的26/27列数据是NaN,可能是数据集中的制表符引起的。用下面的代码train和test数据集中每一列缺失的值的比例:

In [85]:
table_NaN = pd.concat([train.isnull().sum(), test.isnull().sum()], axis=1)
table_NaN.columns = ['train', 'test']
table_NaN
Out[85]:
train test
0 0 0
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
10 0 0
11 0 0
12 0 0
13 0 0
14 0 0
15 0 0
16 0 0
17 0 0
18 0 0
19 0 0
20 0 0
21 0 0
22 0 0
23 0 0
24 0 0
25 0 0
26 20631 13096
27 20631 13096

由此可见,只有26和27列有空值,我们得将空值清洗掉。然后给每一列取一个列名:

In [86]:
#清理无用列
train.drop(train.columns[[-1,-2]], axis=1, inplace=True)
test.drop(test.columns[[-1,-2]], axis=1, inplace=True)
RUL.drop(RUL.columns[[-1,-1]], axis=1, inplace=True)
#命名表头
cols = ['unit', 'cycles', 'op_setting1', 'op_setting2', 'op_setting3', 's1', 's2', 's3', 's4', 's5', 
        's6', 's7', 's8', 's9', 's10', 's11', 's12', 's13', 's14', 's15', 's16', 's17', 's18', 's19', 's20', 's21']
train.columns = cols
test.columns = cols

先看看经过空值清理后train数据集的样子:

In [87]:
train.head()
Out[87]:
unit cycles op_setting1 op_setting2 op_setting3 s1 s2 s3 s4 s5 ... s12 s13 s14 s15 s16 s17 s18 s19 s20 s21
0 1 1 -0.0007 -0.0004 100.0 518.67 641.82 1589.70 1400.60 14.62 ... 521.66 2388.02 8138.62 8.4195 0.03 392 2388 100.0 39.06 23.4190
1 1 2 0.0019 -0.0003 100.0 518.67 642.15 1591.82 1403.14 14.62 ... 522.28 2388.07 8131.49 8.4318 0.03 392 2388 100.0 39.00 23.4236
2 1 3 -0.0043 0.0003 100.0 518.67 642.35 1587.99 1404.20 14.62 ... 522.42 2388.03 8133.23 8.4178 0.03 390 2388 100.0 38.95 23.3442
3 1 4 0.0007 0.0000 100.0 518.67 642.35 1582.79 1401.87 14.62 ... 522.86 2388.08 8133.83 8.3682 0.03 392 2388 100.0 38.88 23.3739
4 1 5 -0.0019 -0.0002 100.0 518.67 642.37 1582.85 1406.22 14.62 ... 522.19 2388.04 8133.80 8.4294 0.03 393 2388 100.0 38.90 23.4044

5 rows × 26 columns

再看看清理后test数据集的样子:

In [88]:
test.head()
Out[88]:
unit cycles op_setting1 op_setting2 op_setting3 s1 s2 s3 s4 s5 ... s12 s13 s14 s15 s16 s17 s18 s19 s20 s21
0 1 1 0.0023 0.0003 100.0 518.67 643.02 1585.29 1398.21 14.62 ... 521.72 2388.03 8125.55 8.4052 0.03 392 2388 100.0 38.86 23.3735
1 1 2 -0.0027 -0.0003 100.0 518.67 641.71 1588.45 1395.42 14.62 ... 522.16 2388.06 8139.62 8.3803 0.03 393 2388 100.0 39.02 23.3916
2 1 3 0.0003 0.0001 100.0 518.67 642.46 1586.94 1401.34 14.62 ... 521.97 2388.03 8130.10 8.4441 0.03 393 2388 100.0 39.08 23.4166
3 1 4 0.0042 0.0000 100.0 518.67 642.44 1584.12 1406.42 14.62 ... 521.38 2388.05 8132.90 8.3917 0.03 391 2388 100.0 39.00 23.3737
4 1 5 0.0014 0.0000 100.0 518.67 642.51 1587.19 1401.92 14.62 ... 522.15 2388.03 8129.54 8.4031 0.03 390 2388 100.0 38.99 23.4130

5 rows × 26 columns

以及RUL数据集。简单来讲就是第一个元件剩余使用寿命为112,第二个为98,依此类推。

In [89]:
RUL.head()
Out[89]:
0
0 112
1 98
2 69
3 82
4 91