d2l房价预测练习
1 | import numpy as np |
Id | MSSubClass | MSZoning | LotFrontage | LotArea | Street | Alley | LotShape | LandContour | Utilities | … | PoolArea | PoolQC | Fence | MiscFeature | MiscVal | MoSold | YrSold | SaleType | SaleCondition | SalePrice | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 60 | RL | 65.0 | 8450 | Pave | NaN | Reg | Lvl | AllPub | … | 0 | NaN | NaN | NaN | 0 | 2 | 2008 | WD | Normal | 208500 |
1 | 2 | 20 | RL | 80.0 | 9600 | Pave | NaN | Reg | Lvl | AllPub | … | 0 | NaN | NaN | NaN | 0 | 5 | 2007 | WD | Normal | 181500 |
2 | 3 | 60 | RL | 68.0 | 11250 | Pave | NaN | IR1 | Lvl | AllPub | … | 0 | NaN | NaN | NaN | 0 | 9 | 2008 | WD | Normal | 223500 |
3 | 4 | 70 | RL | 60.0 | 9550 | Pave | NaN | IR1 | Lvl | AllPub | … | 0 | NaN | NaN | NaN | 0 | 2 | 2006 | WD | Abnorml | 140000 |
4 | 5 | 60 | RL | 84.0 | 14260 | Pave | NaN | IR1 | Lvl | AllPub | … | 0 | NaN | NaN | NaN | 0 | 12 | 2008 | WD | Normal | 250000 |
1 | # 训练集 1460 个数据 |
1 | all_features = pd.concat((train_data.iloc[:, 1:-1], test_data.iloc[:, 1:])) |
MSSubClass | MSZoning | LotFrontage | LotArea | Street | Alley | LotShape | LandContour | Utilities | LotConfig | … | ScreenPorch | PoolArea | PoolQC | Fence | MiscFeature | MiscVal | MoSold | YrSold | SaleType | SaleCondition | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 60 | RL | 65.0 | 8450 | Pave | NaN | Reg | Lvl | AllPub | Inside | … | 0 | 0 | NaN | NaN | NaN | 0 | 2 | 2008 | WD | Normal |
1 | 20 | RL | 80.0 | 9600 | Pave | NaN | Reg | Lvl | AllPub | FR2 | … | 0 | 0 | NaN | NaN | NaN | 0 | 5 | 2007 | WD | Normal |
2 | 60 | RL | 68.0 | 11250 | Pave | NaN | IR1 | Lvl | AllPub | Inside | … | 0 | 0 | NaN | NaN | NaN | 0 | 9 | 2008 | WD | Normal |
3 | 70 | RL | 60.0 | 9550 | Pave | NaN | IR1 | Lvl | AllPub | Corner | … | 0 | 0 | NaN | NaN | NaN | 0 | 2 | 2006 | WD | Abnorml |
4 | 60 | RL | 84.0 | 14260 | Pave | NaN | IR1 | Lvl | AllPub | FR2 | … | 0 | 0 | NaN | NaN | NaN | 0 | 12 | 2008 | WD | Normal |
… | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … |
1454 | 160 | RM | 21.0 | 1936 | Pave | NaN | Reg | Lvl | AllPub | Inside | … | 0 | 0 | NaN | NaN | NaN | 0 | 6 | 2006 | WD | Normal |
1455 | 160 | RM | 21.0 | 1894 | Pave | NaN | Reg | Lvl | AllPub | Inside | … | 0 | 0 | NaN | NaN | NaN | 0 | 4 | 2006 | WD | Abnorml |
1456 | 20 | RL | 160.0 | 20000 | Pave | NaN | Reg | Lvl | AllPub | Inside | … | 0 | 0 | NaN | NaN | NaN | 0 | 9 | 2006 | WD | Abnorml |
1457 | 85 | RL | 62.0 | 10441 | Pave | NaN | Reg | Lvl | AllPub | Inside | … | 0 | 0 | NaN | MnPrv | Shed | 700 | 7 | 2006 | WD | Normal |
1458 | 60 | RL | 74.0 | 9627 | Pave | NaN | Reg | Lvl | AllPub | Inside | … | 0 | 0 | NaN | NaN | NaN | 0 | 11 | 2006 | WD | Normal |
1 | all_features.dtypes |
1 | # 返回类型不为 object 列的列名,类型为数字的列名 |
1 | # 将所有数字类特征放在一个共同的尺度上,即标准化 |
MSSubClass | MSZoning | LotFrontage | LotArea | Street | Alley | LotShape | LandContour | Utilities | LotConfig | … | ScreenPorch | PoolArea | PoolQC | Fence | MiscFeature | MiscVal | MoSold | YrSold | SaleType | SaleCondition | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.067320 | RL | -0.184443 | -0.217841 | Pave | NaN | Reg | Lvl | AllPub | Inside | … | -0.285886 | -0.063139 | NaN | NaN | NaN | -0.089577 | -1.551918 | 0.157619 | WD | Normal |
1 | -0.873466 | RL | 0.458096 | -0.072032 | Pave | NaN | Reg | Lvl | AllPub | FR2 | … | -0.285886 | -0.063139 | NaN | NaN | NaN | -0.089577 | -0.446848 | -0.602858 | WD | Normal |
2 | 0.067320 | RL | -0.055935 | 0.137173 | Pave | NaN | IR1 | Lvl | AllPub | Inside | … | -0.285886 | -0.063139 | NaN | NaN | NaN | -0.089577 | 1.026577 | 0.157619 | WD | Normal |
3 | 0.302516 | RL | -0.398622 | -0.078371 | Pave | NaN | IR1 | Lvl | AllPub | Corner | … | -0.285886 | -0.063139 | NaN | NaN | NaN | -0.089577 | -1.551918 | -1.363335 | WD | Abnorml |
4 | 0.067320 | RL | 0.629439 | 0.518814 | Pave | NaN | IR1 | Lvl | AllPub | FR2 | … | -0.285886 | -0.063139 | NaN | NaN | NaN | -0.089577 | 2.131647 | 0.157619 | WD | Normal |
… | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … |
1454 | 2.419286 | RM | -2.069222 | -1.043758 | Pave | NaN | Reg | Lvl | AllPub | Inside | … | -0.285886 | -0.063139 | NaN | NaN | NaN | -0.089577 | -0.078492 | -1.363335 | WD | Normal |
1455 | 2.419286 | RM | -2.069222 | -1.049083 | Pave | NaN | Reg | Lvl | AllPub | Inside | … | -0.285886 | -0.063139 | NaN | NaN | NaN | -0.089577 | -0.815205 | -1.363335 | WD | Abnorml |
1456 | -0.873466 | RL | 3.884968 | 1.246594 | Pave | NaN | Reg | Lvl | AllPub | Inside | … | -0.285886 | -0.063139 | NaN | NaN | NaN | -0.089577 | 1.026577 | -1.363335 | WD | Abnorml |
1457 | 0.655311 | RL | -0.312950 | 0.034599 | Pave | NaN | Reg | Lvl | AllPub | Inside | … | -0.285886 | -0.063139 | NaN | MnPrv | Shed | 1.144116 | 0.289865 | -1.363335 | WD | Normal |
1458 | 0.067320 | RL | 0.201080 | -0.068608 | Pave | NaN | Reg | Lvl | AllPub | Inside | … | -0.285886 | -0.063139 | NaN | NaN | NaN | -0.089577 | 1.763290 | -1.363335 | WD | Normal |
1 | # pd.get_dummies 利用 pandas 实现one hot encode的方式 |
MSSubClass | LotFrontage | LotArea | OverallQual | OverallCond | YearBuilt | YearRemodAdd | MasVnrArea | BsmtFinSF1 | BsmtFinSF2 | … | SaleType_Oth | SaleType_WD | SaleType_nan | SaleCondition_Abnorml | SaleCondition_AdjLand | SaleCondition_Alloca | SaleCondition_Family | SaleCondition_Normal | SaleCondition_Partial | SaleCondition_nan | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.067320 | -0.184443 | -0.217841 | 0.646073 | -0.507197 | 1.046078 | 0.896679 | 0.523038 | 0.580708 | -0.29303 | … | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
1 | -0.873466 | 0.458096 | -0.072032 | -0.063174 | 2.187904 | 0.154737 | -0.395536 | -0.569893 | 1.177709 | -0.29303 | … | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
2 | 0.067320 | -0.055935 | 0.137173 | 0.646073 | -0.507197 | 0.980053 | 0.848819 | 0.333448 | 0.097840 | -0.29303 | … | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
3 | 0.302516 | -0.398622 | -0.078371 | 0.646073 | -0.507197 | -1.859033 | -0.682695 | -0.569893 | -0.494771 | -0.29303 | … | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | 0.067320 | 0.629439 | 0.518814 | 1.355319 | -0.507197 | 0.947040 | 0.753100 | 1.381770 | 0.468770 | -0.29303 | … | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
… | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … |
1454 | 2.419286 | -2.069222 | -1.043758 | -1.481667 | 1.289537 | -0.043338 | -0.682695 | -0.569893 | -0.968860 | -0.29303 | … | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
1455 | 2.419286 | -2.069222 | -1.049083 | -1.481667 | -0.507197 | -0.043338 | -0.682695 | -0.569893 | -0.415757 | -0.29303 | … | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
1456 | -0.873466 | 3.884968 | 1.246594 | -0.772420 | 1.289537 | -0.373465 | 0.561660 | -0.569893 | 1.717643 | -0.29303 | … | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
1457 | 0.655311 | -0.312950 | 0.034599 | -0.772420 | -0.507197 | 0.682939 | 0.370221 | -0.569893 | -0.229194 | -0.29303 | … | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
1458 | 0.067320 | 0.201080 | -0.068608 | 0.646073 | -0.507197 | 0.715952 | 0.465941 | -0.045732 | 0.694840 | -0.29303 | … | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
1 | # 训练数据个数 |
1 | # 线性模型作为baseline检查错误 |
1 | def get_net(): |
1 | def train(net, train_features, train_labels, test_features, test_labels, |
1 | # K则交叉验证 |
1 | k, num_epochs, lr, weight_decay, batch_size = 5, 100, 5, 0, 64 |
1 | def train_and_pred(train_features, test_features, train_labels, test_data, |
1 | pd.read_csv('submission.csv').head() |
Id | SalePrice |
---|---|
1461 | 119559.195 |
1462 | 154014.53 |
1463 | 198652.77 |
1464 | 217135.89 |
1465 | 177476.7 |
最终 score 为 : 0.16706
本博客所有文章除特别声明外,均采用 CC BY-NC-SA 4.0 许可协议。转载请注明来自 GDPolar's Blog!