Machine Learning 务实----Applying deep learning to real-world problems
1. Pre-tuning method
在现实世界里应用ML,得到大量精确标注的数据是昂贵的。
如果只有有限量的精确标注数据,则可采用pre-tuning方法以辅助提升训练模型的精度[1]。
First pre-tuning on cheap large datasets on related domain.
Then fine-tuning on expensive well-labeled data.
As we fine-tune on precisely labeled data,
it is possible to pre-train on so-called weakly labeled data.
(i.e. 90% of the labels might be correct and 10% wrong)
2.Caveats of real-world label distributions[1]
在现实世界里,得到的数据有以下特征(相对于学术界里的平衡数据):
- Unbalanced label distribution
- Unbalanced cost of misclassification
Solution[1]让训练集中的数据分布更加均衡:
- 增加更多的数据样本
- 重新分类标记(将一些罕见类别合并到常见类别中)
- 采用采样策略包括忽略异常样本、过采样或欠采样、合并罕见类别以及调整损失函数权重
3. grasping the inner workings of complex machine learning models
在现实世界中应用机器学习技术不仅仅依赖于模型的高精度(accuracy),而且还必须解决以下关键问题:
首先是理解模型为什么会犯错以及犯错的原因是什么?
其次是提供对为什么我们的模型能够超越现有解决方案的直觉上的理解?
- make sure that the model cannot be tricked.
参考:
[1] 《Applying deep learning to real-world problems》
The application of deep learning techniques has emerged as a pivotal strategy in addressing contemporary challenges across various domains. To achieve this, we need to address several challenges such as data preprocessing, model selection, and hyperparameter tuning. This approach ensures that models can learn effectively from diverse datasets leading to better performance and more accurate predictions.
