NBFC Platform

Project:

Back Edit Delete

Improving BERT with Hybrid Pooling Network and Drop Mask

Created by MG96

External Public cs.CL

Statistics

Citations

References

Last updated

Authors

Qian Chen Wen Wang Qinglin Zhang Chong Deng Ma Yukun Siqi Zheng

Project Resources

Filter by Resource Type:

Name	Type	Source	Actions
ArXiv Paper	Paper	arXiv	View Edit Delete
Semantic Scholar	Paper	Semantic Scholar	View Edit Delete

Abstract

Transformer-based pre-trained language models, such as BERT, achieve great success in various natural language understanding tasks. Prior research found that BERT captures a rich hierarchy of linguistic information at different layers. However, the vanilla BERT uses the same self-attention mechanism for each layer to model the different contextual features. In this paper, we propose a HybridBERT model which combines self-attention and pooling networks to encode different contextual features in each layer. Additionally, we propose a simple DropMask method to address the mismatch between pre-training and fine-tuning caused by excessive use of special mask tokens during Masked Language Modeling pre-training. Experiments show that HybridBERT outperforms BERT in pre-training with lower loss, faster training speed (8% relative), lower memory cost (13% relative), and also in transfer learning with 1.5% relative higher accuracies on downstream tasks. Additionally, DropMask improves accuracies of BERT on downstream tasks across various masking rates.

Project:

Improving BERT with Hybrid Pooling Network and Drop Mask

Statistics

Citations

References

Last updated

Authors

Project Resources

Abstract

Note:

No note available for this project.

Contact:

No contact available for this project.

Project:

Improving BERT with Hybrid Pooling Network and Drop Mask

Statistics

Citations

References

Last updated

Authors

Authors (6)

Project Resources

Resources (2)

Abstract

Note:

No note available for this project.

Contact:

No contact available for this project.