Named Entity Recognition for Address Extraction in Speech-to-Text Transcriptions Using Synthetic Data

Created by MG96

External Public cs.CL I.2.7

Statistics

Citations
0
References
0
Last updated
Loading...
Authors

Bibiána Lajčinová Patrik Valábek Michal Spišiak
Project Resources

Name Type Source Actions
ArXiv Paper Paper arXiv
Abstract

This paper introduces an approach for building a Named Entity Recognition (NER) model built upon a Bidirectional Encoder Representations from Transformers (BERT) architecture, specifically utilizing the SlovakBERT model. This NER model extracts address parts from data acquired from speech-to-text transcriptions. Due to scarcity of real data, a synthetic dataset using GPT API was generated. The importance of mimicking spoken language variability in this artificial data is emphasized. The performance of our NER model, trained solely on synthetic data, is evaluated using small real test dataset.

Note:

No note available for this project.

No note available for this project.
Contact:

No contact available for this project.

No contact available for this project.