SLT 2018 • Technical Program • 2018 IEEE Workshop on Spoken Language Technology (SLT) | 18-21 December 2018

My SLT 2018 Schedule

Note: Your custom schedule will not be saved unless you create a new account or login to an existing account.

Create a login based on your email (takes less than one minute)
Perform 'Paper Search'
Select papers that you desire to save in your personalized schedule
Click on 'My Schedule' to see the current list of selected papers
Click on 'Printable Version' to create a separate window suitable for printing (the header and menu will appear, but will not actually print)

Paper Detail

Presentation #

Session:

Natural Language Processing

Session Time:

Thursday, December 20, 13:30 - 15:30

Presentation Time:

Thursday, December 20, 13:30 - 15:30

Presentation:

Poster

Topic:

Natural language processing:

Paper Title:

EXTENSION OF CONVENTIONAL CO-TRAINING LEARNING STRATEGIES TO THREE-VIEW AND COMMITTEE-BASED LEARNING STRATEGIES FOR EFFECTIVE AUTOMATIC SENTENCE SEGMENTATION

Authors:

Dogan Dalva; F.M.V. ISIK University

Umit Guz; F.M.V. ISIK University

Hakan Gurkan; Bursa Technical University

Abstract:

The objective of this work is to develop effective multi-view semi-supervised machine learning strategies for sentence boundary classification problem when only small sets of sentence boundary labeled data are available. We propose three-view and committee-based learning strategies incorporating with co-training algorithms with agreement, disagreement, and self-combined learning strategies using prosodic, lexical and morphological information. We compare experimental results of proposed three-view and committee-based learning strategies to other semi-supervised learning strategies in the literature namely, self-training and co-training with agreement, disagreement, and self-combined strategies. The experiment results show that sentence segmentation performance can be highly improved using multi-view learning strategies that we propose since data sets can be represented by three redundantly sufficient and disjoint feature sets. We show that the proposed strategies substantially improve the average performance when only a small set of manually labeled data is available for Turkish and English spoken languages, respectively.