HKNSC: A dynamic bilingual narrative speech corpus from Hong Kong and its potential for linguistic research

Dr. YANG Yike

HKNSC: A dynamic bilingual narrative speech corpus from Hong Kong and its potential for linguistic research

Author(s)

Dr. YANG Yike

Date Issued

2024

Conference

2024 Annual Research Forum of The Linguistic Society of Hong Kong

Citation

Yang, Y. (8 December 2024). HKNSC: A dynamic bilingual narrative speech corpus from Hong Kong and its potential for linguistic research. 2024 Annual Research Forum of the Linguistic Society of Hong Kong, Hong Kong Shue Yan University.

URI

https://lshk.org/wp-content/uploads/2024/12/ARF2024-abstract-book.pdf

http://hdl.handle.net/20.500.11861/24056

Type

Conference Paper

Abstract

Background: Until recently, language documentation has been largely focusing on collecting data of endangered languages, and the linguistic profiles of bilingual speakers in the immigration setting are not well documented, although there are some attempts in documenting language development of bilingual children. The current project aims to collect narrative speech data from late bilingual speakers, who exhibit distinctive linguistic features in the language acquisition process compared to simultaneous or early bilinguals because there is already a full-fledged first language (L1) in place when they start to learn their second language (L2). HKNSC stands for ‘Hong Kong Narrative Speech Corpus’. At the initial stage, only Mandarin-speaking immigrants were invited to contribute to the corpus. The team aims to invite immigrants with a wider range of L1s and will continue to update the corpus. Participants: Three groups of participants were invited to attend this study. The target bilingual group consisted of 73 Mandarin-speaking immigrants who were born and raised up in Northern China and arrived in Hong Kong after puberty. The other two groups were native speakers of Cantonese (N=59) and Mandarin (N=40). The bilingual group and the Cantonese speakers told the story in both Cantonese and Mandarin, while the monolingual Mandarin speakers told the story in Mandarin only. Materials: The wordless picture book Frog, where are you? (Mayer, 1969) were used to elicit spontaneous speech data. Participants were asked to tell the whole story at a self-paced speed. They were allowed to refer to the book and were encouraged to tell the story page by page in order not to miss the key information. In total, 132 Cantonese stories (73 bilinguals + 59 natives) and 155 Mandarin stories (73 bilinguals + 40 natives + 42 Cantonese speakers) were collected. Corpus Construction: The recordings were first transcribed into text using automatic speechto-text tools and subsequently verified by trained linguists to ensure transcription accuracy. Following word and sentence segmentation, the corpus was constructed using CLAN (MacWhinney, 1996). Upon a final review, the corpus will be released to researchers and students of linguistics and language acquisition for public access in early 2025. Potential for Research: This corpus will be valuable for scholars in the fields of linguistics and language acquisition. For instance, the data can be used to investigate the following topics regarding L1 and L2 Cantonese: 1) the merging status of checked syllables; 2) the production of lexical tones; 3) the use of sentence-final particles; 4) lexical/syntactic complexity; and 5) the use of cohesive devices. Moreover, researchers may also use the corpus data to examine the cross-linguistic inference of the L1 and L2 at various linguistic levels.

Options

HKNSC: A dynamic bilingual narrative speech corpus from Hong Kong and its potential for linguistic research

Availability at HKSYU Library