Research Repository

Book chapter

Building a dual dataset of text- and image-grounded conversations and summarisation in Gàidhlig (Scottish Gaelic)

上市 Deposited

Creator

Howcroft, David M
Lamb, Will
Groundwater, Anna ( )
Gkatzia, Dimitra

2023

View files in viewer

Abstract

Gàidhlig (Scottish Gaelic; gd) is spoken by about 57k people in Scotland,1 but remains an under-resourced language with respect to natural language processing in general and natural language generation (NLG) in particular. To address this gap, we developed the first datasets for Scottish Gaelic NLG, collecting both conversational and summarisation data in a single setting. Our task setup involves dialogues between a pair of proficient speakers discussing museum exhibits, grounding the conversation in images and texts. Then, each interlocutor summarises the dialogue resulting in a secondary dialogue summarisation dataset. This paper presents the dialogue and summarisation corpora, as well as the software used for data collection. The dialogue dataset consists of 43 conversations (13.7k words) and 61 summaries (2.0k words).2

Items:

缩图	文件名	上载日期	能见度	File Size	动作
	Building_a_dual_dataset_of_text_and_image_grounded_conversations.pdf	2023-10-09	上市	1.1 MB	Download Download (as thumbnail)
	ade2e5d1-49ac-47cb-b94d-dad4b045a875 - Building_a_dual_dataset_of_text_and_image_grounded_conversations.pdf Page 1	10/09/2023	上市
	ade2e5d1-49ac-47cb-b94d-dad4b045a875 - Building_a_dual_dataset_of_text_and_image_grounded_conversations.pdf Page 2	10/09/2023	上市
	ade2e5d1-49ac-47cb-b94d-dad4b045a875 - Building_a_dual_dataset_of_text_and_image_grounded_conversations.pdf Page 3	10/09/2023	上市
	ade2e5d1-49ac-47cb-b94d-dad4b045a875 - Building_a_dual_dataset_of_text_and_image_grounded_conversations.pdf Page 4	10/09/2023	上市
	ade2e5d1-49ac-47cb-b94d-dad4b045a875 - Building_a_dual_dataset_of_text_and_image_grounded_conversations.pdf Page 5	10/09/2023	上市
	ade2e5d1-49ac-47cb-b94d-dad4b045a875 - Building_a_dual_dataset_of_text_and_image_grounded_conversations.pdf Page 6	10/09/2023	上市
	ade2e5d1-49ac-47cb-b94d-dad4b045a875 - Building_a_dual_dataset_of_text_and_image_grounded_conversations.pdf Page 7	10/09/2023	上市

Metadata

Resource Type: Book chapter
Creator: Howcroft, David M

Lamb, Will

Groundwater, Anna ( )

Gkatzia, Dimitra
Date published: 2023
Institution: National Museums Scotland
Organisational unit: Scottish History & Archaeology
Project name: NLG for low-resource domains
Funder: Name: EPSRC Centre for Doctoral Training in Technology Enhanced Chemical Synthesis

Awards: EP/T024917/1
Book title: The 16th International Natural Language Generation Conference Proceedings of the Conference September 11 - 15, 2023
Pagination: 443-448
Publisher: The Association for Computational Linguistics
ISBN: 9798891760011
Official URL: https://sigdialinlg2023.github.io/static/papers/inlg/82_Paper.pdf
Related URL: https://www.nms.ac.uk/about-us/our-organisation/strategy/gaelic-language-plan-plana-gaidhlig/
https://www.nms.ac.uk/collections-research/collections-departments/scottish-history-and-archaeology/meet-the-team/dr-anna-groundwater/
https://www.nms.ac.uk/explore-our-collections/films/collecting-the-present/
https://www.nms.ac.uk/explore-our-collections/stories/scottish-history-and-archaeology/lewis-chess-pieces/
Licence: CC BY 4.0 Attribution
Rights statement: In Copyright
关键词: conversational and summarisation data
Gàidhlig
images
Lewis Chessmen
museum exhibits
natural language generation
natural language processing
Scottish Gaelic
texts
Thàileisg Leòdhais
Additional Information: Full paper available via the official URL