MoRe NL: foundations of a Modular Realisation Engine for Nguni LanguagesNRF CPRR grant (2020-2022), Grant number 120852
Project summary - Outputs - Members and collaborators
Project summaryA multitude of socio-economic and political factors cause language barriers to persist in healthcare and other areas, such as weather forecasts, for the vast majority of people in South Africa. Computer applications may alleviate these issues by translations or generating the required contextually relevant text from structured input. The latter is addressed by Natural Language Generation (NLG). The current state of NLG for Nguni languages--one of the two main groups of indigenous languages of to South Africa--is in the exploratory stage, which has led to a clear set of problems that need to be resolved. As templates are generally inapplicable, once-off patterns were defined, but there is no NLG pattern specification language. The algorithms for the few knowledge-to-text sentences supported are ad hoc, rather than systematically and modular for flexible reuse across application scenarios. Further, looking beyond isiZulu to related languages, there is no theory, nor tool, nor even an approach for easy reuse and adaptation--or: bootstrapping--the resources for those other languages that are also widely spoken.
The aim of this project is to carry out the research needed to build a generic framework for a NLG realization engine for at least the Nguni language group, inclusive of an entirely novel NLG pattern specification language with annotation model, that will be modular and domain-independent so that one can 'mix and match' word fragments, clitics, and concords as needed for the task. This will be computationally tractable and be usable with popular NLP tools and knowledge representation systems, such as NLTK and RDF and OWL. This will enable designers to generate sentences in the Nguni languages and in related Bantu languages for a range of applications. Further, in aiming for generalizability of such a realisation engine, a solution will be found for devising computationally usable measures with predictive power for bootstrapping across related Bantu languages.
- Mahlaza, Z., Keet, C.M. ToCT: A task ontology to manage complex templates. FOIS'21 Ontology Showcase, 13-16 September 2021, Bolzano, Italy. (in print)
- Keet, C.M. Natural Language Generation Requirements for Social Robots in Sub-Saharan Africa. IST-Africa 2021, 10-14 May 2021, online. IST-Africa Institute and IIMC Ireland. Cunningham, M. and Cunningham, P. (Eds). 10-14 May 2021, online.
- Mahlaza, Z., Keet, C.M. Formalisation and classification of grammar and template-mediated techniques to model and ontology verbalisation. International Journal of Metadata, Semantics and Ontologies, 2020, 14(3): 249-262.
- Mahlaza, Z., Keet, C.M. OWLSIZ: An isiZulu CNL for structured knowledge validation. 3rd Workshop on Natural Language Generation from the Semantic Web (WebNLG'20), ACL, pp15-25. 18 Dec 2020, Dublin, Ireland.
Members and collaborators
- Assoc. Prof. Maria Keet, UCT; PI
- Prof. Langa Khumalo, SADILAR; research associate
- Dr. Zubeida Khan, CSIR; research associate
- Mr. Zola Mahlaza, PhD student, UCT; research associate
- Mr. Leighton Dawson; MSc student, UCT
- Scientific programmers and research assistants: Blessed Chitamba, Kouthar Dollie, Gerald Ngumbulu