MoRe NL: foundations of a Modular Realisation Engine for Nguni LanguagesNRF CPRR grant (2020-2022), Grant number 120852
Project summary - Outputs - Members and collaborators
Project summaryA multitude of socio-economic and political factors cause language barriers to persist in healthcare and other areas, such as weather forecasts, for the vast majority of people in South Africa. Computer applications may alleviate these issues by translations or generating the required contextually relevant text from structured input. The latter is addressed by Natural Language Generation (NLG). The current state of NLG for Nguni languages--one of the two main groups of indigenous languages of to South Africa--is in the exploratory stage, which has led to a clear set of problems that need to be resolved. As templates are generally inapplicable, once-off patterns were defined, but there is no NLG pattern specification language. The algorithms for the few knowledge-to-text sentences supported are ad hoc, rather than systematically and modular for flexible reuse across application scenarios. Further, looking beyond isiZulu to related languages, there is no theory, nor tool, nor even an approach for easy reuse and adaptation--or: bootstrapping--the resources for those other languages that are also widely spoken.
The aim of this project is to carry out the research needed to build a generic framework for a NLG realization engine for at least the Nguni language group, inclusive of an entirely novel NLG pattern specification language with annotation model, that will be modular and domain-independent so that one can 'mix and match' word fragments, clitics, and concords as needed for the task. This will be computationally tractable and be usable with popular NLP tools and knowledge representation systems, such as NLTK and RDF and OWL. This will enable designers to generate sentences in the Nguni languages and in related Bantu languages for a range of applications. Further, in aiming for generalizability of such a realisation engine, a solution will be found for devising computationally usable measures with predictive power for bootstrapping across related Bantu languages.
Mahlaza, Z., Keet, C.M. Surface realisation architecture for low-resourced African languages. ACM Transactions on Asian and Low-Resource Language Information Processing, (in print).
- Gutman, A., Keet, CM. Abstract Wikipedia/Template Language for Wikifunctions. Proposal. 27 July 2022.
- Keet, C.M., Khumalo, L. Mahlaza, Z. Considerations for a model for NCB noun classes in Wikidata. WikiWorkshop 2022, April 25, 2022, online. (abstract)
- Gillis-Webber, F., Keet, C.M. A Survey of Multilingual OWL Ontologies in BioPortal. 13th Semantic Web Applications and Tools for Healthcare and Life Sciences (SWAT4HCLS'22). Wolstencroft, K. et al. (Eds.). CEUR-WS Vol. 3127, 87-96. Leiden, the Netherlands, January 10-13 2022.
- Mahlaza, Z., Keet, C.M. ToCT: A task ontology to manage complex templates. FOIS'21 Ontology Showcase, 13-16 September 2021, Bolzano, Italy. Sanfilippo, E.M. et al. (Eds.). CEUR-WS vol. 2969. 9p.
- Keet, C.M. Natural Language Generation Requirements for Social Robots in Sub-Saharan Africa. IST-Africa 2021, 10-14 May 2021, online. IST-Africa Institute and IIMC Ireland. Cunningham, M. and Cunningham, P. (Eds). 10-14 May 2021, online.
- Mahlaza, Z., Keet, C.M. Formalisation and classification of grammar and template-mediated techniques to model and ontology verbalisation. International Journal of Metadata, Semantics and Ontologies, 2020, 14(3): 249-262.
- Mahlaza, Z., Keet, C.M. OWLSIZ: An isiZulu CNL for structured knowledge validation. 3rd Workshop on Natural Language Generation from the Semantic Web (WebNLG'20), ACL, pp15-25. 18 Dec 2020, Dublin, Ireland.
- Keet, C.M., Khumalo, L. Parthood and Part--Whole Relations in Zulu Language and Culture. Applied Ontology, 2020, 15(3): 361-384.
- Digitial Assistant for Financial Transactions by Junior Moraba and Amy Solomons, in 2021.
- Generating natural language text in isiZulu from mathematical expressions by Shan Smith (main supervisor: Zola Mahlaza), in 2020.
Talks and tutorials:
- Knowledge-to-text Natural Language Generation for Agglutinating African Languages. TechTalk at the Wikimedia Foundation google.org fellows offsite workshop, Google Zurich, Switzerland, 23-26 August 2022. video on Wikimedia
- JOWO 2022 tutorial: Generating text from ontologies in multiple languages. Jönköping, Sweden, 15-19 August.
- Encoding Biases' Influences on Development and Use of Ontologies in the Life Sciences. Keynote at Bio-Ontologies, part of Intelligent Systems for Molecular Biology 2022 (ISMB'22), 10-14 July 2022, Madison, USA.
- Natural Language Generation for Agglutinating African Languages -- A brief overview. Digital Humanities Colloquium, at SADiLaR, 18 May 2022 (online). screen recording on YouTube - slides
- Natural Language Generation Requirements for Social Robots in Sub-Saharan Africa. Conference presentation of the paper at the IST-Africa'21 conference. screen recording
- Proof-of-concept programs and related software artefacts:
Members and collaborators
- Assoc. Prof. Maria Keet, UCT; PI
- Prof. Langa Khumalo, SADILAR; research associate
- Dr. Zubeida Khan, CSIR; research associate
- Mr. Zola Mahlaza, PhD student, UCT; research associate
- Ms. Frances Gillis-Webber, PhD student, UCT
- Mr. Leighton Dawson; MSc student, UCT
- Scientific programmers and research assistants (since 2020): Blessed Chitamba, Kouthar Dollie, Sindiso Mkhatshwa, Junior Moraba, Gerald Ngumbulu, Toky Raboanary