My PhD Thesis:


A Formal Theory of Granularity

Toward enhancing biological and applied life sciences information systems with granularity


btw, the front looks cooler in pdf figureHere it is, the PhD thesis, which was successfully defended in public on the 28th of April, 2008.
For connaisseurs of the italian system: XXesimo ciclo, INF/01.

Keet, C.M. A Formal Theory of Granularity. PhD Thesis, KRDB Research Centre, Faculty of Computer Science, Free University of Bozen-Bolzano, Italy. 2008. PDF

On this page you can find the short backflap summary, the abstract (about two pages), the pdf file of the thesis (about 3MB), the slides of the defense (pdf, 1.8MB), and some funny test files (.zip), being the Mace4 input files and a few illustrative proofs computed with Prover9.




Backflap summary

Computationally managing different levels of detail in biological data, information, and knowledge--biological granularity--is indispensable for both dealing advantageously with the huge amounts of data that are being generated by scientists and for structuring the knowledge to analyse and vertically integrate biological data and information across levels of granularity. Managing such databases, knowledge bases, and ontologies effectively and efficiently requires new foundational methodologies to push implementations to the next phase of in silico biology.

To address these issues, we move from a data-centric and underspecified treatment of granularity to the conceptual and logical layers, where informally defined elements of granularity have become ontologically motivated modelling constructs proper. This is achieved as follows. First, foundational semantics of granularity are disambiguated and structured in a taxonomy of types of granularity. This taxonomy makes explicit both the ways of granulation and representation, and how entities are organised within a level of granularity. Second, the static components of granularity--such as levels, indistinguishability, and granulation criteria, and how levels and entities relate--were subjected to an ontological analysis and formalised in a satisfiable logical theory, the theory of granularity (TOG), so that an unambiguous meaning is ensured, interesting properties proven, and satisfiability computationally demonstrated. Third, an extensible set of domain- and implementation-independent functions are defined for both the TOG elements to enable granular querying and reasoning over the theory, and for moving between entities residing in different levels through abstraction and expansion functions.

Effectively, granularity is lifted up to a higher layer of abstraction alike conceptual modelling languages do for software design and physical database schemas, thereby having made the representation of granularity domain- and implementation independent. Hence, reusability across implementations can be ensured, which in turn facilitates interoperability among information systems, such as granulation of ontologies, knowledge bases, databases, data warehouses, and biological and geographical information systems.


Abstract

Granularity--the ability to represent and operate on different levels of detail in data, information, and knowledge--is indispensable for managing the huge amounts of data generated by, among others, scientists and demands representations at widely ranging levels of analysis. In particular, managing biological databases, knowledge bases, and ontologies effectively and efficiently cannot be solved by adding more one-off software applications, but requires new foundational methodologies to push implementations to the next phase of in silico biology. Recent requests from domain experts include adequately addressing vertical integration of information systems across levels of biological granularity and cross-granular querying and reasoning. However, these kind of enhancements have been investigated in a fragmented manner and encompass both subject domain semantics ranging from molecules up to ecosystems and computational applications for qualitative and quantitative granularity irrespective of the subject domain. Coherently and consistently representing granularity and subsequently using and reusing it is a new foundational methodology that can enhance management of information systems and exploit its data more effectively. To realise representation of and reasoning with granularity, there are multiple issues to solve along three main dimensions. Current informal approaches have inconsistent granulation hierarchies that are not or only to a limited extent usable and reusable for computation. Formal and ontological approaches for granularity offer partial, data-oriented, theories that do not address precisely what granularity is, what its components are, how to use it, or if the theory is proven to be consistent. Engineering solutions are limited to difficult to reuse, data-centric, implementation-level solutions; this counts for both the levels and granulation hierarchies and the lack of transparency of hard-coded functions.

To solve these issues, we move from a data-centric and underspecified treatment of granularity to the ontological and logical layers, where informally defined components of granularity have become ontologically-motivated modelling constructs proper. This is achieved through the analysis and formalisation of three key elements: Effectively, granularity is both lifted up to a higher layer of abstraction alike conceptual modellingdoes for software design and database schemas and precisiated in a formal theory withmodel-theoretic semantics, thereby having made the representation of granularity domain- andimplementation independent. Hence, reusability across implementations can be ensured, which in turn facilitates interoperability among information systems, such as granulation of ontologies, knowledge bases, databases, data warehouses, and biological and geographical information systems. The modelling of a particular domain granularity framework--an instantiation of the TOG--serves not only as a novel methodology for structuring data, information, and knowledge, but also as an additional query- and inferencing layer to recombine the source data by posing granular queries and retrieve implicit or novel information hitherto hidden in the data source. A granular information system facilitates level selection and granular querying of the information source to retrieve desired facts more precisely and facilitates zooming in & out on sections of large ontologies tailored to a user's interests. It thereby alleviates the Ontology Comprehension Problem through enabling granular information retrieval for a range of different domain expert whilst having the full resource still available for reasoning across granular levels.

As encompassing architecture, there are four connected components. First, the foundational semantics of granularity is structured in a taxonomy of types of granularity. Second, the domain- and implementation-independent theory of granularity formally characterises granularity components such as granular perspective, granulation criterion, granular level, and the relation between levels. The TOG also makes use of aspects intertwined with granularity, such as indistinguishability, part-whole relations, and abstraction. When applied to databases, this TOG is positioned orthogonally at the conceptual data modelling layer, or, when applied to conceptual data models or logic-based type-level ontologies, resides at its meta-level. Third, the TOG is instantiated (at the instance or type-level, respectively) for specification of a domain granularity framework, such as a granular perspective for human structural anatomy with granular levels such as cell, tissue, and organ. Fourth, this domain granularity framework is then applied to the data source, which is, in the scope of the thesis, delimited to databases, knowledge bases, and ontologies. The combination of these four components results in a granulated information system that can be used for, among other scenarios, cross-granular querying and automated reasoning. Functions for the applied domain granularity are specified at the goal-oriented conceptual and logical layers and include selections of levels, of their contents (entity types or instances), combinations thereof, and abstraction and expansion operators to move from coarser to finer-grained levels and back. Possible use is demonstrated with the infectious diseases domain and other examples from the biology and applied life sciences domains. Feasibility of computational implementations was tested with the Gene Ontology and Foundational Model of Anatomy, thereby substantiating advantages of automation when applying granularity.

Last, the theory and implementation scenarios are compared with other formal and informal models and usages of granularity. It is demonstrated that the TOG entails extant theories, hence serves as a more generic, unifying, theory which is more comprehensive and can be better scalable and reusable than the extant partial theories and technology-dependent implementations. There is, however, ample room for further research, such as realising granularity-enabled linking of OWL-formalised ontologies and querying across the levels of biological granularity, rough/fuzzy extensions for level specification, and adding the granularity components as modelling constructs to conceptual data modelling languages so that the granularity may propagate automatically to the data in databases.




Given that you made it all the way to the bottom of the page and possibly you are still interested in the topic: there are a few nicely bound hard-copies (about B5-size) left, so if you would like to have one, contact met at keet at ukzn dot ac dot za.

Creative Commons License
A Formal Theory of Granularity by Catharina Maria Keet is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.
Based on a work at www.meteck.org.