Skip to main content

Abstracts - May 18th, 2009

Go back to the program

  • Strings and Trees for Thumbnail Image Classification: overview and selected results from the SATTIC ANR Project (2007-2010) - (Christine Solnon, LIRIS, Université Lyon 1, France)
    SATTIC aims at defining a statistical characterization of a set of symbolic structures (such as strings, trees, or graphs) modelling images. In this talk, we shall first give an overview of this project, and then focus on a particular problem addressed in SATTIC, i.e., the problem of searching for patterns in images modelled by plane graphs (which are planar embeddings of planar graphs). To do that, we propose to model plane graphs with 2-dimensional combinatorial maps, which provide nice data structures for modelling the topology of a subdivision of a plane into nodes, edges and faces. We define submap isomorphism, and we give a polynomial algorithm for this problem. First experimental results show the validity of this approach to efficiently search for patterns in images.
  • Knowledge Discovery for and by Inductive Queries: overview and selected results from the Bingo2 ANR Project (2008-2010) - (Jean-François Boulicaut, LIRIS, INSA Lyon, France and Bruno Crémilleux, GREYC, Université de Caen Basse-Normandie, France)
    In this talk, our first objective is to motivate the workplan of the ANR project Bingo2 (Knowledge Discovery for and by Inductive Queries) and to survey the results obtained by the partners so far. We will also introduce with more technical details some results and ongoing work on pattern discovery: constraint-based paradigm, condensed representations, pattern discovery from n-ary relations (with applications to dynamic network analysis) and a couple of contributions on the so-called "from local to global" data mining perspective.
  • Krimp (Arno Siebes, Universiteit Utrecht, The Netherlands)
    The Minimum Description Length principle is an alternative to Statistics for, e.g., the model selection problem. In our experience MDL is also a good means to find interesting patterns. In this talk I will present some results to convince you of this.
  • The Robot Scientist Adam (Ross D. King, University of Wales, Aberystwyth, United Kingdom)
    Adam has autonomously generated and experimentally tested novel scientific hypotheses. We have confirmed Adam's conclusions through manual experiments. To describe Adam's research we have developed an ontology and logical language. The resulting formalization involves over 10,000 different research units in a nested tree-like structure, ten levels deep, that relates the 6.6 million biomass measurements to their logical description. This formalization describes how a machine discovered new scientific knowledge. We are using relational data-mining to examine this description and so ensure experimental reproducibility.
  • Mining string datasets under similarity and soft frequency constraints: application to promoter sequence analysis (Ieva Mitasiunaité, LIRIS, INSA Lyon, France)
    The Inductive Database (IDB) framework enables to describe and to execute Knowledge Discovery from Data (KDD) scenarios by means of sequences of queries. We are considering string dataset mining and the design of generic algorithms to solve inductive queries that are Boolean compositions of primitive constraints (e.g., a string must be frequent enough in a dataset and infrequent in another one and not include a given substring). In such a context, state-of-the-art proposals (e.g., the FAVST solver) solve such queries for monotonic and anti-monotonic primitive constraints (e.g., the typical conjunction of maximal and minimal frequency constraints). It is far more complicated and often impossible to design generic algorithms to solve not (anti-)monotonic constraints. We are considering these challenging issues that emerge as soon as we want to support the search for fault-tolerant patterns, i.e., for real-life noisy data analysis and/or degenerated pattern discovery. Therefore, our methodological contribution is twofold. First, we have considered different ways to specify sub-string pattern soft occurrences and soft frequency constraints that exploit similarity measures. Importantly, such constraints cannot be guaranteed to be (anti)-monotonic. Our proposal has been to design useful similarity constraints and related soft frequency constraints that can be specified as a conjunction of a monotonic and an anti-monotonic constraints and thus can be exploited efficiently. This has been implemented into our generic solver called Marguerite. Next, we have been considering the problem of extraction parameters tuning (e.g., providing threshold values for the frequency constraints), i.e., one of the current open problems to support constraint-based mining. We have been considering an original technique to guess the solution size thanks to a pattern sampling approach. We studied how to identify the most stringent constraints that provide solutions and whether one can thrust the extracted patterns as not being false positives thanks to a statistical measure called the Twilight Zone Indicator (TZI). Last but not least, we have used our methods and tools in a challenging application domains, namely gene promoter sequence analysis. In tight collaboration with a group of biologists whose expertise concerns stem cell self-renewal molecular mechanisms, we successfully applied Marguerite and the TZI measure to identify putative binding sites of the transcription factors involved in the process of cell differentiation.
  • Bisociative Information Networks Selected Thoughts on Work in Progress (Michael Berthold, University of Konstanz, Germany)
    In this presentation, I outline an approach for network-based information access and exploration. In contrast to existing methods, the presented framework allows for the integration of both semantically meaningful information as well as loosely coupled information fragments from heterogeneous information repositories. The resulting Bisociative Information Networks (BisoNets) together with explorative navigation methods facilitate the discovery of links across diverse domains. In addition to such “chains of evidence”, they enable the user to go back to the original information repository and investigate the origin of each link, ultimately resulting in the discovery of previously unknown connections between information entities of different domains, subsequently triggering new insights and supporting creative discoveries.