9:15 Data_Peeler: constraint-based closed pattern mining in n-ary relations / Extraction sous contraintes de motifs fermés dans des relations n-aires (Loïc Cerf, LIRIS)
Set pattern discovery from binary relations has been extensively studied during the last decade. In particular, many complete and efficient algorithms which extract frequent closed sets are now available. Generalizing such a task to n-ary relations (n >= 2) appears as a timely challenge. It may be important for many applications, e.g., when adding the time dimension to the popular objects X features binary case. The generality of the task - no assumption being made on the relation arity or on the size of its attribute domains - makes it computationally challenging. We introduce an algorithm called Data-Peeler. From a n-ary relation, it extracts all closed patterns satisfying given piecewise (anti)-monotonic constraints. This new class of constraints generalizes both monotonic and anti-monotonic constraints. Considering the special case of ternary relations, Data-Peeler outperforms the state-of-the-art algorithms CubeMiner and Trias by orders of magnitude. These good performances must be granted to a new clever enumeration strategy allowing an efficient closeness checking. An original application on a real-life 4-ary relation is used to assess the relevancy of closed n-sets constraint-based mining.
9:55 Elagage efficace d'automates probabilistes (Baptiste Jeudy et Franck Thollard, LaHC)
Certaines applications de l'inférence grammaticale probabiliste sont limitées par des problèmes d'efficacité. En modélisation statistique de la langue par exemple, on dispose maintenant de grands corpus. L'inférence sur de tels ensembles d'apprentissage est compromise par la taille de l'arbre des prefixes probabiliste engendré puisqu'il peut faire plusieurs millions d'états. Nous proposons dans cet exposé une méthode pour élaguer des automates probabilistes (restreints à une structure d'arbre). La méthode proposée est non seulement efficace (sub-quadratique) mais elle permet de plus une diminution importante du nombre d'états avec un faible impact sur la qualité de l'inférence. Les résultats sont évalués sur une tâche de modélisation statistique de la langue.
10:35 break
11:00 A lower bound on the number of examples needed to perform a significant sequence mining task (Marc Sebban, LaHC)
During the past few years, the problem of assessing the statistical significance of frequent patterns extracted from a given set S of sequences has received much attention. Considering that S always consists of a sample drawn from an unknown underlying distribution, two types of risks can arise during a sequence mining process: accepting a false frequent pattern or rejecting a true one. Assuming that the number of sequences is an application-dependent parameter, almost all the approaches presented in the literature offer solutions to only control one risk to the detriment of the other one. We provide here a lower bound on the number of sequences required to control both risks and then to achieve a significant sequence mining task.
Venue:
GREYC CNRS UMR 6072
Room S3-305
UFR Sciences - Université de Caen
Boulevard du Maréchal Juin
14000 Caen
Tramway: line A, direction Campus 2, get off at the last stop
Maps are available at:
https://www.greyc.fr/?q=node/403