Skip to Main content Skip to Navigation

Optimal Subgroup Discovery in Purely Numerical Data

Alexandre Millot 1 Rémy Cazabet 1 Jean-François Boulicaut 1
1 DM2L - Data Mining and Machine Learning
LIRIS - Laboratoire d'InfoRmatique en Image et Systèmes d'information
Abstract : Subgroup discovery in labeled data is the task of discovering patterns in the description space of objects to find subsets of objects whose labels show an interesting distribution, for example the disproportionate representation of a label value. Discovering interesting subgroups in purely numerical data-attributes and target label-has received little attention so far. Existing methods make use of discretization methods that lead to a loss of information and suboptimal results. This is the case for the reference algorithm SD-Map*. We consider here the discovery of optimal subgroups according to an interestingness measure in purely numerical data. We leverage the concept of closed interval patterns and advanced enumeration and pruning techniques. The performances of our algorithm are studied empirically and its added-value w.r.t. SD-Map* is illustrated.
Document type :
Conference papers
Complete list of metadatas

Cited literature [22 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-02483379
Contributor : Alexandre Millot <>
Submitted on : Tuesday, February 18, 2020 - 4:05:42 PM
Last modification on : Saturday, February 22, 2020 - 1:39:49 AM

File

PAKDD_HAL.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02483379, version 1

Citation

Alexandre Millot, Rémy Cazabet, Jean-François Boulicaut. Optimal Subgroup Discovery in Purely Numerical Data. Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), May 2020, Singapore, Singapore. 12 p., In Press. ⟨hal-02483379⟩

Share

Metrics

Record views

33

Files downloads

41