Semantic based DCM models for text classification

Capitolo di libro

Data di Pubblicazione:

2012

Abstract:

This contribution deals with the problem of text classification. The proposed approach is probabilistic and it is based on a mixture of a Dirichlet and Multinomial distributions. Our aim is to build a classifier able, not only to tale into account the words frequency, but also the latent topics contained within the available corpora. This new model, called sbDCM, allows us to insert directly the number of topics (known or unknown) that compound the document, without losing the 'burstiness' phenomenon and the classification performance.

Tipologia CRIS:

2.1 Contributo in volume (Capitolo o Saggio)

Keywords:

Text classification; mixture models; Dirichlet compound Multinomial model

Elenco autori:

Cerchiello, Paola

Autori di Ateneo:

CERCHIELLO PAOLA

Link alla scheda completa:

https://iris.unipv.it/handle/11571/452701

Titolo del libro:

Advanced Statistical Methods for the Analysis of Large Data-Sets