Chapter Post-stratification as a tool for enhancing the predictive power of classification methods

It is well known that, in classification problems, the predictive capacity of any decision-making model decreases rapidly with increasing asymmetry of the target variable (Sonquist et al., 1973; Fielding 1977). In particular, in segmentation analysis with a categorical target variable, very poor imp...

Full description

Saved in:
Bibliographic Details
Main Author: d'Ovidio, Francesco Domenico (auth)
Other Authors: D'Uggento, Angela Maria (auth), mancarella, rossana (auth), TOMA, Ernesto (auth)
Format: Electronic Book Chapter
Language:English
Published: Florence Firenze University Press 2021
Series:Proceedings e report 132
Subjects:
Online Access:OAPEN Library: download the publication
OAPEN Library: description of the publication
Tags: Add Tag
No Tags, Be the first to tag this record!

MARC

LEADER 00000naaaa2200000uu 4500
001 oapen_2024_20_500_12657_56371
005 20220601
003 oapen
006 m o d
007 cr|mn|---annan
008 20220601s2021 xx |||||o ||| 0|eng d
020 |a 978-88-5518-461-8.24 
020 |a 9788855184618 
040 |a oapen  |c oapen 
024 7 |a 10.36253/978-88-5518-461-8.24  |c doi 
041 0 |a eng 
042 |a dc 
100 1 |a d'Ovidio, Francesco Domenico  |4 auth 
700 1 |a D'Uggento, Angela Maria  |4 auth 
700 1 |a mancarella, rossana  |4 auth 
700 1 |a TOMA, Ernesto  |4 auth 
245 1 0 |a Chapter Post-stratification as a tool for enhancing the predictive power of classification methods 
260 |a Florence  |b Firenze University Press  |c 2021 
300 |a 1 electronic resource (6 p.) 
336 |a text  |b txt  |2 rdacontent 
337 |a computer  |b c  |2 rdamedia 
338 |a online resource  |b cr  |2 rdacarrier 
490 1 |a Proceedings e report  |v 132 
506 0 |a Open Access  |2 star  |f Unrestricted online access 
520 |a It is well known that, in classification problems, the predictive capacity of any decision-making model decreases rapidly with increasing asymmetry of the target variable (Sonquist et al., 1973; Fielding 1977). In particular, in segmentation analysis with a categorical target variable, very poor improvements of purity are obtained when the least represented modality counts less than 1/4 of the cases of the most represented modality. The same problem arises with other (theoretically more exhaustive) techniques such as Artificial Neural Networks. Actually, the optimal situation for classification analyses is the maximum uncertainty, that is, equidistribution of the target variable. Some classification techniques are more robust, by using, for example, the less sensitive logit transformation of the target variable (Fabbris & Martini 2002); however, also the logit transformation is strongly affected by the distributive asymmetry of the target variable. In this paper, starting from the results of a direct survey in which the target (binary) variable was extremely asymmetrical (10% vs. 90%, or greater asymmetry), we noted that also the logit model with the most significant parameters had very reduced fitting measures and almost zero predictive power. To solve this predictive issue, we tested post-stratification techniques, artificially symmetrizing a training sample. In this way, a substantially increase of fitting and predictive capacity was achieved, both in the symmetrized sample and, above all, in the original sample. In conclusion of the paper, an application of the same technique to a dataset of very different nature and size is described, demonstrating that the method is stable even in the case of analysis executed with all data of a population. 
540 |a Creative Commons  |f https://creativecommons.org/licenses/by/4.0/  |2 cc  |4 https://creativecommons.org/licenses/by/4.0/ 
546 |a English 
653 |a Classification 
653 |a Asymmetry 
653 |a Post-stratification 
653 |a Predictive power 
773 1 0 |7 nnaa 
856 4 0 |a www.oapen.org  |u https://library.oapen.org/bitstream/id/faabbd7d-2256-4901-bd39-1e9157a587fc/26242.pdf  |7 0  |z OAPEN Library: download the publication 
856 4 0 |a www.oapen.org  |u https://library.oapen.org/handle/20.500.12657/56371  |7 0  |z OAPEN Library: description of the publication