Graphical Models

Model selection is the task of selecting a statistical model from a set of candidate models, given data. In the simplest cases, a pre-existing set of data is considered.

There are two main objectives in inference and learning from data. One is for scientific discovery, understanding of the underlying data-generating mechanism, and interpretation of the nature of the data. Another objective of learning from data is for predicting future or unseen observations. In the second objective, the data scientist does not necessarily concern an accurate probabilistic description of the data. In line with the two different objectives, model selection can also have two directions: model selection for inference and model selection for prediction.^[1] We will focus on the the first direction: model selection for inference. Model selection for inference aims to identify a model which will provide a reliable characterization of the sources of uncertainty for scientific interpretation.

The first type of method we explore to identify candidate models for model selection is the graphical method. Graphical methods in model selection can be considered as an exploratory data analysis approach which relies on statistical graphics and other data visualization methods. Among the graphical methods used in model selection, we will focus primarily on probabiity plots.

P–P Plot

Definition

A P–P plot plots two cumulative distribution functions (cdfs) against each other:^[2] given two probability distributions, with cdfs [math]F[/math] and [math]G[/math], it plots [math](F(z),G(z))[/math] as z ranges from [math]-\infty[/math] to [math]\infty.[/math] As a cdf has range [0,1], the domain of this parametric graph is [math](-\infty,\infty)[/math] and the range is the unit square [math][0,1]\times [0,1].[/math]

The comparison line is the 45° line from (0,0) to (1,1) – the distributions are equal if and only if the plot falls on this line – any deviation indicates a difference between the distributions.^[3]

Comparing the ECDF to a theoretical distribution

A classic use of P–P plots is comparing the empirical cumulative distribution function with a theoretical distribution:

P–P plot: ECDF vs Theoretical

Suppose we have a random sample [math]x_1,\ldots, x_n [/math] that is thought to be sampled from a theoretical cumulative distribution function [math]F(x)[/math].

The empirical distribution function, denoted by [math]F_n(x)[/math], evaluted at [math]x_i[/math] equals the number of data points less than or equal to [math]x_i[/math] divided by [math]n+1[/math].
The P–P plot is constructed by graphing the points [math](F(x_i)),F_n(x_i))[/math] for [math]i=1,\ldots,n[/math].

To illustrate, consider the following ordered data sample of size 15 that is thought to be sampled from a standard normal distribution:

-1.27 -1.20 -0.98 -0.91 -0.80 -0.64 -0.10  0.30  0.56  0.58  0.60  0.61 0.65  0.92  1.41

Following the procedure outlined above, we obtain the following table:

[math]x[/math]	[math]F_n(x)[/math]	[math]F(x)[/math]
-1.27	0.0625	0.102
-1.2	0.125	0.1151
-0.98	0.1875	0.1635
-0.91	0.25	0.1814
-0.8	0.3125	0.2119
-0.64	0.375	0.2611
-0.1	0.4375	0.4602
0.3	0.5	0.6179
0.56	0.5625	0.7123
0.58	0.625	0.719
0.6	0.6875	0.7257
0.61	0.75	0.7291
0.65	0.8125	0.7422
0.92	0.875	0.8212
1.41	0.9375	0.9207

Using the the table above, we obtain the following P–P plot:

The Difference Plot

Given two probability distributions, with cdfs [math]F[/math] and [math]G[/math], the difference plot graphs [math]D(z)=|G(z)-F(z)|[/math] as [math]z[/math] ranges from [math]-\infty[/math] to [math]\infty.[/math]

Comparing the ECDF to a theoretical distribution

As with the P–P plot, we can use the difference plot to compare an empirical distribution function with a theoretical distribution:

Difference plot: ECDF vs Theoretical

Suppose we have a random sample [math]x_1,\ldots, x_n [/math] that is thought to be sampled from a theoretical cumulative distribution function [math]F(x)[/math].

The empirical distribution function, denoted by [math]F_n(x)[/math], evaluted at [math]x_i[/math] equals the number of data points less than or equal to [math]x_i[/math] divided by [math]n[/math].
The difference plot is constructed by graphing the points [math]D(x_i)=|F(x_i)-F_n(x_i)|[/math] for [math]i=1,\ldots,n[/math].

To illustrate the procedure, we graph the difference plot for the same random sample given for the P–P plot. We have:

[math]x[/math]	[math]F_n(x)[/math]	[math]F(x)[/math]	[math]D(x)[/math]
-1.27	0.0667	0.102	0.0353
-1.2	0.1333	0.1151	0.0182
-0.98	0.2	0.1635	0.0365
-0.91	0.2667	0.1814	0.0853
-0.8	0.3333	0.2119	0.1214
-0.64	0.4	0.2611	0.1389
-0.1	0.4667	0.4602	0.0065
0.3	0.5333	0.6179	0.0846
0.56	0.6	0.7123	0.1123
0.58	0.6667	0.719	0.0523
0.6	0.7333	0.7257	0.0076
0.61	0.8	0.7291	0.0709
0.65	0.87	0.7422	0.1245
0.92	0.9333	0.8212	0.1121
1.41	1	0.9207	0.0793

And, using the values in the table above, we obtain the following difference plot:

References

"Model Selection Techniques: An Overview" (2018). IEEE Signal Processing Magazine 35 (6): 16–34. doi:10.1109/MSP.2018.2867638. ISSN 1053-5888.
Nonparametric statistical inference by Jean Dickinson Gibbons, Subhabrata Chakraborti, 4th Edition, CRC Press, 2003, ISBN 978-0-8247-4052-8, p. 145
"Why Welchs test is Type I error robust" (2016). The Quantitative Methods for Psychology 12 (1): 30–38. doi:10.20982/tqmp.12.1.p030.

Wikipedia References

Wikipedia contributors. "P–P plot". Wikipedia. Wikipedia. Retrieved 14 Nov 2021.

[:0-1] "Model Selection Techniques: An Overview" (2018). IEEE Signal Processing Magazine 35 (6): 16–34. doi:10.1109/MSP.2018.2867638. ISSN 1053-5888.

[2] Nonparametric statistical inference by Jean Dickinson Gibbons, Subhabrata Chakraborti, 4th Edition, CRC Press, 2003, ISBN 978-0-8247-4052-8, p. 145

[3] "Why Welchs test is Type I error robust" (2016). The Quantitative Methods for Psychology 12 (1): 30–38. doi:10.20982/tqmp.12.1.p030.

[1]

[2]

[3]