You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
166 lines
7.1 KiB
Plaintext
166 lines
7.1 KiB
Plaintext
7 months ago
|
.-
|
||
|
help for ^violin^ (STB-46: gr33)
|
||
|
.-
|
||
|
|
||
|
Violin plots
|
||
|
------------
|
||
|
|
||
|
^violin^ varlist [weight] [^if^ exp] [^in^ range]
|
||
|
[^,^ {^bi^weight|^cos^ine|^ep^an|^gau^ss|^par^zen|^rec^tangle|^tri^angle}
|
||
|
^n(^#^) w^idth^(^#^) by(^byvar^) tru^ncat^(^#,#|*^) ro^und^(^#^)^
|
||
|
graph_options ]
|
||
|
|
||
|
^fweights^ and ^aweights^ are allowed; see ^help^ @weights@.
|
||
|
|
||
|
|
||
|
Description
|
||
|
-----------
|
||
|
|
||
|
^violin^ produces violin plots, a graphical box plot--kernel density synergism.
|
||
|
The violin plot combines the basic summary statistics of a box plot with the
|
||
|
visual information provided by a local density estimator. The goal is to
|
||
|
reveal the distributional structure in a variable. Much like a traditional
|
||
|
box plot, the violin plot displays the median as a short horizontal line, the
|
||
|
first-to-third interquartile range as a narrow shaded box, and the lower-to-
|
||
|
upper adjacent value range as a vertical line, but it does not plot outside
|
||
|
values. Instead, it "boxes" the data with mirrored density curves and labels
|
||
|
the y-axis at the minimum, median and maximum observed data values.
|
||
|
|
||
|
^violin^ also lists basic descriptive statistics about the data (i.e., the
|
||
|
lower and upper adjacent values, the 25th and 75th centiles, the minimum,
|
||
|
median and maximum of the data, and the sample size) and it provides
|
||
|
information about the density estimation (i.e., the kernel method used, the
|
||
|
number of points of estimation, and the resulting scale and width factors).
|
||
|
When ^by()^ is specified, descriptive statistics are displayed for the combined
|
||
|
group only. When multiple variables are included in varlist, statistics are
|
||
|
displayed for the last variable only.
|
||
|
|
||
|
^violin^ discards observations on an casewise basis as a function of 1) missing
|
||
|
data and 2) the ^if^ (or ^in^) specification (i.e, it ignores the entire
|
||
|
observation). This behavior may lead to unexpected results when multiple
|
||
|
variables are in the varlist.
|
||
|
|
||
|
Note: ^violin^ calls ^centile^ to compute the needed centiles but ^centile^ does
|
||
|
not respond to a ^[weight]^ specification. This conflicts with the
|
||
|
^kdensity^ code which responds to that specification. The implications of
|
||
|
this conflict have not been explored, but ^violin^ currently allows the the
|
||
|
^[weight]^ specification to be passed through to ^kdensity^.
|
||
|
|
||
|
Note: ^violin^ uses a low-level ^gph^ command which is not supported in Stata's
|
||
|
release 2 ^.gph^ format. As a result neither ^Stage^ nor the ^gphdot^ or
|
||
|
^gphpen^ DOS-based graphics output programs can process a saved violin-plot
|
||
|
graphics file. This limitation does not affect screen display or output
|
||
|
using the ^Print Graph^ option of Stata's ^File^ menu.
|
||
|
|
||
|
|
||
|
Options
|
||
|
-------
|
||
|
|
||
|
^biweight^, ^cosine^, ..., ^triangle^ specify the kernel. By default, ^epan^, the
|
||
|
Epanechnikov kernel, is used.
|
||
|
|
||
|
^n(^#^)^ specifies the number of points at which density estimates will be
|
||
|
evaluated. The default is 50.
|
||
|
|
||
|
^width(^#^)^ specifies the halfwidth of the kernel, the width of the density
|
||
|
window around each point. If ^width()^ is not specified, then the "optimal"
|
||
|
width is used; see ^[R] kdensity^. For multimodal and highly skewed
|
||
|
densities, the "optimal" width is usually too wide and oversmooths the
|
||
|
density.
|
||
|
|
||
|
^by(^byvar^)^ produces separate plots for the groups of observations defined by
|
||
|
byvar and displays them in a single graph having common vertical scale.
|
||
|
^by()^ cannot be specified when there is more than one variable in the
|
||
|
varlist.
|
||
|
|
||
|
^truncat(^#^,^#|^*)^ limits the range of the density trace, either to a range
|
||
|
specified as ^(^#^,^#^)^, or to the observed data limits, specified as ^(*)^.
|
||
|
Regardless of the actual ^(^#^,^#^)^ specification, the maximum range truncation
|
||
|
honored is the observed data limits. The precise truncation points will
|
||
|
be the most extreme points within the specified range where the density is
|
||
|
calculated (the points of density calculation depend on ^n()^, ^width()^
|
||
|
and the observed data).
|
||
|
|
||
|
^round(^#^)^ rounds the y-axis numeric labels to the value specified. As a result,
|
||
|
the labels and their corresponding tic marks may not be placed at the true
|
||
|
minimum, median, or maximum values, rather they will be at the rounded
|
||
|
values. ^round()^ has no effect if ^ylabel^ is specified without arguments,
|
||
|
but is operative if ^ylabel^ is not specified or is specified with arguments.
|
||
|
The ^round()^ option follows the rules of Stata's ^round(^x^,^y^)^ function, with
|
||
|
# being the y argument and each label value being the x argument;
|
||
|
see ^[U] 20.3.5 Special functions^.
|
||
|
|
||
|
graph_options are any of the options allowed by ^graph, twoway^ except ^b2title()^
|
||
|
(which is ignored); see ^help^ @graph@. Some options are preset and, although
|
||
|
changeable, usually should not be modified. These include ^symbol(i)^ and
|
||
|
^connect(l)^ for specifying the plotting symbol and point connection method
|
||
|
for the density curve. In addition, ^ylabel()^ is preset to label only the
|
||
|
minimum, median and maximum points. ^t1title(Violin Plot)^ is preset but can
|
||
|
be changed--except when ^by()^ is specified; in this instance ^t1title^ is used
|
||
|
for the variable name or label. When changeable, use of ^t1title(.)^ will
|
||
|
result in a blank title. Other preset options, such as ^pen(2)^ for the
|
||
|
plot pen color, are intended to be freely changed to suit user preference.
|
||
|
A few options, such as the left and right titles, are set (or default to)
|
||
|
blank. If specified, they appear beside each plot in a multi-variable
|
||
|
graph. Lastly, the ^saving()^ option differs slightly from ^graph^'s in
|
||
|
that the filename extension is always ^.gph^ and must not be specified.
|
||
|
|
||
|
|
||
|
Saved values
|
||
|
------------
|
||
|
|
||
|
S_1 name of kernel used for density trace
|
||
|
S_2 number of points of density estimation
|
||
|
S_3 band width for density estimation
|
||
|
S_4 scale factor of density plot
|
||
|
S_5 minimum
|
||
|
S_6 lower adjacent value
|
||
|
S_7 first quartile
|
||
|
S_8 median
|
||
|
S_9 third quartile
|
||
|
S_10 upper adjacent value
|
||
|
S_11 maximum
|
||
|
S_12 n
|
||
|
|
||
|
When ^by()^ is specified: S_3 and S_4 contain the averages of the band width and
|
||
|
scale factors used in the subgroup density estimations; S_5, S_7, S_8, S_9,
|
||
|
S_11 and S_12 are statistics for the combined group; and S_6 and S_10 are set
|
||
|
missing.
|
||
|
|
||
|
When multiple variables are specified, the saved values contain results for
|
||
|
the last variable in the varlist.
|
||
|
|
||
|
|
||
|
Examples
|
||
|
--------
|
||
|
|
||
|
. ^violin length, t1(Auto data) l1(length of car)^
|
||
|
|
||
|
. ^violin length weight, n(100) w(20)^
|
||
|
|
||
|
. ^violin weight, by(foreign) parzen^
|
||
|
|
||
|
|
||
|
Author
|
||
|
------
|
||
|
|
||
|
Thomas J. Steichen
|
||
|
RJRT
|
||
|
steicht@@rjrt.com
|
||
|
|
||
|
|
||
|
Reference
|
||
|
---------
|
||
|
|
||
|
Hintze, J. L. and R. D. Nelson (1998). "Violin plots: a box plot-density trace
|
||
|
synergism." The American Statistician, 52(2):181-4.
|
||
|
|
||
|
|
||
|
Also see
|
||
|
--------
|
||
|
|
||
|
STB: gr33 (STB-46)
|
||
|
Manual: ^[R] kdensity^, ^[R] graph box^, ^[R] centile^
|
||
|
^[U] 20.3.5 Special functions^
|
||
|
On-line: help for @kdensity@, @graph@, @centile@, @functions@
|