.- help for ^violin^ (STB-46: gr33) .- Violin plots ------------ ^violin^ varlist [weight] [^if^ exp] [^in^ range] [^,^ {^bi^weight|^cos^ine|^ep^an|^gau^ss|^par^zen|^rec^tangle|^tri^angle} ^n(^#^) w^idth^(^#^) by(^byvar^) tru^ncat^(^#,#|*^) ro^und^(^#^)^ graph_options ] ^fweights^ and ^aweights^ are allowed; see ^help^ @weights@. Description ----------- ^violin^ produces violin plots, a graphical box plot--kernel density synergism. The violin plot combines the basic summary statistics of a box plot with the visual information provided by a local density estimator. The goal is to reveal the distributional structure in a variable. Much like a traditional box plot, the violin plot displays the median as a short horizontal line, the first-to-third interquartile range as a narrow shaded box, and the lower-to- upper adjacent value range as a vertical line, but it does not plot outside values. Instead, it "boxes" the data with mirrored density curves and labels the y-axis at the minimum, median and maximum observed data values. ^violin^ also lists basic descriptive statistics about the data (i.e., the lower and upper adjacent values, the 25th and 75th centiles, the minimum, median and maximum of the data, and the sample size) and it provides information about the density estimation (i.e., the kernel method used, the number of points of estimation, and the resulting scale and width factors). When ^by()^ is specified, descriptive statistics are displayed for the combined group only. When multiple variables are included in varlist, statistics are displayed for the last variable only. ^violin^ discards observations on an casewise basis as a function of 1) missing data and 2) the ^if^ (or ^in^) specification (i.e, it ignores the entire observation). This behavior may lead to unexpected results when multiple variables are in the varlist. Note: ^violin^ calls ^centile^ to compute the needed centiles but ^centile^ does not respond to a ^[weight]^ specification. This conflicts with the ^kdensity^ code which responds to that specification. The implications of this conflict have not been explored, but ^violin^ currently allows the the ^[weight]^ specification to be passed through to ^kdensity^. Note: ^violin^ uses a low-level ^gph^ command which is not supported in Stata's release 2 ^.gph^ format. As a result neither ^Stage^ nor the ^gphdot^ or ^gphpen^ DOS-based graphics output programs can process a saved violin-plot graphics file. This limitation does not affect screen display or output using the ^Print Graph^ option of Stata's ^File^ menu. Options ------- ^biweight^, ^cosine^, ..., ^triangle^ specify the kernel. By default, ^epan^, the Epanechnikov kernel, is used. ^n(^#^)^ specifies the number of points at which density estimates will be evaluated. The default is 50. ^width(^#^)^ specifies the halfwidth of the kernel, the width of the density window around each point. If ^width()^ is not specified, then the "optimal" width is used; see ^[R] kdensity^. For multimodal and highly skewed densities, the "optimal" width is usually too wide and oversmooths the density. ^by(^byvar^)^ produces separate plots for the groups of observations defined by byvar and displays them in a single graph having common vertical scale. ^by()^ cannot be specified when there is more than one variable in the varlist. ^truncat(^#^,^#|^*)^ limits the range of the density trace, either to a range specified as ^(^#^,^#^)^, or to the observed data limits, specified as ^(*)^. Regardless of the actual ^(^#^,^#^)^ specification, the maximum range truncation honored is the observed data limits. The precise truncation points will be the most extreme points within the specified range where the density is calculated (the points of density calculation depend on ^n()^, ^width()^ and the observed data). ^round(^#^)^ rounds the y-axis numeric labels to the value specified. As a result, the labels and their corresponding tic marks may not be placed at the true minimum, median, or maximum values, rather they will be at the rounded values. ^round()^ has no effect if ^ylabel^ is specified without arguments, but is operative if ^ylabel^ is not specified or is specified with arguments. The ^round()^ option follows the rules of Stata's ^round(^x^,^y^)^ function, with # being the y argument and each label value being the x argument; see ^[U] 20.3.5 Special functions^. graph_options are any of the options allowed by ^graph, twoway^ except ^b2title()^ (which is ignored); see ^help^ @graph@. Some options are preset and, although changeable, usually should not be modified. These include ^symbol(i)^ and ^connect(l)^ for specifying the plotting symbol and point connection method for the density curve. In addition, ^ylabel()^ is preset to label only the minimum, median and maximum points. ^t1title(Violin Plot)^ is preset but can be changed--except when ^by()^ is specified; in this instance ^t1title^ is used for the variable name or label. When changeable, use of ^t1title(.)^ will result in a blank title. Other preset options, such as ^pen(2)^ for the plot pen color, are intended to be freely changed to suit user preference. A few options, such as the left and right titles, are set (or default to) blank. If specified, they appear beside each plot in a multi-variable graph. Lastly, the ^saving()^ option differs slightly from ^graph^'s in that the filename extension is always ^.gph^ and must not be specified. Saved values ------------ S_1 name of kernel used for density trace S_2 number of points of density estimation S_3 band width for density estimation S_4 scale factor of density plot S_5 minimum S_6 lower adjacent value S_7 first quartile S_8 median S_9 third quartile S_10 upper adjacent value S_11 maximum S_12 n When ^by()^ is specified: S_3 and S_4 contain the averages of the band width and scale factors used in the subgroup density estimations; S_5, S_7, S_8, S_9, S_11 and S_12 are statistics for the combined group; and S_6 and S_10 are set missing. When multiple variables are specified, the saved values contain results for the last variable in the varlist. Examples -------- . ^violin length, t1(Auto data) l1(length of car)^ . ^violin length weight, n(100) w(20)^ . ^violin weight, by(foreign) parzen^ Author ------ Thomas J. Steichen RJRT steicht@@rjrt.com Reference --------- Hintze, J. L. and R. D. Nelson (1998). "Violin plots: a box plot-density trace synergism." The American Statistician, 52(2):181-4. Also see -------- STB: gr33 (STB-46) Manual: ^[R] kdensity^, ^[R] graph box^, ^[R] centile^ ^[U] 20.3.5 Special functions^ On-line: help for @kdensity@, @graph@, @centile@, @functions@