Analyze tweets with RFeature Selection in R -- Removing Extraneous FeaturesObject-Oriented Programming in RProbability Distributions with RR *apply family of functions (functionals)R .RprofileR Aggregating data framesR ANOVAR Arima ModelsR Arithmetic OperatorsR Bar ChartR Base PlottingR Bibliography in RMDR boxplotR caretR ClassesR Cleaning dataR Code profilingR code vectorization best practicesR CoercionR Color schemes for graphicsR Column wise operationR CombinatoricsR Control flow structuresR Creating packages with devtoolsR Creating reports with RMarkdownR Creating vectorsR Data acquisitionR Data framesR data.tableR Date and TimeR Date-time classes (POSIXct and POSIXlt)R DebuggingR Distribution FunctionsR dplyrR Expression: parse + evalR Extracting and Listing Files in Compressed ArchivesR FactorsR Fault-tolerant/resilient codeR FormulaR Fourier Series and TransformationsR Functional programmingR Generalized linear modelsR Get user inputR ggplot2R GPU-accelerated computingR HashmapsR heatmap and heatmap.2R Hierarchical clustering with hclustR Hierarchical Linear ModelingR I/O for database tablesR I/O for foreign tables (Excel, SAS, SPSS, Stata)R I/O for geographic data (shapefiles, etc.)R I/O for raster imagesR I/O for Rs binary formatR Implement State Machine Pattern using S4 ClassR in LaTeX with knitrR Input and outputR Inspecting packagesR Installing packagesR IntroductionR Introduction to Geographical MapsR IntrospectionR JSONR Linear Models (Regression)R ListsR lubridateR Machine learningR Markdown Notebooks (from RStudio)R MatricesR memento by examplesR Meta: Documentation GuidelinesR Missing valuesR Modifying strings by substitutionR Natural language processingR Network analysis with the igraph packageR Non-standard evaluation and standard evaluationR Numeric classes and storage modesR Parallel processingR Pattern Matching and ReplacementR Performing a Permutation TestR Pipe operatorsR Pivot and unpivot with data.tableR PublishingR Random Forest AlgorithmR Random Numbers GeneratorR RandomizationR Raster and Image AnalysisR RcppR Reading and writing strings

R boxplot

From WikiOD

Syntax[edit | edit source]

  • boxplot(x, ...) # generic function
  • boxplot(formula, data = NULL, ..., subset, na.action = NULL) ## S3 method for class 'formula'
  • boxplot(x, ..., range = 1.5, width = NULL, varwidth = FALSE, notch = FALSE, outline = TRUE, names, plot = TRUE, border = par("fg"), col = NULL, log = "", pars = list(boxwex = 0.8, staplewex = 0.5, outwex = 0.5), horizontal = FALSE, add = FALSE, at = NULL) ## Default S3 method

Parameters[edit | edit source]

Parameters Details (source R Documentation)
formula a formula, such as y ~ grp, where y is a numeric vector of data values to be split into groups according to the grouping variable grp (usually a factor).
data a data.frame (or list) from which the variables in formula should be taken.
subset an optional vector specifying a subset of observations to be used for plotting.
na.action a function which indicates what should happen when the data contain NAs. The default is to ignore missing values in either the response or the group.
boxwex a scale factor to be applied to all boxes. When there are only a few groups, the appearance of the plot can be improved by making the boxes narrower.
plot if TRUE (the default) then a boxplot is produced. If not, the summaries which the boxplots are based on are returned.
col if col is non-null it is assumed to contain colors to be used to colour the bodies of the box plots. By default they are in the background colour.

Create a box-and-whisker plot with boxplot() {graphics}[edit | edit source]

This example use the default boxplot() function and the irisdata frame.

> head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

Simple boxplot (Sepal.Length)[edit | edit source]

Create a box-and-whisker graph of a numerical variable

boxplot(iris[,1],xlab="Sepal.Length",ylab="Length(in centemeters)",
           main="Summary Charateristics of Sepal.Length(Iris Data)")


Boxplot of sepal length grouped by species[edit | edit source]

Create a boxplot of a numerical variable grouped by a categorical variable

boxplot(Sepal.Length~Species,data = iris)


Bring order[edit | edit source]

To change order of the box in the plot you have to change the order of the categorical variable's levels.

For example if we want to have the order virginica - versicolor - setosa

newSpeciesOrder <- factor(iris$Species, levels=c("virginica","versicolor","setosa"))
boxplot(Sepal.Length~newSpeciesOrder,data = iris)


Change groups names[edit | edit source]

If you want to specifie a better name to your groups you can use the Names parameter. It take a vector of the size of the levels of categorical variable

boxplot(Sepal.Length~newSpeciesOrder,data = iris,names= c("name1","name2","name3"))


Small improvements[edit | edit source]

Color[edit | edit source]

col : add a vector of the size of the levels of categorical variable

boxplot(Sepal.Length~Species,data = iris,col=c("green","yellow","orange"))


Proximity of the box[edit | edit source]

boxwex: set the margin between boxes.

Left boxplot(Sepal.Length~Species,data = iris,boxwex = 0.1)

Right boxplot(Sepal.Length~Species,data = iris,boxwex = 1)


See the summaries which the boxplots are based plot[edit | edit source]


To see a summary you have to put the paramater plot to FALSE.

Various results are given

> boxplot(Sepal.Length~newSpeciesOrder,data = iris,plot=FALSE)
$stats #summary of the numerical variable for the 3 groups
     [,1] [,2] [,3]
[1,]  5.6  4.9  4.3 # extreme value 
[2,]  6.2  5.6  4.8 # first quartile limit
[3,]  6.5  5.9  5.0 # median limit
[4,]  6.9  6.3  5.2 # third quartile limit
[5,]  7.9  7.0  5.8 # extreme value

$n #number of observations in each groups
[1] 50 50 50

$conf #extreme value of the notchs
         [,1]     [,2]     [,3]
[1,] 6.343588 5.743588 4.910622
[2,] 6.656412 6.056412 5.089378

$out #extreme value
[1] 4.9

$group #group in which are the extreme value
[1] 1

$names #groups names
[1] "virginica"  "versicolor" "setosa"

Additional boxplot style parameters.[edit | edit source]

Box[edit | edit source]

  • boxlty - box line type
  • boxlwd - box line width
  • boxcol - box line color
  • boxfill - box fill colors

Median[edit | edit source]

  • medlty - median line type ("blank" for no line)
  • medlwd - median line widht
  • medcol - median line color
  • medpch - median point (NA for no symbol)
  • medcex - median point size
  • medbg - median point background color

Whisker[edit | edit source]

  • whisklty - whisker line type
  • whisklwd - whisker line width
  • whiskcol - whisker line color

Staple[edit | edit source]

  • staplelty - staple line type
  • staplelwd - staple line width
  • staplecol - staple line color

Outliers[edit | edit source]

  • outlty - outlier line type ("blank" for no line)
  • outlwd - outlier line width
  • outcol - outlier line color
  • outpch - outlier point type (NA for no symbol)
  • outcex - outlier point size
  • outbg - outlier point background color

Example[edit | edit source]

Default and heavily modified plots side by side

# Default
boxplot(Sepal.Length ~ Species, data=iris)
# Modified
boxplot(Sepal.Length ~ Species, data=iris,
        boxlty=2, boxlwd=3, boxfill="cornflowerblue", boxcol="darkblue",
        medlty=2, medlwd=2, medcol="red", medpch=21, medcex=1, medbg="white",
        whisklty=2, whisklwd=3, whiskcol="darkblue",
        staplelty=2, staplelwd=2, staplecol="red",
        outlty=3, outlwd=3, outcol="grey", outpch=NA