devarena logo
Reading Time: 3 minutes


A lineplot is a good choice when you want to visualize two continuous variables for X and Y axis. You can add three more dimensions in terms of color, size, style.

Here, we will see:

  1. How to plot a basic lineplot in seaborn
  2. What is the shaded region in seaborn lineplot?
  3. Calculate confidence interval example (shown as shaded region in seaborn line plot)

Syntax:

seaborn.lineplot(data=data, x=”column_name”, y=”column_name”)

You can pass your dataset to the data parameter.

Example

1. Plot a Basic Lineplot

I am using iris dataset for this demo. First we will import the libraries, and then load the dataset.

import seaborn as sns
sns.set_theme(style="darkgrid")

“set_theme” will only work in seaborn 0.11 and above. Please ignore this line if you are using an older version of seaborn.

iris = sns.load_dataset("iris")
iris.head()
iris dataset sample rows

Next, I am plotting sepal length vs petal length as a line plot.

sns.lineplot(data=iris, x="sepal_length", y="petal_length")
seaborn basic lineplot
seaborn basic lineplot

To plot without the shaded region use ci=None

ci parameter is the size of the confidence interval. For more about this, refer the next section below (What is the shaded region in seaborn lineplot?).

Set ci=None to plot without the shaded region in a seaborn lineplot.

sns.lineplot(data=iris, x="sepal_length", y="petal_length", ci=None)
seaborn lineplot without confidence interval shaded region
seaborn lineplot without confidence interval shaded region

2. What is the shaded region in seaborn lineplot?

This is quoted from seaborn documentation:

“By default, the plot aggregates over multiple y values at each value of x and shows an estimate of the central tendency and a confidence interval for that estimate.” (confidence interval is 95%)

If it sounds confusing, don’t worry. Let us solve it for ourselves, as given below.

3. Calculate confidence interval example

For example, in the above plot, let’s take a value 5.0 in the X-axis. It is occurs 10 times in the dataset. Those 10 rows for sepal_length=5.0 are shown below.

iris dataset where sepal length is 5
iris dataset where sepal length is 5

As we are plotting sepal length vs petal length, there would be 10 values of petal_length for sepal_length=5.0. Those 10 values are [1.4, 1.5, 1.6, 1.6, 1.2, 1.3, 1.6, 1.4, 3.5, 3.3].

The formula for confidence interval is

confidence interval formula
confidence interval

As we are using 95% confidence, we need to calculate the z number for 0.95. Found a nice explanation in this video for z number calculation. Watch from minute 1.00 to 2.40. For 0.95 confidence, the z number is found to be 1.96 as shown in that video.

Let us calculate mean and standard deviation first.

import numpy as np
val = [1.4, 1.5, 1.6, 1.6, 1.2, 1.3, 1.6, 1.4, 3.5, 3.3]
m = np.mean(val)
sd = np.std(val)
print ("Mean:", m)
print ("SD:", sd)
Output:
Mean: 1.8399999999999999
SD: 0.7914543574963752

Now, using the values of mean, standard deviation, z value, n in the confidence interval formula (it is already mentioned above).

ci_low = m - (1.96*sd / np.sqrt(10))
ci_high = m + (1.96*sd / np.sqrt(10))
print(ci_low, ci_high)
Output: Confidence interval range is from 1.35 to 2.33

Now go back to the previous plot and see the shaded area corresponding to the value 5.0 in x-axis.

You can see that the shaded region ranges from 1.35 to 2.33 approximately as per our calculation.

If you are following along till now, then great job!

For complete list of seaborn lineplot parameters, please refer to the seaborn documentation.

Further Reading

Thanks for reading my post. If you find this helpful, please consider following this website on Youtube / Facebook / Twitter / Linkedin.

(Image by David Mark from Pixabay)





Source link

Spread the Word!