Press "Enter" to skip to content

How to Convert Continuous variables into Categorical by Creating Bins

[This article was first published on R – Predictive Hacks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)


Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.

A very common task in data processing is the transformation of the numeric variables (continuous, discrete etc) to categorical by creating bins. For example, is quite ofter to convert the age to the age group. Let’s see how we can easily do that in R.

We will consider a random variable from the Poisson distribution with parameter λ=20

library(dplyr)
# Generate 1000 observations from the Poisson distribution # with lambda equal to 20
df
How to Convert Continuous variables into Categorical by Creating Bins 1

Create specific Bins

Let’s say that you want to create the following bins:

  • Bin 1: (-inf, 15]
  • Bin 2: (15,25]
  • Bin 3: (25, inf)

We can easily do that using the cut command. Let’s start:

df%mutate(MySpecificBins = cut(MyContinuous, breaks = c(-Inf,15,25,Inf)))
head(df,10) 
How to Convert Continuous variables into Categorical by Creating Bins 2

Let’s have a look at the counts of each bin.

df%>%group_by(MySpecificBins)%>%count() 
How to Convert Continuous variables into Categorical by Creating Bins 3

Notice that you can define also you own labels within the cut function.


Create Bins based on Quantiles

Let’s say that you want each bin to have the same number of observations, like for example 4 bins of an equal number of observations, i.e. 25% each. We can easily do it as follows:

numbers_of_bins = 4 df%mutate(MyQuantileBins = cut(MyContinuous, breaks = unique(quantile(MyContinuous,probs=seq.int(0,1, by=1/numbers_of_bins))), include.lowest=TRUE)) head(df,10) 
How to Convert Continuous variables into Categorical by Creating Bins 4

We can check the MyQuantileBins if contain the same number of observations, and also to look at their ranges:

df%>%group_by(MyQuantileBins)%>%count() 
How to Convert Continuous variables into Categorical by Creating Bins 5

Notice that in case that you want to split your continuous variable into bins of equal size you can also use the ntile function of the dplyr package, but it does not create labels of the bins based on the ranges.

Be First to Comment

Leave a Reply

Your email address will not be published.