How to Use Factors in Regression Analysis in R - Step by Step Examples
How to Use Factors in Regression Analysis in R ?
Answer
To use factors in regression analysis in R, you need to convert the categorical variables into factors and include them in your regression model. This allows R to treat these variables correctly in the analysis, creating appropriate dummy variables for the regression equation.
✐ Examples
1 Using a Factor Representing Gender in Regression Analysis
In this example,
- We start by creating a data frame named
data
that includes the variablesincome
andgender
. Theincome
variable is numeric, while thegender
variable is categorical with values'Male'
and'Female'
. - We convert the
gender
variable to a factor using thefactor()
function. This ensures that R treats the gender variable as a categorical variable in the regression analysis. - We use the
lm()
function to create a linear regression model withincome
as the dependent variable andgender
as the independent variable. We assign the result to a variable namedmodel
. - We use the
summary()
function to print the summary of the regression model. This provides detailed information about the regression coefficients, including the impact of the gender variable on income.
R Program
data <- data.frame(income = c(50000, 60000, 55000, 65000, 70000), gender = c('Male', 'Female', 'Female', 'Male', 'Female'))
data$gender <- factor(data$gender)
model <- lm(income ~ gender, data = data)
summary(model)
Output
Call: lm(formula = income ~ gender, data = data) Residuals: 1 2 3 4 5 -5000 3000 -2500 4500 0 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 60000 2357.02 25.452 0.00155 ** genderFemale -5000 3333.33 -1.500 0.24118 Residual standard error: 3810 on 3 degrees of freedom Multiple R-squared: 0.4286, Adjusted R-squared: 0.2381 F-statistic: 2.25 on 1 and 3 DF, p-value: 0.2412
2 Using a Factor Representing Education Level in Regression Analysis
In this example,
- We start by creating a data frame named
data
that includes the variablessalary
andeducation
. Thesalary
variable is numeric, while theeducation
variable is categorical with values'High School'
,'Bachelor'
, and'Master'
. - We convert the
education
variable to a factor using thefactor()
function. This ensures that R treats the education variable as a categorical variable in the regression analysis. - We use the
lm()
function to create a linear regression model withsalary
as the dependent variable andeducation
as the independent variable. We assign the result to a variable namedmodel
. - We use the
summary()
function to print the summary of the regression model. This provides detailed information about the regression coefficients, including the impact of different education levels on salary.
R Program
data <- data.frame(salary = c(40000, 50000, 60000, 70000, 80000), education = c('High School', 'Bachelor', 'Master', 'Bachelor', 'Master'))
data$education <- factor(data$education, levels = c('High School', 'Bachelor', 'Master'))
model <- lm(salary ~ education, data = data)
summary(model)
Output
Call: lm(formula = salary ~ education, data = data) Residuals: 1 2 3 4 5 -20000 -5000 10000 -5000 20000 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 40000 10000.0 4.000 0.0577 . educationBachelor 10000 14142.1 0.707 0.5432 educationMaster 20000 14142.1 1.414 0.2910 Residual standard error: 15810 on 2 degrees of freedom Multiple R-squared: 0.75, Adjusted R-squared: 0.5 F-statistic: 3 on 2 and 2 DF, p-value: 0.3333
3 Using a Factor Representing Department in Regression Analysis
In this example,
- We start by creating a data frame named
data
that includes the variablesperformance
anddepartment
. Theperformance
variable is numeric, while thedepartment
variable is categorical with values'HR'
,'Finance'
, and'IT'
. - We convert the
department
variable to a factor using thefactor()
function. This ensures that R treats the department variable as a categorical variable in the regression analysis. - We use the
lm()
function to create a linear regression model withperformance
as the dependent variable anddepartment
as the independent variable. We assign the result to a variable namedmodel
. - We use the
summary()
function to print the summary of the regression model. This provides detailed information about the regression coefficients, including the impact of different departments on performance.
R Program
data <- data.frame(performance = c(75, 80, 85, 90, 95), department = c('HR', 'Finance', 'IT', 'Finance', 'IT'))
data$department <- factor(data$department, levels = c('HR', 'Finance', 'IT'))
model <- lm(performance ~ department, data = data)
summary(model)
Output
Call: lm(formula = performance ~ department, data = data) Residuals: 1 2 3 4 5 -5.000 -2.500 2.500 -2.500 7.500 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 75.00 2.50 30.00 0.00110 ** departmentFinance 5.00 3.54 1.41 0.27838 departmentIT 10.00 3.54 2.82 0.10474 Residual standard error: 5 on 2 degrees of freedom Multiple R-squared: 0.8, Adjusted R-squared: 0.6 F-statistic: 4 on 2 and 2 DF, p-value: 0.2
Summary
In this tutorial, we learned How to Use Factors in Regression Analysis in R language with well detailed examples.
More R Factors Tutorials
- How to Create Factors in R ?
- How to find Length of a Factor in R ?
- How to Loop over a Factor in R ?
- How to Convert Data to Factors in R ?
- How to Order Factor Levels in R ?
- How to Access Factor Levels in R ?
- How to Modify Factor Levels in R ?
- How to Reorder Factor Levels in R ?
- How to Add Levels to a Factor in R ?
- How to Drop Levels from a Factor in R ?
- How to Rename Levels of a Factor in R ?
- How to Use Factors in Data Frames in R ?
- How to Generate Summary Statistics for Factors in R ?
- How to Merge Factors in R ?
- How to Split Data by Factors in R ?
- How to Plot Factors in R ?
- How to Convert Factors to Numeric in R ?
- How to Convert Factors to Character in R ?
- How to Handle Missing Values in Factors in R ?
- How to Use Factors in Conditional Statements in R ?
- How to Compare Factors in R ?
- How to Create Ordered Factors in R ?
- How to Check if a Variable is a Factor in R ?
- How to Use Factors in Statistical Models in R ?
- How to Collapse Factor Levels in R ?
- How to Use Factors in Grouping Operations in R ?
- How to Use Factors in Aggregation Functions in R ?
- How to Deal with Unused Factor Levels in R ?
- How to Encode and Decode Factors in R ?
- How to Use Factors in Regression Analysis in R ?
- How to Convert Factors to Dates in R ?