Welcome to the Bellabeat data analysis case study! Bellabeat is a high-tech manufacturer of health-focused products for women. As a junior data analyst working on the marketing analyst team at Bellabeat, I am asked to focus on one of Bellabeat’s products and analyze smart device data to gain insight into how consumers are using their smart devices.
Bellabeat is a successful small company, but they have the potential to become a larger player in the global smart device market. Analyzing smart device fitness data could help unlock new growth opportunities for the company.
In this project, we have considered a public data set, FitBit Fitness Tracker Data, made available through Mobius, to gain insights about how customers are using their smart devices. We then share our analysis through visualization to the Bellabeat executive team along with our high-level recommendations for Bellabeat’s marketing strategy.
Business task : To gain insights into how customers are using their smart devices
FitBit Fitness Tracker Data (dataset made available through Mobius): This Kaggle data set contains personal fitness tracker from thirty fitbit users. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. It includes information about daily activity, steps, and heart rate that can be used to explore users’ habits.
The data set consist of 18 tables. The table name and other relevant details are tabulated below:-
Table Name | Number of Subjects | Fields | Remarks |
---|---|---|---|
1. dailyActivity_merged | 33 Subjects with 31 observations each | “Id”, “ActivityDate”, “TotalSteps”, “TotalDistance”, “TrackerDistance”, “LoggedActivitiesDistance”, “VeryActiveDistance”, “ModeratelyActiveDistance”, “LightActiveDistance”, “SedentaryActiveDistance”, “VeryActiveMinutes”, “FairlyActiveMinutes”, “LightlyActiveMinutes”, “SedentaryMinutes”, “Calories” | The field “ActivityDate” is not in standard date format. |
2. dailyCalories_merged | 33 Subjects with 31 observations each | “Id”, “ActivityDay”, “Calories” | The field “ActivityDate” is not in standard date format. |
3. dailyIntensities_merged | 33 Subjects with 31 observations each | “Id”, “ActivityDay”, “SedentaryMinutes”, “LightlyActiveMinutes”, “FairlyActiveMinutes”, “VeryActiveMinutes”, “SedentaryActiveDistance”, “LightActiveDistance”, “ModeratelyActiveDistance”, “VeryActiveDistance” | The field “ActivityDate” is not in standard date format. |
4. dailySteps_merged | 33 Subjects with 31 observations each | “Id”, “ActivityDay”, “StepTotal” | The field “ActivityDate” is not in standard date format. |
5. heartrate_seconds_merged | 14 Subjects with 5 observations corresponding to each minute for 31 days | “Id”, “Time”, “Value” | The field “Time” is not in standard date-time format and there is a lot of missing values corresponding to certain subjects. |
6. hourlyCalories_merged | 33 Subjects with one observation corresponding to each hour for 31 days | “Id”,“ActivityHour”,“Calories” | The field “ActivityHour” is not in standard date-time format and also there are missing values corresponding to certain subjects. |
7. hourlyIntensities_merged | 33 Subjects with one observation corresponding to each hour for 31 days | “Id”, “ActivityHour”, “TotalIntensity”, “AverageIntensity” | The field “ActivityHour” is not in standard date-time format and also there are missing values corresponding to certain subjects. |
8. hourlySteps_merged | 33 Subjects with one observation corresponding to each hour for 31 days | “Id”, “ActivityHour”, “StepTotal” | The field “ActivityHour” is not in standard date-time format and also there are missing values corresponding to certain subjects. |
9. minuteCaloriesNarrow_merged | 33 Subjects with one observation corresponding to each minute for 31 days | “Id”, “ActivityMinute”, “Calories” | The field “ActivityMinute” is not in standard date-time format and also there are missing values corresponding to certain subjects. |
10. minuteCaloriesWide_merged | Same as minuteCaloriesNarrow_merged with each minute of an hour in separate columns | ||
11. minuteIntensitiesNarrow_merged | 33 Subjects with one observation corresponding to each minute for 31 days | “Id”, “ActivityMinute”, “Intensity” | The field “ActivityMinute” is not in standard date-time format and also there are missing values corresponding to certain subjects. |
12. minuteIntensitiesWide_merged | Same as minuteIntensitiesNarrow_merged with each minute of an hour in separate columns | ||
13. minuteMETsNarrow_merged | 33 Subjects with one observation corresponding to each minute for 31 days | “Id”, “ActivityMinute”, “METs” | Metabolic Equivalents (METs) is the ratio of working metabolic rate relative to resting metabolic rate. But the criteria under which METs is calculated is not clearly defined. The field “ActivityMinute” is not in standard date-time format and also there are missing values corresponding to certain subjects. |
14. minuteSleep_merged | 24 Subjects with one observation corresponding to each minute for 31 days | “Id”, “date”, “value”, “logId” | The field “value” is not clearly specified. Unable to infer anything from this table |
15. minuteStepsNarrow_merged | 33 Subjects with one observation corresponding to each minute for 31 days | “Id”, “ActivityMinute”, “Steps” | The field “ActivityMinute” is not in standard date-time format and also there are missing values corresponding to certain subjects. |
16. minuteStepsWide_merged | Same as minuteStepsNarrow_merged with each minute of an hour in separate columns | ||
17. sleepDay_merged | 24 Subjects with 31 observations each | “Id”, “SleepDay”, “TotalSleepRecords”, “TotalMinutesAsleep”, “TotalTimeInBed” | The field “SleepDay” is not in standard date format and there are multiple sleep records for certain subjects. |
18. weightLogInfo_merged | 8 Subjects | “Id”, “Date”, “WeightKg”, “WeightPounds”, “Fat”, “BMI”, “IsManualReport”, “LogId” | Very few records |
The data set consist of only 33 subjects which lead to a potential bias in our sample. It can be seen that the minutes tables are used to form hourly tables and hourly tables are used to form daily tables. Also some tables are missing certain subjects.
The data set is lacking certain important information regarding the subjects such as, age, gender, etc.
Now we move on to the process phase in data analysis. Here, we clean the data and prepare it for further analysis.
I have selected 6 tables out the 18 tables for analysis.
1. dailyActivity_merged as daily_activity
2. sleepDay_merged as daily_sleep
3. hourlyCalories_merged as hourly_calories
4. hourlyIntensities_merged as hourly_intensity
5. hourlySteps_merged as hourly_steps
6. heartrate_seconds_merged as heartrate
rm(list=ls())
# Loading required libraries
library(tidyverse)
library(plotly)
library(timetk)
#Loading the selected tables
daily_activity <- read.csv("C://Users//Annu T Poulose//Desktop//Data Analytics//Fitabase Data 4.12.16-5.12.16//dailyActivity_merged.csv")
glimpse(daily_activity)
## Rows: 940
## Columns: 15
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 150396036~
## $ ActivityDate <chr> "4/12/2016", "4/13/2016", "4/14/2016", "4/15/~
## $ TotalSteps <int> 13162, 10735, 10460, 9762, 12669, 9705, 13019~
## $ TotalDistance <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9.8~
## $ TrackerDistance <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9.8~
## $ LoggedActivitiesDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ VeryActiveDistance <dbl> 1.88, 1.57, 2.44, 2.14, 2.71, 3.19, 3.25, 3.5~
## $ ModeratelyActiveDistance <dbl> 0.55, 0.69, 0.40, 1.26, 0.41, 0.78, 0.64, 1.3~
## $ LightActiveDistance <dbl> 6.06, 4.71, 3.91, 2.83, 5.04, 2.51, 4.71, 5.0~
## $ SedentaryActiveDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ VeryActiveMinutes <int> 25, 21, 30, 29, 36, 38, 42, 50, 28, 19, 66, 4~
## $ FairlyActiveMinutes <int> 13, 19, 11, 34, 10, 20, 16, 31, 12, 8, 27, 21~
## $ LightlyActiveMinutes <int> 328, 217, 181, 209, 221, 164, 233, 264, 205, ~
## $ SedentaryMinutes <int> 728, 776, 1218, 726, 773, 539, 1149, 775, 818~
## $ Calories <int> 1985, 1797, 1776, 1745, 1863, 1728, 1921, 203~
daily_sleep <- read.csv("C://Users//Annu T Poulose//Desktop//Data Analytics//Fitabase Data 4.12.16-5.12.16//sleepDay_merged.csv")
glimpse(daily_sleep)
## Rows: 413
## Columns: 5
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 150~
## $ SleepDay <chr> "4/12/2016 12:00:00 AM", "4/13/2016 12:00:00 AM", "~
## $ TotalSleepRecords <int> 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ~
## $ TotalMinutesAsleep <int> 327, 384, 412, 340, 700, 304, 360, 325, 361, 430, 2~
## $ TotalTimeInBed <int> 346, 407, 442, 367, 712, 320, 377, 364, 384, 449, 3~
hourly_calories <- read.csv("C://Users//Annu T Poulose//Desktop//Data Analytics//Fitabase Data 4.12.16-5.12.16//hourlyCalories_merged.csv")
glimpse(hourly_calories)
## Rows: 22,099
## Columns: 3
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 150396036~
## $ ActivityHour <chr> "4/12/2016 12:00:00 AM", "4/12/2016 1:00:00 AM", "4/12/20~
## $ Calories <int> 81, 61, 59, 47, 48, 48, 48, 47, 68, 141, 99, 76, 73, 66, ~
hourly_intensity <- read.csv("C://Users//Annu T Poulose//Desktop//Data Analytics//Fitabase Data 4.12.16-5.12.16//hourlyIntensities_merged.csv")
glimpse(hourly_intensity)
## Rows: 22,099
## Columns: 4
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 15039~
## $ ActivityHour <chr> "4/12/2016 12:00:00 AM", "4/12/2016 1:00:00 AM", "4/1~
## $ TotalIntensity <int> 20, 8, 7, 0, 0, 0, 0, 0, 13, 30, 29, 12, 11, 6, 36, 5~
## $ AverageIntensity <dbl> 0.333333, 0.133333, 0.116667, 0.000000, 0.000000, 0.0~
hourly_steps <- read.csv("C://Users//Annu T Poulose//Desktop//Data Analytics//Fitabase Data 4.12.16-5.12.16//hourlySteps_merged.csv")
glimpse(hourly_steps)
## Rows: 22,099
## Columns: 3
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 150396036~
## $ ActivityHour <chr> "4/12/2016 12:00:00 AM", "4/12/2016 1:00:00 AM", "4/12/20~
## $ StepTotal <int> 373, 160, 151, 0, 0, 0, 0, 0, 250, 1864, 676, 360, 253, 2~
heartrate <- read.csv("C://Users//Annu T Poulose//Desktop//Data Analytics//Fitabase Data 4.12.16-5.12.16//heartrate_seconds_merged.csv")
glimpse(heartrate)
## Rows: 2,483,658
## Columns: 3
## $ Id <dbl> 2022484408, 2022484408, 2022484408, 2022484408, 2022484408, 2022~
## $ Time <chr> "4/12/2016 7:21:00 AM", "4/12/2016 7:21:05 AM", "4/12/2016 7:21:~
## $ Value <int> 97, 102, 105, 103, 101, 95, 91, 93, 94, 93, 92, 89, 83, 61, 60, ~
Let us look whether there is any duplicated rows in the tables.
sum(duplicated(daily_activity))
## [1] 0
sum(duplicated(daily_sleep))
## [1] 3
sum(duplicated(hourly_calories))
## [1] 0
sum(duplicated(hourly_intensity))
## [1] 0
sum(duplicated(hourly_steps))
## [1] 0
sum(duplicated(heartrate))
## [1] 0
We can see that the table daily_sleep is having duplicate rows and we need to remove them. Also, we change the column names to lower case to maintain uniformity during further analysis.
#Cleaning the Data
daily_activity<-rename_with(daily_activity, tolower)
daily_sleep <- rename_with(daily_sleep, tolower) %>%
distinct() %>%
drop_na()
hourly_steps<-rename_with(hourly_steps, tolower)
hourly_intensity<-rename_with(hourly_intensity, tolower)
hourly_calories<-rename_with(hourly_calories, tolower)
heartrate<-rename_with(heartrate,tolower)
Now, we change the format of date and time to a standard format, split the date and time to separate columns in the relevant tables and give a common column name date and time to all the tables. This helps us to maintain uniformity during further analysis. Also we convert the heartrate table to hourly_heartrate table by taking the average of heart rate values in an hour.
daily_activity <- daily_activity %>%
rename(date = activitydate) %>%
mutate(date = as.Date(date, format = "%m/%d/%Y"))
head(daily_activity)
daily_sleep <- daily_sleep %>%
rename(date = sleepday) %>%
mutate(date = as.POSIXct(date,format ="%m/%d/%Y %I:%M:%S %p" , tz=Sys.timezone()))
head(daily_sleep)
hourly_calories <- hourly_calories %>%
mutate(activityhour = as.POSIXct(activityhour,format ="%m/%d/%Y %I:%M:%S %p" , tz=Sys.timezone())) %>%
separate(activityhour, into = c("date", "time"), sep= " ")
head(hourly_calories)
hourly_intensity <- hourly_intensity %>%
mutate(activityhour = as.POSIXct(activityhour,format ="%m/%d/%Y %I:%M:%S %p" , tz=Sys.timezone())) %>%
separate(activityhour, into = c("date", "time"), sep= " ")
head(hourly_intensity)
hourly_steps <- hourly_steps %>%
mutate(activityhour = as.POSIXct(activityhour,format ="%m/%d/%Y %I:%M:%S %p" , tz=Sys.timezone())) %>%
separate(activityhour, into = c("date", "time"), sep= " ")
head(hourly_steps)
heartrate <- heartrate %>%
mutate(time = as.POSIXct(time,format ="%m/%d/%Y %I:%M:%S %p" , tz=Sys.timezone()))
# Changing the values to hourly heart rate
hourly_heartrate <- heartrate %>%
group_by(id) %>%
summarise_by_time(
.date_var = time,
.by = "hour",
value = mean(value)
)
hourly_heartrate<- hourly_heartrate %>%
separate(time, into = c("date", "time"), sep= " ")
head(hourly_heartrate)
In the analysis phase, we try to generate information from the data that can help us in solving our business task. Most of the graphs in this section are interactive graphs and they are plotted using the package plotly.
In order to track trends during each weekday, we need to insert a new field into the daily data tables showing the weekday corresponding to each date.
# To track trend on weekdays
daily_activity$weekday <- weekdays(as.Date(daily_activity$date))
daily_activity$weekday <- ordered(daily_activity$weekday, levels=c("Monday", "Tuesday", "Wednesday", "Thursday","Friday", "Saturday","Sunday"))
daily_sleep$weekday <- weekdays(as.Date(daily_sleep$date))
daily_sleep$weekday<- ordered(daily_sleep$weekday, levels=c("Monday", "Tuesday", "Wednesday", "Thursday","Friday", "Saturday","Sunday"))
To understand the distribution of total steps on each weekday, we create a box plot corresponding to each weekday.
fig_1 <- plot_ly(data=daily_activity,x=~weekday,y=~totalsteps,color=~weekday,colors = "Dark2",type="box")%>%
layout(title = "Total Steps on Weekdays",xaxis = list(title = "Day"),yaxis = list(title = "Total Steps"))
fig_1
Let us assume 10000 steps per day to be an ideal value.
Everyday more than 50% of the observations lie below 10k. On Wednesday, Thursday, Friday and Sunday, around 75% of the observations are below 10k.
This clearly shows that most of the time, the subjects are not taking required amount of daily steps.
Next we analyze the sleep pattern on each weekdays.
fig_2 <- plot_ly(data=daily_sleep,x=~weekday,y=~totalminutesasleep,color=~weekday,colors = "Dark2",type="box")%>%
layout(title = "Total minutes Asleep on Weekdays",xaxis = list(title = "Day"),yaxis = list(title = "Total Minutes Asleep"))
fig_2
We take 7 hours (420 minutes) of sleep as minimum amount of sleep required for maintaining a good health.
Except on Sunday, around 50% of the observations lie below 420 minutes. This clearly shows that on days other than Sunday, around 50% of the time they are sleep deprived. Individuals who habitually sleep less than 7 hours a day may be exhibiting signs or symptoms of serious health problems.
Next, we look into the correlation between Total minutes of Sleep and Sedentary (Inactive) minutes.
Since the dimensions of daily_activity and daily_sleep are different, we cannot calculate the correlation directly. We combine these tables to form a new table daily_activity_sleep for 24 subjects (maximum number of subjects in both tables) and then calculate the correlation to understand whether there is any relation between amount of sleep and inactive minutes in a day for the 24 subjects.
daily_activity_sleep <- merge(daily_activity, daily_sleep, by=c("id","date")) %>%
drop_na()
head(daily_activity_sleep)
cor(daily_activity_sleep$sedentaryminutes,daily_activity_sleep$totalminutesasleep)
## [1] -0.2506668
The negative correlation shows that as the amount of sleep decreases, the amount of sedentary minutes increases. This supports our statement that sleeping less than 7 hours a day may lead to serious health problems.
Let us look into the scatter plot between total minutes of sleep and sedentary minutes to further understand the distribution of observations.
fit <- lm(sedentaryminutes ~ totalminutesasleep, data = daily_activity_sleep)
fig_3 <- daily_activity_sleep %>%
plot_ly(x = ~totalminutesasleep,y = ~sedentaryminutes,type="scatter",mode="markers")%>%
add_lines(x = ~totalminutesasleep, y = fitted(fit))%>%
layout(title = "Comparison between minutes Asleep and Sedentary minutes",xaxis = list(title = "Total Minutes Asleep"),yaxis = list(title = "Sedentary Minutes"),showlegend = F)
fig_3
The orange line is a linear model fitted using total minutes of sleep as independent variable and sedentary minutes as dependent variable. The line helps us to understand the extent of linear relationship between these two variables.
The number of steps taken in a day is an important parameter in analyzing the daily activity of a person. The next plot gives us an idea about the general trend in the total steps taken in a day among the 33 subjects.
average_hourly_steps<- hourly_steps %>%
group_by(time) %>%
summarize(average_steps = mean(steptotal))
glimpse(average_hourly_steps)
## Rows: 24
## Columns: 2
## $ time <chr> "00:00:00", "01:00:00", "02:00:00", "03:00:00", "04:00:0~
## $ average_steps <dbl> 42.188437, 23.102894, 17.110397, 6.426581, 12.699571, 43~
fig_4<- ggplot(data=average_hourly_steps) +
geom_col(mapping = aes(x=time, y = average_steps, fill = average_steps)) +
labs(title = "Hourly Steps", x="Time", y="Average Steps") +
scale_fill_gradient(low = "red", high = "blue")+
theme(axis.text.x = element_text(angle = 90))
fig_4
The subjects are mostly active from 7:00 AM to 10:00 PM. The maximum steps are taken between 6:00 PM to 7:00 PM which is mostly the time when they return from their workplace. Also, there is a significantly large number of steps taken during the lunch break ( 12:00 PM to 2:00 PM ). Most of the steps are taken during the office hours, 9:00 AM to 6:00 PM or 10:00 AM to 7:00 PM.
Next, we look into the trend over the days for total calories burnt by each subject.
daily_activity$id<- as.factor(daily_activity$id)
fig_5 <- plot_ly(data=daily_activity,x=~date,y=~calories,color=~id,colors = "Dark2",type="scatter")%>%
layout(title = "Total Calories burnt by each User",
xaxis = list(title = "Date"),
yaxis = list(title = "Calories"))
fig_5
We can see within individual heterogeneity and between individual heterogeneity from the above plot. Also, on 12th May 2016, the trend is different from rest of the dates. The calories burnt are significantly lower compared to other days and this requires further investigation.
hourly_calories %>%
filter(date=="2016-05-12") %>%
distinct(time)
It can be seen that the data is incomplete for 12th May 2016 which resulted in recording lower amount of calories burnt.
Let us look into the relation between the calories burnt and the corresponding intensity to analyze the general pattern among the subjects.
hourly_calories_intensity <- merge(hourly_calories,hourly_intensity, by=c("id","date","time"),all=T)
fig_6<- hourly_calories_intensity %>%
ggplot(aes(x=totalintensity,y=calories))+
geom_point(color="blue")+geom_smooth(color="red")+
labs(title = "Total Intensity vs Calories Burnt", x="Total Intensity", y="Calories")
fig_6
We can see that the curve takes a sudden shift around an intensity of 130 (the unit of intensity is not clearly specified) which conveys an information that an intensity of 130 or above can lead to a relatively faster burning of calories.
Next, we analyze the general relation between total intensity and heart rate.
heartrate_intensity <- merge(hourly_heartrate,hourly_intensity, by = c("id","date","time"),all.x = TRUE)
head(heartrate_intensity)
heartrate_intensity$id<- as.factor(heartrate_intensity$id)
heartrate_intensity<-heartrate_intensity %>%
drop_na()
heartrate_intensity %>%
select(value,totalintensity,averageintensity) %>%
summary()
## value totalintensity averageintensity
## Min. : 43.35 Min. : 0.00 Min. :0.00000
## 1st Qu.: 64.31 1st Qu.: 2.00 1st Qu.:0.03333
## Median : 72.50 Median : 11.00 Median :0.18333
## Mean : 74.85 Mean : 19.17 Mean :0.31954
## 3rd Qu.: 83.29 3rd Qu.: 26.00 3rd Qu.:0.43333
## Max. :161.51 Max. :180.00 Max. :3.00000
From the summary of heart rate value, more than 50% of the values belong to the normal range of 60 to 100. Also more than 50% of the total intensity values lie between 2 and 26.
fig_7<- plot_ly(data=heartrate_intensity,x=~totalintensity,y=~value,color=~id,colors = "Dark2",type="scatter")%>%
layout(title = "Total Intensity vs Heart rate",xaxis = list(title = "Total Intensity"),yaxis = list(title = "Heart rate"))
fig_7
The plot shows the relation between total intensity and heart rate. It is clear that as intensity increases, heart rate increases.
If we had the age and gender data for the 14 subjects in the heartrate table, we could find abnormal heart rates corresponding to different intensities in the data.
Currently we focus at zero intensity or resting. The plot shows an outlier or an high risk heart rate corresponding to the id 6775888955 at zero intensity.
Now we move on to analyze the usage of the smart device by each subject. We categorize the subjects into 3 categories namely, Low Usage, Moderate Usage and High Usage subjects, based on the number of days the smart device was used by the respective subject.
* Less than 10 days of usage \(\implies\) Low Usage
* Between 10 to 20 days of usage \(\implies\) Moderate Usage
* More than 20 days of usage \(\implies\) High Usage
usage <- daily_activity %>%
group_by(id) %>%
summarize(days_used=sum(n())) %>%
mutate(usage_level = case_when(
days_used< 10 ~ "Low Usage (Less than 10 days)",
days_used >= 10 & days_used <= 20 ~ "Moderate Usage (Between 10 to 20 days)",
days_used > 20 ~ "High Usage (More than 20 days)",
))
usage_frequency<- usage %>%
group_by(usage_level) %>%
summarise(frequency=n())
fig_8<- usage_frequency %>%
plot_ly(labels= ~usage_level, values= ~frequency) %>%
add_pie(hole=0.5) %>%
layout(title="Smart Device Usage", showlegend= F)
fig_8
From the doughnut chart, around 88% of the subjects used the smart device for more than 20 days.
Let us now look into the usage of the smart device with respect to different time points in a day.
usage_by_time<- hourly_intensity %>%
group_by(time) %>%
summarize(no_of_usage=sum(n()))
glimpse(usage_by_time)
## Rows: 24
## Columns: 2
## $ time <chr> "00:00:00", "01:00:00", "02:00:00", "03:00:00", "04:00:00"~
## $ no_of_usage <int> 934, 933, 933, 933, 932, 932, 931, 931, 931, 931, 929, 927~
fig_9<- ggplot(data=usage_by_time) +
geom_col(mapping = aes(x=time, y = no_of_usage, fill = no_of_usage)) +
labs(title = "Hourly Usage of the Smart Device", x="Time", y="Frequency") +
scale_fill_gradient(low = "red", high = "blue")+
theme(axis.text.x = element_text(angle = 90))
fig_9
From the figure we can infer that most of the subjects use the smart device around the clock.