BellaBeat.knit

Introduction
About the Project
Prepare the Data
3.1 Overview of the Data set
Process the Data
Analyze the Data
5.1 Total Steps on Weekdays
5.2 Total minutes Asleep on Weekdays
5.3 Comparison between minutes Asleep and Sedentary minutes
5.4 Hourly steps throughout the day
5.5 Total Calories burnt by each User
5.6 Comparison between total Intensity and Calories burnt
5.7 Comparison between total Intensity and Heart rate
5.8 Daily Use of Smart device
5.9 Hourly Usage of the Smart device
Recommendation
References

1. Introduction

Welcome to the Bellabeat data analysis case study! Bellabeat is a high-tech manufacturer of health-focused products for women. As a junior data analyst working on the marketing analyst team at Bellabeat, I am asked to focus on one of Bellabeat’s products and analyze smart device data to gain insight into how consumers are using their smart devices.

2. About the Project

Bellabeat is a successful small company, but they have the potential to become a larger player in the global smart device market. Analyzing smart device fitness data could help unlock new growth opportunities for the company.
In this project, we have considered a public data set, FitBit Fitness Tracker Data, made available through Mobius, to gain insights about how customers are using their smart devices. We then share our analysis through visualization to the Bellabeat executive team along with our high-level recommendations for Bellabeat’s marketing strategy.

Business task : To gain insights into how customers are using their smart devices

3. Prepare the Data

FitBit Fitness Tracker Data (dataset made available through Mobius): This Kaggle data set contains personal fitness tracker from thirty fitbit users. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. It includes information about daily activity, steps, and heart rate that can be used to explore users’ habits.

3.1 Overview of the Dataset

The data set consist of 18 tables. The table name and other relevant details are tabulated below:-

Table Name	Number of Subjects	Fields	Remarks
1. dailyActivity_merged	33 Subjects with 31 observations each	“Id”, “ActivityDate”, “TotalSteps”, “TotalDistance”, “TrackerDistance”, “LoggedActivitiesDistance”, “VeryActiveDistance”, “ModeratelyActiveDistance”, “LightActiveDistance”, “SedentaryActiveDistance”, “VeryActiveMinutes”, “FairlyActiveMinutes”, “LightlyActiveMinutes”, “SedentaryMinutes”, “Calories”	The field “ActivityDate” is not in standard date format.
2. dailyCalories_merged	33 Subjects with 31 observations each	“Id”, “ActivityDay”, “Calories”	The field “ActivityDate” is not in standard date format.
3. dailyIntensities_merged	33 Subjects with 31 observations each	“Id”, “ActivityDay”, “SedentaryMinutes”, “LightlyActiveMinutes”, “FairlyActiveMinutes”, “VeryActiveMinutes”, “SedentaryActiveDistance”, “LightActiveDistance”, “ModeratelyActiveDistance”, “VeryActiveDistance”	The field “ActivityDate” is not in standard date format.
4. dailySteps_merged	33 Subjects with 31 observations each	“Id”, “ActivityDay”, “StepTotal”	The field “ActivityDate” is not in standard date format.
5. heartrate_seconds_merged	14 Subjects with 5 observations corresponding to each minute for 31 days	“Id”, “Time”, “Value”	The field “Time” is not in standard date-time format and there is a lot of missing values corresponding to certain subjects.
6. hourlyCalories_merged	33 Subjects with one observation corresponding to each hour for 31 days	“Id”,“ActivityHour”,“Calories”	The field “ActivityHour” is not in standard date-time format and also there are missing values corresponding to certain subjects.
7. hourlyIntensities_merged	33 Subjects with one observation corresponding to each hour for 31 days	“Id”, “ActivityHour”, “TotalIntensity”, “AverageIntensity”	The field “ActivityHour” is not in standard date-time format and also there are missing values corresponding to certain subjects.
8. hourlySteps_merged	33 Subjects with one observation corresponding to each hour for 31 days	“Id”, “ActivityHour”, “StepTotal”	The field “ActivityHour” is not in standard date-time format and also there are missing values corresponding to certain subjects.
9. minuteCaloriesNarrow_merged	33 Subjects with one observation corresponding to each minute for 31 days	“Id”, “ActivityMinute”, “Calories”	The field “ActivityMinute” is not in standard date-time format and also there are missing values corresponding to certain subjects.
10. minuteCaloriesWide_merged	Same as minuteCaloriesNarrow_merged with each minute of an hour in separate columns
11. minuteIntensitiesNarrow_merged	33 Subjects with one observation corresponding to each minute for 31 days	“Id”, “ActivityMinute”, “Intensity”	The field “ActivityMinute” is not in standard date-time format and also there are missing values corresponding to certain subjects.
12. minuteIntensitiesWide_merged	Same as minuteIntensitiesNarrow_merged with each minute of an hour in separate columns
13. minuteMETsNarrow_merged	33 Subjects with one observation corresponding to each minute for 31 days	“Id”, “ActivityMinute”, “METs”	Metabolic Equivalents (METs) is the ratio of working metabolic rate relative to resting metabolic rate. But the criteria under which METs is calculated is not clearly defined. The field “ActivityMinute” is not in standard date-time format and also there are missing values corresponding to certain subjects.
14. minuteSleep_merged	24 Subjects with one observation corresponding to each minute for 31 days	“Id”, “date”, “value”, “logId”	The field “value” is not clearly specified. Unable to infer anything from this table
15. minuteStepsNarrow_merged	33 Subjects with one observation corresponding to each minute for 31 days	“Id”, “ActivityMinute”, “Steps”	The field “ActivityMinute” is not in standard date-time format and also there are missing values corresponding to certain subjects.
16. minuteStepsWide_merged	Same as minuteStepsNarrow_merged with each minute of an hour in separate columns
17. sleepDay_merged	24 Subjects with 31 observations each	“Id”, “SleepDay”, “TotalSleepRecords”, “TotalMinutesAsleep”, “TotalTimeInBed”	The field “SleepDay” is not in standard date format and there are multiple sleep records for certain subjects.
18. weightLogInfo_merged	8 Subjects	“Id”, “Date”, “WeightKg”, “WeightPounds”, “Fat”, “BMI”, “IsManualReport”, “LogId”	Very few records

The data set consist of only 33 subjects which lead to a potential bias in our sample. It can be seen that the minutes tables are used to form hourly tables and hourly tables are used to form daily tables. Also some tables are missing certain subjects.
The data set is lacking certain important information regarding the subjects such as, age, gender, etc.

4. Process the Data

Now we move on to the process phase in data analysis. Here, we clean the data and prepare it for further analysis.
I have selected 6 tables out the 18 tables for analysis.
1. dailyActivity_merged as daily_activity
2. sleepDay_merged as daily_sleep
3. hourlyCalories_merged as hourly_calories
4. hourlyIntensities_merged as hourly_intensity
5. hourlySteps_merged as hourly_steps
6. heartrate_seconds_merged as heartrate

rm(list=ls())

# Loading required libraries

library(tidyverse)
library(plotly)
library(timetk)

#Loading the selected tables

daily_activity <- read.csv("C://Users//Annu T Poulose//Desktop//Data Analytics//Fitabase Data 4.12.16-5.12.16//dailyActivity_merged.csv")
glimpse(daily_activity)

## Rows: 940
## Columns: 15
## $ Id                       <dbl> 1503960366, 1503960366, 1503960366, 150396036~
## $ ActivityDate             <chr> "4/12/2016", "4/13/2016", "4/14/2016", "4/15/~
## $ TotalSteps               <int> 13162, 10735, 10460, 9762, 12669, 9705, 13019~
## $ TotalDistance            <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9.8~
## $ TrackerDistance          <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9.8~
## $ LoggedActivitiesDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ VeryActiveDistance       <dbl> 1.88, 1.57, 2.44, 2.14, 2.71, 3.19, 3.25, 3.5~
## $ ModeratelyActiveDistance <dbl> 0.55, 0.69, 0.40, 1.26, 0.41, 0.78, 0.64, 1.3~
## $ LightActiveDistance      <dbl> 6.06, 4.71, 3.91, 2.83, 5.04, 2.51, 4.71, 5.0~
## $ SedentaryActiveDistance  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ VeryActiveMinutes        <int> 25, 21, 30, 29, 36, 38, 42, 50, 28, 19, 66, 4~
## $ FairlyActiveMinutes      <int> 13, 19, 11, 34, 10, 20, 16, 31, 12, 8, 27, 21~
## $ LightlyActiveMinutes     <int> 328, 217, 181, 209, 221, 164, 233, 264, 205, ~
## $ SedentaryMinutes         <int> 728, 776, 1218, 726, 773, 539, 1149, 775, 818~
## $ Calories                 <int> 1985, 1797, 1776, 1745, 1863, 1728, 1921, 203~

daily_sleep <- read.csv("C://Users//Annu T Poulose//Desktop//Data Analytics//Fitabase Data 4.12.16-5.12.16//sleepDay_merged.csv")
glimpse(daily_sleep)

## Rows: 413
## Columns: 5
## $ Id                 <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 150~
## $ SleepDay           <chr> "4/12/2016 12:00:00 AM", "4/13/2016 12:00:00 AM", "~
## $ TotalSleepRecords  <int> 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ~
## $ TotalMinutesAsleep <int> 327, 384, 412, 340, 700, 304, 360, 325, 361, 430, 2~
## $ TotalTimeInBed     <int> 346, 407, 442, 367, 712, 320, 377, 364, 384, 449, 3~

hourly_calories <- read.csv("C://Users//Annu T Poulose//Desktop//Data Analytics//Fitabase Data 4.12.16-5.12.16//hourlyCalories_merged.csv")
glimpse(hourly_calories)

## Rows: 22,099
## Columns: 3
## $ Id           <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 150396036~
## $ ActivityHour <chr> "4/12/2016 12:00:00 AM", "4/12/2016 1:00:00 AM", "4/12/20~
## $ Calories     <int> 81, 61, 59, 47, 48, 48, 48, 47, 68, 141, 99, 76, 73, 66, ~

hourly_intensity <- read.csv("C://Users//Annu T Poulose//Desktop//Data Analytics//Fitabase Data 4.12.16-5.12.16//hourlyIntensities_merged.csv")
glimpse(hourly_intensity)

## Rows: 22,099
## Columns: 4
## $ Id               <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 15039~
## $ ActivityHour     <chr> "4/12/2016 12:00:00 AM", "4/12/2016 1:00:00 AM", "4/1~
## $ TotalIntensity   <int> 20, 8, 7, 0, 0, 0, 0, 0, 13, 30, 29, 12, 11, 6, 36, 5~
## $ AverageIntensity <dbl> 0.333333, 0.133333, 0.116667, 0.000000, 0.000000, 0.0~

hourly_steps <- read.csv("C://Users//Annu T Poulose//Desktop//Data Analytics//Fitabase Data 4.12.16-5.12.16//hourlySteps_merged.csv")
glimpse(hourly_steps)

## Rows: 22,099
## Columns: 3
## $ Id           <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 150396036~
## $ ActivityHour <chr> "4/12/2016 12:00:00 AM", "4/12/2016 1:00:00 AM", "4/12/20~
## $ StepTotal    <int> 373, 160, 151, 0, 0, 0, 0, 0, 250, 1864, 676, 360, 253, 2~

heartrate <- read.csv("C://Users//Annu T Poulose//Desktop//Data Analytics//Fitabase Data 4.12.16-5.12.16//heartrate_seconds_merged.csv")
glimpse(heartrate)

## Rows: 2,483,658
## Columns: 3
## $ Id    <dbl> 2022484408, 2022484408, 2022484408, 2022484408, 2022484408, 2022~
## $ Time  <chr> "4/12/2016 7:21:00 AM", "4/12/2016 7:21:05 AM", "4/12/2016 7:21:~
## $ Value <int> 97, 102, 105, 103, 101, 95, 91, 93, 94, 93, 92, 89, 83, 61, 60, ~

Let us look whether there is any duplicated rows in the tables.

sum(duplicated(daily_activity))

## [1] 0

sum(duplicated(daily_sleep))

## [1] 3

sum(duplicated(hourly_calories))

## [1] 0

sum(duplicated(hourly_intensity))

## [1] 0

sum(duplicated(hourly_steps))

## [1] 0

sum(duplicated(heartrate))

## [1] 0

We can see that the table daily_sleep is having duplicate rows and we need to remove them. Also, we change the column names to lower case to maintain uniformity during further analysis.

#Cleaning the Data
daily_activity<-rename_with(daily_activity, tolower)
daily_sleep <- rename_with(daily_sleep, tolower) %>% 
  distinct() %>% 
  drop_na()
hourly_steps<-rename_with(hourly_steps, tolower)
hourly_intensity<-rename_with(hourly_intensity, tolower)
hourly_calories<-rename_with(hourly_calories, tolower)
heartrate<-rename_with(heartrate,tolower)

Now, we change the format of date and time to a standard format, split the date and time to separate columns in the relevant tables and give a common column name date and time to all the tables. This helps us to maintain uniformity during further analysis. Also we convert the heartrate table to hourly_heartrate table by taking the average of heart rate values in an hour.

daily_activity <- daily_activity %>%
  rename(date = activitydate) %>%
  mutate(date = as.Date(date, format = "%m/%d/%Y"))
head(daily_activity)

daily_sleep <- daily_sleep %>%
  rename(date = sleepday) %>% 
  mutate(date = as.POSIXct(date,format ="%m/%d/%Y %I:%M:%S %p" , tz=Sys.timezone()))
head(daily_sleep)

hourly_calories <- hourly_calories %>% 
  mutate(activityhour = as.POSIXct(activityhour,format ="%m/%d/%Y %I:%M:%S %p" , tz=Sys.timezone())) %>% 
  separate(activityhour, into = c("date", "time"), sep= " ")
head(hourly_calories)

hourly_intensity <- hourly_intensity %>% 
  mutate(activityhour = as.POSIXct(activityhour,format ="%m/%d/%Y %I:%M:%S %p" , tz=Sys.timezone())) %>% 
  separate(activityhour, into = c("date", "time"), sep= " ")
head(hourly_intensity)

hourly_steps <- hourly_steps %>% 
  mutate(activityhour = as.POSIXct(activityhour,format ="%m/%d/%Y %I:%M:%S %p" , tz=Sys.timezone())) %>% 
  separate(activityhour, into = c("date", "time"), sep= " ")
head(hourly_steps)

heartrate <- heartrate %>% 
  mutate(time = as.POSIXct(time,format ="%m/%d/%Y %I:%M:%S %p" , tz=Sys.timezone()))

# Changing the values to hourly heart rate 

hourly_heartrate <- heartrate %>% 
  group_by(id) %>%
  summarise_by_time(
    .date_var = time,
    .by = "hour",
    value = mean(value)
  )
hourly_heartrate<- hourly_heartrate %>% 
  separate(time, into = c("date", "time"), sep= " ")
head(hourly_heartrate)

5. Analyze the Data

In the analysis phase, we try to generate information from the data that can help us in solving our business task. Most of the graphs in this section are interactive graphs and they are plotted using the package plotly.

5.1 Total Steps on Weekdays

In order to track trends during each weekday, we need to insert a new field into the daily data tables showing the weekday corresponding to each date.

# To track trend on weekdays

daily_activity$weekday <- weekdays(as.Date(daily_activity$date))
daily_activity$weekday <- ordered(daily_activity$weekday, levels=c("Monday", "Tuesday", "Wednesday", "Thursday","Friday", "Saturday","Sunday"))

daily_sleep$weekday <- weekdays(as.Date(daily_sleep$date))
daily_sleep$weekday<- ordered(daily_sleep$weekday, levels=c("Monday", "Tuesday", "Wednesday", "Thursday","Friday", "Saturday","Sunday"))

To understand the distribution of total steps on each weekday, we create a box plot corresponding to each weekday.

fig_1 <- plot_ly(data=daily_activity,x=~weekday,y=~totalsteps,color=~weekday,colors = "Dark2",type="box")%>%
  layout(title = "Total Steps on Weekdays",xaxis = list(title = "Day"),yaxis = list(title = "Total Steps"))
fig_1

Let us assume 10000 steps per day to be an ideal value.
Everyday more than 50% of the observations lie below 10k. On Wednesday, Thursday, Friday and Sunday, around 75% of the observations are below 10k.
This clearly shows that most of the time, the subjects are not taking required amount of daily steps.

5.2 Total minutes Asleep on Weekdays

Next we analyze the sleep pattern on each weekdays.

fig_2 <- plot_ly(data=daily_sleep,x=~weekday,y=~totalminutesasleep,color=~weekday,colors = "Dark2",type="box")%>% 
  layout(title = "Total minutes Asleep on Weekdays",xaxis = list(title = "Day"),yaxis = list(title = "Total Minutes Asleep"))
fig_2

We take 7 hours (420 minutes) of sleep as minimum amount of sleep required for maintaining a good health.
Except on Sunday, around 50% of the observations lie below 420 minutes. This clearly shows that on days other than Sunday, around 50% of the time they are sleep deprived. Individuals who habitually sleep less than 7 hours a day may be exhibiting signs or symptoms of serious health problems.
Next, we look into the correlation between Total minutes of Sleep and Sedentary (Inactive) minutes.

5.3 Comparison between minutes Asleep and Sedentary minutes

Since the dimensions of daily_activity and daily_sleep are different, we cannot calculate the correlation directly. We combine these tables to form a new table daily_activity_sleep for 24 subjects (maximum number of subjects in both tables) and then calculate the correlation to understand whether there is any relation between amount of sleep and inactive minutes in a day for the 24 subjects.

daily_activity_sleep <- merge(daily_activity, daily_sleep, by=c("id","date")) %>% 
  drop_na()
head(daily_activity_sleep)

cor(daily_activity_sleep$sedentaryminutes,daily_activity_sleep$totalminutesasleep)

## [1] -0.2506668

The negative correlation shows that as the amount of sleep decreases, the amount of sedentary minutes increases. This supports our statement that sleeping less than 7 hours a day may lead to serious health problems.

Let us look into the scatter plot between total minutes of sleep and sedentary minutes to further understand the distribution of observations.

fit <- lm(sedentaryminutes ~ totalminutesasleep, data = daily_activity_sleep)

fig_3 <- daily_activity_sleep %>% 
  plot_ly(x = ~totalminutesasleep,y = ~sedentaryminutes,type="scatter",mode="markers")%>%
  add_lines(x = ~totalminutesasleep, y = fitted(fit))%>% 
  layout(title = "Comparison between minutes Asleep and Sedentary minutes",xaxis = list(title = "Total Minutes Asleep"),yaxis = list(title = "Sedentary Minutes"),showlegend = F)
fig_3

The orange line is a linear model fitted using total minutes of sleep as independent variable and sedentary minutes as dependent variable. The line helps us to understand the extent of linear relationship between these two variables.

5.4 Hourly Steps throughout the Day

The number of steps taken in a day is an important parameter in analyzing the daily activity of a person. The next plot gives us an idea about the general trend in the total steps taken in a day among the 33 subjects.

average_hourly_steps<- hourly_steps %>% 
  group_by(time) %>% 
  summarize(average_steps = mean(steptotal))
glimpse(average_hourly_steps)

## Rows: 24
## Columns: 2
## $ time          <chr> "00:00:00", "01:00:00", "02:00:00", "03:00:00", "04:00:0~
## $ average_steps <dbl> 42.188437, 23.102894, 17.110397, 6.426581, 12.699571, 43~

fig_4<- ggplot(data=average_hourly_steps) +
  geom_col(mapping = aes(x=time, y = average_steps, fill = average_steps)) + 
  labs(title = "Hourly Steps", x="Time", y="Average Steps") + 
  scale_fill_gradient(low = "red", high = "blue")+
  theme(axis.text.x = element_text(angle = 90))

fig_4

The subjects are mostly active from 7:00 AM to 10:00 PM. The maximum steps are taken between 6:00 PM to 7:00 PM which is mostly the time when they return from their workplace. Also, there is a significantly large number of steps taken during the lunch break ( 12:00 PM to 2:00 PM ). Most of the steps are taken during the office hours, 9:00 AM to 6:00 PM or 10:00 AM to 7:00 PM.

5.5 Total Calories burnt by each User

Next, we look into the trend over the days for total calories burnt by each subject.

daily_activity$id<- as.factor(daily_activity$id)
fig_5 <- plot_ly(data=daily_activity,x=~date,y=~calories,color=~id,colors = "Dark2",type="scatter")%>% 
  layout(title =  "Total Calories burnt by each User",
         xaxis = list(title = "Date"),
         yaxis = list(title = "Calories"))
fig_5

We can see within individual heterogeneity and between individual heterogeneity from the above plot. Also, on 12th May 2016, the trend is different from rest of the dates. The calories burnt are significantly lower compared to other days and this requires further investigation.

hourly_calories %>% 
  filter(date=="2016-05-12") %>% 
  distinct(time)

It can be seen that the data is incomplete for 12th May 2016 which resulted in recording lower amount of calories burnt.

5.6 Comparison between total Intensity and Calories burnt

Let us look into the relation between the calories burnt and the corresponding intensity to analyze the general pattern among the subjects.

hourly_calories_intensity <- merge(hourly_calories,hourly_intensity, by=c("id","date","time"),all=T)

fig_6<- hourly_calories_intensity %>% 
  ggplot(aes(x=totalintensity,y=calories))+
  geom_point(color="blue")+geom_smooth(color="red")+
  labs(title = "Total Intensity vs Calories Burnt", x="Total Intensity", y="Calories")
fig_6

We can see that the curve takes a sudden shift around an intensity of 130 (the unit of intensity is not clearly specified) which conveys an information that an intensity of 130 or above can lead to a relatively faster burning of calories.

5.7 Comparison between total Intensity and Heart rate

Next, we analyze the general relation between total intensity and heart rate.

heartrate_intensity <- merge(hourly_heartrate,hourly_intensity, by = c("id","date","time"),all.x = TRUE)
head(heartrate_intensity)

heartrate_intensity$id<- as.factor(heartrate_intensity$id)

heartrate_intensity<-heartrate_intensity %>% 
  drop_na()

heartrate_intensity %>%
  select(value,totalintensity,averageintensity) %>% 
  summary()

##      value        totalintensity   averageintensity 
##  Min.   : 43.35   Min.   :  0.00   Min.   :0.00000  
##  1st Qu.: 64.31   1st Qu.:  2.00   1st Qu.:0.03333  
##  Median : 72.50   Median : 11.00   Median :0.18333  
##  Mean   : 74.85   Mean   : 19.17   Mean   :0.31954  
##  3rd Qu.: 83.29   3rd Qu.: 26.00   3rd Qu.:0.43333  
##  Max.   :161.51   Max.   :180.00   Max.   :3.00000

From the summary of heart rate value, more than 50% of the values belong to the normal range of 60 to 100. Also more than 50% of the total intensity values lie between 2 and 26.

fig_7<- plot_ly(data=heartrate_intensity,x=~totalintensity,y=~value,color=~id,colors = "Dark2",type="scatter")%>% 
  layout(title = "Total Intensity vs Heart rate",xaxis = list(title = "Total Intensity"),yaxis = list(title = "Heart rate"))

fig_7

The plot shows the relation between total intensity and heart rate. It is clear that as intensity increases, heart rate increases.

If we had the age and gender data for the 14 subjects in the heartrate table, we could find abnormal heart rates corresponding to different intensities in the data.

Currently we focus at zero intensity or resting. The plot shows an outlier or an high risk heart rate corresponding to the id 6775888955 at zero intensity.

5.8 Daily Use of Smart device

Now we move on to analyze the usage of the smart device by each subject. We categorize the subjects into 3 categories namely, Low Usage, Moderate Usage and High Usage subjects, based on the number of days the smart device was used by the respective subject.
* Less than 10 days of usage \(\implies\) Low Usage
* Between 10 to 20 days of usage \(\implies\) Moderate Usage
* More than 20 days of usage \(\implies\) High Usage

usage <- daily_activity %>%
  group_by(id) %>%
  summarize(days_used=sum(n())) %>%
  mutate(usage_level = case_when(
    days_used< 10 ~ "Low Usage (Less than 10 days)",
    days_used >= 10 & days_used <= 20 ~ "Moderate Usage (Between 10 to 20 days)", 
    days_used > 20 ~ "High Usage (More than 20 days)", 
  ))
usage_frequency<- usage %>% 
  group_by(usage_level) %>% 
  summarise(frequency=n())

fig_8<- usage_frequency %>% 
  plot_ly(labels= ~usage_level, values= ~frequency) %>% 
  add_pie(hole=0.5) %>% 
  layout(title="Smart Device Usage", showlegend= F)
fig_8

From the doughnut chart, around 88% of the subjects used the smart device for more than 20 days.
Let us now look into the usage of the smart device with respect to different time points in a day.

5.9 Hourly Usage of the Smart device

usage_by_time<- hourly_intensity %>% 
  group_by(time) %>% 
  summarize(no_of_usage=sum(n()))
glimpse(usage_by_time)

## Rows: 24
## Columns: 2
## $ time        <chr> "00:00:00", "01:00:00", "02:00:00", "03:00:00", "04:00:00"~
## $ no_of_usage <int> 934, 933, 933, 933, 932, 932, 931, 931, 931, 931, 929, 927~

fig_9<- ggplot(data=usage_by_time) +
  geom_col(mapping = aes(x=time, y = no_of_usage, fill = no_of_usage)) + 
  labs(title = "Hourly Usage of the Smart Device", x="Time", y="Frequency") + 
  scale_fill_gradient(low = "red", high = "blue")+
  theme(axis.text.x = element_text(angle = 90))

fig_9

From the figure we can infer that most of the subjects use the smart device around the clock.

6. Recommendation

A smart device with high battery life can make an impact as most of the users use these devices 24x7.
Bellabeat can offer rewards for their users if they take more than 10,000 steps a day. This will attract more customers and also ensure a healthy life style for its users.
Bellabeat app can show a weekly pattern and average of total minute asleep for its users and also suggest articles based on consequences of lack of sleep and tips for a good sleep if the weekly average go below 420 minutes.
If we can get our hands on the data for intensities corresponding to a set of workouts, we can integrate that with Bellabeat app to suggest workouts for a user with a target to burn certain amount of calories.
From the age and gender data obtained from the users, we can alarm them when an abnormal heart rate is detected corresponding to each intensity level.

7. References

How many Steps a Day?, Article by ‘Healthline’ (https://www.healthline.com/health/how-many-steps-a-day)
National Sleep Foundation’s guidelines (https://pubmed.ncbi.nlm.nih.gov/29073412/)

Google Data Analytics Capstone - How Can a Wellness Technology Company Play It Smart?

Table of Contents