Human Activity Recognition Using Accelerometer and Gyroscope Data in R

September 02, 2024
Eunice Dunbar
🇺🇸 United States
Machine Learning
Eunice Dunbar is a data scientist with over 8 years of experience in machine learning and data analysis. She specializes in implementing neural networks and tree-based models using R and Python.

20% OFF on your Fall Semester Programming Assignment
Use Code PHHFALL2024

We Accept

Tip of the day
News
Key Topics
• Step 1: Understand the Assignment Requirements
• Step 2: Load and Explore the Data
• Common Exploratory Data Analysis (EDA) Steps:
• Step 3: Feature Selection and Engineering (if needed)
• Step 4: Choose and Implement Machine Learning Algorithms
• Model 1: Neural Network for Classification
• Model 2: Tree-Based Models
• Step 5: Compare the Models
• Other Considerations:
• Conclusion

Machine learning has become an essential skill for students in computer science and data science programs. Many assignments require you to classify data, predict outcomes, and compare algorithms using real-world datasets. One common and exciting application of machine learning is human activity recognition, often performed using data from smart health and fitness devices. These devices collect data from sensors, such as accelerometers and gyroscopes, to track various activities like walking, running, and sitting. In this blog post, we will discuss a systematic approach to solve machine learning assignments, specifically focusing on human activity recognition using data from devices like accelerometers and gyroscopes. Although we will refer to the context of activity recognition, the principles discussed here can be applied to various other R programming assignments involving machine learning and data analysis.

Step 1: Understand the Assignment Requirements

The first step is to carefully understand what the assignment is asking for. Here's a quick breakdown of typical requirements, using the example provided:

• Objective: The main task is to classify human activities based on sensor data.
• Data: Preprocessed data has been provided, and no additional cleaning is required.
• Algorithms: You are asked to implement and compare two types of models:
• A neural network model for classification.
• Tree-based models, such as decision trees, random forests, or gradient boosting.

The key takeaway is to focus on solving the problem using machine learning techniques. You'll need to split the data into training and test sets, train the models, evaluate their performance, and compare the results.

Step 2: Load and Explore the Data

Before jumping into model-building, it’s essential to load the data and explore it. In the context of the assignment, the dataset contains recordings from smartphone sensors, such as accelerometers and gyroscopes. The data has been preprocessed, which implies that it has already been cleaned and normalized.

Common Exploratory Data Analysis (EDA) Steps:

• Check Data Structure: Use functions like str(), summary(), and head() to inspect the data structure and get a sense of the features and labels.
• Visualize the Data: Plotting histograms, boxplots, or scatter plots can help you understand the distribution of the features. Visualizations are key to identifying potential outliers, correlations, and feature importance.
• Understand Class Imbalance: In classification problems, class imbalance (e.g., more samples for some activities than others) can negatively impact the model’s performance. Use a bar chart or table() function to check the distribution of classes.

By gaining an understanding of the data, you’ll be better equipped to build more accurate models.

Step 3: Feature Selection and Engineering (if needed)

Although the data in this assignment has been preprocessed, many neural network assignments will require you to engineer or select features. Here are some general strategies:

• Dimensionality Reduction: If the dataset has a large number of features, techniques like Principal Component Analysis (PCA) can reduce the dimensionality and help the model generalize better.
• Feature Creation: Sometimes, creating new features based on domain knowledge can enhance model performance. For example, combining accelerometer and gyroscope data to calculate the overall movement intensity might help in recognizing activities like walking or running.

In our case, since the data is already cleaned, we can proceed directly to model-building.

Step 4: Choose and Implement Machine Learning Algorithms

Model 1: Neural Network for Classification

Neural networks are a powerful class of machine learning models, especially suited for complex tasks like image and activity recognition.

Key Components of a Neural Network:

• Hidden Layers: These layers perform computations on the input data to learn complex patterns. You can experiment with the number of hidden layers and the number of neurons in each layer.
• Output Layer: For this task, the output layer will have six neurons (one for each activity), with a softmax activation function to produce probabilities for each class.

In R, you can use the keras or nnet package to implement a neural network. In Python, you might use TensorFlow or PyTorch.

Here’s a general structure of the neural network implementation:

1. Split Data into Training and Test Sets: Use an 80-20 or 70-30 split to divide the dataset.

2. Build the Network:

• Add layers using the keras_model_sequential() function.
• Use layer_dense() to add fully connected layers with ReLU activation.
• Use layer_dropout() for regularization (if necessary).

3. Compile the Model:

• Use categorical cross-entropy as the loss function for multi-class classification.
• Choose an optimizer such as Adam.

4. Train the Model:

• Train the model using the fit() function, with the training set as input.
• Monitor the loss and accuracy over epochs to ensure the model is learning properly.

5. Evaluate the Model:

• Use the evaluate () function to measure the accuracy of the model on the test set.
• Report the accuracy in the assignment.

Here is a simplified code snippet in R using keras:

```library(keras) model <- keras_model_sequential() %>% layer_dense(units = 64, activation = 'relu', input_shape = ncol(train_data)) %>% layer_dropout(rate = 0.4) %>% layer_dense(units = 64, activation = 'relu') %>% layer_dense(units = 6, activation = 'softmax') model %>% compile( optimizer = 'adam', loss = 'categorical_crossentropy', metrics = c('accuracy') ) history <- model %>% fit( train_data, train_labels, epochs = 30, batch_size = 32, validation_split = 0.2 ) accuracy <- model %>% evaluate(test_data, test_labels) ```

Model 2: Tree-Based Models

Tree-based models are intuitive and powerful for classification tasks. Some of the popular tree-based algorithms include:

• Decision Trees: A simple tree structure where each node represents a feature, and branches represent the outcomes of decisions.
• Random Forest: An ensemble method that creates multiple decision trees and aggregates their results to make predictions.
• Gradient Boosting (e.g., XGBoost): A powerful ensemble technique that builds trees sequentially, with each new tree correcting the errors of the previous ones.

You can use the rpart, randomForest, or xgboost packages in R for these models. In Python, you might use scikit-learn or xgboost.

Here’s a simplified implementation using Random Forest in R:

```library(randomForest) rf_model <- randomForest(Activity ~ ., data = train_data, ntree = 100) predictions <- predict(rf_model, test_data) accuracy <- mean(predictions == test_labels) ```

The random forest automatically handles feature selection, so you typically don’t need to worry about feature engineering. The model also provides feature importance, which can be a helpful diagnostic tool.

Step 5: Compare the Models

Once you’ve trained both models (neural network and tree-based), it’s time to compare their performance. The primary metric for comparison is accuracy, but you may also consider other metrics such as precision, recall, and F1-score, depending on the assignment’s requirements.

In this case, you would compare the accuracy of the neural network and the tree-based model on the test set and analyze which one performs better.

Other Considerations:

• Training Time: Neural networks often take longer to train than tree-based models, especially on larger datasets.
• Overfitting: Neural networks, particularly deep ones, are prone to overfitting. To counter this, you can use techniques like dropout, regularization, and cross-validation.
• Interpretability: Tree-based models, especially decision trees, are more interpretable than neural networks, which can be beneficial when explaining your results.

Conclusion

Solving machine learning assignments, especially in domains like activity recognition, involves understanding the problem, selecting appropriate models, and evaluating their performance. By following a systematic approach—understanding the requirements, exploring the data, building models, and comparing results—you can effectively solve programming assignments involving data analysis and training. Although this blog used a specific example of human activity recognition, the steps outlined here can be applied to many other machine learning problems, enabling you to excel in your programming assignments.