Skip to main content

Command Palette

Search for a command to run...

Hand RPS Recognition – AAI / Computer Vision Project

Deep Learning for Detecting Hand Gestures in Rock, Paper, Scissors

Published
4 min read
Hand RPS Recognition – AAI / Computer Vision Project
R

Tech enthusiast | MATLAB & Python lover | Exploring AI, ML, and deep learning. Sharing free code & tutorials to help others learn faster

Introduction

Hand gesture recognition is a rapidly evolving field in computer vision and human–computer interaction. One of the classic examples is the “Rock–Paper–Scissors” (RPS) game: simple gestures, clear classes, but still rich enough for exploring data collection, augmentation, model training, and real-time inference.
In this project, I present a practical implementation of hand RPS recognition, available on GitHub at https://github.com/Riel0303ru/hand-rps-recognition. In this article you will find the motivation, dataset preparation, model architecture, training process, and deployment considerations.


Motivation

Why build a hand RPS recognition system?

  • It’s a manageable multi-class classification problem (rock vs paper vs scissors) that still involves real-world challenges: varying lighting, hand shapes, camera angle, background clutter.

  • It makes a good playground for exploring convolutional neural networks (CNNs), transfer learning, and data augmentation.

  • It has real practical use cases: gesture-based controls, human–robot interaction, game interfaces, accessibility tools for users with motor impairments.


Project Overview

Dataset & Preprocessing

  • The repository contains a dataset of hand gestures for rock, paper, and scissors categories (see link above).

  • Pre-processing steps typically include:

    • Resizing images to a fixed size (e.g., 128×128 or 224×224).

    • Normalizing pixel values (scale to [0, 1] or mean-subtracted).

    • Data augmentation: flips, rotations, brightness/contrast changes, cropping, random background noise to improve generalization.

    • Splitting into training, validation (and optionally test) sets.

Model Architecture

A typical architecture for this kind of task might be:

  • Input: 224×224×3 image (RGB)

  • Convolution Block 1: conv(32 filters, 3×3) → ReLU → BatchNorm → MaxPool

  • Convolution Block 2: conv(64 filters, 3×3) → ReLU → BatchNorm → MaxPool

  • Convolution Block 3: conv(128 filters, 3×3) → ReLU → BatchNorm → MaxPool

  • Flatten → Dense(256) → ReLU → Dropout(0.5)

  • Output Layer: Dense(3) → Softmax (classes: rock, paper, scissors)

Or: use transfer learning with a pretrained backbone (e.g., MobileNetV2, EfficientNetB0) and fine-tune on your hand gesture dataset to achieve higher accuracy with less data.

Learning Algorithm & Training

The core learning algorithm is supervised classification with cross-entropy loss. Here’s the general training loop:

  • Forward pass: input image → model → class probabilities.

    Compute loss:

$$\text{Loss} = -\sum_{c=1}^{3} y_c \log(p_c)$$

  1. (y_c) is ground-truth one-hot label, (p_c) is predicted probability for class (c).

  2. Back-propagation (Adam optimizer or SGD with momentum).

  3. Weight update.

  4. Monitor validation accuracy and loss; apply early stopping or learning-rate scheduling.

Key hyper-parameters to tune: initial learning rate (e.g., 1e-3), batch size (e.g., 32), number of epochs (e.g., 50–100), image size, augmentation mix.

Deployment & Real-Time Inference

Once the model is trained and validated, deploy it for real-time use:

  • Capture camera frames via OpenCV or MediaPipe Hand tracker.

  • Pre-process each frame: crop hand region, resize to model input, normalize.

  • Pass through model → get predicted class.

  • Overlay result on video feed (e.g., display “Rock”, “Paper”, or “Scissors”).

  • Optionally integrate into UI/UX, game interface, or robotics control.


Why the Algorithms Matter

  • Data augmentation helps avoid over-fitting by simulating variations in hand appearance and environment.

  • Transfer learning leverages large pretrained models to extract robust features (edges, textures, shapes) and fine-tunes them for your specific gesture classes.

  • Softmax classification inherently gives you class probabilities, which allows thresholding or ensemble techniques for higher reliability.

  • Dropout and BatchNorm improve generalization and stability of training.


Results & Metrics

training matrics

confusion matrix

You’ll want to report metrics like:

  • Final validation accuracy (%).

  • Precision, recall, and F1-score for each class (rock, paper, scissors).

  • Real-time inference latency (ms per frame).

  • Failure cases: e.g., ambiguous hand shape, blur, occlusion.


Lessons Learned & Challenges

  • Hand shape variation: different users, different skin tones, accessories (rings, watches) change appearance.

  • Lighting & background: strong shadows or clutter make segmentation harder.

  • Real-time constraints: achieving < 30ms latency per frame for smooth UX.

  • Dataset bias: often many “paper” samples but fewer “scissors”; balancing classes matters.


Future Work

  • Expand classes: e.g., “Lizard”, “Spock” (extended RPS game) or dynamic gesture sequences.

  • Use MediaPipe/Hand-Landmarks to localize hand and feed only hand region to model → reduce noise.

  • Deploy to mobile/web using TensorFlow.js or ONNX with WebAssembly for browser-based recognition.

  • Real-world application: accessibility interface, virtual-reality gesture control, game-integration.


How to Use / Get Started

  • Clone the project: git clone https://github.com/Riel0303ru/hand-rps-recognition

  • Install dependencies (see README).

  • Capture dataset or use provided set.

  • Train model or use pretrained weights.

  • Run inference script or GUI to test live recognition.

  • Modify model/parameters as needed for custom data.


Conclusion

Hand RPS recognition is a fun but meaningful project bridging computer vision, deep learning, and real-time inference UX. With the code provided and the concepts outlined above, you’re well equipped to build, experiment, and extend this system. Feel free to reuse, modify, and share — that’s the power of open-source.


References