Tennis Statistics and Data Capture Application

The project is a tennis performance analysis and profiling application. The application is connected to a Bluetooth-enabled device that attaches to the tennis racket. This device sends real-time data, allowing the application to collect a wide range of information for performance analysis. The device provides users with insights into their consistency, ball speed, and the variety of different shots they execute, as well as helps them identify their closest player profile. Additionally, the application allows for analyzing the user's running technique simultaneously.

Furthermore, the same device can be placed in the user's pocket, providing data to the application for collecting informaDon about their running activity, such as step count, running versus walking detection, speed, and distance covered. It also offers insights into the user's running profile. The application is user-friendly and compatible with all Android smartphones, ensuring accessibility to a wide range of users.
Overall, the project aims to provide tennis players and runners with a comprehensive analysis of their performance and profiles. The combination of real-time data collection and user-friendly features makes the application a valuable tool for enhancing performance and understanding individual strengths and areas for improvement.

view github

User Story and Goal

The application offers real benefits to users. Whether they are amateur, semi-professional, or professional tennis players, having access to game statistics provides a valuable opportunity for improvement. While sophisticated camera systems exist for analyzing the performance of top professional players, those without access to such expensive technologies can turn to simple performance analysis applications. Runners and tennis players can retrieve their data after a match, analyze their performance, view their profiles, and track their progress from one session to another, identifying areas for improvement. This allows them to monitor their development over time and identify areas where further enhancement is needed.

The goal of the application is to collect data such as the number of stroke performed during the session, the number of forehand and backhand, the average and maximum number of strokes per rally, the maximum and average speed of the ball. For the running part, we want to find the type of activity (running/walking), the speed, the distance, and the number of step. Moreover, we want to use the data to establish a profile for the user.

Hardware

In this project, we were tasked with building the hardware device that attaches to the tennis racket, which is crucial for capturing real-time performance data during a session. The core components of the device include an ESP32 microcontroller, accelerometer, magnetometer, gyroscope, and battery. These materials, although not expensive, were carefully selected to ensure that the device is both functional and non-intrusive to the player.

The ESP32 microcontroller serves as the central unit, responsible for processing data from the sensors and transmitting it via Bluetooth to the mobile application. The accelerometer is essential for measuring the player’s movements, including walking and running, as well as detecting the impact of the racket with the ball. It helps to capture data on the force and frequency of each stroke, which is crucial for analyzing the consistency and power of the player’s performance. The gyroscope plays a vital role in determining the orientation of the racket. This allows the application to differentiate between various stroke types, such as forehand, backhand, and volleys, by tracking the angle and direction of the racket during each movement. The magnetometer, in combination with the gyroscope, provides additional orientation data to improve accuracy in stroke detection, particularly in distinguishing the racket’s positioning during different shots.

In addition to these components, the device includes a battery to power the sensors and ensure uninterrupted data collection. One important feature is the device’s modularity; it can be detached from the racket and placed in the user’s pocket to capture running-related data. In this setup, the accelerometer continues to track the user’s step count, detect the type of activity (running vs walking), and monitor running speed and distance. The device’s ability to switch between different uses — from racket performance to running analysis — offers versatility and comprehensive tracking.

While the required materials were straightforward and cost-effective, a challenge we faced was the physical size of the components. Ideally, smaller equipment would have been beneficial to make the device more discreet and easier to attach to the racket without interfering with the player's comfort or performance. Nonetheless, the current design provides a solid foundation for gathering valuable insights into both tennis and running activities.

Arduino

We wrote the Arduino code to enable the device to function properly and transmit the necessary data to the mobile application. The primary objective of the Arduino code was to collect data from the sensors (accelerometer, gyroscope, and magnetometer) and then process and transmit this data at a suitable frequency for real-time analysis.

The Arduino code begins by initializing the sensor modules and configuring them to collect data along the three axes (X, Y, and Z) for each of the sensors. The accelerometer measures the device's acceleration in three dimensions (X, Y, Z), which is useful for tracking movement, including the player's strokes and running activity. The gyroscope measures angular acceleration along these axes, which provides information about the device's rotation and helps identify the orientation of the racket. The magnetometer is used to detect the device’s orientation relative to magnetic north, providing further information about the position of the racket during strokes.

The raw data collected from the sensors is initially in the form of raw accelerations, angular velocities, and magnetic field intensities along each axis. The Arduino code then processes this data by calculating the norms of the accelerometer, gyroscope, and magnetometer readings. This is done using the Euclidean norm formula.

This calculation simplifies the data and provides a single magnitude value for each sensor, which is often more useful for understanding the overall behavior of the system (e.g., total acceleration or angular velocity). These calculated norms are important for analyzing overall device movements and distinguishing between different types of actions.

The Arduino code then sends this processed data over Bluetooth to the mobile application. To manage the flow of data and avoid overwhelming the Bluetooth module, the code transmits data every 45 milliseconds. This interval was carefully chosen because the Bluetooth module on the ESP32 is limited in the amount of data it can handle at once, especially when multiple sensors are sending large amounts of data. The 45-millisecond interval strikes a balance between data granularity and system performance, ensuring reliable and continuous data transmission without overloading the communication channel.

Once the data is transmitted, the mobile application receives and processes it for further analysis. The application uses the data to compute useful insights, such as stroke types, consistency, ball speed, and running metrics.

In summary, the Arduino code receives raw sensor data, processes it by calculating norms, and transmits the processed data at regular intervals, ensuring a continuous flow of information to the mobile application. This setup provides a robust foundation for collecting real-time performance data from both the tennis racket and the user's running activities.

Data Collection

For data collection, I developed a custom interface in Java to receive Bluetooth data from the device. This interface was designed to capture all sensor data sent by the device attached to the racket, including accelerometer, gyroscope, and magnetometer readings. I integrated Bluetooth communication into the Java application, ensuring that the incoming data was continuously received and processed in real-time. To streamline the data collection process, I created a system within the application to automatically store the incoming data in CSV files, which would later be used for analysis.

Additionally, I implemented a feature to record specific player actions, such as the type of stroke performed (forehand or backhand) and the moment of ball impact. This was done by adding a "strike" button, which the data collector pressed to mark the exact moment when the ball hit the racket. The inclusion of this button allowed for precise synchronization of the sensor data with the timing of the ball strike, significantly improving the accuracy of the analysis.

I gathered data from four tennis players of varying skill levels, and for each player, we conducted multiple sessions involving different types of shots, including forehands, backhands, powerful shots, normal shots, and random shots. In total, I collected nearly 1,500 shots, with 750 forehand shots and 750 backhand shots. Each session was carefully labeled, ensuring that the data was properly categorized. This extensive dataset formed the basis for training machine learning models to distinguish between walking and running activities, as well as to differentiate between various tennis strokes. The data collected through this process is now being used for performance analysis and to improve our ability to recognize different types of movements and strokes in real-time applications.

Running Part of the App

Data Analysis

For the running part of the project, we focused on analyzing the accelerometer data to distinguish between walking and running activities and to calculate the number of steps taken. After importing the data into Python, we observed several missing data points, particularly in the X-axis coordinates of the accelerometer, which suggested a potential hardware issue. To address this, we replaced the missing values by calculating the average of the data points immediately before and after the missing timestamp. This helped restore continuity in the data and allowed for further analysis.

Next, we plotted the norm of the accelerometer data, which represented the magnitude of the accelerations across all three axes (X, Y, Z). This graph revealed numerous peaks, each corresponding to a step taken by the user. To identify each step accurately, we applied a function to detect all local maxima in the curve. We then normalized the data to remove noise and focused on identifying the most significant peaks that indicated actual steps. However, determining the optimal threshold for peak height and the appropriate distance between peaks posed a challenge. If the thresholds were set too low, the model became too sensitive, potentially detecting non-step movements as steps. On the other hand, setting the thresholds too high would mean that some steps were missed, affecting the accuracy of the step count.

To overcome this challenge, we developed a model that could distinguish between walking and running activities by dynamically adjusting the thresholds for each activity type. This ensured that the step counting algorithm would remain accurate across different physical activities. By fine-tuning these thresholds, we were able to create a reliable system for accurately counting steps and distinguishing between walking and running, providing valuable insights into the user’s activity and performance.

Machine Learning Model

To predict whether the user's activity is walking or running, we implemented several machine learning classifiers. Using the data collected from the participants, we labeled the signals accordingly, ensuring a large amount of shared data for training. We experimented with different machine learning models, including SVM (Support Vector Machine), Hard SVM, and Random Forest. After testing these models, we achieved the best results with the Random Forest Classifier, which achieved an accuracy of 91%. The Random Forest model successfully distinguished between running and walking activities, providing reliable predictions.

To train and evaluate the model, we divided the dataset into training and test sets. The model was trained using the training set, and its performance was evaluated using the test set. The confusion matrix, which shows the distribution of true and misclassified instances, was used to assess the model's accuracy. Misclassified datasets were highlighted in violet to identify potential issues in the predictions.

We chose machine learning for this task because it allows us to generalize the concepts of walking and running across various individuals. By utilizing a machine learning model, we were able to accommodate different running styles and paces, recognizing that everyone has unique movement patterns. This flexibility made the model adaptable to a range of users, improving its ability to classify activities accurately.

Next, we focused on step counting. We observed that peaks in the norm of the accelerometer's data corresponded to steps, but there were also many peaks of varying heights and intervals. Some of these peaks represented steps, while others were noise. To address this, we implemented an algorithm that tested different combinations of minimum peak height and minimum peak distance for walking and running activities. We then calculated the prediction error, comparing the true number of steps (labels) with the model's predictions. By adjusting the thresholds for peak height and distance, we minimized the prediction error.

Through this process, we determined that the optimal minimum peak height for both walking and running was 0.7 after normalizing the data. However, the minimum peak distance threshold differed between walking and running. For running, a threshold of 1 minimized the error, while for walking, a threshold of 3 worked better. The average prediction error (loss) was 0.2, which was very low and met our expectations. This low error suggests that the algorithm was effective at counting steps, differentiating between walking and running.

Finally, to estimate the user's speed and distance traveled, we used the speed formula. We calculated the average distance per step for each individual, finding that the average step distance was 80 cm for walking and 1.10 meters for running. These estimations were useful for providing additional insights into the user’s activity, although it's important to note that individual height variations were not taken into account.

By leveraging machine learning, we were able to accurately classify walking and running activities, count steps, and estimate speed and distance. This enhanced the overall functionality of our application, providing valuable data for users to track their physical performance and activity patterns.

Tennis Part of the App

Data Analysis

Our initial goal was to isolate the moment of impact between the racket and the ball. This moment marks the starting point for identifying each shot and conducting further analysis. Upon examining the accelerometer's data, we quickly noticed that the impact of the ball on the racket caused a distinct peak in the curve of the accelerometer's norm. This peak served as a clear indicator of the moment of impact.

Once we identified this peak, we could isolate each shot and analyze the corresponding curve. This allowed us to effectively detect the number of shots played during a session and estimate key performance metrics, such as the ball's speed and the number of shots in an exchange.

By leveraging the accelerometer's data, we were able to track and quantify each shot, providing valuable insights into a player's performance. Additionally, the timing and characteristics of the peaks allowed us to distinguish between individual shots, giving us the ability to assess various aspects of the tennis exchange, including shot frequency and intensity. This information can be used for further analysis of playing style, shot effectiveness, and overall performance.

By analyzing the time intervals between consecutive shots, we can determine when an exchange has ended and a new ball is played. This helps us count the number of shots in the exchange before it concludes, whether by the ball going out of bounds or the point being stopped. By looking at the time between the shots and identifying specific patterns, we can track the flow of the game and distinguish individual exchanges.

Furthermore, we were able to estimate the ball's speed based on the time intervals between shots. Since each timestamp represents 45 milliseconds, we calculated the ball's speed by considering the distance of a tennis court (23.5 meters). We used the following method:

Time Interval Adjustment: The time recorded between two consecutive shots reflects the round trip (the ball traveling to the opponent and then back). To calculate the speed for just the forward motion, we divided the time interval by 2.
Player Positioning Assumptions: We estimated that the player is positioned between 22 and 26 meters behind the net. These assumptions allowed us to estimate both a lower and upper bound for the ball's speed, providing an approximation of how fast the ball is traveling during each shot.

This method offered a useful approximation of the ball's speed during play, helping us better understand the intensity of the exchange and the player's performance.

Additionally, we sought to distinguish between forehand and backhand shots. After analyzing the signal from the accelerometer and magnetometer, we identified a significant difference in the curve of the magnetometer's norm axis during these two types of shots. This variation in the signal allowed us to develop an algorithm capable of classifying each shot as either a forehand or a backhand based on the pattern in the magnetometer data.

By observing the changes in the magnetometer's readings, we could discern which type of shot was played, providing valuable information for performance analysis and training purposes. This data can be used to assess a player's skill in executing forehand and backhand shots, enabling more detailed feedback on their technique and consistency during play.

We noticed that the curve for a backhand shot is increasing before the ball's impact, while the curve for a forehand shot is decreasing before the impact. However, this data is not generalizable to all shots, as different players have different techniques, and there are various types of forehand and backhand shots depending on the players' positions. Therefore, we decided to apply a machine learning algorithm to generalize the difference between forehand and backhand shots.

By training our machine learning model on labeled data, we were able to create a classifier that can predict whether a shot is a forehand or a backhand based on various features extracted from the signal curve. This approach allows us to account for the individual variations in players' techniques and achieve a more accurate classification.

Additionally, we extended this approach by developing a separate model to predict whether a shot is a topspin (lift), slice, or flat shot. Using similar features from the signal curve, we trained a machine learning classifier to distinguish between these different types of shots. This model further enhances our ability to analyze a player's playing style and shot selection, providing valuable insights into their technique and improving performance analysis.

Machine Learning Model

In order to generalize between forehand and backhand shots, we applied a machine learning model to classify each shot as either a forehand or a backhand. To achieve this, we extracted data from the 6 timestamps preceding the impact of the ball with the racket. We collected the 3 axes of the gyroscope and the 3 axes of the magnetometer for each of these 6 timestamps, resulting in a total of 36 features per shot. Since our dataset was labeled based on our measurements, we were able to use several classification models such as SVM, Hard SVM, and Random Forest. Once again, the Random Forest model yielded the best results with an accuracy of 92%.

This demonstrates that our model successfully distinguishes between forehand and backhand shots on our test set. The outcome is particularly interesting as it encompasses shots from different individuals and various types of backhand and forehand shots. Our model effectively generalizes the characteristics of a forehand and a backhand shot. As a result, we can provide additional data such as the number of forehand and backhand shots, as well as the average speed of a forehand and a backhand shot by following the same procedure outlined above.

Furthermore, we developed another model to predict the type of spin applied to the ball—whether it is a topspin, slice, or flat shot. Using the same feature extraction approach, we trained a Random Forest classifier, which achieved an accuracy of 80%. This model enables us to analyze not only the type of shot played but also the effect imparted on the ball, further enhancing our ability to evaluate a player's playing style and shot selection.

By utilizing machine learning techniques, our application can provide comprehensive insights into an individual's performance in terms of shot classification, counting, and average speeds for forehand and backhand shots. These findings contribute to a deeper understanding of players' playing styles and can aid in their training and skill development.

Profiling Model for Game Style Prediction

In addition, we implemented a player profiling method by calculating several additional features. These features include the average speed of forehand shots, the average speed of backhand shots, endurance (the average number of shots per rally), technique (the distribution between forehand and backhand shots), and consistency (the standard deviation of the number of shots per rally). We also incorporated the distribution of spin types—topspin, slice, and flat shots—into the profiling process. These metrics allowed us to establish detailed player profiles in tennis.

To enhance the accuracy of our profiling system, we leveraged data from the TennisStat dataset, which contains statistics from 23 professional ATP players. For each player, we compiled average values for key metrics such as forehand and backhand shot distribution, shot speeds, endurance, consistency, and the proportion of spin types (topspin, slice, flat). This dataset served as a reference for categorizing playing styles.

When a new player uses our application, we calculate their specific values for these characteristics based on the data collected during their session. To determine their playing style, we compute the Euclidean distances between their statistical profile and those of the 23 ATP players in our reference dataset. The closest matches allow us to classify the user into a playing style that aligns with a professional player's tendencies. This method provides users with an insightful comparison to top players, helping them understand their strengths and weaknesses in relation to high-level performance.

The player profiling feature enhances our application's ability to deliver personalized insights and recommendations tailored to each individual’s playing style. Players can use this information to adjust their training programs, refine their strategic approach during matches, and identify areas for improvement.

Additionally, we applied a similar profiling approach to running, focusing on average speed and endurance (total running time). This allows runners to classify their running style based on the limited data we collected, enabling them to track their progress and set performance goals.

By combining advanced profiling techniques with real-world player data, our system provides a comprehensive and detailed analysis of both tennis and running performance, offering users valuable insights to optimize their game and training strategies.

Application Design

Our application consists of seven different pages, each serving a specific purpose and providing various functionalities. Here, we will outline the organization of the application and describe the features offered by each page:

Bluetooth Page

The first page of the application is the Bluetooth connection page. Here, users can establish a connection with the Bluetooth device by selecting the appropriate device from the available options. The page contains three Java files that handle the Bluetooth connection and transmit the Bluetooth address of the device to other Java files for data collection purposes.

Main Menu Page

Once the user selects the Bluetooth device, they are taken to the main menu of the application. Here, they can see the status of the device connection and determine whether it is connected or disconnected. They have the option to choose their activity using a switch button, allowing them to select either tennis or running based on their desired activity. When they press the "Start" button, they can begin their activity. A timer starts on the page to track the duration of the session. Once the session is completed, the user presses the "Stop" button.

In the backend, when the user presses the "Start" button, a timer begins, and a StartFlag activates a receive function that collects data sent by the Bluetooth device. The application gathers 12 data points every 45 milliseconds and stores them in an ArrayList. When the user presses the "Stop" button, this ArrayList is saved in a CSV file called "Memory.csv", which is stored in the phone's memory. The file is overwritten each time a new session starts, ensuring that only the most recent session data is saved.

Finally, within the application, the user can navigate to either the statistics page or the profile page, based on their preference. This allows them to analyze their session statistics or review their profile once the session is completed, enabling them to track their performance and progress over time.

Statistics Pages

The statistics page varies depending on whether the user has selected the tennis or running option. When the user chooses to view their statistics, the relevant page is displayed.

For tennis, the user can see:
- The total number of shots
- The number of backhands and forehands
- The average number of shots per rally
- The maximum number of shots in a single rally during the session
- The average ball speed
- The maximum ball speed recorded during the session
- The percentage of flat shots, slice shots, and topspin shots
- The percentage of flat, slice, and topspin shots specifically for forehands
- The percentage of flat, slice, and topspin shots specifically for backhands
For running, the user can see:
- The type of activity they have selected
- The total number of steps
- The distance covered
- The average speed

When the user presses the statistics button, it automatically triggers a Python function that utilizes all the machine learning models we have built and performs the required analyses directly on the "Memory.csv" file stored in the phone's memory. This function processes the data and returns all relevant statistics to the Java code, which then displays them on the application.

This seamless Python-Java integration ensures efficient data processing and real-time retrieval, providing users with detailed insights into their performance.

Profile Pages

The profile page is accessible from the menu page after a session. It displays a graph that evaluates various statistics, including:

Endurance (average number of shots per rally)
Consistency (standard deviation of shots per rally)
Forehand speed (average forehand shot speed)
Backhand speed (average backhand shot speed)
Ratio of backhands to forehands played
Percentage of flat, slice, and topspin shots
Percentage of flat, slice, and topspin shots for forehands
Percentage of flat, slice, and topspin shots for backhands

These statistics help users visualize their player profile and identify their playing style based on predefined player categories.

Just like the statistics page, when the user accesses the profile page, a Python function is triggered. This function reads the session data directly from the "Memory.csv" file stored in the phone's memory. It then processes the data and returns the results to the Java code, which displays the graph and player profile on the application.

This seamless Python-Java integration allows users to analyze their strengths and weaknesses and gain deeper insights into their performance.

Additionally, a return button is available on each page, allowing users to navigate easily within the application.

Conclusion

Despite the limitations we encountered, we firmly believe that the project holds significant value for tennis players and runners, providing them with valuable performance feedback, analysis, and personalized profiles. Given more time, we could further enhance the application by incorporating a long-term performance tracking feature, allowing players to appreciate their progress over time. Additionally, with improved motion analysis, we could implement a 3D reproduction of racket movements, enabling a comprehensive assessment of the player's strengths and weaknesses and offering tailored progression advice.

Moreover, with a larger and more diverse database of collected data from a broader range of players, we could provide much more precise player profiles. This enhanced database could even support the development of a tennis partner matching feature, where players can find suitable partners based on playing style and skill level.

While our current implementation has its limitations, it serves as a solid foundation for future enhancements and advancements in sports performance analysis. By leveraging emerging technologies and expanding our data collection efforts, we can continue to refine and expand the application’s capabilities, ultimately providing players with a comprehensive and personalized sports performance solution.

Limitations

We encountered limitations due to the low power of our smartphone model, which had a limited Bluetooth capability and was quickly overwhelmed by the data received through Bluetooth. As a result, we were unable to receive a large amount of data at shorter intervals. Additionally, we faced challenges with the hardware components, such as the ESP32 and the battery. The size and weight of the setup made it difficult to attach to the racket without impacting the player's comfort and performance.

Furthermore, we were unable to differentiate between different types of forehand and backhand strokes (topspin, slice, flat) because we couldn't obtain precise measurements of the racket's inclination. This limitation prevented us from capturing detailed information about the specific techniques and variations in the players' strokes.

Despite these limitations, we were able to collect valuable data and develop features that provide insights into the player's performance and profile. Future improvements and advancements in hardware and technology may address these limitations and further enhance the capabilities of our application.