Conclusion

The main takeaway from the project is that the serve is the most important shot in tennis, regardless of surface. It is the shot that starts the point, and for each service game, the server usually has the advantage. From the association rule mining, the serving table had significantly more associations between surface and serve than the rallying table and the surface. This takeaway can also be backed up by the linear regression, since double faults and serve win percentage were considered significant variables. It was also interesting to note that non-serving related statistics had no effect on the surface. On the professional level, it seems like everybody can hit the ball well and sustain it over long periods of time, so it can be interpretable that there’s no real advantage to hitting the ball harder or more accurately in general. In the association rule mining, there were not many connections between surface and rallying, and there were no significant variables in the linear regression, which backs up the claim. 

In terms of surfaces, there does seem to be an effect each surface has on a player. It was interesting to note that from the clustering and support vector machines, there was an evident connection between the grass court and the clay court, despite them being completely different surfaces. This was probably due to the fact that their seasons in a year are similar lengths. The clay court, as the slowest court with the highest bounce, seems to value consistency the most. That surface is about making the least mistakes and remaining steady. The hard court, with a medium bounce and speed, allows players to be more aggressive than a clay court. With metrics like aces and unforced errors as significant, the surface encourages players to take more risks. The grass court is a wildcard surface. With the lowest bounce and fastest speed, there were very few metrics that provided a definitive advantage or linked to a play style. It could be considered an unpredictable surface with no statistical significance.

Playing styles can also be matched to the surfaces. Recall that the four main play styles of tennis are the offensive baseliner, counterpuncher, serve and volley, and all court player. The counterpuncher is evidently the best on a clay court. The court surface values consistency, and counter punchers are known to not only be consistent, but also picks and chooses their aggression the best. On the contrary, the offensive baseliners preferred surface is the hard court. The increase in speed compared to the clay court gives this category more of an advantage. Offensive baseliners like to be aggressive with their shots and the hard court fits that play style the best. Since the grass court is unpredictable, the serve and volley players probably would work best on this surface. For an erratic surface, a player could eliminate the volatility by finishing points quickly as opposed to longer, drawn out points. Finally, the all court player benefits from all surfaces, since they are capable of doing a little bit of everything. 

There were definitely some limitations to the models that were built in this project. First, the data was a little bit skewed, favoring the hard court. This was so because the majority of the tour is on the hard court. Clay court season is only two months of the calendar year, while the grass court is just one month long. In contrast, the hard court is used in the other seven months of the tour. This makes it so that the hard court numbers that aren’t percentages are much higher than the grass and clay court. In the future, any model that uses career statistics should be either normalized by surface to account for this. Another option could be to just choose one year instead of an entire career. That way, the data can be evaluated by season and give a more proportionate outcome to the study. It was important to ensure that the data was balanced in the study, which it was in this study. However, a career study might have not been the most effective way to conduct it because sports generally have eras, and certain eras may not be reflective of how the game is played right now. However, doing the study this way has shown what has remained consistent over the eras of tennis, which mainly lie in the serving aspect of tennis.

Overall, it was an insightful study into how a surface affects not only the play on the court, but also how playstyles benefit from it. This research could be used by anybody to help their tennis game or strategize more efficiently to have success on a surface. This could also be used as a guide or evidence for coaching purposes. This topic could be expanded upon through other racquet sports. One that comes to mind is the pickleball, which has taken off in recent years. Pickleball is like a blend of tennis and ping pong, where the game is played on tennis surfaces but utilizes a ball and paddle on a much smaller court. It has gotten popular over recent years due to its quick learning curve and accessibility to the general public. It would be interesting to see a pickleball analysis and see if there is any relation at all to tennis surfaces.