DOI 10.14526/01_2017_199
TRIATHLETES PERFORMANCE PREDICTION MODEL
Domingos R. Pandelo Jr
Univesiade Federal de Sao Paulo (Campus Baixada Santista), Brazil Centro de Alta Performance (High Performance Center) Andre K. Saito
Univesiade Federal de Sao Paulo (Campus Baixada Santista), Brazil Paulo Henrigue S.M. de Azevedo
Univesiade Federal de Sao Paulo (Campus Baixada Santista), Brazil
E-mail: [email protected]
Abstract: The prediction of sport performance is relevant for identifying talent and establishing training strategies. The aim of this paper is to establish a model that has the ability to predict the performance of triathletes. It was used discriminant analysis, with is a multivariate analysis technique. 21 volunteers, 7 professionals and 14 amateurs, all male, was selected. Anthropometric, physiological and training variables, easy to be measured, without the need to use specific laboratories were selected. This study showed that the use of some variables can seek to infer the performance triathletes. The correct prediction rating was above 92 %, which can be considered very good. The expected performance is vital, whether for the detection of talent, whether for the structuring of training, which shows the importance of developing models of this kind. Keywords: triathlon , performance, training, talent detection.
Introduction. The prediction of sports performance is important to detecting talents and establishment of training strategies. Many studies have utilized discriminant analyses to rank and predict athlete's performance. Le Meur et al.1 had utilized this technique to predict overreaching in endurance athletes, while Saavedra et al.2 applied this method to predict performance in youth swimmers and Opstoel et al.3, by anthropometric variable data and performance, utilized discriminant analyses to rank young athletes in nine sports modalities.
The purpose of the current study was to establish a model that has the capacity to predict the performance of triathletes using the discriminant analysis (multivariate analysis) which is a linear regression model that aim to find the best equation able to predict what is looking for. In this case, we were looking for a model capable to measure the potential performance based on few predictive variables. We supposed that was possible to establish a simple model accessible to use in clinical practice.
Methods
The study was approved by the ethics committee (approval no.
48748015.8.0000.5509). The sample size was calculated utilizing the G*Power software and to reach a a=0,05 and |3=0,80 the minimum volunteers needed was 16. The inclusion criteria were: (1) practice triathlon for, at least, one year; (2) without previous musculoskeletal injuries; (3) do not use doping substance. The subjects were excluded if: (1) presented body mass index (BMI) equal or higher than 30kg/m2, (2) presented neuromuscular disease. They were divided in two groups: group 1 (professionals) and group 2 (amateurs). From January to December 2013, 21 healthy men (7 professionals, 14 amateurs), participated in this study. We chose male to have homogeneous and make the discrimination easier by the performance model. If female subjects had been included it would increase the results variability due to the difference in performance between genders and it would make the ranking more complex as well as subject to higher error rates. All participants signed the consent form.
Variables
Table 1 shows the variables measured and data collected by questionnaire in the current study. They were selected based on previous studies4-9. The maximum oxygen uptake (VO2Max) were indirectly calculated from 3000 meters running at maximum effort, weekly swimming distance training (WSD), weekly cycling distance training (WCD), weekly running distance training (WRD) and age of subjects.
The technique utilized
The discriminant analysis is indicated to rank specific element in one of k groups previously assessed. The aim of this technique is to create a function that maximize the variance between groups and, simultaneously, minimize the variance among groups10. By its aim, it is noticeable the importance of choosing the sample and regressors. The sample must be as homogeneous as possible, except the performance level, which is what we intend to discriminate.
The discriminant function represents the discriminant score Z which is the sum of regressors statistically selected by the model, weighted by their respective weights.
'Z' is the discriminant score, 'c' is a constant of model, 'p' is the weight of each variable 'v' selected.
As pointed out by Hair et al.10, the discriminant analysis is relatively robust to some violations required by multivariate analysis model, such as normality, linearity and homoscedasticity. However, due to sample size is relatively small (less than 30, that characterizes small sample size) and after carry out some normality and linearity tests, it was decided to work whit transformed values, aided by natural logarithm (nl), adjusting the data in order to minimize eventual issues coming from the violation of some classical premises of the model.
The statistical analysis was performed using the software SPSS v.21 (IBM SPSS Statistics v.21.0), to normality tests was used Kolmogorov-Smirnov and Shapiro-Wilk. To measure the linearity, we chose the scatterplot and the analysis of homoscedasticity was performed using Levene test. Results
The coefficient of variation has shown higher variability in years of practice and weekly swimming distance, while Body mass index has shown lowest variability. After inserting data and performing basics tests to verify the sample suitability to classical premises of multivariate analysis, the analysis to elaborate the model was started (table 2).
(professionals and amateurs).
Variable Mean SD Coefficient of variation Refference
BMI (kg/m2) 23,51 ±1,72 0,07 Gilinky et al. (2014) Knechtle et al. (2012) Knechtle et al. (2010)
Age (years) 38,71 ±8,7 0,22 Gilinky et al. (2014) Millet et al. (2002) Knechtle et al. (2012) Knechtle et al. (2010)
z
C + p , V. + p 2 V . +
P„v
n n
HRR (bpm) 50,14 ±9,92 0,20 Laursen & Rhodes (2001)
VO2Max (L/min) 52,73 ±8,00 0,15 Hue (2003); Laursen & Rhodes (2001); Millet et al. (2002)
Years practicing triatlhon 11,29 ±7,18 0,64 Gilinky et al. (2014)
Gilinky et al. (2014)
WSD (Km) 11,00 7,14 0,65 Hue (2003)
Millet et al. (2002)
WCD (Km) 221,90 93,95 0,42 Gilinky et al. (2014)
7
Hue (2003) Millet et al. (2002)
Table 1. Standard deviation (SD) and variables coefficient of variance utilized in model elaboration, as well as studies that have utilized the same indicators. BMI — Body mass index; HRR — Heart rate at rest; VO2Max — Maximum oxygen uptake; WSD — Weekly swimming distance training; WCD — Weekly cycling distance training; WRD — Weekly running distance training; Km — Kilometers.
Table 2. Significance level and effect size.
Wilks' Lambida F Sig. Hedges 'g
nlBMI ,741 6,647 ,018 1,19
nlAge ,838 3,674 ,070 0,89
nlBpm ,472 21,294 ,000 2,33
nlVO2Max ,572 14,243 ,001 1,75
nlYT ,980 ,392 ,539 0,29
nlWSD ,324 39,623 ,000 2,91
nlWCD ,694 8,366 ,009 1,34
nlWRD ,650 10,216 ,005 1,48
Table 2. nlBMI — Body mass index natural logarithm; nlAge — Age natural logarithm; nlHRR — Heart rate at rest natural logarithm; nlVO2Max - Maximum oxygen uptake natural logarithm; nlYT — Years of triathlon practice natural logarithm; nlWSD — Weekly swimming distance training natural logarithm; nlWCD — Weekly cycling distance training natural logarithm; nlWRD — Weekly running distance training natural logarithm.
Table 3 shows the ranking coefficients based on variables selected by discriminant analysis model to both professional and amateur groups.
Table 3. Ranking coefficient (Fisher's linear discriminant functions)
Group
Profissionals Amateurs
nlWSD 29,253 20,573
nlWCD 46,751 41,400
Constant -176,098 -125,819
Table 3. nlWSD — Weekly swimming distance training natural logarithm; nlWCD — Weekly cycling distance training natural logarithm
Table 4 shows the centroid (midpoint) of each group and table 5 shows the results regarding to original model and cross-validation model to both groups. Cross-validation is done only for the analysis cases. In cross-validation, each case is classified by functions derived from all different cases of this case.
Group Function
1 2,338
2 -1,169
Table 4. Group 1: Professionals; Group 2: Amateurs.
Table 5. Original model and cross-validation modela
Group Predicted group association Total
Profissionals Amateurs
Original model Counting 1 2 1 7 0 100 0 14 0 7 14 100
% 2 0 100 100
Counting 1 7 0 7
Cross-validation modelb 2 1 13 14
0/ 1 100 0 100
% 2 7,1 92,9 100
Table 5. Data from Original Model and Cross-validation model. a. 100% of original cases grouped correctly classified.
Discussion
The aim of the present study was to establish a model that have the capacity to predict the performance of triathletes using the discriminant analysis (multivariate analysis) based on few predictive variables. Considering the relevance of each variable to explain the phenomenon studied, it could be verified that weekly swimming distance training, heart rate at rest and VO?m.. were the most relevant based on significance level.
Our hypothesis was confirmed, been able to establish a simple prediction model accessible to be used in clinical setting. Measuring the effect size, we could notice that weekly swimming index appears to have a huge capacity of discrimination, as the heart rate at rest and VO2Max. This result was similar if we analyze measuring the p. value, but measuring the effect size it was possible to verify that the weekly running and cycling distance are relevant variables also.
The inclusion of effect size analysis is fundamental to better practical phenomenon assessment that intend to study11. In the present study, we chose to use Hedges' g instead Cohen's d due to small sample size, as well as by the size difference between groups12.
In the discriminant analysis, it is possible to choose for enter method, which all the variables that meet the minimal prerequisite of the model are forced to be in the final discriminant function, or stepwise method. In the stpwise method, the variables only stay in the model if they contribute to enhance the model's predictive capacity.
Therefore, we could notice that the most relevant variables to construct the discriminant model were the weekly swimming distance and weekly cycling distance. Such information seems
to be very relevant because they are not anthropometric neither physiologic variables, appearing that the performance in triathlon depends on training strategies.
Obviously when intend to distinguish athlete's performance to classify them in two groups (professionals and amateurs), many physiologic and anthropometric variables, besides training, looks important. The model tells us that when analyzed together, considering correlation and interaction among variables, it is possible to accurately infer the triathlete's performance by measuring the weekly swimming and cycling training.
It was noticeable the existence of two discriminant functions, being one to professionals and other to amateurs. Both are linear functions with one constant and two explanatory variables (weekly swimming distance natural logarithm and weekly cycling distance natural logarithm). Based on two equations and considering the values of centroid it was possible to rank the athletes in one of those groups.
By making the weighted average of each group centroids through sample size of each group, in the present study it would be zero and the values above zero would be closer to the amateur performance and values below zero would be closer to professional performance. These data from centroid of each group made easier the calculation of z cut score regarding performance.
Due to small sample size, we chose to work with original and cross-validation. The cross-validation can make n models, being n the sample size used, and it is important to test the predictive capacity of the model 13. The accuracy tends to be lower than original model, but it is a situation close to reality and is extremely
important when working with small sample size. Therefore, each athlete is tested in a built model without him in data base. Generally, the degree of accuracy in original model is higher than cross-validation model, because in the original model the athletes that were in the sample to build the model were tested and ranked with the same model.
The predictive capacity of the model, based on data can be considered as really good. It has occurred only one small percentage of bad rank, and it was specifically in an amateur athlete that the results are close to professionals. The present research showed that using some anthropometric, physiologic and training variables it is possible to infer performance in triathletes. The prediction performance in vital, whether to talent detection or for training program.
The limitation of the current study was the inability to create a model that translate exactly the reality, as the model is a simplification of reality with aim to facilitate the analysis and decision make. The contribution of our research was show a possibility to be explored, with building models to predict performance and ranking athletes based on multivariate analysis.
Conclusion
We concluded that is possible to predict the performance in triathletes by establishing a simple prediction model accessible to be used in clinical setting. To date, there is no study stablishing models utilizing simple variables. We suggest new studies with other complex variables to, eventually, get more accurate results.
Conflict of interest
None.
Refferences
1. Le Meur Y, Hausswirth C, Natta F, Couturier A, Bignet F, Vidal PP. A multidisciplinary approach to overreaching detection in endurance trained athletes. J Appl Physiol (1985). 2013 Feb;114(3):411-20. PubMed PMID: 23195630. Epub 2012/12/01. eng.
2. Saavedra JM, Escalante Y, Rodriguez FA. A multivariate analysis of performance in young swimmers. Pediatr Exerc Sci. 2010 Feb;22(1):135-51. PubMed PMID: 20332546. Epub 2010/03/25. eng.
3. Opstoel K, Pion J, Elferink-Gemser M, Hartman E, Willemse B, Philippaerts R, et al. Anthropometric characteristics, physical fitness and
motor coordination of 9 to 11 year old children participating in a wide range of sports. PLoS One. 2015;10(5):e0126282. PubMed PMID: 25978313. Pubmed Central PMCID: PMC4433213. Epub 2015/05/16. eng.
4. Gilinsky N, Hawkins KR, Tokar TN, Cooper JA. Predictive variables for half-Ironman triathlon performance. J Sci Med Sport. 2014 May;17(3):300-5. PubMed PMID: 23707141. Epub 2013/05/28. eng.
5. Hue O. Prediction of drafted-triathlon race time from submaximal laboratory testing in elite triathletes. Can J Appl Physiol. 2003 Aug;28(4):547-60. PubMed PMID: 12904633. Epub 2003/08/09. eng.
6. Laursen PB, Rhodes EC. Factors affecting performance in an ultraendurance triathlon. Sports Med. 2001;31(3):195-209. PubMed PMID: 11286356. Epub 2001/04/05. eng.
7. Knechtle B, Knechtle P, Wirth A, Alexander Rust C, Rosemann T. A faster running speed is associated with a greater body weight loss in 100-km ultra-marathoners. J Sports Sci. 2012;30(11):1131-40. PubMed PMID: 22668199. Epub 2012/06/07. eng.
8. Millet GP, Candau RB, Barbier B, Busso T, Rouillon JD, Chatard JC. Modelling the transfers of training effects on performance in elite triathletes. Int J Sports Med. 2002 Jan;23(1):55-63. PubMed PMID: 11774068. Epub 2002/01/05. eng.
9. Knechtle B, Wirth A, Baumann B, Knechtle P, Rosemann T, Oliver S. Differential correlations between anthropometry, training volume, and performance in male and female Ironman triathletes. J Strength Cond Res. 2010 0ct;24(10):2785-93. PubMed PMID: 20571444. Epub 2010/06/24. eng.
10. Hair JF, Black WC, Babin BJ, Anderson RE, Tatham RL. Multivariate data analysis: Pearson Prentice Hall Upper Saddle River, NJ; 2006.
11. Cumming G. Understanding the new statistics: Effect sizes, confidence intervals, and metaanalysis: Routledge; 2013.
12. Ellis PD. The essential guide to effect sizes: Statistical power, meta-analysis, and the interpretation of research results: Cambridge University Press; 2010.
13. Johnson RA, Wichern DW. Applied multivariate statistical analysis: Prentice hall Englewood Cliffs, NJ; 1992.
14. Kuznetsova Z.M., Savosina M.N., Hramova N.A. The model of image formation of a fututre specialist in the context of health. Pedagogiko-psihologicheskie i mediko-biologicheskie problemy fizicheskoi kul'tury i sporta [Pedagogico-pshycological and medico-biological problems of physical culture and sports], 2006, Vol.1, no 1, pp. 25-30. Available at: http ://www.j ournal-science.org/ru/magazine/49. html.
15. Kamalieva G.A., Kuznetsova Z.M. The model of training of volleyball players to overcome obstacles and difficulties that unexpectedly arise in competitive activities. Pedagogiko-psihologicheskie i mediko-biologicheskie problemy fizicheskoi kul'tury i
sporta [Pedagogico-pshycological and medico-biological problems of physical culture and sports], 2011, Vol. 6, no 2, pp. 38-44. Available at: http://www.journal-science.org/ru/magazine/30.html.
16. Adam Hewitt, Kevin Norton & Keith Lyons. Movement profiles of elite women soccer players during international matches and the effect of opposition's team ranking. Journal of Sports sciences. 2014, Vol. 32, pp. 1874-1880 (SCOPUS)
17. Paul Glavin, Marisa Young. THe influence of Regional Unemployment on Worker's Reactions to the Threat of Job loss. Journal of health and Social behavior. March 10, 2017. DOI: 10.1177/0022146517696148 (SCOPUS)
Submitted: 11.04.2017 Received: 14.04.2017
Domingos R. Pandelo Jr Univesiade Federal de Sao Paulo (Campus Baixada Santista), Brazil, Centro de Alta Performance (High Performance Center), E-mail: [email protected] Andre K. Saito Univesiade Federal de Sao Paulo (Campus Baixada Santista), Brazil Paulo Henrigue S.M. de Azevedo Univesiade Federal de Sao Paulo (Campus Baixada Santista), Brazil
For citations: Domingos R. Pandelo Jr, Andre K. Saito, Paulo Henrigue S.M. de Azevedo Triathletes performance prediction model, The Russian journal of physical education and sport (pedagogico-psychological and medico-biological problems of physical culture and sports), 2017, Vol. 12, No.2, pp. 6-13.
APPENDIX Triathlete assessment questionnaire
General Data
Name:_
E-mail:_
Country:
Gender: ( ) Male ( ) Female
Age:_
Height (m):_
Peso (kg):_
Beats per minute (at rest):_
Years practicing triathlon:_
Average amount of competitions per year:_
Number of short triathlon:_
Number of Olympic triathlon:_
Number of 70.3 triathlon:_
Number of 140.6 triathlon:_
Has coach? ( ) Yes ( ) No
Has a nutritionist? ( ) Yes ( ) No
Has sports psychologist ( ) Yes ( ) no
Has (specialist in) sports biomechanics ( ) Yes ( ) no
Complementary Activities
Do you do resistance training? ( ) Yes ( ) No
Do you do stretching training? ( ) Yes ( ) No
Do you do yoga? ( ) Yes ( ) No
Do you do pilates? ( ) Yes ( ) No
Do you do plyometric training? ( ) Yes ( ) No
Do you do crossfit training? ( ) Yes ( ) No
Do you do high intensity interval training (HIIT)? ( ) Yes ( ) No
Method for measuring training Intensity
( ) Heart Rate (BPM)
( ) Perceived exertion
( ) Pace
( ) Power meter
( ) Other(s)_
Supplementation
Supplements Before During After
Carbohydrate gel
Carbohydrate solution
Caffeine
Protein bar
Whey Protein
BCAA
Creatine
Recovery (solution, gel, bar)
Vitamins/Minerals
Dopamine
Other
Pre-Psychology Test
( ) No
( ) Positive Thinking ( ) Relaxation
Body composition (bioelectrical impedance test result)
Body mass index (BMI) (do not fill)
%/ Total fat:_
% Total lean body mass:_
Weight of lean mass:_
Body hydration:_
Muscle Hydration:_
Time - Maximum Effort
Swimming
700 metres:_
Cycling:
5 Km:_
Running:
3 km:_
Average distance by modality (weekly)
Swimming:_
Cycling:_
Running:_
Distance by intensity level
SWIMMING CYCLING RACE
Z1 (extremely light)
Z2 (light)
Z3 (moderated)
Z4 (slightly hard)
Z5 (strong)
Z6 (very strong)
Training hours (a week) by intensity level
INTENSITY ZONE SWIMMING CYCLING RACE
Z1
Z2
Z3
Z4
Z5
Z6
Results by competition (Best times)
Competition Swimming Cycling Race Transitions Total
Short
Olympian
70.3
140.6
Do you have sports biomechanics ( ) Yes ( ) No
Do you do specific ergospirometric assessment? ( ) Yes ( ) No
How many times a year?_
Performs lactatemia test? ( ) Yes ( ) No
Is there a periodization of training? ( ) Yes ( ) No
Which model?_
Performs fitbike? ( ) Yes ( ) No
During the competition what limits your performance?: Short-triathlon Swimming: Muscle fatigue Cycling: Muscle fatigue Running: Muscle fatigue Olympic triath Swimming: Muscle fatigue Cycling: Muscle fatigue Running: Muscle fatigue
70.3 triathlon Swimming: Muscle fatigue Cycling: Muscle fatigue Running: Muscle fatigue
140.6 triatlhon Swimming: Muscle fatigue Cycling: Muscle fatigue Running: Muscle fatigue
( ) respiratory fatigue ( ) mental fatigue (
( ) respiratory fatigue ( ) mental fatigue (
( ) respiratory fatigue ( ) mental fatigue ( lon
( ) respiratory fatigue ( ) mental fatigue (
( ) respiratory fatigue ( ) mental fatigue (
( ) respiratory fatigue ( ) mental fatigue (
( ) respiratory fatigue ( ) mental fatigue (
( ) respiratory fatigue ( ) mental fatigue (
( ) respiratory fatigue ( ) mental fatigue (
( ) respiratory fatigue ( ) mental fatigue (
( ) respiratory fatigue ( ) mental fatigue (
( ) respiratory fatigue ( ) mental fatigue (