Section 3. General biology
https ://doi.org/10.29013/ELBLS-20-2.3-37-42
Yichi Xu, Our Lady of the Elms Colonnade Dr, Akron, Ohio E-mail: [email protected]
PREDICTING CHILDHOOD ADHD USING INDIVIDUAL CHARACTERISTICS AND ENVIRONMENTAL RISK FACTORS
Abstract. Attention-deficit/hyperactivity disorder (ADHD) is frequently found in children these days, but its risk factors remain uncertain. The main aim of this study is to identify and indicate the main risk factors for ADHD. Family income, household and family sizes, demographic information about the household reference person, and other demographic information such as gender, age, race, education and country of birth were assessed in 1576 youths examined in the NHANES National Youth Fitness Survey (NNYFS). Logistic regression analysis was performed to analyze the relationship between a set of potential risk factors and the outcome (ADHD). In addition, Kolmogorov-Smirnov (IKS) statistic and Receiver operating characteristic curve (ROC curve) were used to evaluate the predictive model. The linear regression models indicate factors such as male, younger age and maternal smoking will largely increase the risk of developing ADHD while significant influence was not found in other factors like race, family income, education and birth weight.
Keywords: ADHD, mental disorder, methylphenidate, dextroamphetamine, maternal smoking.
1. Background sure to environmental toxins during pregnancy such
Attention-deficit/hyperactivity disorder as high levels of lead, at a young age, low birth weight,
(ADHD) is a disorder marked by an ongoing pat- and brain injuries [1]. ADHD is more common in
tern of inattention and/or hyperactivity-impulsivity males than females, and females with ADHD are
that interferes with functioning or development [1]. more likely to have problems primarily with inatten-
Inattention and hyperactivity/impulsivity are the tion. Based on the current uncertainty of risk factors
key behaviors ofADHD. Some people with ADHD that may cause ADHD, data analysis is used in this
only have problems with one of the behaviors, while study in order to more accurately confirm the risk
others have both inattention and hyperactivity- factor that cause ADHD [1]. impulsivity [1]. Most children have the combined 2. Study methods: type ofADHD. Researchers are not sure what causes 2.1 Data source
ADHD. Like many other illnesses, several factors can The NHANES National Youth Fitness Survey
contribute to ADHD, such as genes, cigarette smok- (NNYFS, website: https://www.cdc.gov/nchs/nny-
ing, alcohol use, drug use during pregnancy, expo- fs/index.htm) was conducted in 2012 as a one-time
survey by the Division of Health and Nutrition Examination Surveys, National Center for Health Statistics, part of the Centers for Disease Control and Prevention. It collected data on physical activity and fitness levels of children and teens in the U.S. ages 3 to 15 years. The NNYFS collected data through interviews and fitness tests. The fitness tests included standardized measurements of core upper and lower body muscle strength, as well as a measurement of cardiovascular fitness by walking and running on a treadmill. The family and sample person demographics questionnaires were asked, in the home, by trained interviewers. An adult family member, aged 18 years or older, was interviewed as a proxy for the survey participant.
The demographics file provides individual, family, and household level information on the following topics:
• family income;
• Household and family sizes;
• Demographic information about the household reference person;
• Other selected demographic information, such as gender, age, race/Hispanic origin, education, and country of birth.
NNYFS2012included a total of 1,640 youths' data from the interview, and among them, 1,576 youth were examined.
2.2 Study variables ADHD
Children on common ADHD medications were identified by medication use information. The most commonly used stimulant medications include methylphenidate (Ritalin, Concerta), mixed amphetamine salts (Adderall), dextroamphetamine (Dexedrine), and lisdexamfetamine (Vyvanse). Non-stimulant medications with a specific indication for ADHD include atomoxetine (Strattera), Vguanfacine (Intuniv).
Variables of potential predictors
We tried to include variables from literature that are suggested as related to ADHD risk and mean-
while with available data from the survey. The following variables of participants are included: Gender; Age;
race;
birth weight; maternal smoking; Ratio of family income to poverty.
Other information of the participant's mother was not directly available. Therefore, we used the reference person's information as a proxy, including the following:
• gender;
• education;
• marital status.
2.3 Analysis
To formally train and then test a predictive model, data was split randomly into two datasets, with one half for model development ("training" data), and the other half for model validation ("testing" data).
Using the training data, we performed logistic regression analysis to build the predictive model. Logistic regression is a type of generalized linear regression for analyzing relationship between a set of explanatory variables and a binary outcome variable (i. e., with yes/no value). In this study, the outcome is if a student has above average grades. The model is: ln(odds of outcome event)=ln(Prob/Prob-l) = b0 + +b ,*X, + b *X + ... .+ b*X
112 2 n n
"Prob" is the probability of an event, and is convertible with odds. X1, X2, ... Xn are explanatory variables. b is regression coefficient for a specific X.
• If the coefficient of a variable X is above 0, it means that X is related to a higher odds/ probability of the event. The corresponding Odds Ratio will be above 1.
• If the coefficient of a variable X is equal to 0, it means that X is NOT related to the event. The corresponding Odds Ratio equals to 1.
• If the coefficient of a variable X is below 0, it means that X is related to a lower odds/prob-
ability of the event. The corresponding Odds Ratio will be below 1. Lastly, the prediction model was tested using the testing data to examine if the model provides good prediction of the outcome. The following measures are used to evaluate if the model is a good fit:
• Kolmogorov-Smirnov (KS) statistic. IKS is the most commonly used model evaluation metric for models predicting binary outcomes. It reflects the distance between distributions of positive outcome and negative outcome. A higher KS means more separation of positive vs negative. Therefore, a higher value indicates
better model fit. KS ranges from 0% to100%. A rule of thumb is that a KS of 0.4 suggest good discrimination of the outcome [1].
• Receiver operating characteristic curve (ROC curve) and the Area under the ROC Curve (AUC). ROC curve is a graphical plot that illustrates the diagnostic ability of a model [2]. For AUC, the higher the better.
3. Results
The study sample included 50% boys and 50% girls. Average age was 9 years.
4% of the study sample were found with ADHD. this is similar with the reported national average.
Figure 1. Prevalence of ADHD in study sample Figure 2. Racial composition of study sample 3.1 Development of the prediction model Table of Coefficients:
Table 1. - Of coefficients from training data
Estimate standard error z value p-value
-6.867 1.784 -3.848 <0.001
male 1.39 0.527 2.638 0.008
age 0.129 0.059 2.178 0.029
Race
Black vs. White -0.548 0.587 -0.934 0.35
Other vs. White -0.653 0.671 -0.973 0.33
birth weight 0 0 0.809 0.418
income poverty ratio -0.059 0.18 -0.333 0.739
maternal smoking 1.354 0.533 2.54 0.011
reference person's information
male vs. female 1.086 0.504 2.151 0.031
education 0.158 0.239 0.662 0.507
married vs. other -0.939 0.529 -1.774 0.076
Table 2.- Of odds ratios from training data
Odds Ratio Lower CI Upper CI
male 4.018 1.536 12.633
age 1.138 1.016 1.285
Race
Black vs. White 0.577 0.167 1.739
Other vs. White 0.52 0.124 1.817
birth weight 1 0.999 1.001
income poverty ratio 0.941 0.656 1.337
maternal smoking 3.876 1.317 10.896
reference person's information
male vs. female 2.962 1.129 8.294
education 1.171 0.743 1.909
married vs. other 0.39 0.137 1.112
ReT: married vs. other-Ref*: male vs. female -ReP: education Other vs. White-maternal_smoking -male
income_poverty_ratio -Black vs. White birthjweight age
!
i 1
w i 1
i i
i A
i i A
, ■ : !
1 • 1 i 1 :
0 0
Odds ratios
'Ref: reference person
Figure 1. Factor predicting childhood ADHD
False positive rate Figure 2. ROC curve
As introduced in the Methods section, an Odds Ratio above 1 means that the variable is related to a higher risk of the outcome, while an Odds Ratio below 1 means that the variable is related to a lower risk of the outcome. From the tables, we can see that three variables are related to a higher risk ofADHD: male gender, older age, and maternal smoking during pregnancy. Reference person's gender also seems to related to ADHD but practically this finding is not very meaningful.
3.2 Validation of the prediction model
When applying the above model to the testing data, the KS statistic was 0.41. Meanwhile, the following ROC curve was generated. AUC was 73%.
4. Discussion
After the data of study samples were analyzed, the prediction model was developed. Based on the table of coefficients, as those coefficients that are greater than 0 mean the risk factor is highly related to ADHD, conditions including gender (male), age, maternal smoking, and education fulfill the requirements we are looking for. Another standard that has to be met is the p-value, which if it is under 0.05, it shows the risk factor is related. Therefore, education can be eliminated from the list.
In the table of odds ratios from training data, the table shows that all the potential risk factors that can be shown related in the last table are also shown relatable
to ADHD. By simply looking at the table, risk factors including gender, age, and maternal smoking all have a odds ratio that are greater than 1, which shows a great likelihood of those risk factors related to ADHD.
In addition, the prediction model was tested by KS statistic and ROC curve. In this study, as the testing data was applied, the KS statistic turned out to be 0.41, which indicates a relatively data-fitting model. Also, after the RO C graph was produced, the AUC was calculated to be 73%, showing a good fit of the data and the relativity.
Contrary to the former researches [3; 4], some results from this study predicting risk factors like prenatal factors appear to be contradictory, while genetic factors like gender seem to be mutual. The uncertainty of whether maternal smoking being a risk factor might be due to the difficulty surveying mothers' smoking history, including the length and frequency of their smoking history.
Based on the fact that ADHD has already become a prevalent disorder, studying its risk factor has a great significance reduce the rate from people getting it, and that is the major objective of this study.
5. Conclusion
Male sex and younger age are significant risk factors of ADHD. Smoking during maternal presidency will also largely increase the risk developing the disorder.
References:
1. The National Institute of Mental Health Information Resource Center, Attention-Deficit/Hyperactivity Disorder (ADHD): The Basics.
2. Richter, and Natasha L. "A Second Look at Don John, Shakespeare's Most Passive Villain." Inquiries Journal, Inquiries Journal, 1Jan. 2010. URL: http://www.inquiriesjournal.com/articles/133/a-second-look-at-don-john-shakespeares-most-passive-villain.
3. Sagiv Sharon K., et al. "Pre- and Postnatal Risk Factors for ADHD in a Nonclinical Pediatric Population - Sharon K. Sagiv, Jeff N. Epstein, David C. Bellinger, Susan A. Korrick, 2013." SAGE Journals. URL: http://journals.sagepub.com/doi/abs/10.1177/1087054711427563.
4. Knopik Valerie S., et al. "Contributions of Parental Alcoholism, Prenatal Substance Exposure, and Genetic Transmission to Child ADHD Risk: a Female Twin Study: Psychological Medicine." Cambridge Core, Cambridge University Press, 6 Jan. 2005. URL: http://www.cambridge.org/core/journals/psycholog-
ical-medicine/article/contributions-of-parental-alcoholism-prenatal-substance-exposure-and-genetic-transmission-to-child-adhd-risk-a-female-twin-study/A2EA7BF80712C3CFA09C71C78966AE03.
5. Tabachnick B., and Fidell L. Using Multivariate Statistics (4th Ed.). Needham Heights, MA: Allyn & Bacon, 2001.
6. Stat Soft, Electronic Statistics Textbook. URL: http://www.statsoft.com/textbook/stathome.html; URL: http://www.statsoft.com/textbook/stathome.html.
7. Stokes M., Davis C. S. Categorical Data Analysis Using the SAS System, SAS Institute Inc., 1995.