Computer Vision (1)
Data Preparation (35)
- Feature Engineering (30)
- Sampling Techniques (5)
Deep Learning (52)
- DL Architectures (17)
  - Feedforward Network / MLP (2)
  - Sequence models (6)
  - Transformers (9)
- DL Basics (16)
- DL Training and Optimization (17)
Generative AI (2)
Machine Learning Basics (18)
Natural Language Processing (27)
- NLP Data Preparation (18)
Statistics (34)
Supervised Learning (115)
- Classification (70)
  - Classification Evaluations (9)
  - Ensemble Learning (24)
  - Logistic Regression (10)
  - Other Classification Models (9)
  - Support Vector Machine (9)
- Regression (41)
  - Generalized Linear Models (9)
  - Linear Regression (26)
  - Regularization (6)
Unsupervised Learning (55)
- Clustering (37)
  - Clustering Evaluations (6)
  - Distance Measures (9)
  - Gaussian Mixture Models (5)
  - Hierarchical Clustering (3)
  - K-Means Clustering (9)
- Dimensionality Reduction (9)

Suppose there are a large number of predictors ‘p’. What is the best approach to find out if any of the p predictors are helpful in predicting the response ‘y’?

Updated: March 26, 2023

Suppose Null hypothesis (H₀) is:

?₁= ?₂ = ……….?_p= 0

And Alternate hypothesis (H_a) is:

At least one of the ?_i not equal to 0

Best approach: In order to find if any of the ‘p’ predictors are helpful in predicting ‘y’, use F-Statistic. (This approach works well when p<n. For p>n, other high dimensional methods will work)

Side Note: T-statistic might not be good in this scenario

If p is large, let’s say p = 200, and none of the variables (p₁, ….p_n) are predictive for response variable y (i.e. null hypothesis above is true), yet about 5% of the p-values associated with each of the variables comes below 0.05 by chance. Now, in reality, these variables with low p values do not have any predictive power. The lower p-value is just by chance. Therefore, if we are using individual t-statistic and p values to conclude that the variables have predictive power, we may be drawing the wrong conclusion.

As F-statistic adjusts for the large number of variables, it doesn’t suffer from the above problem

Help us improve this post by suggesting in comments below:

– modifications to the text, and infographics
– video resources that offer clear explanations for this question
– code snippets and case studies relevant to this concept
– online blogs, and research publications that are a “must read” on this topic

Leave the first comment (Cancel Reply)

You must be logged in to post a comment.

Partner Ad

Join us on:

Find out all the ways that you can

Contribute

Partner Ad

Learn Data Science with Travis - your AI-powered tutor | LearnEngine.com

Suppose there are a large number of predictors ‘p’. What is the best approach to find out if any of the p predictors are helpful in predicting the response ‘y’?

Leave the first comment (Cancel Reply)

Other Questions in Linear Regression