Skip to main content
Skip to footer
Home
Interview Questions
Machine Learning Basics
Deep Learning
Supervised Learning
Unsupervised Learning
Natural Language Processing
Statistics
Data Preparation
Jobs
Home
Interview Questions
Machine Learning Basics
Deep Learning
Supervised Learning
Unsupervised Learning
Natural Language Processing
Statistics
Data Preparation
Jobs
Login
Sign Up
Explore Questions by Topics
Computer Vision
(1)
–
Data Preparation
(35)
Feature Engineering
(30)
Sampling Techniques
(5)
–
Deep Learning
(52)
–
DL Architectures
(17)
Feedforward Network / MLP
(2)
Sequence models
(6)
Transformers
(9)
DL Basics
(16)
DL Training and Optimization
(17)
Generative AI
(2)
Machine Learning Basics
(18)
–
Natural Language Processing
(27)
NLP Data Preparation
(18)
Statistics
(34)
–
Supervised Learning
(115)
–
Classification
(70)
Classification Evaluations
(9)
Ensemble Learning
(24)
Logistic Regression
(10)
Other Classification Models
(9)
Support Vector Machine
(9)
–
Regression
(41)
Generalized Linear Models
(9)
Linear Regression
(26)
Regularization
(6)
–
Unsupervised Learning
(55)
–
Clustering
(37)
Clustering Evaluations
(6)
Distance Measures
(9)
Gaussian Mixture Models
(5)
Hierarchical Clustering
(3)
K-Means Clustering
(9)
Dimensionality Reduction
(9)
Machine Learning Interview Questions
Q.
What are some strategies to address Overfitting in Neural Networks?
Q.
What are some options for making Backpropagation more efficient?
Q.
What is Backpropagation?
Q.
How are Regression and Classification performed using multilayer perceptrons (MLP)?
Q.
What are some guidelines for choosing activation functions?
Q.
Discuss Softmax activation function
Q.
What is Rectified Linear Unit (ReLU) activation function? Discuss its advantages and disadvantages
Q.
Discuss TanH activation function
Q.
What is Sigmoid (logistic) activation function?
Q.
What is an activation function, and what are some of the most common choices for activation functions?
Q.
What is the difference between Deep and Shallow networks?
Q.
Explain the basic architecture of a Neural Network, model training and key hyper-parameters
Q.
What is a Multilayer Perceptron (MLP) or a Feedforward Neural Network (FNN)?
Q.
What is a Perceptron? What is the role of bias in a perceptron (or neuron)?
Q.
What are the advantages and disadvantages of Deep Learning?
Q.
How does Deep Learning methods compare with traditional Machine Learning methods?
Q.
What is bootstrapping, and why is it a useful technique?
Q.
What are the main components of a Bayesian Model?
Q.
How does Bayesian Statistics differ from the Frequentist paradigm?
Q.
What is Local Outlier Factor?
Q.
What is Isolation Forest?
Q.
What are some automatic outlier detection mechanisms?
Q.
What are some options for dealing with outliers?
Q.
What is an Outlier?
Q.
What is Skewness and Kurtosis?
Q.
How to choose between mean and median to summarize data?
Q.
What is the difference between Mean, Median and Mode?
Q.
What is a Confidence Interval?
Q.
What is a p-value, and what is its significance?
Q.
What is the difference between probability and likelihood?
Q.
What is the Central Limit Theorem (CLT), and what are its implications for statistical inference?
Q.
What are some desirable properties of estimators?
Q.
What are the pros and cons of parametric vs. non-parametric models?
Q.
What is the difference between parametric and non-parametric models?
Q.
What is the relationship between independence and correlation?
Q.
What is the difference between covariance and correlation?
Q.
What is Chebyshev’s Theorem and its implications?
Q.
What is the Empirical Rule?
Q.
What is a Z Score?
Q.
What is Cluster Sampling?
Q.
What is Stratified Sampling?
Q.
What is Simple Random Sampling?
Q.
What is the difference between probability and non-probability sampling, and what are some example methodologies for each?
Q.
What does it mean if observations are iid, and why is this a desirable property?
Q.
What is Kolmogorov–Smirnov statistic?
Q.
What is the difference between a Probability Mass Function (PMF), Probability Density Function (PDF), and Cumulative Distribution Function (CDF)?
Q.
What is a random variable?
Q.
What is Bayes’ Rule?
Q.
What is conditional probability?
Q.
What does it mean for two events to be independent?
Q.
What does it mean for two events to be mutually exclusive?
Q.
What is a probability function, and what properties must it satisfy?
Q.
What is the difference between a parameter and a statistic?
Q.
How does T-SNE compare to PCA?
Q.
How does T-distributed Stochastic Neighbor Embedding (T-SNE) work at a high level?
Q.
What is Factor Analysis, and how does it differ from PCA?
Q.
What is Independent Component Analysis (ICA), and how is it distinguished from PCA?
Q.
What is Kernel PCA?
Q.
What is Principal Component Analysis (PCA), and how does it differ from clustering?
Q.
Pros and Cons of Gaussian Mixture Models (GMM) Clustering
Q.
How does the EM algorithm (in the context of GMM) compare to K-Means?
Q.
What are some options for identifying the number of components in a GMM?
Q.
What is a Gaussian Mixture Model (GMM)?
Q.
What is Expectation-Maximization (EM)?
Q.
What is Spectral co-clustering?
Q.
What is Bi-Clustering? What are possible use cases of it?
Q.
What is Spectral Clustering?
Q.
How does DBSCAN Clustering work, and in what cases is it useful?
Q.
How is clustering affected by high-dimensional data, and how can the quality of clusters generated be improved in such cases?
Q.
What are some options for clustering on categorical data? What if the dataset contains a combination of numeric and categorical features?
Q.
What are some of the pros and cons of hierarchical clustering compared to K-Means?
Q.
What is a dendrogram, and how is it used in hierarchical clustering?
Q.
How does imposing connectivity constraints help with Agglomerative clustering?
Q.
What are some of the possible linkage types to use in order to form successive clusters?
Q.
What are the two ways in which Hierarchical clustering can proceed?
Q.
What are the Pros and Cons of K-Means Clustering?
Q.
How do outliers affect the clusters formed in K-Means?
Q.
How does K-Means ++ work?
Q.
What is the effect of minimizing the within-cluster sum of squares on the shapes of clusters produced in K-Means?
Q.
What loss function does K-Means seek to minimize?
Q.
How does the initial choice of centroids affect the K-Means algorithm?
Q.
How can you choose the optimal value for ‘k’ in K-Means?
Q.
How does K-Means Work?
Q.
What is KL Divergence?
Q.
What is Jaccard Index / Distance?
Q.
What is Cosine Similarity?
Q.
What is Minkowski Distance?
Q.
What is Manhattan Distance?
Q.
What is Mahalanobis Distance?
Q.
What is Euclidean Distance?
Q.
What are some common distance metrics that can be used in clustering?
Q.
What is Mutual Information (MI)?
Q.
What is Adjusted Rand Index (ARI)?
Q.
What is Rand Index?
Q.
What is Dunn Index?
Q.
What is Silhouette Score?
Q.
What is Within Cluster Sum of Squares (WCSS)?
Q.
What are some common evaluation metrics in clustering?
Q.
What is Model-based Clustering?
Q.
What is Hierarchical Clustering?
←
1
2
3
4
→
Partner Ad
Explore Questions by Topics
Computer Vision
(1)
–
Data Preparation
(35)
Feature Engineering
(30)
Sampling Techniques
(5)
–
Deep Learning
(52)
–
DL Architectures
(17)
Feedforward Network / MLP
(2)
Sequence models
(6)
Transformers
(9)
DL Basics
(16)
DL Training and Optimization
(17)
Generative AI
(2)
Machine Learning Basics
(18)
–
Natural Language Processing
(27)
NLP Data Preparation
(18)
Statistics
(34)
–
Supervised Learning
(115)
–
Classification
(70)
Classification Evaluations
(9)
Ensemble Learning
(24)
Logistic Regression
(10)
Other Classification Models
(9)
Support Vector Machine
(9)
–
Regression
(41)
Generalized Linear Models
(9)
Linear Regression
(26)
Regularization
(6)
–
Unsupervised Learning
(55)
–
Clustering
(37)
Clustering Evaluations
(6)
Distance Measures
(9)
Gaussian Mixture Models
(5)
Hierarchical Clustering
(3)
K-Means Clustering
(9)
Dimensionality Reduction
(9)
Search
Join us on:
Machine Learning Interview Preparation Group
@OfficialAIML
Find out all the ways that you can
Contribute
Other Questions in Machine Learning Interview Questions
How does SVM adjust for classes that cannot be linearly separated?
What is Bi-Clustering? What are possible use cases of it?
How to perform Standardization in case of outliers?
What is a Multilayer Perceptron (MLP) or a Feedforward Neural Network (FNN)?
Distinguish between a Weak learner and a Strong Learner
What are the pros and cons of parametric vs. non-parametric models?