๐ Feature Selection Techniques (Keep Original Features)
1. Filter Methods
Evaluate features using statistical tests, independent of any model.
- Correlation Coefficient: Drop features that are highly correlated with others.
- Chi-Square Test: Tests independence between categorical features and target variable.
- ANOVA: Compares group means to find significant features.
2. Wrapper Methods
Use model performance to evaluate different subsets of features.
- Forward Selection: Start with none, add one feature at a time.
- Backward Elimination: Start with all, remove one at a time.
- Recursive Feature Elimination (RFE): Iteratively remove least important features.
3. Embedded Methods
Feature selection is performed during model training.
- LASSO (L1 Regularization): Shrinks some coefficients to zero.
- Tree-based Methods: Use feature importance from Random Forest, XGBoost, etc.
๐ Feature Extraction Techniques (Transform Features)
1. Principal Component Analysis (PCA)
Projects data onto orthogonal components that maximize variance.
2. Linear Discriminant Analysis (LDA)
Supervised method that maximizes class separation for dimensionality reduction.
3. t-Distributed Stochastic Neighbor Embedding (t-SNE)
Reduces dimensionality while preserving local structure. Great for visualization.
4. Autoencoders
Neural networks trained to compress and reconstruct data, learning efficient representations.
✅ Summary Table
| Technique | Type | Supervised | Pros | Cons |
|---|---|---|---|---|
| Correlation | Filter | ❌ | Fast, interpretable | Ignores interactions |
| Chi-Square / ANOVA | Filter | ✅ | Simple, statistically sound | Assumptions may not hold |
| RFE | Wrapper | ✅ | Model-aware, accurate | Expensive |
| LASSO | Embedded | ✅ | Integrated, efficient | Model-specific |
| PCA | Extraction | ❌ | Captures variance | Uninterpretable components |
| LDA | Extraction | ✅ | Maximizes class separation | Assumes normality |
| t-SNE | Extraction | ❌ | Great for visualization | Not usable for modeling |
| Autoencoders | Extraction | ❌ (or ✅) | Captures complex patterns | Requires deep learning setup |
๐ง Final Thoughts
Whether you're aiming for interpretability or performance, choosing the right dimensionality reduction technique is essential. Feature selection is great for transparency, while feature extraction often delivers higher performance, especially with complex datasets.