Automated Feature Engineering Using Meta-Learning for Efficient and Generalizable Data Science Pipelines
DOI:
https://doi.org/10.61453/jods.v20260104Keywords:
Automated Machine Learning, Feature Engineering, Meta-Learning, Data Pipelines, AutoMLAbstract
Feature engineering remains one of the most time-intensive and expertise-dependent stages in machine learning pipelines, often limiting scalability and reproducibility. Despite advances in automated machine learning, existing systems largely emphasize model and hyperparameter optimization while leaving feature construction partially manual and task-specific. This reveals a critical research gap: the absence of a transferable, experience-driven mechanism capable of generalizing feature engineering knowledge across heterogeneous datasets. To address this limitation, this study proposes a meta-learning–based automated feature engineering framework that models transformation selection as a learnable mapping between dataset meta-characteristics and transformation utility. The framework constructs a reusable meta-knowledge layer trained on historical task–transformation–performance relationships and applies ranked transformation strategies to unseen datasets under computational constraints. Experiments conducted on diverse classification and regression datasets demonstrate that the proposed approach achieves up to 4.2% improvement in F1-score and 8.3% reduction in RMSE compared to raw-feature baselines, while maintaining performance comparable to or exceeding manually engineered pipelines. In addition, development time is reduced by up to 55%, and search complexity decreases by approximately 60% through ranking-based pruning. These findings confirm that feature engineering can be formalized as a transferable meta-learning problem, enabling scalable, efficient, and generalizable data science workflows. The study advances the automation of representation construction and supports the integration of intelligent meta-knowledge reuse in next-generation AutoML systems
References
Abdallah, M., Rossi, R. A., Mahadik, K., Kim, S., Zhao, H., & Bagchi, S. (2025). Evaluation-free Time-series Forecasting Model Selection via Meta-learning. ACM Transactions on Knowledge Discovery from Data, 19(3). https://doi.org/10.1145/3715149
Ameen, Y. A., Badary, D. M., Abonnoor, A. E. I., Hussain, K. F., & Sewisy, A. A. (2023). Which data subset should be augmented for deep learning? a simulation study using urothelial cell carcinoma histopathology images. BMC Bioinformatics 2023 24:1, 24(1), 75-. https://doi.org/10.1186/s12859-023-05199-y
Azhar, M., Amjad, A., Dewi, D. A., & Kasim, S. (2025). A Systematic Review and Experimental Evaluation of Classical and Transformer-Based Models for Urdu Abstractive Text Summarization. Information, 16(9). https://doi.org/10.3390/info16090784
Bhuyan, H. K., & Chakraborty, C. (2024). Explainable Machine Learning for Data Extraction Across Computational Social System. IEEE Transactions on Computational Social Systems, 11(3), 3131–3145. https://doi.org/10.1109/TCSS.2022.3164993
Bonidia, R. P., Santos, A. P. A., De Almeida, B. L. S., Stadler, P. F., Da Rocha, U. N., Sanches, D. S., & De Carvalho, A. C. P. L. F. (2022). BioAutoML: automated feature engineering and metalearning to predict noncoding RNAs in bacteria. Briefings in Bioinformatics, 23(4). https://doi.org/10.1093/bib/bbac218
Cheng, S., Harsuko, R., & Alkhalifah, T. (2024). Meta-Processing: A robust framework for multi-tasks seismic processing. Surveys in Geophysics, 45(4), 1081–1116. https://doi.org/10.1007/s10712-024-09837-9
Correia, J., Capela, J., & Rocha, M. (2024). Deepmol: an automated machine and deep learning framework for computational chemistry. Journal of Cheminformatics 2024 16:1, 16(1), 136-. https://doi.org/10.1186/s13321-024-00937-7
Dagher, R., Ozkara, B. B., Karabacak, M., Dagher, S. A., Rumbaut, E. I., Luna, L. P., Yedavalli, V. S., & Wintermark, M. (2024). Artificial intelligence/machine learning for neuroimaging to predict hemorrhagic transformation: Systematic review/meta-analysis. Journal of Neuroimaging, 34(5), 505–514. https://doi.org/10.1111/jon.13223
De Amorim, L. B. V., Cavalcanti, G. D. C., & Cruz, R. M. O. (2025). Meta-Scaler: A Meta-Learning Framework for the Selection of Scaling Techniques. IEEE Transactions on Neural Networks and Learning Systems, 36(3), 4805–4819. https://doi.org/10.1109/TNNLS.2024.3366615
Eldeeb, H., & Elshawi, R. (2025). Empowering Machine Learning With Scalable Feature Engineering and Interpretable AutoML. IEEE Transactions on Artificial Intelligence, 6(2), 432–447. https://doi.org/10.1109/TAI.2024.3400752
Garouani, M., Ahmad, A., Bouneffa, M., & Hamlich, M. (2023). Autoencoder-kNN meta-model based data characterization approach for an automated selection of AI algorithms. Journal of Big Data 2023 10:1, 10(1), 14-. https://doi.org/10.1186/s40537-023-00687-7
Garside, A. K., Ahmad, R., & Muhtazaruddin, M. N. Bin. (2024). A recent review of solution approaches for green vehicle routing problem and its variants. Operations Research Perspectives, 12(1), 100303. https://doi.org/10.1016/j.orp.2024.100303
Ghubaish, A., Yang, Z., Erbad, A., & Jain, R. (2024). LEMDA: A Novel Feature Engineering Method for Intrusion Detection in IoT Systems. IEEE Internet of Things Journal, 11(8), 13247–13256. https://doi.org/10.1109/JIOT.2023.3328795
Hassani, S. (2025). Meta-model structural monitoring with cutting-edge AAE-VMD fusion alongside optimized machine learning methods. Structural Health Monitoring, 24(5), 3185–3213. https://doi.org/10.1177/14759217241263954
Hu, G., Kollias, D., Papadopoulou, E., Tzouveli, P., Wei, J., & Yang, X. (2025). Rethinking Affect Analysis: A Protocol for Ensuring Fairness and Consistency. IEEE Transactions on Biometrics, Behavior, and Identity Science, 7(4), 914–923. https://doi.org/10.1109/TBIOM.2025.3550000
Kucik, A., & Stokholm, A. (2023). AI4SeaIce: selecting loss functions for automated SAR sea ice concentration charting. Scientific Reports 2023 13:1, 13(1), 5962-. https://doi.org/10.1038/s41598-023-32467-x
Lausser, L., Szekely, R., Schmid, F., Maucher, M., & Kestler, H. A. (2022). Efficient cross-validation traversals in feature subset selection. Scientific Reports 2022 12:1, 12(1), 21485-. https://doi.org/10.1038/s41598-022-25942-4
Lee, G., & Lee, S. (2022). Importance of Testing with Independent Subjects and Contexts for Machine-Learning Models to Monitor Construction Workers’ Psychophysiological Responses. Journal of Construction Engineering and Management, 148(9), 04022082. https://doi.org/10.1061/(asce)co.1943-7862.0002341
Payares-Garcia, D., Mateu, J., & Schick, W. (2023). Neuronorm: An R Package to Standardize Multiple Structural MRI. https://doi.org/10.2139/ssrn.4374278
Rulff, D., & Evins, R. (2025). Systematic refinement of surrogate modelling procedure for useful application to building energy problems. Journal of Building Performance Simulation, 18(4), 389–423. https://doi.org/10.1080/19401493.2024.2440418
Suawa, P. F., Halbinger, A., Jongmanns, M., & Reichenbach, M. (2023). Noise-Robust Machine Learning Models for Predictive Maintenance Applications. IEEE Sensors Journal, 23(13), 15081–15092. https://doi.org/10.1109/JSEN.2023.3273458
Uddin, S., & Lu, H. (2024). Dataset meta-level and statistical features affect machine learning performance. Scientific Reports 2024 14:1, 14(1), 1670-. https://doi.org/10.1038/s41598-024-51825-x
Wan, Q., Wang, M., Shan, W., Wang, B., Zhang, L., Leng, Z., Yan, B., Xu, Y., & Chen, H. (2025). Meta-Learning With Task-Adaptive Selection. IEEE Transactions on Circuits and Systems for Video Technology, 35(9), 8627–8638. https://doi.org/10.1109/TCSVT.2025.3557706
Wang, C., Zhao, J., Li, L., Jiao, L., Liu, J., & Wu, K. (2023). A Multi-Transformation Evolutionary Framework for Influence Maximization in Social Networks. IEEE Computational Intelligence Magazine, 18(1), 52–67. https://doi.org/10.1109/MCI.2022.3222050
Wang, P., Xu, J., Zhou, M., & Albeshri, A. (2023). Budget-Constrained Optimal Deployment of Redundant Services in Edge Computing Environment. IEEE Internet of Things Journal, 10(11), 9453–9464. https://doi.org/10.1109/JIOT.2023.3234966
Xiao, M., Wang, D., Wu, M., Liu, K., Xiong, H., Zhou, Y., & Fu, Y. (2024). Traceable Group-Wise Self-Optimizing Feature Transformation Learning: A Dual Optimization Perspective. ACM Transactions on Knowledge Discovery from Data, 18(4), 76. https://doi.org/10.1145/3638059
Yu, K., Sun, S., Liang, J., Chen, K., Qu, B., Yue, C., & Nagaratnam Suganthan, P. (2024). A Space Transformation-Based Multiform Approach for Multiobjective Feature Selection in High-Dimensional Classification. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 54(12), 7305–7317. https://doi.org/10.1109/TSMC.2024.3450278
Zhang, H., Ding, J., Feng, L., Chen Tan, K., & Li, K. (2024). Solving Expensive Optimization Problems in Dynamic Environments with Meta-Learning. IEEE Transactions on Cybernetics, 54(12), 7430–7442. https://doi.org/10.1109/TCYB.2024.3443396
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Journal of Data Science

This work is licensed under a Creative Commons Attribution 4.0 International License.