Which firm characteristics truly add economic value in ML portfolios? Out-of-sample tests show microcaps distort results, some predictors hurt returns, and liquidity and risk signals matter most.
Abstract
We study which firm characteristics drive the economic value of machine learning portfolios. Three results stand out. First, in-sample variable importance overfits and provides little reliable guidance, highlighting the need for out-of-sample evaluation using economic criteria. Second, conventional models are dominated by microcaps, which inflate returns and concentrate gains in costly-to-trade stocks; excluding microcaps is essential for meaningful inference. Third, some predictors carry negative importance and consistently degrade performance; removing them improves risk-adjusted returns and clarifies which characteristics matter. These findings show that only with economic restrictions can machine learning deliver robust asset pricing insights.