5/16/2023 0 Comments Exposure x chasinglightsactionsThe example model has a validation max feature exposure of 0.2905. Given the aformentioned changes in the feature exposure metrics, all previous heuristics we had about good feature exposures are no longer valid. Return np.sqrt(np.mean(np.square(feature_exposures(df)))) Return np.max(np.abs(feature_exposures(df))) PREDICTION_NAME = f"prediction_"įe = spearmanr(df, df) I’d appreciate it if anyone proficient in R could post an R version of the snippet below in this thread. I know there are a lot of people here, who use R. Let’s start with a code snippet in Python to calculate maximum feature exposure, the new way. And instead of aggregating individual feature exposures with standard deviation, we’re now using root mean square as the aggregation function. We’ve gone from using Pearson correlation coefficient to using Spearman’s rank correlation coefficient (which is the same metric used for CORR). The feature exposure metric has changed a bit since I last posted an implementation of it. On the other hand, models with high max feature exposure will likely have higher corr, but are also more likely to burn in the long run. Models with very low max feature exposure also tend to have low correlation. There’s a trade off between feature exposure and correlation. Bear in mind that it’s possible to train models with extremely low max feature exposure, which aren’t very useful in practice. To conclude the anecdote, I switched back to a more conservative model from the next round onwards and everything was fine (at least for the next round). In hindsight that model was “overfit” on a limited set of features and when the regime changed, it began burning heavily. The model that I’d used for that round was performing rather well on live data under another one of my accounts, before I decided to flip it over to my primary account. Models with lower feature exposures tend to have more consistent performance over the long run.įor a real life example of this, I refer you to the massive burn in r223 on my primary account. Feature exposure (more specifically, max feature exposure) is a measure of how well balanced a model’s exposure is to the features. A model that attributes too much importance to a small set of features might do well in the short run, but is unlikely to perform well in the long run. i.e features that have great predictive power in one era might not have any predictive power, or perhaps might even hurt the model’s performance in another era. With the numerai data, the underlying process is non stationary. At training time, the model learns a mapping between input features and the predictions. The idea behind feature exposure is as follows: Any supervised ML model from a very high level perspective, is a function that takes an input feature vector (X) and outputs a prediction (y). I’ll also discuss ways to reduce feature exposure (regularization and feature neutralization). I’ll try explain the intuition behind feature exposure, and why it matters.
0 Comments
Leave a Reply. |