One final comment on "The Black Swan" .
Taleb is correct that it is difficult to confidently say much based on empirical data about events that occur too rarely to appear in your data set. However I don't really agree that this means you shouldn't even try. There are techniques that help a bit and that may provide useful warnings. The following example comes from a talk I heard many years ago at IBM.
Suppose you are trying to predict the 200 year flood or 500 year flood, the maximum flow for some river that can be expected over the stated period, and you only have say 100 years of data. You can look at the maximum flows each year and model them as the results of some underlying random distribution and then derive the expected n-year flood. But this is risky as the real distribution may include occasional samples from a process which didn't happen to operate during the period for which you have empirical data. A real world example is where hurricanes occasionally pass over the watershed in question. If your data set does not include any hurricane years you may get a completely misleading picture of what the maximum flood size over periods of time long enough to include hurricane years is likely to be. But there is something you can do. As well as looking at historical data from the particular watershed you are forecasting you can look at data from many similar watersheds. In this case some of these watersheds would have experienced hurricanes giving you notice that a process capable of generating extreme events likely operates occasionally for your particular watershed as well. This will make your predictions more realistic and may encourage more prudent behavior.
This technique is generalizable. For example for financial markets you can look at data from outside the United States. Of course this just mitigates the underlying problem and you may still be caught by surprise but it isn't really practical to worry about everything. I agree with Taleb that you should expect the occasional surprise but not that it is useless to even try to predict and avoid them.
Raw data: A cautionary tale
6 hours ago
No comments:
Post a Comment