It sometimes surprises me to learn that there are people who don't know that one of the first really big datasets used to train and evaluate computer language and social models was (and still is) a bunch of internal emails from Enron.
Yes, that Enron. Collected as part of the investigation into its collapse.
Enron Corpus - Wikipedia


