Fast, Consistent Tokenization of Natural Language Text