TY - JOUR T1 - Automated Theme Search in ICO Whitepapers JF - The Journal of Financial Data Science SP - 140 LP - 158 DO - 10.3905/jfds.2019.1.011 VL - 1 IS - 4 AU - Fu Chuanjie AU - Andrew Koh AU - Paul Griffin Y1 - 2019/10/31 UR - https://pm-research.com/content/1/4/140.abstract N2 - The authors explore how topic modeling can be used to automate the categorization of initial coin offerings (ICOs) into different topics (e.g., finance, media, information, professional services, health and social, natural resources) based solely on the content within the whitepapers. This tool has been developed by fitting a latent Dirichlet allocation (LDA) model to the text extracted from the ICO whitepapers. After evaluating the automated categorization of whitepapers using statistical and human judgment methods, it is determined that there is enough evidence to conclude that the LDA model appropriately categorizes the ICO whitepapers. The results from a two-population proportion test show a statistically significant difference between topics in the success of an ICO being funded, indicating that the topics are usefully differentiated and suggesting that the topic model could be used to help predict whether an ICO will be successful.TOPICS: Statistical methods, simulations, big data/machine learningKey Findings• Categorization of ICO whitepapers can be done via topic modeling with the latent Dirichlet allocation (LDA) model.• Statistical and human judgment methods confirms that there is enough evidence to conclude that the LDA model appropriately categorizes ICO whitepapers.• Statistical tests suggests that the categorization results from the LDA model provides useful information on predicting whether an ICO will be successful funded. ER -