The Value of Publicly Available, Textual and Non-textual Information for Startup Performance Prediction

Ulrich Kaiser*, Johan Moritz Kuhn

*Corresponding author af dette arbejde

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

42 Downloads (Pure)


We use administrative textual and non-textual data retrieved from publicly available archives to predict the performance of Danish startups at the time of foundation. The performance outcomes we consider are survival, high employment growth, a return on assets of above 20 percent, new patent applications and participation in an innovation subsidy program. We consider a base specification that includes variables for legal form, region, ownership and industry in all specifications and add variable sets representing firm names, business purpose statements (BPSs) as well as founder and startup characteristics. To forecast the two innovation-related performance outcomes well, we only need to include a set of variables derived from the BPS texts on top of the base variables while an accurate prediction of startup survival requires the combination of the firm names and the BPS variables along with founder characteristics. An accurate forecast of high employment growth needs the combination of the BPS variables and the founder characteristics. All information our forecasts require is likely to be easily obtainable since the underlying information is mandatory to report upon business registration in many countries. The substantial accuracy of our predictions for survival, employment growth, new patents and participation in innovation subsidy programs indicates ample scope for algorithmic scoring models as an additional pillar of funding and innovation support decisions.
TidsskriftJournal of Business Venturing Insights
Antal sider21
StatusUdgivet - nov. 2020


  • Algorithmic scoring
  • Performance
  • Startup
  • Text as data