The Value of Publicly Available, Textual and Non-textual Information for Startup Performance Prediction

Ulrich Kaiser*, Johan Moritz Kuhn

*Corresponding author for this work

Research output: Contribution to journalJournal articleResearchpeer-review

81 Downloads (Pure)

Abstract

We use administrative textual and non-textual data retrieved from publicly available archives to predict the performance of Danish startups at the time of foundation. The performance outcomes we consider are survival, high employment growth, a return on assets of above 20 percent, new patent applications and participation in an innovation subsidy program. We consider a base specification that includes variables for legal form, region, ownership and industry in all specifications and add variable sets representing firm names, business purpose statements (BPSs) as well as founder and startup characteristics. To forecast the two innovation-related performance outcomes well, we only need to include a set of variables derived from the BPS texts on top of the base variables while an accurate prediction of startup survival requires the combination of the firm names and the BPS variables along with founder characteristics. An accurate forecast of high employment growth needs the combination of the BPS variables and the founder characteristics. All information our forecasts require is likely to be easily obtainable since the underlying information is mandatory to report upon business registration in many countries. The substantial accuracy of our predictions for survival, employment growth, new patents and participation in innovation subsidy programs indicates ample scope for algorithmic scoring models as an additional pillar of funding and innovation support decisions.
Original languageEnglish
Article numbere00179
JournalJournal of Business Venturing Insights
Volume14
Number of pages21
ISSN2352-6734
DOIs
Publication statusPublished - Nov 2020

Keywords

  • Algorithmic scoring
  • Performance
  • Prediction
  • Startup
  • Text as data

Cite this