Unsupervised Domain-agnostic Identification of Product Names in Social Media Posts

Nicolai Pogrebnyakov

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Abstract

Product name recognition is a significant practical problem, spurred by the greater availability of platforms for discussing products such as social media and product review functionalities of online marketplaces. Customers, product manufacturers and online marketplaces may want to identify product names in unstructured text to extract important insights, such as sentiment, surrounding a product. Much extant research on product name identification has been domain-specific (e.g., identifying mobile phone models) and used supervised or semi-supervised methods. With massive numbers of new products released to the market every year such methods may require retraining on updated labeled data to stay relevant, and may transfer poorly across domains. This research addresses this challenge and develops a domain-agnostic, unsupervised algorithm for identifying product names based on Facebook posts. The algorithm consists of two general steps: (a) candidate product name identification using an off-the-shelf pretrained conditional random fields (CRF) model, part-of-speech tagging and a set of simple patterns; and (b) filtering of candidate names to remove spurious entries using clustering and word embeddings generated from the data.
Original languageEnglish
Title of host publicationProceedings of the 2018 IEEE International Conference on Big Data
EditorsNaoki Abe, Huan Liu, Calton Pu, Xiaohua Hu, Nesreen Ahmed, Mu Qiao, Yang Song, Donald Kossmann, Bing Liu, Kisung Lee, Jiliang Tang, Jingrui He, Jeffrey Saltz
Number of pages6
Place of PublicationLos Alamos, CA
PublisherIEEE
Publication date2019
Pages3711-3716
Article number8622119
ISBN (Print)9781538650363
ISBN (Electronic)9781538650349, 9781538650356
DOIs
Publication statusPublished - 2019
EventSixth IEEE International Conference on Big Data. IEEE BigData 2018 - The Westin Seattle, Seattle, United States
Duration: 10 Dec 201813 Dec 2018
Conference number: 6
http://cci.drexel.edu/bigdata/bigdata2018/index.html

Conference

ConferenceSixth IEEE International Conference on Big Data. IEEE BigData 2018
Number6
LocationThe Westin Seattle
Country/TerritoryUnited States
CitySeattle
Period10/12/201813/12/2018
Internet address

Keywords

  • Named entity recognition
  • Social media
  • Product names
  • Facebook

Cite this