Unsupervised Domain-agnostic Identification of Product Names in Social Media Posts

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningpeer review

Resumé

Product name recognition is a significant practical problem, spurred by the greater availability of platforms for discussing products such as social media and product review functionalities of online marketplaces. Customers, product manufacturers and online marketplaces may want to identify product names in unstructured text to extract important insights, such as sentiment, surrounding a product. Much extant research on product name identification has been domain-specific (e.g., identifying mobile phone models) and used supervised or semi-supervised methods. With massive numbers of new products released to the market every year such methods may require retraining on updated labeled data to stay relevant, and may transfer poorly across domains. This research addresses this challenge and develops a domain-agnostic, unsupervised algorithm for identifying product names based on Facebook posts. The algorithm consists of two general steps: (a) candidate product name identification using an off-the-shelf pretrained conditional random fields (CRF) model, part-of-speech tagging and a set of simple patterns; and (b) filtering of candidate names to remove spurious entries using clustering and word embeddings generated from the data.
OriginalsprogEngelsk
TitelProceedings of the 2018 IEEE International Conference on Big Data
RedaktørerNaoki Abe, Huan Liu, Calton Pu, Xiaohua Hu, Nesreen Ahmed, Mu Qiao, Yang Song, Donald Kossmann, Bing Liu, Kisung Lee, Jiliang Tang, Jingrui He, Jeffrey Saltz
Antal sider6
Udgivelses stedLos Alamos, CA
ForlagIEEE
Publikationsdato2019
Sider3711-3716
Artikelnummer8622119
ISBN (Trykt)9781538650363
ISBN (Elektronisk)9781538650349, 9781538650356
DOI
StatusUdgivet - 2019
Begivenhed2018 IEEE International Conference on Big Data - The Westin Seattle, Seattle, USA
Varighed: 10 dec. 201813 dec. 2018
Konferencens nummer: 6
http://cci.drexel.edu/bigdata/bigdata2018/index.html

Konference

Konference2018 IEEE International Conference on Big Data
Nummer6
LokationThe Westin Seattle
LandUSA
BySeattle
Periode10/12/201813/12/2018
Internetadresse

Bibliografisk note

CBS Bibliotek har ikke adgang til materialet

Emneord

  • Named entity recognition
  • Social media
  • Product names
  • Facebook

Citer dette

Pogrebnyakov, N. (2019). Unsupervised Domain-agnostic Identification of Product Names in Social Media Posts. I N. Abe, H. Liu, C. Pu, X. Hu, N. Ahmed, M. Qiao, Y. Song, D. Kossmann, B. Liu, K. Lee, J. Tang, J. He, ... J. Saltz (red.), Proceedings of the 2018 IEEE International Conference on Big Data (s. 3711-3716). [8622119] Los Alamos, CA: IEEE. https://doi.org/10.1109/BigData.2018.8622119
Pogrebnyakov, Nicolai. / Unsupervised Domain-agnostic Identification of Product Names in Social Media Posts. Proceedings of the 2018 IEEE International Conference on Big Data. red. / Naoki Abe ; Huan Liu ; Calton Pu ; Xiaohua Hu ; Nesreen Ahmed ; Mu Qiao ; Yang Song ; Donald Kossmann ; Bing Liu ; Kisung Lee ; Jiliang Tang ; Jingrui He ; Jeffrey Saltz. Los Alamos, CA : IEEE, 2019. s. 3711-3716
@inproceedings{30414e4b097a488fa8c1ad09b1c8c363,
title = "Unsupervised Domain-agnostic Identification of Product Names in Social Media Posts",
abstract = "Product name recognition is a significant practical problem, spurred by the greater availability of platforms for discussing products such as social media and product review functionalities of online marketplaces. Customers, product manufacturers and online marketplaces may want to identify product names in unstructured text to extract important insights, such as sentiment, surrounding a product. Much extant research on product name identification has been domain-specific (e.g., identifying mobile phone models) and used supervised or semi-supervised methods. With massive numbers of new products released to the market every year such methods may require retraining on updated labeled data to stay relevant, and may transfer poorly across domains. This research addresses this challenge and develops a domain-agnostic, unsupervised algorithm for identifying product names based on Facebook posts. The algorithm consists of two general steps: (a) candidate product name identification using an off-the-shelf pretrained conditional random fields (CRF) model, part-of-speech tagging and a set of simple patterns; and (b) filtering of candidate names to remove spurious entries using clustering and word embeddings generated from the data.",
keywords = "Named entity recognition, Social media, Product names, Facebook, Named entity recognition, Social media, Product names, Facebook",
author = "Nicolai Pogrebnyakov",
note = "CBS Library does not have access to the material",
year = "2019",
doi = "10.1109/BigData.2018.8622119",
language = "English",
isbn = "9781538650363",
pages = "3711--3716",
editor = "Naoki Abe and Huan Liu and Calton Pu and Xiaohua Hu and Nesreen Ahmed and Mu Qiao and Yang Song and Donald Kossmann and Bing Liu and Kisung Lee and Tang, { Jiliang} and Jingrui He and Jeffrey Saltz",
booktitle = "Proceedings of the 2018 IEEE International Conference on Big Data",
publisher = "IEEE",
address = "United States",

}

Pogrebnyakov, N 2019, Unsupervised Domain-agnostic Identification of Product Names in Social Media Posts. i N Abe, H Liu, C Pu, X Hu, N Ahmed, M Qiao, Y Song, D Kossmann, B Liu, K Lee, J Tang, J He & J Saltz (red), Proceedings of the 2018 IEEE International Conference on Big Data., 8622119, IEEE, Los Alamos, CA, s. 3711-3716, 2018 IEEE International Conference on Big Data, Seattle, USA, 10/12/2018. https://doi.org/10.1109/BigData.2018.8622119

Unsupervised Domain-agnostic Identification of Product Names in Social Media Posts. / Pogrebnyakov, Nicolai.

Proceedings of the 2018 IEEE International Conference on Big Data. red. / Naoki Abe; Huan Liu; Calton Pu; Xiaohua Hu; Nesreen Ahmed; Mu Qiao; Yang Song; Donald Kossmann; Bing Liu; Kisung Lee; Jiliang Tang; Jingrui He; Jeffrey Saltz. Los Alamos, CA : IEEE, 2019. s. 3711-3716 8622119.

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningpeer review

TY - GEN

T1 - Unsupervised Domain-agnostic Identification of Product Names in Social Media Posts

AU - Pogrebnyakov, Nicolai

N1 - CBS Library does not have access to the material

PY - 2019

Y1 - 2019

N2 - Product name recognition is a significant practical problem, spurred by the greater availability of platforms for discussing products such as social media and product review functionalities of online marketplaces. Customers, product manufacturers and online marketplaces may want to identify product names in unstructured text to extract important insights, such as sentiment, surrounding a product. Much extant research on product name identification has been domain-specific (e.g., identifying mobile phone models) and used supervised or semi-supervised methods. With massive numbers of new products released to the market every year such methods may require retraining on updated labeled data to stay relevant, and may transfer poorly across domains. This research addresses this challenge and develops a domain-agnostic, unsupervised algorithm for identifying product names based on Facebook posts. The algorithm consists of two general steps: (a) candidate product name identification using an off-the-shelf pretrained conditional random fields (CRF) model, part-of-speech tagging and a set of simple patterns; and (b) filtering of candidate names to remove spurious entries using clustering and word embeddings generated from the data.

AB - Product name recognition is a significant practical problem, spurred by the greater availability of platforms for discussing products such as social media and product review functionalities of online marketplaces. Customers, product manufacturers and online marketplaces may want to identify product names in unstructured text to extract important insights, such as sentiment, surrounding a product. Much extant research on product name identification has been domain-specific (e.g., identifying mobile phone models) and used supervised or semi-supervised methods. With massive numbers of new products released to the market every year such methods may require retraining on updated labeled data to stay relevant, and may transfer poorly across domains. This research addresses this challenge and develops a domain-agnostic, unsupervised algorithm for identifying product names based on Facebook posts. The algorithm consists of two general steps: (a) candidate product name identification using an off-the-shelf pretrained conditional random fields (CRF) model, part-of-speech tagging and a set of simple patterns; and (b) filtering of candidate names to remove spurious entries using clustering and word embeddings generated from the data.

KW - Named entity recognition

KW - Social media

KW - Product names

KW - Facebook

KW - Named entity recognition

KW - Social media

KW - Product names

KW - Facebook

U2 - 10.1109/BigData.2018.8622119

DO - 10.1109/BigData.2018.8622119

M3 - Article in proceedings

SN - 9781538650363

SP - 3711

EP - 3716

BT - Proceedings of the 2018 IEEE International Conference on Big Data

A2 - Abe, Naoki

A2 - Liu, Huan

A2 - Pu, Calton

A2 - Hu, Xiaohua

A2 - Ahmed, Nesreen

A2 - Qiao, Mu

A2 - Song, Yang

A2 - Kossmann, Donald

A2 - Liu, Bing

A2 - Lee, Kisung

A2 - Tang, Jiliang

A2 - He, Jingrui

A2 - Saltz, Jeffrey

PB - IEEE

CY - Los Alamos, CA

ER -

Pogrebnyakov N. Unsupervised Domain-agnostic Identification of Product Names in Social Media Posts. I Abe N, Liu H, Pu C, Hu X, Ahmed N, Qiao M, Song Y, Kossmann D, Liu B, Lee K, Tang J, He J, Saltz J, red., Proceedings of the 2018 IEEE International Conference on Big Data. Los Alamos, CA: IEEE. 2019. s. 3711-3716. 8622119 https://doi.org/10.1109/BigData.2018.8622119