To read this content please select one of the options below:

Screening patents of ICT in construction using deep learning and NLP techniques

Hengqin Wu (Department of Building and Real Estate, The Hong Kong Polytechnic University, Kowloon, Hong Kong) (School of Management, Harbin Institute of Technology, Harbin, China)
Geoffrey Shen (Department of Building and Real Estate, The Hong Kong Polytechnic University, Kowloon, Hong Kong)
Xue Lin (School of Government, Nanjing University, Nanjing, China)
Minglei Li (Huawei Technologies Co Ltd, Shenzhen, China)
Boyu Zhang (Department of Building and Real Estate, The Hong Kong Polytechnic University, Kowloon, Hong Kong) (Department of Standards and Codes, China Academy of Building Research, Beijing, China)
Clyde Zhengdao Li (College of Civil and Transportation Engineering, Shenzhen University, Shenzhen, China)

Engineering, Construction and Architectural Management

ISSN: 0969-9988

Article publication date: 13 May 2020

Issue publication date: 21 September 2020

610

Abstract

Purpose

This study proposes an approach to solve the fundamental problem in using query-based methods (i.e. searching engines and patent retrieval tools) to screen patents of information and communication technology in construction (ICTC). The fundamental problem is that ICTC incorporates various techniques and thus cannot be simply represented by man-made queries. To investigate this concern, this study develops a binary classifier by utilizing deep learning and NLP techniques to automatically identify whether a patent is relevant to ICTC, thus accurately screening a corpus of ICTC patents.

Design/methodology/approach

This study employs NLP techniques to convert the textual data of patents into numerical vectors. Then, a supervised deep learning model is developed to learn the relations between the input vectors and outputs.

Findings

The validation results indicate that (1) the proposed approach has a better performance in screening ICTC patents than traditional machine learning methods; (2) besides the United States Patent and Trademark Office (USPTO) that provides structured and well-written patents, the approach could also accurately screen patents form Derwent Innovations Index (DIX), in which patents are written in different genres.

Practical implications

This study contributes a specific collection for ICTC patents, which is not provided by the patent offices.

Social implications

The proposed approach contributes an alternative manner in gathering a corpus of patents for domains like ICTC that neither exists as a searchable classification in patent offices, nor is accurately represented by man-made queries.

Originality/value

A deep learning model with two layers of neurons is developed to learn the non-linear relations between the input features and outputs providing better performance than traditional machine learning models. This study uses advanced NLP techniques lemmatization and part-of-speech POS to process textual data of ICTC patents. This study contributes specific collection for ICTC patents which is not provided by the patent offices.

Keywords

Acknowledgements

We are grateful for the extremely helpful feedback we received from Xiao Li, Juan Huang, Bingxia Sun, and other members of the Sustainable Construction Lab, The Hong Kong Polytechnic University, Hong Kong.Funding: This research was supported by the National Natural Science Foundation of China (NSFC) (No. 71771067, No. 71801159), the National Natural Science Foundation of Guangdong Province (No. 2018A030310534), and Youth Fund of Humanities and Social Sciences Research of the Ministry of Education (No. 18YJCZH090).

Citation

Wu, H., Shen, G., Lin, X., Li, M., Zhang, B. and Li, C.Z. (2020), "Screening patents of ICT in construction using deep learning and NLP techniques", Engineering, Construction and Architectural Management, Vol. 27 No. 8, pp. 1891-1912. https://doi.org/10.1108/ECAM-09-2019-0480

Publisher

:

Emerald Publishing Limited

Copyright © 2020, Emerald Publishing Limited

Related articles