To read this content please select one of the options below:

Web-enhanced unmanned aerial vehicle target search method combining imitation learning and reinforcement learning

Tao Pang (East China Institute of Computing Technology, Shanghai, China)
Wenwen Xiao (School of Computer Engineering and Science, Shanghai University, Shanghai, China)
Yilin Liu (School of Computer Engineering and Science, Shanghai University, Shanghai, China)
Tao Wang (School of Computer Engineering and Science, Shanghai University, Shanghai, China)
Jie Liu (School of Computer Engineering and Science, Shanghai University, Shanghai, China)
Mingke Gao (The 32nd Research Institute of China Electronics Technology Group Corporation, Shanghai, China)

International Journal of Web Information Systems

ISSN: 1744-0084

Article publication date: 1 April 2024

Issue publication date: 30 April 2024

6

Abstract

Purpose

This paper aims to study the agent learning from expert demonstration data while incorporating reinforcement learning (RL), which enables the agent to break through the limitations of expert demonstration data and reduces the dimensionality of the agent’s exploration space to speed up the training convergence rate.

Design/methodology/approach

Firstly, the decay weight function is set in the objective function of the agent’s training to combine both types of methods, and both RL and imitation learning (IL) are considered to guide the agent's behavior when updating the policy. Second, this study designs a coupling utilization method between the demonstration trajectory and the training experience, so that samples from both aspects can be combined during the agent’s learning process, and the utilization rate of the data and the agent’s learning speed can be improved.

Findings

The method is superior to other algorithms in terms of convergence speed and decision stability, avoiding training from scratch for reward values, and breaking through the restrictions brought by demonstration data.

Originality/value

The agent can adapt to dynamic scenes through exploration and trial-and-error mechanisms based on the experience of demonstrating trajectories. The demonstration data set used in IL and the experience samples obtained in the process of RL are coupled and used to improve the data utilization efficiency and the generalization ability of the agent.

Keywords

Acknowledgements

The research reported in this paper was supported in part by the National Key Research and Development Program of China under the grant No. 2022YFB4500900.

Citation

Pang, T., Xiao, W., Liu, Y., Wang, T., Liu, J. and Gao, M. (2024), "Web-enhanced unmanned aerial vehicle target search method combining imitation learning and reinforcement learning", International Journal of Web Information Systems, Vol. 20 No. 3, pp. 324-337. https://doi.org/10.1108/IJWIS-10-2023-0186

Publisher

:

Emerald Publishing Limited

Copyright © 2024, Emerald Publishing Limited

Related articles