Application of probabilistic methods to Chinese
Abstract
The use of text retrieval methods based on the probabilistic model with Chinese language material is discussed. Since Chinese text has no natural word boundaries, we must either apply a dictionary‐based word segmentation method to the text, or index and search in terms of single Chinese characters. In either case, it becomes important to have a good way of dealing with phrases or contiguous strings of characters; the probabilistic model does not at present have such a facility. Some ad hoc modificatkions of the probabilistic weighting function and matching method are proposed for this purpose.
Keywords
Citation
Huang, X. and Robertson, S.E. (1997), "Application of probabilistic methods to Chinese", Journal of Documentation, Vol. 53 No. 1, pp. 74-79. https://doi.org/10.1108/EUM0000000007193
Publisher
:MCB UP Ltd
Copyright © 1997, MCB UP Limited