By Jian-Yun Nie, Graeme Hirst
Look for details is not any longer solely constrained in the local language of the consumer, yet is increasingly more prolonged to different languages. this offers upward push to the matter of cross-language details retrieval (CLIR), whose target is to discover proper details written in a special language to a question. as well as the issues of monolingual info retrieval (IR), translation is the most important challenge in CLIR: one should still translate both the question or the records from a language to a different. even if, this translation challenge isn't just like full-text computer translation (MT): the target isn't really to supply a human-readable translation, yet a translation compatible for locating appropriate records. particular translation equipment are therefore required. The objective of this publication is to supply a accomplished description of the specifi c difficulties bobbing up in CLIR, the recommendations proposed during this sector, in addition to the rest difficulties. The ebook starts off with a normal description of the monolingual IR and CLIR difficulties. diversified sessions of techniques to translation are then provided: ways utilizing an MT method, dictionary-based translation and methods in response to parallel and similar corpora. additionally, the common retrieval effectiveness utilizing varied ways is in comparison. will probably be proven that translation techniques particularly designed for CLIR can rival and outperform high quality MT structures. eventually, the ebook bargains a glance into the longer term that attracts a powerful parallel among question enlargement in monolingual IR and question translation in CLIR, suggesting that many ways built in monolingual IR will be tailored to CLIR. The ebook can be utilized as an creation to CLIR. complicated readers may also locate extra technical information and discussions in regards to the ultimate examine demanding situations sooner or later. it's compatible to new researchers who intend to hold out examine on CLIR.
Read Online or Download Cross-language Information Retrieval (Synthesis Lectures on Human Language Technologies) PDF
Best ai & machine learning books
Describes scientists' makes an attempt to determine how lifestyles all started, together with such issues as spontaneous iteration and evolution.
This introductory textual content to statistical computing device translation (SMT) presents the entire theories and strategies had to construct a statistical desktop translator, equivalent to Google Language instruments and Babelfish. as a rule, statistical options let automated translation structures to be equipped speedy for any language-pair utilizing basically translated texts and conventional software program.
Publication by means of
Biomedical traditional Language Processing is a complete travel during the vintage and present paintings within the box. It discusses all topics from either a rule-based and a computing device studying method, and in addition describes each one topic from the viewpoint of either organic technological know-how and medical drugs. The meant viewers is readers who have already got a heritage in average language processing, yet a transparent advent makes it obtainable to readers from the fields of bioinformatics and computational biology, in addition.
- Neural Networks. Advances and Applications
- Machine Translation: Past, Present, Future
- Language Identification Using Spectral and Prosodic Features
- Complex-Valued Neural Networks with Multi-Valued Neurons
- Reviews of Nonlinear Dynamics and Complexity
Additional resources for Cross-language Information Retrieval (Synthesis Lectures on Human Language Technologies)
If the meaning of an ambiguous word depends on a distant word, SMT may fail to account for it. In addition, models used in SMT rely on a set of characteristics observed on the training examples. These characteristics may fail to capture the linguistic phenomena (especially the semantic information) that govern the translation in many cases. As a consequence, the trained translation model is not powerful enough to propose appropriate translations in these cases. using manually constructed translation systems and resources 39 Let us use a set of possible queries containing the ambiguous word “drug” to illustrate the possible problems with both rule-based and statistical MT systems.
There are reasons for this. • • Less strict syntax is required The task of translating a query (or document) from a language to another in CLIR is not to make it readable by a human being; rather, its goal is to enable the system (computer) to match the query to documents (or the reverse). Therefore, the translation only has to be usable by the IR system, which is often based on keywords. This means that we do not have to obey the strict language grammar in the target language when producing such a translation, but the selection of translation words is the most important.
For example, how can we recognize that the following descriptions describe the same piece of information? There is a major earthquake in Wenchuan, China in 2008 (in English). Un tremblement de terre violent à Wenchuan secoue la Chine en 2008 (in French). 中国汶川08年发生强烈地震。 (in Chinese). How can we succeed to find the above information when we request for “major earthquakes in recent years” in 2010? These examples illustrate the main problems in CLIR and MLIR, that of representing and matching the same piece of information or information need in a comparable manner or within the same representation space, even if they are described in different languages.