Artech House, 2006. — 301 pp.
This book introduces the field of text mining to those interested in organizing, searching, discovering, or communicating biological knowledge, and aims to arm them with a sound appreciation of its main techniques, concerns, challenges, results, and promising future directions. As seen in other areas involving the introduction of new technology in the shape of applied information systems, there is a danger of expectations exceeding reality, leading to disappointment and rejection. Thus, a further aim of this book is to critically examine the state of the art, and to make clear what can be expected of the field at present or in the near future. The reader will find extensive summarization and discussion of the research literature in text mining and reported systems, geared towards informing and educating, rather than oriented towards other experts in text mining. To this end, this book has been conceived as a number of complementary chapters, which target core topics. These chapters were specially commissioned from leading experts around the world, and have undergone a strict peer-reviewing procedure. Each chapter takes its own view of its subject matter. However, the reader will find, on occasion, the same topic being discussed from a different point of view in different chapters. This was due to a deliberate policy of encouraging informative discussion, rather than artificial compartmentalization. The reader also will find differences of opinion, of terminology, and of fundamental approach. Text mining is a complex, dynamic area, with many techniques and approaches being tried out. It would be foolhardy to attempt to gloss over the differences that naturally occur due to this dynamism and complexity, or to give the appearance of consensus where there may be none. Where there is consensus, this has been brought out, and where there are differing voices and views, these have been left untouched. Thus, the reader will appreciate which areas are controversial, and which are considered mature and a good foundation to build on. For those wishing an approachable, concise explanation of the concerns, techniques, and information problems of molecular biology, viewed from the perspective of how people interact with information and technology, we recommend the article by MacMullen and Denn. Other overviews are referred to throughout the book.
Significantly, text mining does not just provide existing tools for application to the biology domain. A major reason why text miners have engaged so closely with this domain is that it presents a number of challenges, which have necessitated new and different approaches. Challenges range from having to deal with the particular language of the biologist, to building scalable and robust systems, to presenting the results of text mining in meaningful and informative ways (to the biologist).
Biology also interacts closely with different disciplines (e.g., chemistry and medicine), and this interaction presents further challenges to text miners, who have to deal with interdisciplinary aspects, and user communities with different views over the same knowledge space and with different information needs. An example is that of a cell, which can be described by a bacteriologist, an immunologist, a neurologist, or a biochemist, each from his or her own point of view.
Levels of Natural Language Processing for Text Mining
Lexical, Terminological, and Ontological Resources for Biological Text Mining
Automatic Terminology Management in Biomedicine
Abbreviations in Biomedical Text
Named Entity Recognition
Information Extraction
Corpora and Their Annotation
Evaluation of Text Mining in Biology
Integrating Text Mining with Data Mining