ISTE Ltd., John Wiley & Sons, Inc, 2016. — 347 p. — ISBN: 1848219024.
This book is at the very heart of linguistics. It provides the theoretical and methodological framework needed to create a successful linguistic project.
Potential applications of descriptive linguistics include spell-checkers, intelligent search engines, information extractors and annotators, automatic summary producers, automatic translators, and more. These applications have considerable economic potential, and it is therefore important for linguists to make use of these technologies and to be able to contribute to them.
The author provides linguists with tools to help them formalize natural languages and aid in the building of software able to automatically process texts written in natural language (Natural Language Processing, or NLP).
Computers are a vital tool for this, as characterizing a phenomenon using mathematical rules leads to its formalization. NooJ - a linguistic development environment software developed by the author - is described and practically applied to examples of NLP.
Contents:
Acknowledgments1. Introduction: the ProjectCharacterizing a set of infinite sizes.
Computers and linguistics.
Levels of formalization.
Not applicable.
NLP applications.
Linguistic formalisms: NooJ.
Conclusion and structure of this book.
Exercises.
Internet links.
I. Linguistic Units2. Formalizing the AlphabetBits and bytes.
Digitizing information.
Representing natural numbers.
Encoding characters.
Alphabetical order.
Classification of characters.
Conclusion.
Exercises.
Internet links.
3. Defining VocabularyMultiple vocabularies and the evolution of vocabulary.
Derivation.
Atomic linguistic units (ALUs).
Multiword units versus analyzable sequences of simple words.
Conclusion.
Exercises.
Internet links.
4. Electronic DictionariesCould editorial dictionaries be reused?
LADL electronic dictionaries.
Dubois and Dubois-Charlier electronic dictionaries.
Specifications for the construction of an electronic dictionary.
Conclusion.
Exercises.
Internet links.
II. Languages, Grammars and Machines5. Languages, Grammars, and MachinesDefinitions.
Generative grammars.
Chomsky-Schützenberger hierarchy.
The NooJ approach.
Conclusion.
Exercises.
Internet links.
6. Regular GrammarsRegular expressions.
Finite-state graphs.
Non-deterministic and deterministic graphs.
Minimal deterministic graphs.
Kleene’s theorem.
Regular expressions with outputs and finite-state transducers.
Extensions of regular grammars.
Conclusion.
Exercises.
Internet links.
7. Context-Free GrammarsRecursion.
Parse trees.
Conclusion.
Exercises.
Internet links.
8. Context-Sensitive GrammarsThe NooJ approach.
NooJ contextual constraints.
NooJ variables.
Conclusion.
Exercises.
Internet links.
9. Unrestricted GrammarsLinguistic adequacy.
Conclusion.
Exercise.
Internet links.
III. Automatic Linguistic Parsing10. Text Annotation StructureParsing a text.
Annotations.
Text annotation structure (TAS).
Exercise.
Internet links.
11. Lexical AnalysisTokenization.
Word forms.
Morphological analyzes.
Multiword unit recognition.
Recognizing expressions.
Conclusion.
Exercise.
12. Syntactic AnalysisLocal grammars.
Structural grammars.
Conclusion.
Exercises.
Internet links.
13. Transformational AnalysisImplementing transformations.
Theoretical problems.
Transformational analysis with NooJ.
Question answering.
Semantic analysis.
Machine translation.
Conclusion.
Exercises.
Internet links.
Conclusion.
Bibliography.
Index