Lin J., Dyer C. Data-Intensive Text Processing with MapReduce

Файл формата pdf
размером 1,70 МБ

Добавлен пользователем Shushimora 16.05.2012 16:48
Описание отредактировано 19.05.2012 09:59

Lin J., Dyer C. Data-Intensive Text Processing with MapReduce

Издательство Morgan & Claypool, 2010, -176 pp.

MapReduce is a programming model for expressing distributed computations on massive amounts of data and an execution framework for large-scale data processing on clusters of commodity servers. It was originally developed by Google and built on well-known principles in parallel and distributed processing dating back several decades. MapReduce has since enjoyed widespread adoption via an open-source implementation called Hadoop, whose development was led by Yahoo (now an Apache project). Today, a vibrant software ecosystem has sprung up around Hadoop, with significant activity in both industry and academia.
This book is about scalable approaches to processing large amounts of text with MapReduce. Given this focus, it makes sense to start with the most basic question: Why? There are many answers to this question, but we focus on two. First, big data" is a fact of the world, and therefore an issue that real-world systems must grapple with. Second, across a wide range of text processing applications, more data translates into more effective algorithms, and thus it makes sense to take advantage of the plentiful amounts of data that surround us.

MapReduce Basics
MapReduce Algorithm Design
Inverted Indexing for Text Retrieval
Graph Algorithms
EM Algorithms for Text Processing
Closing Remarks

Чтобы скачать этот файл зарегистрируйтесь и/или войдите на сайт используя форму сверху.
Регистрация

Узнайте сколько стоит уникальная работа конкретно по Вашей теме:
Сколько стоит заказать работу?

Смотри также

Подробнее

Cuesta H. Practical Data Analysis

Раздел: Искусственный интеллект → Интеллектуальный анализ данных

Packt Publishing, 2013. — 339 p. — ISBN: 978-1-78328-099-5. Transform, model, and visualize your data through hands-on projects, developed in open source tools. Overview 1) Explore how to analyze your data in various innovative ways and turn them into insight. 2) Learn to use the D3js visualization tool for exploratory data analysis. 3) Understand how to work with graphs and...

9,75 МБ
добавлен 12.03.2014 01:32
описание отредактировано 12.03.2014 07:08

Подробнее

Karau H., Konwinski A., Wendell P., Zaharia M. Learning Spark

Раздел: Распределенные вычисления и системы → Apache Spark

O’Reilly Media, 2015. — 274 p. — e-ISBN: 978-1-4493-5904-1, ISBN10: 1-4493-5904-3. Data in all domains is getting bigger. How can you work with it efficiently? This book introduces Apache Spark, the open source cluster computing system that makes data analytics fast to write and fast to run. With Spark, you can tackle big datasets quickly through simple APIs in Python, Java,...

7,82 МБ
добавлен 03.04.2015 14:04
описание отредактировано 16.06.2017 19:29

Подробнее

Miner D., Shook A. MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems

Раздел: Распределенные вычисления и системы → Apache Hadoop

O’Reilly Media, 2013. — 233 p. — ISBN: 1449327176, 978-1449327170. На англ. языке. MapReduce — модель распределённых вычислений, представленная компанией Google, используемая для параллельных вычислений над очень большими (несколько петабайт) наборами данных в компьютерных кластерах. MapReduce — это фреймворк для вычисления некоторых наборов распределенных задач с...

9,05 МБ
добавлен 04.02.2014 03:32
описание отредактировано 10.09.2016 10:43

Подробнее

Nisbet R., Elder J., Miner G. Handbook of Statistical Analysis and Data Mining Applications

Раздел: Искусственный интеллект → Интеллектуальный анализ данных

Academic Press, 2009. — 864 p. — ISBN: 0123747651. Robert Nisbet, Pacific Capital Bank Corporation, Santa Barbara, CA, USA John Elder, Elder Research, Inc. and the University of Virginia, Charlottesville, USA Gary Miner, StatSoft, Inc. , Tulsa, OK, USA Description The Handbook of Statistical Analysis and Data Mining Applications is a comprehensive professional reference book...

41,49 МБ
дата добавления неизвестна
описание отредактировано 08.05.2010 23:04

Подробнее

White T. Hadoop: The Definitive Guide

Раздел: Распределенные вычисления и системы → Apache Hadoop

4th Edition. — O’Reilly, 2015. — 805 p. — ISBN: 1491901632. На англ. языке. Get ready to unlock the power of your data. With the fourth edition of this comprehensive guide, you’ll learn how to build and maintain reliable, scalable, distributed systems with Apache Hadoop. This book is ideal for programmers looking to analyze datasets of any size, and for administrators who want...

11,10 МБ
добавлен 01.04.2015 20:10
описание отредактировано 10.09.2016 10:43

Главная

Наверх