2008年3月18日 星期二

一個值得研究的領域 - Hadoop

一個值得研究的領域 - Hadoop

Hadoop

.Hadoop is a software platform that lets one easily write and run applications that process vast amounts of data.

.Hadoop implements MapReduce, using the Hadoop Distributed File System (HDFS) MapReduce divides applications into many small blocks of work. HDFS creates multiple replicas of data blocks for reliability, placing them on compute nodes around the cluster. MapReduce can then process the data where it is located.

.Hadoop is a Lucene sub-project that contains the distributed computing platform that was formerly a part of Nutch.

Hadoop相關資源

Hadoop學習筆記一 簡要介紹

Hadoop學習筆記二 安裝部署

Getting Started with Hadoop, Part 1

Yahoo!'s bet on Hadoop

Open Source Distributed Computing: Yahoo's Hadoop Support

Yahoo! Launches World's Largest Hadoop Production Application

Hadoop Wikipedia

Google Code for Distributed Systems

Running Hadoop MapReduce on Amazon EC2 and Amazon S3

Building an Inverted Index for an Online E-Book Store

Running Hadoop On Ubuntu Linux (Single-Node Cluster)

Running Hadoop On Ubuntu Linux (Multi-Node Cluster)

MapReduce相關資源

Why Should You Care About MapReduce?

Google: MapReduce in a Week

MapReduce

Can Your Programming Language Do This?[中譯]


University of Washington將Hadoop當作一個課程來做教學,如以下網址: