Hadoop安装记录

安装环境说明:
主机:win7 64位 家庭版 i5 cpu/4G内存
Vmware版本:9.0.0-812388
Ubuntu版本:ubuntu-12.04.1-server-i386
Java版本:jdkk-6u34-linux-i568
Hadoop: 0.20.2

1. 安装虚拟机和操作系统

使用Vmare创建三个虚拟机,一个当NameNode,另外两个是DataNode。虚拟机的网络设置选择桥接模式。

安装好一个可以使用vmware的clone功能复制出另外两个。

2. 修改主机名称

sudo vi /etc/hostname

NameNode服务器名称:hnn,两个DataNode的名称分别为:hdn1, hdn2

修改时区:

sudo dpkg-reconfigure tzdata

3. 修改三台主机为静态IP地址

因使用路由器共享上网,ip地址网段为192.168.0
这样设置的好处是主机的win7系统和三台linux在同一网段,可以使用主机的ssh客户端直接连接虚拟机

sudo vi /etc/network/interface

auto eth0
iface e/th0 inet static

address 192.168.0.128
netmask 255.255.255.0
gateway 192.168.116.255

修改DNS服务器

vi /etc/resolvconf/resolv.conf.d/tail

nameserver 202.102.224.68

sudo restart resolvconf

4. 修改host名称

sudo vi /etc/hosts

hnn: 192.168.0.128
hdn1: 192.1/68.0.129
hdn2: 192.168.0.130

保存退出后,使用重启networking命令让新配置生效。

sudo /etc/init.d/networking restart

或者

sudo ifconfig eth0 down
sudo ifconfig eth0 up

如果网络地址不生效,需要重启linux系统

5. 安装SSH服务

sudo apt-get install openssh-server

6. 使用usb复制java安装程序(这一步因为最开始虚拟机使用的是NAT模式,和host机器不在一个网段)
# 查看一下U盘在那里

sudo fdisk -l
sudo mount  -t vfat /dev/sdb1 /media/u

# U盘卸载(注意:此时不能够在/media/u目录下输入前面的卸载命令,否则系统会认为你的“设备忙”而拒绝U盘的卸载)

sudo umount /media/u

7. 安装java

adam@hnn:~/java$ ./jdk-6u34-linux-i586.bin

安装后在安装目录建立符号链接

adam@hnn:~/java$ ln -s jdk1.6.0_34 jdk

# 设置java相关的环境变量

adam@hnn:~/java$ sudo vi /etc/profile

在文件最后添加

#set java environment
export JAVA_HOME=/home/adam/java/jdk
export JRE_HOME=/home/adam/java/jdk/jre
export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH

查看环境变量是否正确

echo $JAVA_HOME

/home/adam/java/jdk

8. 配置SSH,使NameNode可以不使用口令ssh到两台DataNode

ssh-keygen -t rsa
ssh-copy-id hnn
ssh-copy-id hdn1
ssh-copy-id hdn2

9. 安装hadoop 0.20.2版本,修改配置文件
注意hadoop-env.sh里面的JAVA_HOME的环境变量必须设置,即使在/etc/profile中设置过也不行

NameNode修改完毕后复制到其他两个节点

scp -r ./hadoop-0.20.2 hdn1:/home/adam/
scp -r ./hadoop-0.20.2 hdn2:/home/adam/

10. 格式化NameNode

hadoop namenode -format
12/10/10 07:09:48 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = hnn/192.168.0.128
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 0.20.2
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
12/10/10 07:09:48 INFO namenode.FSNamesystem: fsOwner=adam,adam,adm,cdrom,sudo,dip,plugdev,lpadmin,sambashare
12/10/10 07:09:48 INFO namenode.FSNamesystem: supergroup=supergroup
12/10/10 07:09:48 INFO namenode.FSNamesystem: isPermissionEnabled=true
12/10/10 07:09:48 INFO common.Storage: Image file of size 94 saved in 0 seconds.
12/10/10 07:09:48 INFO common.Storage: Storage directory /tmp/hadoop-adam/dfs/name has been successfully formatted.
12/10/10 07:09:48 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hnn/192.168.0.128
************************************************************/

11. 启动hadoop

start-all.sh
starting namenode, logging to /home/adam/hadoop/bin/../logs/hadoop-adam-namenode-hnn.out
hdn1: starting datanode, logging to /home/adam/hadoop/bin/../logs/hadoop-adam-datanode-hdn1.out
hdn2: starting datanode, logging to /home/adam/hadoop/bin/../logs/hadoop-adam-datanode-hdn2.out
hnn: starting secondarynamenode, logging to /home/adam/hadoop/bin/../logs/hadoop-adam-secondarynamenode-hnn.out
starting jobtracker, logging to /home/adam/hadoop/bin/../logs/hadoop-adam-jobtracker-hnn.out
hdn1: starting tasktracker, logging to /home/adam/hadoop/bin/../logs/hadoop-adam-tasktracker-hdn1.out
hdn2: starting tasktracker, logging to /home/adam/hadoop/bin/../logs/hadoop-adam-tasktracker-hdn2.out

12. hadoop运行情况监测
查看hadoop的版本:

hadoop version
Hadoop 0.20.2
Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707
Compiled by chrisdo on Fri Feb 19 08:07:34 UTC 2010

NameServer的jps运行结果:

jps
5720 JobTracker
5652 SecondaryNameNode
5788 Jps
5519 NameNode

DataNode1的jps运行结果:

jps
3725 Jps
3487 DataNode
3563 TaskTracker

DataNode2的jps运行结果:

jps
3850 Jps
3487 DataNode
3563 TaskTracker

查看hnn上jobtracker的web管理页面
url:http://192.168.0.128:50030/jobtracker.jsp
jobtracker页面截图:http://xingaiming.com/wp-content/uploads/2012/10/hadoop-jobtracker-webpage.png

查看hnn上dfs的web管理页面
url:http://192.168.0.128:50070/dfshealth.jsp
dfs页面截图:http://xingaiming.com/wp-content/uploads/2012/10/hadoop-dfs-webpage.png

13. hdfs文件操作测试

echo "123" > test.txt
# 创建文件夹
hadoop fs -mkdir test
# 查看test文件夹
hadoop fs -ls
Found 1 items
drwxr-xr-x   - adam supergroup          0 2012-10-10 07:54 /user/adam/test
# 上传文件
hadoop fs -put ~/test.txt /user/adam/test/test.txt
12/10/10 07:56:24 INFO datanode.DataNode: Receiving block blk_1839212720376014834_1100 src: /192.168.0.128:33301 dest: /192.168.0.128:50010
12/10/10 07:56:24 INFO DataNode.clienttrace: src: /192.168.0.128:33301, dest: /192.168.0.128:50010, bytes: 4, op: HDFS_WRITE, cliID: DFSClient_1267113820, srvID: DS-640018605-192.168.0.128-50010-1349879442516, blockid: blk_1839212720376014834_1100
12/10/10 07:56:24 INFO datanode.DataNode: PacketResponder 0 for block blk_1839212720376014834_1100 terminating
# 查看上传的文件
hadoop fs -cat /user/adam/test/test.txt
123
12/10/10 07:57:30 INFO DataNode.clienttrace: src: /192.168.0.128:50010, dest: /192.168.0.128:33306, bytes: 8, op: HDFS_READ, cliID: DFSClient_326926642, srvID: DS-640018605-192.168.0.128-50010-1349879442516, blockid: blk_1839212720376014834_1100

14. Hadoop启动发现错误的处理

NameNode服务器重新启动后,出现无法启动NameNode进程的现象,在hadoop-adam-namenode-hnn.log的错误提示如下:

2012-10-10 08:34:54,775 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean
2012-10-10 08:34:54,845 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory /tmp/hadoop-adam/dfs/name does not exist.
2012-10-10 08:34:54,848 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed.
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /tmp/hadoop-adam/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.

解决方法:
在core-site.xml中增加如下配置项,并将配置文件同步到两个DataNode节点
在NameNode创建指定的文件夹,创建下层dfs/name文件夹,

<property>
<name>hadoop.tmp.dir</name>
<value>/home/adam/hadoop/tmp/hadoop-adam</value>
</property>

DataNode的启动日志错误如下:

2012-10-10 08:53:56,128 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /home/adam/hadoop-0.20.2/data: namenode namespaceID = 1688337869; datanode namespaceID = 812785141

解决方法:
修改datanode中 /homme/adam/hadoop/data/current/VERSION中的namespaceID,与namenode上的namespaceID保持一致

Comments are closed.