sqoop的安装与使用

sqoop的安装与使用

原文地址:sqoop的安装与使用作者:ericzhang

1、编译安装

http://sqoop.apache.org/下载源码

a)编译所需基本工具(Compiling Sqoop requires the following tools:)

* Apache ant (1.7.1)
* Java JDK 1.6

b)编译依赖工具(Additionally, building the documentation requires these tools:)

* asciidoc
* make
* python 2.5+
* xmlto
* tar
* gzip

c)执行编译

ant package

编译完了,可以直接到build文件夹下面去获取编译后的版本。

2、部署环境

a)配置环境变量

export JAVA_HOME=$HOME/java
export HADOOP_HOME=/home/hadoop/hadoop
export HBASE_HOME=$HOME/hbase
export SQOOP_HOME=/opt/sqoop
export PATH=$HBASE_HOME/bin:$HADOOP_HOME/bin:$JAVA_HOME/bin:$HOME/bin:$PATH:$SQOOP_HOME/bin

b)执行导入hbase

1)依据mysql表或查询语句自动产生序列化类

sqoop codegen –connect jdbc:mysql://aaa/db –username XXX –password XXX –class-name com.sqoop.relation.QueryResult –outdir src/main/java –query “select id from relation_$I where $CONDITIONS”

2)执行导入程序
sqoop import -Dmapred.job.name=$HBASE_TABLE.$PORT.$I -Dmapred.reduce.tasks.speculative.execution=false -Dmapred.map.tasks.speculative.execution=false -Dsqoop.inline.lob.length.max=1073741824 -m 10 –bulk-load-dir /user/hbase/tmp_sqoop_$I –connect jdbc:mysql://aaa:$PORT/fb –username sqoop –password XXX –boundary-query “select min(host), max(host) from relation_$I ” –query “select host, guest, time from relation_$I where $CONDITIONS and unix_timestamp(time) != 0 $OTHER_WHERE_CLAUSE ” –hbase-table $HBASE_TABLE –column-family f –hbase-row-key host –split-by host –jar-file $JAR_FILE –class-name ‘com.sqoop.relation.QueryResult’ > nohup.out.$HBASE_TABLE.$PORT.$I 2>&1

在导入的时候,加上标黄部分的参数是指批量导入,能够大幅度提高导入性能,加快导入速度!