Impala 使用指南
更新时间:2024-08-15
Impala
Impala 是用于处理存储在 Hadoop 集群中的大量数据的 MPP(大规模并行处理)SQL 查询引擎。 它是一个用 C ++ 和 Java 编写的开源软件。 与其他 Hadoop 的 SQL 引擎相比,它提供了高性能和低延迟。
安装步骤
安装 metastore
参考 Presto使用指南 一文中"基于 S3 的 presto 访问"一节安装并配置 metastore
安装 impala
- 下载 rpm 包,地址http://archive.cloudera.com/cdh5/repo-as-tarball/5.14.0/cdh5.14.0-centos6.tar.gz
使用
tar -zxvf cdh5.14.0-centos6.tar.gz
解压之后,cd cdh/5.14.0,创建本地 server,运行
python -m SimpleHTTPServer 8092 &
- 配置本地 yum 源
vim /etc/yum.repos.d/localimp.repo
[localimp]
name=localimp
baseurl=http://127.0.0.1:8092/
gpgcheck=0
enabled=1
- 使用如下命令安装
yum install -y impala impala-server impala-state-store impala-catalog impala-shell
- 将 hive 的配置文件(metastore-site.xml)复制到 impala 的配置路径下:
#把配置好的conf复制到/etc/impala/conf/路径下
cp metastore/conf/metastore-site.xml /etc/impala/conf/hive-site.xml
- 增加 s3 配置 vim /etc/impala/conf/core-site.xml,参考 impala-s3 配置
<configuration>
<property>
<name>fs.s3a.block.size</name>
<value>134217728 </value>
</property>
<property>
<name>fs.azure.user.agent.prefix</name>
<value>User-Agent: APN/1.0 Hortonworks/1.0 HDP/None</value>
</property>
<property>
<name>fs.s3a.connection.maximum</name>
<value>1500</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>s3a://${bucket}</value>
</property>
<property>
<name>fs.s3a.endpoint</name>
<value>s3.bj.bcebos.com</value>
<description>endpoint</description>
</property>
<property>
<name>fs.s3a.access.key</name>
<value>${AK}</value>
<description>AK</description>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>${SK}</value>
<description>SK</description>
</property>
</configuration>
- 修改 bigtop 的配置,设置 JAVA_HOME,并确保 impala 用户也具有访问权限。 修改 bigtop 的 java_home 路径(3台机器)
vim /etc/default/bigtop-utils
export JAVA_HOME=/export/servers/jdk1.8.0_65
- 设置 mysql 驱动的软链接:
ln -s mysql-connector-java-5.1.32.jar /usr/share/java/mysql-connector-java.jar
- 启动 impala
service impala-state-store start
service impala-catalog start
service impala-server start
启动后可在 /var/log/impala 文件夹下查看日志 运行 impala-shell 命令:
[root@my-node impala]# impala-shell
Starting Impala Shell without Kerberos authentication
Connected to my-node:21000
Server version: impalad version 2.11.0-cdh5.14.0 RELEASE (build d68206561bce6b26762d62c01a78e6cd27aa7690)
***********************************************************************************
Welcome to the Impala shell.
(Impala Shell v2.11.0-cdh5.14.0 (d682065) built on Sat Jan 6 13:27:16 PST 2018)
When pretty-printing is disabled, you can use the '--output_delimiter' flag to set
the delimiter for fields in the same row. The default is ','.
***********************************************************************************
[my-node:21000] > show databases;
Query: show databases
+------------------+----------------------------------------------+
| name | comment |
+------------------+----------------------------------------------+
| _impala_builtins | System database for Impala builtin functions |
| default | Default Hive database |
+------------------+----------------------------------------------+
Fetched 2 row(s) in 0.16s
[my-node:21000] > CREATE DATABASE db_on_s3 LOCATION 's3a://my-bigdata/impala/s3';
Query: create DATABASE db_on_s3 LOCATION 's3a://my-bigdata/impala/s3'
WARNINGS: Path 's3a://my-bigdata/impala' cannot be reached: Path does not exist.
Fetched 0 row(s) in 2.51s
[my-node:21000] > show databases;
Query: show databases
+------------------+----------------------------------------------+
| name | comment |
+------------------+----------------------------------------------+
| _impala_builtins | System database for Impala builtin functions |
| db_on_s3 | |
| default | Default Hive database |
+------------------+----------------------------------------------+
Fetched 3 row(s) in 0.01s
[my-node:21000] > use db_on_s3;
Query: use db_on_s3
[my-node:21000] > create table hive_test (a int, b string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
Query: create table hive_test (a int, b string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
Fetched 0 row(s) in 2.11s
[my-node:21000] > insert into hive_test(a, b) values(1,'tom');
Query: insert into hive_test(a, b) values(1,'tom')
Query submitted at: 2023-09-13 19:20:26 (Coordinator: http://my-node:25000)
Query progress can be monitored at: http://my-node:25000/query_plan?query_id=ec4463f20d37dfe4:5192e94f00000000
Modified 1 row(s) in 7.57s
[my-node:21000] > insert into hive_test(a, b) values(2,'jerry');
Query: insert into hive_test(a, b) values(2,'jerry')
Query submitted at: 2023-09-13 19:20:42 (Coordinator: http://my-node:25000)
Query progress can be monitored at: http://my-node:25000/query_plan?query_id=694061adf492a154:4a24912d00000000
Modified 1 row(s) in 1.02s
在对应路径下可看见新生成的文件: