Impala 使用指南
所有文档
menu

对象存储 BOS

Impala 使用指南

产品详情自助选购

Impala

Impala 是用于处理存储在 Hadoop 集群中的大量数据的 MPP(大规模并行处理)SQL 查询引擎。 它是一个用 C ++ 和 Java 编写的开源软件。 与其他 Hadoop 的 SQL 引擎相比,它提供了高性能和低延迟。

安装步骤

安装 metastore

参考 Presto使用指南 一文中"基于 S3 的 presto 访问"一节安装并配置 metastore

安装 impala

  1. 下载 rpm 包,地址http://archive.cloudera.com/cdh5/repo-as-tarball/5.14.0/cdh5.14.0-centos6.tar.gz 使用 tar -zxvf cdh5.14.0-centos6.tar.gz 解压之后,cd cdh/5.14.0,创建本地 server,运行
python -m SimpleHTTPServer 8092 &
  1. 配置本地 yum 源
vim /etc/yum.repos.d/localimp.repo
[localimp]
name=localimp
baseurl=http://127.0.0.1:8092/ 
gpgcheck=0
enabled=1
  1. 使用如下命令安装
yum install -y impala impala-server impala-state-store impala-catalog impala-shell
  1. 将 hive 的配置文件(metastore-site.xml)复制到 impala 的配置路径下:
#把配置好的conf复制到/etc/impala/conf/路径下
cp metastore/conf/metastore-site.xml  /etc/impala/conf/hive-site.xml 
  1. 增加 s3 配置 vim /etc/impala/conf/core-site.xml,参考 impala-s3 配置
<configuration>
 <property>
     <name>fs.s3a.block.size</name>
     <value>134217728 </value>
 </property>
<property>
    <name>fs.azure.user.agent.prefix</name>
    <value>User-Agent: APN/1.0 Hortonworks/1.0 HDP/None</value>
</property>
<property>
    <name>fs.s3a.connection.maximum</name>
    <value>1500</value>
</property>
<property>
    <name>fs.defaultFS</name>
        <value>s3a://${bucket}</value>
        </property>
<property>
    <name>fs.s3a.endpoint</name>
        <value>s3.bj.bcebos.com</value>
            <description>endpoint</description>
            </property>
<property>
    <name>fs.s3a.access.key</name>
        <value>${AK}</value>
            <description>AK</description>
            </property>
<property>
    <name>fs.s3a.secret.key</name>
        <value>${SK}</value>
            <description>SK</description>
            </property>
</configuration>
  1. 修改 bigtop 的配置,设置 JAVA_HOME,并确保 impala 用户也具有访问权限。 修改 bigtop 的 java_home 路径(3台机器)
vim /etc/default/bigtop-utils
export JAVA_HOME=/export/servers/jdk1.8.0_65
  1. 设置 mysql 驱动的软链接:
ln -s mysql-connector-java-5.1.32.jar /usr/share/java/mysql-connector-java.jar
  1. 启动 impala
service impala-state-store start
service impala-catalog start
service impala-server start

启动后可在 /var/log/impala 文件夹下查看日志 运行 impala-shell 命令:

[root@my-node impala]# impala-shell
Starting Impala Shell without Kerberos authentication
Connected to my-node:21000
Server version: impalad version 2.11.0-cdh5.14.0 RELEASE (build d68206561bce6b26762d62c01a78e6cd27aa7690)
***********************************************************************************
Welcome to the Impala shell.
(Impala Shell v2.11.0-cdh5.14.0 (d682065) built on Sat Jan  6 13:27:16 PST 2018)

When pretty-printing is disabled, you can use the '--output_delimiter' flag to set
the delimiter for fields in the same row. The default is ','.
***********************************************************************************
[my-node:21000] > show databases;
Query: show databases
+------------------+----------------------------------------------+
| name             | comment                                      |
+------------------+----------------------------------------------+
| _impala_builtins | System database for Impala builtin functions |
| default          | Default Hive database                        |
+------------------+----------------------------------------------+
Fetched 2 row(s) in 0.16s
[my-node:21000] > CREATE DATABASE db_on_s3 LOCATION 's3a://my-bigdata/impala/s3';
Query: create DATABASE db_on_s3 LOCATION 's3a://my-bigdata/impala/s3'
WARNINGS: Path 's3a://my-bigdata/impala' cannot be reached: Path does not exist.

Fetched 0 row(s) in 2.51s
[my-node:21000] > show databases;
Query: show databases
+------------------+----------------------------------------------+
| name             | comment                                      |
+------------------+----------------------------------------------+
| _impala_builtins | System database for Impala builtin functions |
| db_on_s3         |                                              |
| default          | Default Hive database                        |
+------------------+----------------------------------------------+
Fetched 3 row(s) in 0.01s
[my-node:21000] > use db_on_s3;
Query: use db_on_s3
[my-node:21000] > create table hive_test (a int, b string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
Query: create table hive_test (a int, b string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
Fetched 0 row(s) in 2.11s
[my-node:21000] > insert into hive_test(a, b) values(1,'tom');
Query: insert into hive_test(a, b) values(1,'tom')
Query submitted at: 2023-09-13 19:20:26 (Coordinator: http://my-node:25000)
Query progress can be monitored at: http://my-node:25000/query_plan?query_id=ec4463f20d37dfe4:5192e94f00000000
Modified 1 row(s) in 7.57s
[my-node:21000] > insert into hive_test(a, b) values(2,'jerry');
Query: insert into hive_test(a, b) values(2,'jerry')
Query submitted at: 2023-09-13 19:20:42 (Coordinator: http://my-node:25000)
Query progress can be monitored at: http://my-node:25000/query_plan?query_id=694061adf492a154:4a24912d00000000
Modified 1 row(s) in 1.02s

在对应路径下可看见新生成的文件:

image (1)_6360b52.png

上一篇
Presto 使用指南
下一篇
Logstash数据存储到BOS