Hbase使用BOS作为底层存储
更新时间:2024-09-23
HBase
HBase 是一个面向列式存储的分布式数据库,旨在提供对大量结构化数据的快速随机访问,底层存储一般基于 HDFS 实现。
前提条件
首先参考 BOS HDFS 一文安装并配置 BOS HDFS,本机安装的 Hadoop 版本为 hadoop-3.3.2,参考文中"开始使用"一节完成 BOS HDFS 的基本试用,并设置环境变量:
export HADOOP_HOME=/opt/hadoop-3.3.2
export HADOOP_CLASSPATH=`$HADOOP_HOME/bin/hadoop classpath
安装
1.HBase 环境准备
# 下载到一个路径
wget https://www.apache.org/dyn/closer.lua/hbase/2.6.0/hbase-2.6.0-bin.tar.gz
# 解压
tar zxvf hbase-2.6.0-bin.tar.gz
2.配置
在conf/hbase-env.sh里配置JAVA_HOME:
# 选择机器已安装的java环境,至少1.8版本
export JAVA_HOME=/usr/java/jdk1.8.0/
在conf/hbase-site.xml里配置,使用BOS存储数据:
<property>
<name>hbase.rootdir</name>
<value>bos://{bucket}/hbase</value>
<description>此项用于设置持久化存储HBase数据路径,使用对象存储bos时,需设置为带"bos://{bucket}/"前缀的路径</description>
</property>
<property>
<name>hbase.wal.dir</name>
<value></value>
<description>此项用于设置WAL数据路径,要求低延迟,一般使用HDFS存储。如果使用BOS需确保集群的BOS-HDFS版本支持hflush/hfsync接口</description>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/var/zookeeper</value>
<description>此项用于设置存储ZooKeeper的元数据,如果不设置默认存在/tmp下,重启时数据会丢失。</description>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>false</value>
<description>此项用于设置分布式集群模式,false为单机或伪分布式模式,true为完全分布式模式</description>
</property>
使用
1.启动 HBase
./bin/start-hbase.sh
2.创建表
./bin/hbase shell
>status # 查看集群状态
1 active master, 0 backup masters, 1 servers, 0 dead, 2.0000 average load
Took 0.7840 seconds
>create 'students','name','age' # 创建students表
2024-09-02 19:23:25,153 INFO [main] client.HBaseAdmin (HBaseAdmin.java:postOperationResult(3746)) - Operation: CREATE, Table Name: default:students, procId: 9 completed
Created table students
Took 2.4410 seconds
=> Hbase::Table - students
3.插入数据
>put 'students', 'row1', 'name:lastname', 'zhang'
Took 0.0820 seconds
> put 'students', 'row1', 'name:firstname', 'san'
Took 0.0900 seconds
> put 'students', 'row1', 'age', '23'
Took 0.0990 seconds
> put 'students', 'row2', 'name:lastname', 'li'
Took 0.0710 seconds
> put 'students', 'row2', 'name:firstname', 'si'
Took 0.0520 seconds
> put 'students', 'row2', 'age', '30'
Took 0.0920 seconds
查看BOS上存储的数据
4.全表扫描
>scan 'students'
ROW COLUMN+CELL
row1 column=age:, timestamp=2024-09-02T19:37:56.571, value=23
row1 column=name:firstname, timestamp=2024-09-02T19:37:31.480, value=san
row1 column=name:lastname, timestamp=2024-09-02T19:36:09.318, value=zhang
row2 column=age:, timestamp=2024-09-02T19:38:50.066, value=30
row2 column=name:firstname, timestamp=2024-09-02T19:38:38.772, value=si
row2 column=name:lastname, timestamp=2024-09-02T19:38:24.245, value=li
2 row(s)
Took 0.0350 seconds
5.退出
>quit