Flume 数据存储到 BOS
更新时间:2024-08-23
Flume
Flume 是一个分布式、可靠性和高可用的海量日志聚合系统,支持在系统中定制各类数据发送方,用于收集数据;同时,Flume 提供对数据进行简单处理,并写到各种数据接收方(可定制)的能力。
Flume 支持多种 Sink 类型,可以借助 HDFS Sink 将收集到的数据存储到 BOS。
开始
1. 下载并安装 apache-flume
略
2. 配置环境
如果已有 hadoop 环境,且已配置过访问 BOS,本环节跳过; 否则
- 将 bos-hdfs 的 jar 包下载到 /opt/apache-flume-1.xx.0-bin/lib 目录下;
- 将 hadoop 下的配置文件 core-site.xml 添加访问 BOS 的相关配置,并复制到 /opt/apache-flume-1.xx.0-bin/conf 目录下。
3. 创建 flume 配置文件
把 flume 的 StressSource 作为 source,使用内存 channel,通过 HDFS 协议写入 BOS。
#ss2bos.properties
agent.sources = stress_source
agent.channels = mem_channel
agent.sinks = bos_hdfs_sink
agent.sources.stress_source.type = org.apache.flume.source.StressSource
agent.sources.stress_source.channels = mem_channel
agent.sources.stress_source.size = 1024
agent.sources.stress_source.maxTotalEvents = 1000
agent.sources.stress_source.maxEventsPerSecond = 10
agent.sources.stress_source.batchSize=10
agent.channels.mem_channel.type = memory
agent.channels.mem_channel.capacity = 1000000
agent.channels.mem_channel.transactionCapacity = 100
agent.sinks.bos_hdfs_sink.channel = mem_channel
agent.sinks.bos_hdfs_sink.type = hdfs
agent.sinks.bos_hdfs_sink.hdfs.useLocalTimeStamp = true
agent.sinks.bos_hdfs_sink.hdfs.filePrefix = %{host}_bos_hdfs_sink #host区分文件,避免并发写冲突
agent.sinks.bos_hdfs_sink.hdfs.path = bos://{your bucket}/flume/%Y-%m-%d-%H-%M #替换bucket路径
agent.sinks.bos_hdfs_sink.hdfs.fileType = DataStream
agent.sinks.bos_hdfs_sink.hdfs.writeFormat = Text
agent.sinks.bos_hdfs_sink.hdfs.rollSize = 0
agent.sinks.bos_hdfs_sink.hdfs.rollCount = 100
agent.sinks.bos_hdfs_sink.hdfs.rollInterval = 0
agent.sinks.bos_hdfs_sink.hdfs.batchSize = 100
agent.sinks.bos_hdfs_sink.hdfs.round = true
agent.sinks.bos_hdfs_sink.hdfs.roundValue = 10
agent.sinks.bos_hdfs_sink.hdfs.roundUnit = minute
4. 启动 Flume agent
./bin/flume-ng agent -n agent -c conf/ -f ss2bos.properties