Flume数据存储到BOS
更新时间:2024-03-22
Flume
Flume是一个分布式、可靠性和高可用的海量日志聚合系统,支持在系统中定制各类数据发送方,用于收集数据;同时,FLume提供对数据进行简单处理,并写到各种数据接收方(可定制)的能力。
Flume支持多种Sink类型,可以借助HDFS Sink将收集到的数据存储到BOS。
开始
1. 下载并安装apache-flume
略
2. 配置环境
如果已有hadoop环境,且已配置过访问BOS,本环节跳过; 否则
- 将bos-hdfs的jar包下载到/opt/apache-flume-1.xx.0-bin/lib目录下;
- 将hadoop下的配置文件core-site.xml添加访问BOS的相关配置,并复制到/opt/apache-flume-1.xx.0-bin/conf目录下。
3. 创建flume配置文件
把flume的StressSource作为source,使用内存channel,通过HDFS协议写入BOS。
#ss2bos.properties
agent.sources = stress_source
agent.channels = mem_channel
agent.sinks = bos_hdfs_sink
agent.sources.stress_source.type = org.apache.flume.source.StressSource
agent.sources.stress_source.channels = mem_channel
agent.sources.stress_source.size = 1024
agent.sources.stress_source.maxTotalEvents = 1000
agent.sources.stress_source.maxEventsPerSecond = 10
agent.sources.stress_source.batchSize=10
agent.channels.mem_channel.type = memory
agent.channels.mem_channel.capacity = 1000000
agent.channels.mem_channel.transactionCapacity = 100
agent.sinks.bos_hdfs_sink.channel = mem_channel
agent.sinks.bos_hdfs_sink.type = hdfs
agent.sinks.bos_hdfs_sink.hdfs.useLocalTimeStamp = true
agent.sinks.bos_hdfs_sink.hdfs.filePrefix = %{host}_bos_hdfs_sink #host区分文件,避免并发写冲突
agent.sinks.bos_hdfs_sink.hdfs.path = bos://{your bucket}/flume/%Y-%m-%d-%H-%M #替换bucket路径
agent.sinks.bos_hdfs_sink.hdfs.fileType = DataStream
agent.sinks.bos_hdfs_sink.hdfs.writeFormat = Text
agent.sinks.bos_hdfs_sink.hdfs.rollSize = 0
agent.sinks.bos_hdfs_sink.hdfs.rollCount = 100
agent.sinks.bos_hdfs_sink.hdfs.rollInterval = 0
agent.sinks.bos_hdfs_sink.hdfs.batchSize = 100
agent.sinks.bos_hdfs_sink.hdfs.round = true
agent.sinks.bos_hdfs_sink.hdfs.roundValue = 10
agent.sinks.bos_hdfs_sink.hdfs.roundUnit = minute
4. 启动Flume agent
./bin/flume-ng agent -n agent -c conf/ -f ss2bos.properties