简介:In this article, we will guide you through the process of setting up and running Apache Flink on Apache Yarn. We will cover the installation, configuration, and deployment of Flink on Yarn, as well as demonstrate how to use Flink CDC (Change Data Capture) with Flink on Yarn. Let's get started!
Apache Flink is a distributed streaming data processing engine that enables real-time analysis and processing of data streams. Apache Yarn is a cluster resource management system that provides a framework for scheduling and managing compute resources in a Hadoop cluster. Integrating Flink with Yarn allows you to leverage the resources of your Hadoop cluster to run Flink jobs, providing scalability and fault tolerance. In this article, we will guide you through the process of setting up and running Flink on Yarn.
Step 1: Installing Apache Flink
To get started, you need to have Apache Flink installed on your system. You can download the latest version of Flink from the official Flink website or use your package manager to install it. Once you have downloaded the package, follow the installation instructions provided in the Flink documentation.
Step 2: Configuring Apache Yarn
Next, you need to configure Apache Yarn to support Flink jobs. Open the Yarn configuration file (通常是yarn-site.xml在Hadoop安装目录下), and add the following properties:
yarn.resourcemanager.scheduler.class设置为org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler或org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler,选择适合你的集群的调度器。yarn.resourcemanager.resource-tracker.class设置为org.apache.hadoop.yarn.server.resourcemanager.resourcechecker.ResourceTrackerService。yarn.nodemanager.aux-services添加mapreduce_shuffle和flink_shuffle。flink-conf.yaml在Flink安装目录下), and add the following properties:jobmanager.rpc.address设置为Yarn ResourceManager的主机名或IP地址。jobmanager.rpc.port设置为Yarn ResourceManager的RPC端口,默认为8032。taskmanager.numberOfTaskSlots设置为每个TaskManager的槽位数,根据你集群的配置进行设置。parallelism.default设置为默认的并行度。This command will submit your Flink job to Yarn’s ResourceManager for execution on the cluster.
flink submit --target yarn --jar your-flink-job.jar
pom.xml file or include it in your job JAR file.