Flink on YARN: Submitting Jobs Using Java API and RESTful Interface

作者:da吃一鲸8862024.03.11 16:37浏览量:13

简介:Learn how to submit Flink jobs on YARN cluster using Java API and RESTful interface. Explore the steps and best practices for deploying Flink applications on YARN.

Apache Flink is a distributed stream processing framework that enables real-time data analysis at scale. One of the deployment options for Flink is to run it on YARN (Yet Another Resource Negotiator), a resource management and scheduling platform widely used in Hadoop ecosystems. YARN provides elastic resource allocation and fault tolerance for distributed applications like Flink.

When submitting Flink jobs on YARN, you can choose between two methods: using the Java API or the RESTful interface.

Submitting Flink Jobs Using Java API

To submit a Flink job using the Java API, you need to follow these steps:

  1. Set up the Environment: Ensure that you have Flink and YARN properly installed and configured on your cluster.

  2. Create a JobGraph: Build your Flink job as a JobGraph object. This represents the topology and configuration of your job.

  3. Configure YARN Deployment: Set up the necessary YARN configuration parameters, such as the YARN master URL, the amount of memory to allocate for the job, and other relevant settings.

  4. Submit the JobGraph: Use the JobExecutionEnvironment class to submit the JobGraph to YARN. This involves creating a StreamExecutionEnvironment or BatchExecutionEnvironment and calling the executeAsync() method.

Here’s a simplified example of submitting a Flink job using the Java API:

  1. import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
  2. import org.apache.flink.api.common.JobID;
  3. public class FlinkYarnJobSubmitter {
  4. public static void main(String[] args) throws Exception {
  5. // Set up the execution environment
  6. final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
  7. // Build your Flink job topology here
  8. // ...
  9. // Configure YARN deployment
  10. env.getConfig().setGlobalJobParameters(new HashMap<String, String>() {
  11. {
  12. put("yarn.scheduler.capacity.maximum-am-resource-percent", "1");
  13. // Add other YARN configuration parameters as needed
  14. }
  15. });
  16. // Submit the job to YARN
  17. JobID jobId = env.executeAsync("Flink YARN Job").get();
  18. System.out.println("Job submitted with ID: " + jobId);
  19. }
  20. }

Submitting Flink Jobs Using RESTful Interface

Alternatively, you can submit Flink jobs to YARN using the RESTful interface provided by the Flink REST API. This approach is useful when you want to integrate with external systems or when submitting jobs programmatically without writing Java code.

To use the RESTful interface, you need to:

  1. Enable the REST API: Configure Flink to enable the REST API by setting the rest.port parameter in the flink-conf.yaml file.

  2. Construct the JobGraph JSON: Prepare a JSON representation of your Flink job’s JobGraph. This JSON structure needs to match the schema expected by the REST API.

  3. Send a POST Request: Use an HTTP client to send a POST request to the Flink REST endpoint (/jobs/submit) with the JSON payload representing your job.

Here’s a high-level overview of how you can use curl to submit a Flink job via the REST API:

  1. curl -X POST http://<flink-rest-endpoint>:<rest-port>/jobs/submit
  2. -H "Content-Type: application/json"
  3. -d @<path-to-jobgraph-json>

In this command, replace <flink-rest-endpoint> with the hostname or IP address of your Flink cluster, <rest-port> with the REST API port (default is 8081), and <path-to-jobgraph-json> with the path to the JSON file containing your job’s JobGraph.

Best Practices

  • Error Handling: Ensure that your submission scripts or code handle errors gracefully, such as network issues or YARN resource allocation failures.
  • Job Monitoring: Implement job monitoring and alerting to track job status, failures, and resource utilization.
  • Security: Consider enabling security features like SSL/TLS encryption for REST API communication and configuring YARN security settings appropriately.

By