public class JobMaster extends org.apache.flink.runtime.rpc.PermanentlyFencedRpcEndpoint<JobMasterId> implements JobMasterGateway, JobMasterService
JobGraph.
It offers the following methods as part of its rpc interface to interact with the JobMaster remotely:
updateTaskExecutionState(org.apache.flink.runtime.taskmanager.TaskExecutionState) updates the task execution state for given task
| 限定符和类型 | 字段和说明 |
|---|---|
static String |
JOB_MANAGER_NAME
Default names for Flink's distributed components.
|
| 构造器和说明 |
|---|
JobMaster(org.apache.flink.runtime.rpc.RpcService rpcService,
JobMasterId jobMasterId,
JobMasterConfiguration jobMasterConfiguration,
ResourceID resourceId,
JobGraph jobGraph,
HighAvailabilityServices highAvailabilityService,
SlotPoolServiceSchedulerFactory slotPoolServiceSchedulerFactory,
JobManagerSharedServices jobManagerSharedServices,
HeartbeatServices heartbeatServices,
JobManagerJobMetricGroupFactory jobMetricGroupFactory,
OnCompletionActions jobCompletionActions,
org.apache.flink.runtime.rpc.FatalErrorHandler fatalErrorHandler,
ClassLoader userCodeLoader,
ShuffleMaster<?> shuffleMaster,
PartitionTrackerFactory partitionTrackerFactory,
ExecutionDeploymentTracker executionDeploymentTracker,
ExecutionDeploymentReconciler.Factory executionDeploymentReconcilerFactory,
long initializationTimestamp) |
| 限定符和类型 | 方法和说明 |
|---|---|
void |
acknowledgeCheckpoint(org.apache.flink.api.common.JobID jobID,
ExecutionAttemptID executionAttemptID,
long checkpointId,
CheckpointMetrics checkpointMetrics,
org.apache.flink.util.SerializedValue<TaskStateSnapshot> checkpointState) |
CompletableFuture<Acknowledge> |
cancel(org.apache.flink.api.common.time.Time timeout)
Cancels the currently executed job.
|
void |
declineCheckpoint(DeclineCheckpoint decline) |
CompletableFuture<CoordinationResponse> |
deliverCoordinationRequestToCoordinator(OperatorID operatorId,
org.apache.flink.util.SerializedValue<CoordinationRequest> serializedRequest,
org.apache.flink.api.common.time.Time timeout)
Deliver a coordination request to a specified coordinator and return the response.
|
void |
disconnectResourceManager(ResourceManagerId resourceManagerId,
Exception cause)
Disconnects the resource manager from the job manager because of the given cause.
|
CompletableFuture<Acknowledge> |
disconnectTaskManager(ResourceID resourceID,
Exception cause)
Disconnects the given
TaskExecutor from the
JobMaster. |
void |
failSlot(ResourceID taskManagerId,
AllocationID allocationId,
Exception cause)
Fails the slot with the given allocation id and cause.
|
JobMasterGateway |
getGateway()
Get the
JobMasterGateway belonging to this service. |
CompletableFuture<Void> |
heartbeatFromResourceManager(ResourceID resourceID)
Sends heartbeat request from the resource manager.
|
CompletableFuture<Void> |
heartbeatFromTaskManager(ResourceID resourceID,
TaskExecutorToJobManagerHeartbeatPayload payload)
Sends the heartbeat to job manager from task manager.
|
void |
notifyAllocationFailure(AllocationID allocationID,
Exception cause)
Notifies that the allocation has failed.
|
CompletableFuture<Acknowledge> |
notifyKvStateRegistered(org.apache.flink.api.common.JobID jobId,
JobVertexID jobVertexId,
KeyGroupRange keyGroupRange,
String registrationName,
org.apache.flink.queryablestate.KvStateID kvStateId,
InetSocketAddress kvStateServerAddress)
Notifies that queryable state has been registered.
|
CompletableFuture<Acknowledge> |
notifyKvStateUnregistered(org.apache.flink.api.common.JobID jobId,
JobVertexID jobVertexId,
KeyGroupRange keyGroupRange,
String registrationName)
Notifies that queryable state has been unregistered.
|
void |
notifyNotEnoughResourcesAvailable(Collection<ResourceRequirement> acquiredResources)
Notifies that not enough resources are available to fulfill the resource requirements of a
job.
|
CompletableFuture<Acknowledge> |
notifyPartitionDataAvailable(ResultPartitionID partitionID,
org.apache.flink.api.common.time.Time timeout)
Notifies the JobManager about available data for a produced partition.
|
CompletableFuture<Collection<SlotOffer>> |
offerSlots(ResourceID taskManagerId,
Collection<SlotOffer> slots,
org.apache.flink.api.common.time.Time timeout)
Offers the given slots to the job manager.
|
protected void |
onStart() |
CompletableFuture<Void> |
onStop()
Suspend the job and shutdown all other services including rpc.
|
CompletableFuture<RegistrationResponse> |
registerTaskManager(org.apache.flink.api.common.JobID jobId,
TaskManagerRegistrationInformation taskManagerRegistrationInformation,
org.apache.flink.api.common.time.Time timeout)
Registers the task manager at the job manager.
|
void |
reportCheckpointMetrics(org.apache.flink.api.common.JobID jobID,
ExecutionAttemptID executionAttemptID,
long checkpointId,
CheckpointMetrics checkpointMetrics) |
CompletableFuture<ExecutionGraphInfo> |
requestJob(org.apache.flink.api.common.time.Time timeout)
Requests the
ExecutionGraphInfo of the executed job. |
CompletableFuture<JobDetails> |
requestJobDetails(org.apache.flink.api.common.time.Time timeout)
Request the details of the executed job.
|
CompletableFuture<org.apache.flink.api.common.JobStatus> |
requestJobStatus(org.apache.flink.api.common.time.Time timeout)
Requests the current job status.
|
CompletableFuture<KvStateLocation> |
requestKvStateLocation(org.apache.flink.api.common.JobID jobId,
String registrationName)
Requests a
KvStateLocation for the specified InternalKvState registration
name. |
CompletableFuture<SerializedInputSplit> |
requestNextInputSplit(JobVertexID vertexID,
ExecutionAttemptID executionAttempt)
Requests the next input split for the
ExecutionJobVertex. |
CompletableFuture<ExecutionState> |
requestPartitionState(IntermediateDataSetID intermediateResultId,
ResultPartitionID resultPartitionId)
Requests the current state of the partition.
|
CompletableFuture<Acknowledge> |
sendOperatorEventToCoordinator(ExecutionAttemptID task,
OperatorID operatorID,
org.apache.flink.util.SerializedValue<OperatorEvent> serializedEvent) |
CompletableFuture<CoordinationResponse> |
sendRequestToCoordinator(OperatorID operatorID,
org.apache.flink.util.SerializedValue<CoordinationRequest> serializedRequest) |
CompletableFuture<?> |
stopTrackingAndReleasePartitions(Collection<ResultPartitionID> partitionIds)
Notifies the
JobMasterPartitionTracker
to stop tracking the target result partitions and release the locally occupied resources on
TaskExecutors if any. |
CompletableFuture<String> |
stopWithSavepoint(String targetDirectory,
org.apache.flink.core.execution.SavepointFormatType formatType,
boolean terminate,
org.apache.flink.api.common.time.Time timeout)
Stops the job with a savepoint.
|
CompletableFuture<String> |
triggerCheckpoint(org.apache.flink.api.common.time.Time timeout)
Triggers taking a checkpoint of the executed job.
|
CompletableFuture<String> |
triggerSavepoint(String targetDirectory,
boolean cancelJob,
org.apache.flink.core.execution.SavepointFormatType formatType,
org.apache.flink.api.common.time.Time timeout)
Triggers taking a savepoint of the executed job.
|
CompletableFuture<Object> |
updateGlobalAggregate(String aggregateName,
Object aggregand,
byte[] serializedAggregateFunction)
Update the aggregate and return the new value.
|
CompletableFuture<Acknowledge> |
updateTaskExecutionState(TaskExecutionState taskExecutionState)
Updates the task execution state for a given task.
|
callAsyncWithoutFencing, getFencingToken, getMainThreadExecutor, getUnfencedMainThreadExecutor, runAsyncWithoutFencingcallAsync, closeAsync, getAddress, getEndpointId, getHostname, getRpcService, getSelfGateway, getTerminationFuture, internalCallOnStart, internalCallOnStop, isRunning, runAsync, scheduleRunAsync, scheduleRunAsync, start, stop, validateRunsInMainThreadclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitgetAddress, getTerminationFuturepublic JobMaster(org.apache.flink.runtime.rpc.RpcService rpcService,
JobMasterId jobMasterId,
JobMasterConfiguration jobMasterConfiguration,
ResourceID resourceId,
JobGraph jobGraph,
HighAvailabilityServices highAvailabilityService,
SlotPoolServiceSchedulerFactory slotPoolServiceSchedulerFactory,
JobManagerSharedServices jobManagerSharedServices,
HeartbeatServices heartbeatServices,
JobManagerJobMetricGroupFactory jobMetricGroupFactory,
OnCompletionActions jobCompletionActions,
org.apache.flink.runtime.rpc.FatalErrorHandler fatalErrorHandler,
ClassLoader userCodeLoader,
ShuffleMaster<?> shuffleMaster,
PartitionTrackerFactory partitionTrackerFactory,
ExecutionDeploymentTracker executionDeploymentTracker,
ExecutionDeploymentReconciler.Factory executionDeploymentReconcilerFactory,
long initializationTimestamp)
throws Exception
Exceptionprotected void onStart()
throws JobMasterException
onStart 在类中 org.apache.flink.runtime.rpc.RpcEndpointJobMasterExceptionpublic CompletableFuture<Void> onStop()
onStop 在类中 org.apache.flink.runtime.rpc.RpcEndpointpublic CompletableFuture<Acknowledge> cancel(org.apache.flink.api.common.time.Time timeout)
JobMasterGatewaycancel 在接口中 JobMasterGatewaytimeout - of this operationpublic CompletableFuture<Acknowledge> updateTaskExecutionState(TaskExecutionState taskExecutionState)
updateTaskExecutionState 在接口中 JobMasterGatewaytaskExecutionState - New task execution state for a given taskpublic CompletableFuture<SerializedInputSplit> requestNextInputSplit(JobVertexID vertexID, ExecutionAttemptID executionAttempt)
JobMasterGatewayExecutionJobVertex. The next input split is
sent back to the sender as a SerializedInputSplit message.requestNextInputSplit 在接口中 JobMasterGatewayvertexID - The job vertex idexecutionAttempt - The execution attempt idpublic CompletableFuture<ExecutionState> requestPartitionState(IntermediateDataSetID intermediateResultId, ResultPartitionID resultPartitionId)
JobMasterGatewayrequestPartitionState 在接口中 JobMasterGatewayintermediateResultId - The execution attempt ID of the task requesting the partition
state.resultPartitionId - The partition ID of the partition to request the state of.public CompletableFuture<Acknowledge> notifyPartitionDataAvailable(ResultPartitionID partitionID, org.apache.flink.api.common.time.Time timeout)
JobMasterGatewayThere is a call to this method for each ExecutionVertex instance once per produced
ResultPartition instance, either when first producing data (for pipelined executions)
or when all data has been produced (for staged executions).
The JobManager then can decide when to schedule the partition consumers of the given session.
notifyPartitionDataAvailable 在接口中 JobMasterGatewaypartitionID - The partition which has already produced datatimeout - before the rpc call failspublic CompletableFuture<Acknowledge> disconnectTaskManager(ResourceID resourceID, Exception cause)
JobMasterGatewayTaskExecutor from the
JobMaster.disconnectTaskManager 在接口中 JobMasterGatewayresourceID - identifying the TaskManager to disconnectcause - for the disconnection of the TaskManagerpublic void acknowledgeCheckpoint(org.apache.flink.api.common.JobID jobID,
ExecutionAttemptID executionAttemptID,
long checkpointId,
CheckpointMetrics checkpointMetrics,
@Nullable
org.apache.flink.util.SerializedValue<TaskStateSnapshot> checkpointState)
public void reportCheckpointMetrics(org.apache.flink.api.common.JobID jobID,
ExecutionAttemptID executionAttemptID,
long checkpointId,
CheckpointMetrics checkpointMetrics)
public void declineCheckpoint(DeclineCheckpoint decline)
public CompletableFuture<Acknowledge> sendOperatorEventToCoordinator(ExecutionAttemptID task, OperatorID operatorID, org.apache.flink.util.SerializedValue<OperatorEvent> serializedEvent)
public CompletableFuture<CoordinationResponse> sendRequestToCoordinator(OperatorID operatorID, org.apache.flink.util.SerializedValue<CoordinationRequest> serializedRequest)
public CompletableFuture<KvStateLocation> requestKvStateLocation(org.apache.flink.api.common.JobID jobId, String registrationName)
KvStateLocationOracleKvStateLocation for the specified InternalKvState registration
name.requestKvStateLocation 在接口中 KvStateLocationOraclejobId - identifying the job for which to request the KvStateLocationregistrationName - Name under which the KvState has been registered.InternalKvState locationpublic CompletableFuture<Acknowledge> notifyKvStateRegistered(org.apache.flink.api.common.JobID jobId, JobVertexID jobVertexId, KeyGroupRange keyGroupRange, String registrationName, org.apache.flink.queryablestate.KvStateID kvStateId, InetSocketAddress kvStateServerAddress)
KvStateRegistryGatewaynotifyKvStateRegistered 在接口中 KvStateRegistryGatewayjobId - identifying the job for which to register a key value statejobVertexId - JobVertexID the KvState instance belongs to.keyGroupRange - Key group range the KvState instance belongs to.registrationName - Name under which the KvState has been registered.kvStateId - ID of the registered KvState instance.kvStateServerAddress - Server address where to find the KvState instance.public CompletableFuture<Acknowledge> notifyKvStateUnregistered(org.apache.flink.api.common.JobID jobId, JobVertexID jobVertexId, KeyGroupRange keyGroupRange, String registrationName)
KvStateRegistryGatewaynotifyKvStateUnregistered 在接口中 KvStateRegistryGatewayjobId - identifying the job for which to unregister a key value statejobVertexId - JobVertexID the KvState instance belongs to.keyGroupRange - Key group index the KvState instance belongs to.registrationName - Name under which the KvState has been registered.public CompletableFuture<Collection<SlotOffer>> offerSlots(ResourceID taskManagerId, Collection<SlotOffer> slots, org.apache.flink.api.common.time.Time timeout)
JobMasterGatewayofferSlots 在接口中 JobMasterGatewaytaskManagerId - identifying the task managerslots - to offer to the job managertimeout - for the rpc callpublic void failSlot(ResourceID taskManagerId, AllocationID allocationId, Exception cause)
JobMasterGatewayfailSlot 在接口中 JobMasterGatewaytaskManagerId - identifying the task managerallocationId - identifying the slot to failcause - of the failingpublic CompletableFuture<RegistrationResponse> registerTaskManager(org.apache.flink.api.common.JobID jobId, TaskManagerRegistrationInformation taskManagerRegistrationInformation, org.apache.flink.api.common.time.Time timeout)
JobMasterGatewayregisterTaskManager 在接口中 JobMasterGatewayjobId - jobId specifying the job for which the JobMaster should be responsibletaskManagerRegistrationInformation - the information for registering a task manager at
the job managertimeout - for the rpc callpublic void disconnectResourceManager(ResourceManagerId resourceManagerId, Exception cause)
JobMasterGatewaydisconnectResourceManager 在接口中 JobMasterGatewayresourceManagerId - identifying the resource manager leader idcause - of the disconnectpublic CompletableFuture<Void> heartbeatFromTaskManager(ResourceID resourceID, TaskExecutorToJobManagerHeartbeatPayload payload)
JobMasterGatewayheartbeatFromTaskManager 在接口中 JobMasterGatewayresourceID - unique id of the task managerpayload - report payloadpublic CompletableFuture<Void> heartbeatFromResourceManager(ResourceID resourceID)
JobMasterGatewayheartbeatFromResourceManager 在接口中 JobMasterGatewayresourceID - unique id of the resource managerpublic CompletableFuture<JobDetails> requestJobDetails(org.apache.flink.api.common.time.Time timeout)
JobMasterGatewayrequestJobDetails 在接口中 JobMasterGatewaytimeout - for the rpc callpublic CompletableFuture<org.apache.flink.api.common.JobStatus> requestJobStatus(org.apache.flink.api.common.time.Time timeout)
JobMasterGatewayrequestJobStatus 在接口中 JobMasterGatewaytimeout - for the rpc callpublic CompletableFuture<ExecutionGraphInfo> requestJob(org.apache.flink.api.common.time.Time timeout)
JobMasterGatewayExecutionGraphInfo of the executed job.requestJob 在接口中 JobMasterGatewaytimeout - for the rpc callExecutionGraphInfo of the executed jobpublic CompletableFuture<String> triggerSavepoint(@Nullable String targetDirectory, boolean cancelJob, org.apache.flink.core.execution.SavepointFormatType formatType, org.apache.flink.api.common.time.Time timeout)
JobMasterGatewaytriggerSavepoint 在接口中 JobMasterGatewaytargetDirectory - to which to write the savepoint data or null if the default savepoint
directory should be usedformatType - binary format for the savepointtimeout - for the rpc callpublic CompletableFuture<String> triggerCheckpoint(org.apache.flink.api.common.time.Time timeout)
JobMasterGatewaytriggerCheckpoint 在接口中 JobMasterGatewaytimeout - for the rpc callpublic CompletableFuture<String> stopWithSavepoint(@Nullable String targetDirectory, org.apache.flink.core.execution.SavepointFormatType formatType, boolean terminate, org.apache.flink.api.common.time.Time timeout)
JobMasterGatewaystopWithSavepoint 在接口中 JobMasterGatewaytargetDirectory - to which to write the savepoint data or null if the default savepoint
directory should be usedterminate - flag indicating if the job should terminate or just suspendtimeout - for the rpc callpublic void notifyAllocationFailure(AllocationID allocationID, Exception cause)
JobMasterGatewaynotifyAllocationFailure 在接口中 JobMasterGatewayallocationID - the failed allocation id.cause - the reason that the allocation failedpublic void notifyNotEnoughResourcesAvailable(Collection<ResourceRequirement> acquiredResources)
JobMasterGatewaynotifyNotEnoughResourcesAvailable 在接口中 JobMasterGatewayacquiredResources - the resources that have been acquired for the jobpublic CompletableFuture<Object> updateGlobalAggregate(String aggregateName, Object aggregand, byte[] serializedAggregateFunction)
JobMasterGatewayupdateGlobalAggregate 在接口中 JobMasterGatewayaggregateName - The name of the aggregate to updateaggregand - The value to add to the aggregateserializedAggregateFunction - The function to apply to the current aggregate and
aggregand to obtain the new aggregate value, this should be of type AggregateFunctionpublic CompletableFuture<CoordinationResponse> deliverCoordinationRequestToCoordinator(OperatorID operatorId, org.apache.flink.util.SerializedValue<CoordinationRequest> serializedRequest, org.apache.flink.api.common.time.Time timeout)
JobMasterGatewaydeliverCoordinationRequestToCoordinator 在接口中 JobMasterGatewayoperatorId - identifying the coordinator to receive the requestserializedRequest - serialized request to deliverFlinkException if the task is not running, or no
operator/coordinator exists for the given ID, or the coordinator cannot handle client
events.public CompletableFuture<?> stopTrackingAndReleasePartitions(Collection<ResultPartitionID> partitionIds)
JobMasterGatewayJobMasterPartitionTracker
to stop tracking the target result partitions and release the locally occupied resources on
TaskExecutors if any.public JobMasterGateway getGateway()
JobMasterServiceJobMasterGateway belonging to this service.getGateway 在接口中 JobMasterServiceCopyright © 2014–2022 The Apache Software Foundation. All rights reserved.