public class AdaptiveScheduler extends Object implements SchedulerNG
SchedulerNG implementation that uses the declarative resource management and
automatically adapts the parallelism in case not enough resource could be acquired to run at the
configured parallelism, as described in FLIP-160.
This scheduler only supports jobs with streaming semantics, i.e., all vertices are connected via pipelined data-exchanges.
The implementation is spread over multiple State classes that control which RPCs are
allowed in a given state and what state transitions are possible (see the FLIP for an overview).
This class can thus be roughly split into 2 parts:
1) RPCs, which must forward the call to the state via State.tryRun(Class,
ThrowingConsumer, String) or State.tryCall(Class, FunctionWithException, String).
2) Context methods, which are called by states, to either transition into another state or access functionality of some component in the scheduler.
| Constructor and Description |
|---|
AdaptiveScheduler(JobGraph jobGraph,
org.apache.flink.configuration.Configuration configuration,
DeclarativeSlotPool declarativeSlotPool,
SlotAllocator slotAllocator,
Executor ioExecutor,
ClassLoader userCodeClassLoader,
CheckpointsCleaner checkpointsCleaner,
CheckpointRecoveryFactory checkpointRecoveryFactory,
java.time.Duration initialResourceAllocationTimeout,
java.time.Duration resourceStabilizationTimeout,
JobManagerJobMetricGroup jobManagerJobMetricGroup,
RestartBackoffTimeStrategy restartBackoffTimeStrategy,
long initializationTimestamp,
org.apache.flink.runtime.concurrent.ComponentMainThreadExecutor mainThreadExecutor,
org.apache.flink.runtime.rpc.FatalErrorHandler fatalErrorHandler,
JobStatusListener jobStatusListener,
ExecutionGraphFactory executionGraphFactory) |
| Modifier and Type | Method and Description |
|---|---|
void |
acknowledgeCheckpoint(org.apache.flink.api.common.JobID jobID,
ExecutionAttemptID executionAttemptID,
long checkpointId,
CheckpointMetrics checkpointMetrics,
TaskStateSnapshot checkpointState) |
void |
cancel() |
boolean |
canScaleUp(ExecutionGraph executionGraph)
Asks if we can scale up the currently executing job.
|
CompletableFuture<Void> |
closeAsync() |
void |
declineCheckpoint(DeclineCheckpoint decline) |
CompletableFuture<CoordinationResponse> |
deliverCoordinationRequestToCoordinator(OperatorID operator,
CoordinationRequest request)
Delivers a coordination request to the
OperatorCoordinator with the given OperatorID and returns the coordinator's response. |
void |
deliverOperatorEventToCoordinator(ExecutionAttemptID taskExecution,
OperatorID operator,
OperatorEvent evt)
Delivers the given OperatorEvent to the
OperatorCoordinator with the given OperatorID. |
ArchivedExecutionGraph |
getArchivedExecutionGraph(org.apache.flink.api.common.JobStatus jobStatus,
Throwable cause)
Creates an
ArchivedExecutionGraph for the given jobStatus and failure cause. |
CompletableFuture<org.apache.flink.api.common.JobStatus> |
getJobTerminationFuture() |
Executor |
getMainThreadExecutor()
Gets the main thread executor.
|
void |
goToCanceling(ExecutionGraph executionGraph,
ExecutionGraphHandler executionGraphHandler,
OperatorCoordinatorHandler operatorCoordinatorHandler)
Transitions into the
Canceling state. |
void |
goToCreatingExecutionGraph()
Transitions into the
CreatingExecutionGraph state. |
void |
goToExecuting(ExecutionGraph executionGraph)
Transitions into the
Executing state. |
void |
goToExecuting(ExecutionGraph executionGraph,
ExecutionGraphHandler executionGraphHandler,
OperatorCoordinatorHandler operatorCoordinatorHandler)
Transitions into the
Executing state. |
void |
goToFailing(ExecutionGraph executionGraph,
ExecutionGraphHandler executionGraphHandler,
OperatorCoordinatorHandler operatorCoordinatorHandler,
Throwable failureCause)
Transitions into the
Failing state. |
void |
goToFinished(ArchivedExecutionGraph archivedExecutionGraph)
Transitions into the
Finished state. |
void |
goToRestarting(ExecutionGraph executionGraph,
ExecutionGraphHandler executionGraphHandler,
OperatorCoordinatorHandler operatorCoordinatorHandler,
java.time.Duration backoffTime)
Transitions into the
Restarting state. |
CompletableFuture<String> |
goToStopWithSavepoint(ExecutionGraph executionGraph,
ExecutionGraphHandler executionGraphHandler,
OperatorCoordinatorHandler operatorCoordinatorHandler,
CheckpointScheduling checkpointScheduling,
CompletableFuture<String> savepointFuture)
Transitions into the
StopWithSavepoint state. |
void |
goToWaitingForResources()
Transitions into the
WaitingForResources state. |
void |
handleGlobalFailure(Throwable cause) |
boolean |
hasDesiredResources(ResourceCounter desiredResources)
Checks whether we have the desired resources.
|
boolean |
hasSufficientResources()
Checks if we currently have sufficient resources for executing the job.
|
org.apache.flink.runtime.scheduler.adaptive.Executing.FailureResult |
howToHandleFailure(Throwable failure)
Asks how to handle the failure.
|
boolean |
isState(org.apache.flink.runtime.scheduler.adaptive.State expectedState)
Checks whether the current state is the expected state.
|
void |
notifyKvStateRegistered(org.apache.flink.api.common.JobID jobId,
JobVertexID jobVertexId,
KeyGroupRange keyGroupRange,
String registrationName,
org.apache.flink.queryablestate.KvStateID kvStateId,
InetSocketAddress kvStateServerAddress) |
void |
notifyKvStateUnregistered(org.apache.flink.api.common.JobID jobId,
JobVertexID jobVertexId,
KeyGroupRange keyGroupRange,
String registrationName) |
void |
notifyPartitionDataAvailable(ResultPartitionID partitionID) |
void |
onFinished(ArchivedExecutionGraph archivedExecutionGraph)
Callback which is called when the execution reaches the
Finished state. |
void |
reportCheckpointMetrics(org.apache.flink.api.common.JobID jobID,
ExecutionAttemptID executionAttemptID,
long checkpointId,
CheckpointMetrics checkpointMetrics) |
ExecutionGraphInfo |
requestJob() |
JobDetails |
requestJobDetails() |
org.apache.flink.api.common.JobStatus |
requestJobStatus() |
KvStateLocation |
requestKvStateLocation(org.apache.flink.api.common.JobID jobId,
String registrationName) |
SerializedInputSplit |
requestNextInputSplit(JobVertexID vertexID,
ExecutionAttemptID executionAttempt) |
ExecutionState |
requestPartitionState(IntermediateDataSetID intermediateResultId,
ResultPartitionID resultPartitionId) |
void |
runIfState(org.apache.flink.runtime.scheduler.adaptive.State expectedState,
Runnable action)
Run the given action if the current state equals the expected state.
|
ScheduledFuture<?> |
runIfState(org.apache.flink.runtime.scheduler.adaptive.State expectedState,
Runnable action,
java.time.Duration delay)
Runs the given action after a delay if the state at this time equals the expected state.
|
void |
startScheduling() |
CompletableFuture<String> |
stopWithSavepoint(String targetDirectory,
boolean terminate) |
CompletableFuture<String> |
triggerSavepoint(String targetDirectory,
boolean cancelJob) |
org.apache.flink.runtime.scheduler.adaptive.CreatingExecutionGraph.AssignmentResult |
tryToAssignSlots(org.apache.flink.runtime.scheduler.adaptive.CreatingExecutionGraph.ExecutionGraphWithVertexParallelism executionGraphWithVertexParallelism)
Try to assign slots to the created
ExecutionGraph. |
void |
updateAccumulators(AccumulatorSnapshot accumulatorSnapshot) |
boolean |
updateTaskExecutionState(TaskExecutionStateTransition taskExecutionState) |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitupdateTaskExecutionStatepublic AdaptiveScheduler(JobGraph jobGraph, org.apache.flink.configuration.Configuration configuration, DeclarativeSlotPool declarativeSlotPool, SlotAllocator slotAllocator, Executor ioExecutor, ClassLoader userCodeClassLoader, CheckpointsCleaner checkpointsCleaner, CheckpointRecoveryFactory checkpointRecoveryFactory, java.time.Duration initialResourceAllocationTimeout, java.time.Duration resourceStabilizationTimeout, JobManagerJobMetricGroup jobManagerJobMetricGroup, RestartBackoffTimeStrategy restartBackoffTimeStrategy, long initializationTimestamp, org.apache.flink.runtime.concurrent.ComponentMainThreadExecutor mainThreadExecutor, org.apache.flink.runtime.rpc.FatalErrorHandler fatalErrorHandler, JobStatusListener jobStatusListener, ExecutionGraphFactory executionGraphFactory) throws JobExecutionException
JobExecutionExceptionpublic void startScheduling()
startScheduling in interface SchedulerNGpublic CompletableFuture<Void> closeAsync()
closeAsync in interface org.apache.flink.util.AutoCloseableAsyncpublic void cancel()
cancel in interface SchedulerNGpublic CompletableFuture<org.apache.flink.api.common.JobStatus> getJobTerminationFuture()
getJobTerminationFuture in interface SchedulerNGpublic void handleGlobalFailure(Throwable cause)
handleGlobalFailure in interface SchedulerNGpublic boolean updateTaskExecutionState(TaskExecutionStateTransition taskExecutionState)
updateTaskExecutionState in interface SchedulerNGpublic SerializedInputSplit requestNextInputSplit(JobVertexID vertexID, ExecutionAttemptID executionAttempt) throws IOException
requestNextInputSplit in interface SchedulerNGIOExceptionpublic ExecutionState requestPartitionState(IntermediateDataSetID intermediateResultId, ResultPartitionID resultPartitionId) throws PartitionProducerDisposedException
requestPartitionState in interface SchedulerNGPartitionProducerDisposedExceptionpublic void notifyPartitionDataAvailable(ResultPartitionID partitionID)
notifyPartitionDataAvailable in interface SchedulerNGpublic ExecutionGraphInfo requestJob()
requestJob in interface SchedulerNGpublic org.apache.flink.api.common.JobStatus requestJobStatus()
requestJobStatus in interface SchedulerNGpublic JobDetails requestJobDetails()
requestJobDetails in interface SchedulerNGpublic KvStateLocation requestKvStateLocation(org.apache.flink.api.common.JobID jobId, String registrationName) throws UnknownKvStateLocation, FlinkJobNotFoundException
requestKvStateLocation in interface SchedulerNGUnknownKvStateLocationFlinkJobNotFoundExceptionpublic void notifyKvStateRegistered(org.apache.flink.api.common.JobID jobId,
JobVertexID jobVertexId,
KeyGroupRange keyGroupRange,
String registrationName,
org.apache.flink.queryablestate.KvStateID kvStateId,
InetSocketAddress kvStateServerAddress)
throws FlinkJobNotFoundException
notifyKvStateRegistered in interface SchedulerNGFlinkJobNotFoundExceptionpublic void notifyKvStateUnregistered(org.apache.flink.api.common.JobID jobId,
JobVertexID jobVertexId,
KeyGroupRange keyGroupRange,
String registrationName)
throws FlinkJobNotFoundException
notifyKvStateUnregistered in interface SchedulerNGFlinkJobNotFoundExceptionpublic void updateAccumulators(AccumulatorSnapshot accumulatorSnapshot)
updateAccumulators in interface SchedulerNGpublic CompletableFuture<String> triggerSavepoint(@Nullable String targetDirectory, boolean cancelJob)
triggerSavepoint in interface SchedulerNGpublic void acknowledgeCheckpoint(org.apache.flink.api.common.JobID jobID,
ExecutionAttemptID executionAttemptID,
long checkpointId,
CheckpointMetrics checkpointMetrics,
TaskStateSnapshot checkpointState)
acknowledgeCheckpoint in interface SchedulerNGpublic void reportCheckpointMetrics(org.apache.flink.api.common.JobID jobID,
ExecutionAttemptID executionAttemptID,
long checkpointId,
CheckpointMetrics checkpointMetrics)
reportCheckpointMetrics in interface SchedulerNGpublic void declineCheckpoint(DeclineCheckpoint decline)
declineCheckpoint in interface SchedulerNGpublic CompletableFuture<String> stopWithSavepoint(@Nullable String targetDirectory, boolean terminate)
stopWithSavepoint in interface SchedulerNGpublic void deliverOperatorEventToCoordinator(ExecutionAttemptID taskExecution, OperatorID operator, OperatorEvent evt) throws org.apache.flink.util.FlinkException
SchedulerNGOperatorCoordinator with the given OperatorID.
Failure semantics: If the task manager sends an event for a non-running task or a non-existing operator coordinator, then respond with an exception to the call. If task and coordinator exist, then we assume that the call from the TaskManager was valid, and any bubbling exception needs to cause a job failure
deliverOperatorEventToCoordinator in interface SchedulerNGorg.apache.flink.util.FlinkException - Thrown, if the task is not running or no operator/coordinator exists
for the given ID.public CompletableFuture<CoordinationResponse> deliverCoordinationRequestToCoordinator(OperatorID operator, CoordinationRequest request) throws org.apache.flink.util.FlinkException
SchedulerNGOperatorCoordinator with the given OperatorID and returns the coordinator's response.deliverCoordinationRequestToCoordinator in interface SchedulerNGorg.apache.flink.util.FlinkException - Thrown, if the task is not running, or no operator/coordinator exists
for the given ID, or the coordinator cannot handle client events.public boolean hasDesiredResources(ResourceCounter desiredResources)
desiredResources - desiredResources describing the desired resourcestrue if we have enough resources; otherwise falsepublic boolean hasSufficientResources()
true if we have sufficient resources; otherwise falsepublic ArchivedExecutionGraph getArchivedExecutionGraph(org.apache.flink.api.common.JobStatus jobStatus, @Nullable Throwable cause)
ArchivedExecutionGraph for the given jobStatus and failure cause.jobStatus - jobStatus to create the ArchivedExecutionGraph withcause - cause represents the failure cause for the ArchivedExecutionGraph;
null if there is no failure causeArchivedExecutionGraphpublic void goToWaitingForResources()
WaitingForResources state.public void goToExecuting(ExecutionGraph executionGraph)
Executing state.executionGraph - executionGraph which is passed to the Executing statepublic void goToExecuting(ExecutionGraph executionGraph, ExecutionGraphHandler executionGraphHandler, OperatorCoordinatorHandler operatorCoordinatorHandler)
Executing state.executionGraph - executionGraph to pass to the Executing stateexecutionGraphHandler - executionGraphHandler to pass to the Executing stateoperatorCoordinatorHandler - operatorCoordinatorHandler to pass to the Executing statepublic void goToCanceling(ExecutionGraph executionGraph, ExecutionGraphHandler executionGraphHandler, OperatorCoordinatorHandler operatorCoordinatorHandler)
Canceling state.executionGraph - executionGraph to pass to the Canceling stateexecutionGraphHandler - executionGraphHandler to pass to the Canceling stateoperatorCoordinatorHandler - operatorCoordinatorHandler to pass to the Canceling statepublic void goToRestarting(ExecutionGraph executionGraph, ExecutionGraphHandler executionGraphHandler, OperatorCoordinatorHandler operatorCoordinatorHandler, java.time.Duration backoffTime)
Restarting state.executionGraph - executionGraph to pass to the Restarting stateexecutionGraphHandler - executionGraphHandler to pass to the Restarting
stateoperatorCoordinatorHandler - operatorCoordinatorHandler to pas to the Restarting statebackoffTime - backoffTime to wait before transitioning to the Restarting
statepublic void goToFailing(ExecutionGraph executionGraph, ExecutionGraphHandler executionGraphHandler, OperatorCoordinatorHandler operatorCoordinatorHandler, Throwable failureCause)
Failing state.executionGraph - executionGraph to pass to the Failing stateexecutionGraphHandler - executionGraphHandler to pass to the Failing stateoperatorCoordinatorHandler - operatorCoordinatorHandler to pass to the Failing statefailureCause - failureCause describing why the job execution failedpublic CompletableFuture<String> goToStopWithSavepoint(ExecutionGraph executionGraph, ExecutionGraphHandler executionGraphHandler, OperatorCoordinatorHandler operatorCoordinatorHandler, CheckpointScheduling checkpointScheduling, CompletableFuture<String> savepointFuture)
StopWithSavepoint state.executionGraph - executionGraph to pass to the StopWithSavepoint stateexecutionGraphHandler - executionGraphHandler to pass to the StopWithSavepoint stateoperatorCoordinatorHandler - operatorCoordinatorHandler to pass to the StopWithSavepoint statesavepointFuture - Future for the savepoint to complete.public void goToFinished(ArchivedExecutionGraph archivedExecutionGraph)
Finished state.archivedExecutionGraph - archivedExecutionGraph is passed to the Finished
statepublic void goToCreatingExecutionGraph()
CreatingExecutionGraph state.public org.apache.flink.runtime.scheduler.adaptive.CreatingExecutionGraph.AssignmentResult tryToAssignSlots(org.apache.flink.runtime.scheduler.adaptive.CreatingExecutionGraph.ExecutionGraphWithVertexParallelism executionGraphWithVertexParallelism)
ExecutionGraph. If it is possible, then this
method returns a successful AssignmentResult which contains the assigned ExecutionGraph. If not, then the assignment result is a failure.executionGraphWithVertexParallelism - executionGraphWithVertexParallelism to assign
slots to resourcesAssignmentResult representing the result of the assignmentpublic boolean canScaleUp(ExecutionGraph executionGraph)
executionGraph - executionGraph for making the scaling decision.public void onFinished(ArchivedExecutionGraph archivedExecutionGraph)
Finished state.archivedExecutionGraph - archivedExecutionGraph represents the final state of the
job executionpublic org.apache.flink.runtime.scheduler.adaptive.Executing.FailureResult howToHandleFailure(Throwable failure)
failure - failure describing the failure causeFailureResult which describes how to handle the failurepublic Executor getMainThreadExecutor()
public boolean isState(org.apache.flink.runtime.scheduler.adaptive.State expectedState)
expectedState - expectedState is the expected statetrue if the current state equals the expected state; otherwise falsepublic void runIfState(org.apache.flink.runtime.scheduler.adaptive.State expectedState,
Runnable action)
expectedState - expectedState is the expected stateaction - action to run if the current state equals the expected statepublic ScheduledFuture<?> runIfState(org.apache.flink.runtime.scheduler.adaptive.State expectedState, Runnable action, java.time.Duration delay)
expectedState - expectedState describes the required state at the time of running
the actionaction - action to run if the expected state equals the actual statedelay - delay after which to run the actionCopyright © 2014–2022 The Apache Software Foundation. All rights reserved.