@Deprecated @PublicEvolving public class FsStateBackend extends AbstractFileStateBackend implements ConfigurableStateBackend
FsStateBackend is deprecated in favor of HashMapStateBackend and FileSystemCheckpointStorage. This change does not affect
the runtime characteristics of your Jobs and is simply an API change to help better communicate
the ways Flink separates local state storage from fault tolerance. Jobs can be upgraded without
loss of state. If configuring your state backend via the StreamExecutionEnvironment
please make the following changes.
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setStateBackend(new HashMapStateBackend());
env.getCheckpointConfig().setCheckpointStorage("hdfs:///checkpoints");
If you are configuring your state backend via the flink-conf.yaml please make the
following changes set your state backend type to "hashmap" state.backend: hashmap.
This state backend holds the working state in the memory (JVM heap) of the TaskManagers. The state backend checkpoints state as files to a file system (hence the backend's name).
Each checkpoint individually will store all its files in a subdirectory that includes the
checkpoint number, such as hdfs://namenode:port/flink-checkpoints/chk-17/.
Working state is kept on the TaskManager heap. If a TaskManager executes multiple tasks concurrently (if the TaskManager has multiple slots, or if slot-sharing is used) then the aggregate state of all tasks needs to fit into that TaskManager's memory.
This state backend stores small state chunks directly with the metadata, to avoid creating
many small files. The threshold for that is configurable. When increasing this threshold, the
size of the checkpoint metadata increases. The checkpoint metadata of all retained completed
checkpoints needs to fit into the JobManager's heap memory. This is typically not a problem,
unless the threshold getMinFileSizeThreshold() is increased significantly.
Checkpoints from this state backend are as persistent and available as filesystem that is written to. If the file system is a persistent distributed file system, this state backend supports highly available setups. The backend additionally supports savepoints and externalized checkpoints.
As for all state backends, this backend can either be configured within the application (by creating the backend with the respective constructor parameters and setting it on the execution environment) or by specifying it in the Flink configuration.
If the state backend was specified in the application, it may pick up additional configuration
parameters from the Flink configuration. For example, if the backend if configured in the
application without a default savepoint directory, it will pick up a default savepoint directory
specified in the Flink configuration of the running job/cluster. That behavior is implemented via
the configure(ReadableConfig, ClassLoader) method.
latencyTrackingConfigBuilder| 构造器和说明 |
|---|
FsStateBackend(org.apache.flink.core.fs.Path checkpointDataUri)
已过时。
Creates a new state backend that stores its checkpoint data in the file system and location
defined by the given URI.
|
FsStateBackend(org.apache.flink.core.fs.Path checkpointDataUri,
boolean asynchronousSnapshots)
已过时。
Creates a new state backend that stores its checkpoint data in the file system and location
defined by the given URI.
|
FsStateBackend(String checkpointDataUri)
已过时。
Creates a new state backend that stores its checkpoint data in the file system and location
defined by the given URI.
|
FsStateBackend(String checkpointDataUri,
boolean asynchronousSnapshots)
已过时。
Creates a new state backend that stores its checkpoint data in the file system and location
defined by the given URI.
|
FsStateBackend(URI checkpointDataUri)
已过时。
Creates a new state backend that stores its checkpoint data in the file system and location
defined by the given URI.
|
FsStateBackend(URI checkpointDataUri,
boolean asynchronousSnapshots)
已过时。
Creates a new state backend that stores its checkpoint data in the file system and location
defined by the given URI.
|
FsStateBackend(URI checkpointDataUri,
int fileStateSizeThreshold)
已过时。
Creates a new state backend that stores its checkpoint data in the file system and location
defined by the given URI.
|
FsStateBackend(URI checkpointDataUri,
int fileStateSizeThreshold,
boolean asynchronousSnapshots)
已过时。
Creates a new state backend that stores its checkpoint data in the file system and location
defined by the given URI.
|
FsStateBackend(URI checkpointDataUri,
URI defaultSavepointDirectory)
已过时。
Creates a new state backend that stores its checkpoint data in the file system and location
defined by the given URI.
|
FsStateBackend(URI checkpointDirectory,
URI defaultSavepointDirectory,
int fileStateSizeThreshold,
int writeBufferSize,
org.apache.flink.util.TernaryBoolean asynchronousSnapshots)
已过时。
Creates a new state backend that stores its checkpoint data in the file system and location
defined by the given URI.
|
| 限定符和类型 | 方法和说明 |
|---|---|
FsStateBackend |
configure(org.apache.flink.configuration.ReadableConfig config,
ClassLoader classLoader)
已过时。
Creates a copy of this state backend that uses the values defined in the configuration for
fields where that were not specified in this state backend.
|
CheckpointStorageAccess |
createCheckpointStorage(org.apache.flink.api.common.JobID jobId)
已过时。
Creates a storage for checkpoints for the given job.
|
<K> AbstractKeyedStateBackend<K> |
createKeyedStateBackend(Environment env,
org.apache.flink.api.common.JobID jobID,
String operatorIdentifier,
org.apache.flink.api.common.typeutils.TypeSerializer<K> keySerializer,
int numberOfKeyGroups,
KeyGroupRange keyGroupRange,
TaskKvStateRegistry kvStateRegistry,
TtlTimeProvider ttlTimeProvider,
org.apache.flink.metrics.MetricGroup metricGroup,
Collection<KeyedStateHandle> stateHandles,
org.apache.flink.core.fs.CloseableRegistry cancelStreamRegistry)
已过时。
Creates a new
CheckpointableKeyedStateBackend that is responsible for holding
keyed state and checkpointing it. |
OperatorStateBackend |
createOperatorStateBackend(Environment env,
String operatorIdentifier,
Collection<OperatorStateHandle> stateHandles,
org.apache.flink.core.fs.CloseableRegistry cancelStreamRegistry)
已过时。
Creates a new
OperatorStateBackend that can be used for storing operator state. |
org.apache.flink.core.fs.Path |
getBasePath()
已过时。
Deprecated in favor of
getCheckpointPath(). |
org.apache.flink.core.fs.Path |
getCheckpointPath()
已过时。
Gets the base directory where all the checkpoints are stored.
|
int |
getMinFileSizeThreshold()
已过时。
Gets the threshold below which state is stored as part of the metadata, rather than in files.
|
int |
getWriteBufferSize()
已过时。
Gets the write buffer size for created checkpoint stream.
|
boolean |
isUsingAsynchronousSnapshots()
已过时。
Gets whether the key/value data structures are asynchronously snapshotted, which is always
true for this state backend.
|
boolean |
supportsNoClaimRestoreMode()
已过时。
Tells if a state backend supports the
RestoreMode.NO_CLAIM mode. |
boolean |
supportsSavepointFormat(org.apache.flink.core.execution.SavepointFormatType formatType)
已过时。
|
String |
toString()
已过时。
|
getSavepointPath, resolveCheckpointgetCompressionDecoratorclone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, waitcreateKeyedStateBackend, useManagedMemorypublic FsStateBackend(String checkpointDataUri)
A file system for the file system scheme in the URI (e.g., 'file://', 'hdfs://', or
'S3://') must be accessible via FileSystem.get(URI).
For a state backend targeting HDFS, this means that the URI must either specify the authority (host and port), or that the Hadoop configuration that describes that information must be in the classpath.
checkpointDataUri - The URI describing the filesystem (scheme and optionally authority),
and the path to the checkpoint data directory.public FsStateBackend(String checkpointDataUri, boolean asynchronousSnapshots)
A file system for the file system scheme in the URI (e.g., 'file://', 'hdfs://', or
'S3://') must be accessible via FileSystem.get(URI).
For a state backend targeting HDFS, this means that the URI must either specify the authority (host and port), or that the Hadoop configuration that describes that information must be in the classpath.
checkpointDataUri - The URI describing the filesystem (scheme and optionally authority),
and the path to the checkpoint data directory.asynchronousSnapshots - This parameter is only there for API compatibility. Checkpoints
are always asynchronous now.public FsStateBackend(org.apache.flink.core.fs.Path checkpointDataUri)
A file system for the file system scheme in the URI (e.g., 'file://', 'hdfs://', or
'S3://') must be accessible via FileSystem.get(URI).
For a state backend targeting HDFS, this means that the URI must either specify the authority (host and port), or that the Hadoop configuration that describes that information must be in the classpath.
checkpointDataUri - The URI describing the filesystem (scheme and optionally authority),
and the path to the checkpoint data directory.public FsStateBackend(org.apache.flink.core.fs.Path checkpointDataUri,
boolean asynchronousSnapshots)
A file system for the file system scheme in the URI (e.g., 'file://', 'hdfs://', or
'S3://') must be accessible via FileSystem.get(URI).
For a state backend targeting HDFS, this means that the URI must either specify the authority (host and port), or that the Hadoop configuration that describes that information must be in the classpath.
checkpointDataUri - The URI describing the filesystem (scheme and optionally authority),
and the path to the checkpoint data directory.asynchronousSnapshots - This parameter is only there for API compatibility. Checkpoints
are always asynchronous now.public FsStateBackend(URI checkpointDataUri)
A file system for the file system scheme in the URI (e.g., 'file://', 'hdfs://', or
'S3://') must be accessible via FileSystem.get(URI).
For a state backend targeting HDFS, this means that the URI must either specify the authority (host and port), or that the Hadoop configuration that describes that information must be in the classpath.
checkpointDataUri - The URI describing the filesystem (scheme and optionally authority),
and the path to the checkpoint data directory.public FsStateBackend(URI checkpointDataUri, @Nullable URI defaultSavepointDirectory)
A file system for the file system scheme in the URI (e.g., 'file://', 'hdfs://', or
'S3://') must be accessible via FileSystem.get(URI).
For a state backend targeting HDFS, this means that the URI must either specify the authority (host and port), or that the Hadoop configuration that describes that information must be in the classpath.
checkpointDataUri - The URI describing the filesystem (scheme and optionally authority),
and the path to the checkpoint data directory.defaultSavepointDirectory - The default directory to store savepoints to. May be null.public FsStateBackend(URI checkpointDataUri, boolean asynchronousSnapshots)
A file system for the file system scheme in the URI (e.g., 'file://', 'hdfs://', or
'S3://') must be accessible via FileSystem.get(URI).
For a state backend targeting HDFS, this means that the URI must either specify the authority (host and port), or that the Hadoop configuration that describes that information must be in the classpath.
checkpointDataUri - The URI describing the filesystem (scheme and optionally authority),
and the path to the checkpoint data directory.asynchronousSnapshots - This parameter is only there for API compatibility. Checkpoints
are always asynchronous now.public FsStateBackend(URI checkpointDataUri, int fileStateSizeThreshold)
A file system for the file system scheme in the URI (e.g., 'file://', 'hdfs://', or
'S3://') must be accessible via FileSystem.get(URI).
For a state backend targeting HDFS, this means that the URI must either specify the authority (host and port), or that the Hadoop configuration that describes that information must be in the classpath.
checkpointDataUri - The URI describing the filesystem (scheme and optionally authority),
and the path to the checkpoint data directory.fileStateSizeThreshold - State up to this size will be stored as part of the metadata,
rather than in filespublic FsStateBackend(URI checkpointDataUri, int fileStateSizeThreshold, boolean asynchronousSnapshots)
A file system for the file system scheme in the URI (e.g., 'file://', 'hdfs://', or
'S3://') must be accessible via FileSystem.get(URI).
For a state backend targeting HDFS, this means that the URI must either specify the authority (host and port), or that the Hadoop configuration that describes that information must be in the classpath.
checkpointDataUri - The URI describing the filesystem (scheme and optionally authority),
and the path to the checkpoint data directory.fileStateSizeThreshold - State up to this size will be stored as part of the metadata,
rather than in files (-1 for default value).asynchronousSnapshots - This parameter is only there for API compatibility. Checkpoints
are always asynchronous now.public FsStateBackend(URI checkpointDirectory, @Nullable URI defaultSavepointDirectory, int fileStateSizeThreshold, int writeBufferSize, org.apache.flink.util.TernaryBoolean asynchronousSnapshots)
A file system for the file system scheme in the URI (e.g., 'file://', 'hdfs://', or
'S3://') must be accessible via FileSystem.get(URI).
For a state backend targeting HDFS, this means that the URI must either specify the authority (host and port), or that the Hadoop configuration that describes that information must be in the classpath.
checkpointDirectory - The path to write checkpoint metadata to.defaultSavepointDirectory - The path to write savepoints to. If null, the value from the
runtime configuration will be used, or savepoint target locations need to be passed when
triggering a savepoint.fileStateSizeThreshold - State below this size will be stored as part of the metadata,
rather than in files. If -1, the value configured in the runtime configuration will be
used, or the default value (1KB) if nothing is configured.writeBufferSize - Write buffer size used to serialize state. If -1, the value configured
in the runtime configuration will be used, or the default value (4KB) if nothing is
configured.asynchronousSnapshots - This parameter is only there for API compatibility. Checkpoints
are always asynchronous now.@Deprecated public org.apache.flink.core.fs.Path getBasePath()
getCheckpointPath().@Nonnull public org.apache.flink.core.fs.Path getCheckpointPath()
getCheckpointPath 在类中 AbstractFileStateBackendpublic int getMinFileSizeThreshold()
If not explicitly configured, this is the default value of CheckpointingOptions.FS_SMALL_FILE_THRESHOLD.
public int getWriteBufferSize()
If not explicitly configured, this is the default value of CheckpointingOptions.FS_WRITE_BUFFER_SIZE.
public boolean isUsingAsynchronousSnapshots()
public boolean supportsNoClaimRestoreMode()
StateBackendRestoreMode.NO_CLAIM mode.
If a state backend supports NO_CLAIM mode, it should create an independent
snapshot when it receives CheckpointType.FULL_CHECKPOINT in Snapshotable.snapshot(long, long, CheckpointStreamFactory, CheckpointOptions).
supportsNoClaimRestoreMode 在接口中 StateBackendRestoreMode.NO_CLAIM mode.public boolean supportsSavepointFormat(org.apache.flink.core.execution.SavepointFormatType formatType)
supportsSavepointFormat 在接口中 StateBackendpublic FsStateBackend configure(org.apache.flink.configuration.ReadableConfig config, ClassLoader classLoader)
configure 在接口中 ConfigurableStateBackendconfig - the configurationclassLoader - The class loader that should be used to load the state backend.public CheckpointStorageAccess createCheckpointStorage(org.apache.flink.api.common.JobID jobId) throws IOException
CheckpointStoragecreateCheckpointStorage 在接口中 CheckpointStoragejobId - The job to store checkpoint data for.IOException - Thrown if the checkpoint storage cannot be initialized.public <K> AbstractKeyedStateBackend<K> createKeyedStateBackend(Environment env, org.apache.flink.api.common.JobID jobID, String operatorIdentifier, org.apache.flink.api.common.typeutils.TypeSerializer<K> keySerializer, int numberOfKeyGroups, KeyGroupRange keyGroupRange, TaskKvStateRegistry kvStateRegistry, TtlTimeProvider ttlTimeProvider, org.apache.flink.metrics.MetricGroup metricGroup, @Nonnull Collection<KeyedStateHandle> stateHandles, org.apache.flink.core.fs.CloseableRegistry cancelStreamRegistry) throws BackendBuildingException
StateBackendCheckpointableKeyedStateBackend that is responsible for holding
keyed state and checkpointing it.
Keyed State is state where each value is bound to a key.
createKeyedStateBackend 在接口中 StateBackendcreateKeyedStateBackend 在类中 AbstractStateBackendK - The type of the keys by which the state is organized.env - The environment of the task.jobID - The ID of the job that the task belongs to.operatorIdentifier - The identifier text of the operator.keySerializer - The key-serializer for the operator.numberOfKeyGroups - The number of key-groups aka max parallelism.keyGroupRange - Range of key-groups for which the to-be-created backend is responsible.kvStateRegistry - KvStateRegistry helper for this task.ttlTimeProvider - Provider for TTL logic to judge about state expiration.metricGroup - The parent metric group for all state backend metrics.stateHandles - The state handles for restore.cancelStreamRegistry - The registry to which created closeable objects will be
registered during restore.BackendBuildingExceptionpublic OperatorStateBackend createOperatorStateBackend(Environment env, String operatorIdentifier, @Nonnull Collection<OperatorStateHandle> stateHandles, org.apache.flink.core.fs.CloseableRegistry cancelStreamRegistry) throws BackendBuildingException
StateBackendOperatorStateBackend that can be used for storing operator state.
Operator state is state that is associated with parallel operator (or function) instances, rather than with keys.
createOperatorStateBackend 在接口中 StateBackendcreateOperatorStateBackend 在类中 AbstractStateBackendenv - The runtime environment of the executing task.operatorIdentifier - The identifier of the operator whose state should be stored.stateHandles - The state handles for restore.cancelStreamRegistry - The registry to register streams to close if task canceled.BackendBuildingExceptionCopyright © 2014–2022 The Apache Software Foundation. All rights reserved.