Package org.nd4j.autodiff.samediff.ops
Class SDNN
- java.lang.Object
-
- org.nd4j.autodiff.samediff.ops.SDOps
-
- org.nd4j.autodiff.samediff.ops.SDNN
-
public class SDNN extends SDOps
-
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description SDVariablebatchNorm(String name, SDVariable input, SDVariable mean, SDVariable variance, SDVariable gamma, SDVariable beta, double epsilon, int... axis)SDVariablebatchNorm(SDVariable input, SDVariable mean, SDVariable variance, SDVariable gamma, SDVariable beta, double epsilon, int... axis)SDVariablebiasAdd(String name, SDVariable input, SDVariable bias, boolean nchw)Bias addition operation: a special case of addition, typically used with CNN 4D activations and a 1D bias vectorSDVariablebiasAdd(SDVariable input, SDVariable bias, boolean nchw)Bias addition operation: a special case of addition, typically used with CNN 4D activations and a 1D bias vectorSDVariablecReLU(String name, SDVariable x)Concatenates a ReLU which selects only the positive part of the activation with a ReLU which selects only the negative part of the activation.SDVariablecReLU(SDVariable x)Concatenates a ReLU which selects only the positive part of the activation with a ReLU which selects only the negative part of the activation.SDVariabledotProductAttention(String name, SDVariable queries, SDVariable keys, SDVariable values, SDVariable mask, boolean scaled)This operation performs dot product attention on the given timeseries input with the given queries
out = sum(similarity(k_i, q) * v_i)
similarity(k, q) = softmax(k * q) where x * q is the dot product of x and q
Optionally with normalization step:
similarity(k, q) = softmax(k * q / sqrt(size(q))
See also "Attention is all you need" (https://arxiv.org/abs/1706.03762, p.SDVariabledotProductAttention(SDVariable queries, SDVariable keys, SDVariable values, SDVariable mask, boolean scaled)This operation performs dot product attention on the given timeseries input with the given queries
out = sum(similarity(k_i, q) * v_i)
similarity(k, q) = softmax(k * q) where x * q is the dot product of x and q
Optionally with normalization step:
similarity(k, q) = softmax(k * q / sqrt(size(q))
See also "Attention is all you need" (https://arxiv.org/abs/1706.03762, p.SDVariabledropout(String name, SDVariable input, double inputRetainProbability)Dropout operationSDVariabledropout(SDVariable input, double inputRetainProbability)Dropout operationSDVariabledropoutInverted(String name, SDVariable input, double p)Dropout inverted operation.SDVariabledropoutInverted(SDVariable input, double p)Dropout inverted operation.SDVariableelu(String name, SDVariable x)Element-wise exponential linear unit (ELU) function:
out = x if x > 0
out = a * (exp(x) - 1) if x <= 0
with constant a = 1.0SDVariableelu(SDVariable x)Element-wise exponential linear unit (ELU) function:
out = x if x > 0
out = a * (exp(x) - 1) if x <= 0
with constant a = 1.0SDVariablegelu(String name, SDVariable x)GELU activation function - Gaussian Error Linear Units
For more details, see Gaussian Error Linear Units (GELUs) - https://arxiv.org/abs/1606.08415
This method uses the sigmoid approximationSDVariablegelu(SDVariable x)GELU activation function - Gaussian Error Linear Units
For more details, see Gaussian Error Linear Units (GELUs) - https://arxiv.org/abs/1606.08415
This method uses the sigmoid approximationSDVariablehardSigmoid(String name, SDVariable x)Element-wise hard sigmoid function:
out[i] = 0 if in[i] <= -2.5
out[1] = 0.2*in[i]+0.5 if -2.5 < in[i] < 2.5
out[i] = 1 if in[i] >= 2.5SDVariablehardSigmoid(SDVariable x)Element-wise hard sigmoid function:
out[i] = 0 if in[i] <= -2.5
out[1] = 0.2*in[i]+0.5 if -2.5 < in[i] < 2.5
out[i] = 1 if in[i] >= 2.5SDVariablehardTanh(String name, SDVariable x)Element-wise hard tanh function:
out[i] = -1 if in[i] <= -1
out[1] = in[i] if -1 < in[i] < 1
out[i] = 1 if in[i] >= 1SDVariablehardTanh(SDVariable x)Element-wise hard tanh function:
out[i] = -1 if in[i] <= -1
out[1] = in[i] if -1 < in[i] < 1
out[i] = 1 if in[i] >= 1SDVariablehardTanhDerivative(String name, SDVariable x)Derivative (dOut/dIn) of the element-wise hard Tanh function - hardTanh(INDArray)SDVariablehardTanhDerivative(SDVariable x)Derivative (dOut/dIn) of the element-wise hard Tanh function - hardTanh(INDArray)SDVariablelayerNorm(String name, SDVariable input, SDVariable gain, boolean channelsFirst, int... dimensions)Apply Layer Normalization
y = gain * standardize(x) + biasSDVariablelayerNorm(String name, SDVariable input, SDVariable gain, SDVariable bias, boolean channelsFirst, int... dimensions)Apply Layer Normalization
y = gain * standardize(x) + biasSDVariablelayerNorm(SDVariable input, SDVariable gain, boolean channelsFirst, int... dimensions)Apply Layer Normalization
y = gain * standardize(x) + biasSDVariablelayerNorm(SDVariable input, SDVariable gain, SDVariable bias, boolean channelsFirst, int... dimensions)Apply Layer Normalization
y = gain * standardize(x) + biasSDVariableleakyRelu(String name, SDVariable x, double alpha)Element-wise leaky ReLU function:
out = x if x >= 0.0
out = alpha * x if x < cutoff
Alpha value is most commonly set to 0.01SDVariableleakyRelu(SDVariable x, double alpha)Element-wise leaky ReLU function:
out = x if x >= 0.0
out = alpha * x if x < cutoff
Alpha value is most commonly set to 0.01SDVariableleakyReluDerivative(String name, SDVariable x, double alpha)Leaky ReLU derivative: dOut/dIn given input.SDVariableleakyReluDerivative(SDVariable x, double alpha)Leaky ReLU derivative: dOut/dIn given input.SDVariablelinear(String name, SDVariable input, SDVariable weights, SDVariable bias)Linear layer operation: out = mmul(in,w) + bias
Note that bias array is optionalSDVariablelinear(SDVariable input, SDVariable weights, SDVariable bias)Linear layer operation: out = mmul(in,w) + bias
Note that bias array is optionalSDVariablelogSigmoid(String name, SDVariable x)Element-wise sigmoid function: out[i] = log(sigmoid(in[i]))SDVariablelogSigmoid(SDVariable x)Element-wise sigmoid function: out[i] = log(sigmoid(in[i]))SDVariablelogSoftmax(String name, SDVariable x)Log softmax activationSDVariablelogSoftmax(String name, SDVariable x, int dimension)Log softmax activationSDVariablelogSoftmax(SDVariable x)Log softmax activationSDVariablelogSoftmax(SDVariable x, int dimension)Log softmax activationSDVariablemultiHeadDotProductAttention(String name, SDVariable queries, SDVariable keys, SDVariable values, SDVariable Wq, SDVariable Wk, SDVariable Wv, SDVariable Wo, SDVariable mask, boolean scaled)This performs multi-headed dot product attention on the given timeseries input
out = concat(head_1, head_2, ..., head_n) * Wo
head_i = dot_product_attention(Wq_i*q, Wk_i*k, Wv_i*v)
Optionally with normalization when calculating the attention for each head.
See also "Attention is all you need" (https://arxiv.org/abs/1706.03762, pp.SDVariablemultiHeadDotProductAttention(SDVariable queries, SDVariable keys, SDVariable values, SDVariable Wq, SDVariable Wk, SDVariable Wv, SDVariable Wo, SDVariable mask, boolean scaled)This performs multi-headed dot product attention on the given timeseries input
out = concat(head_1, head_2, ..., head_n) * Wo
head_i = dot_product_attention(Wq_i*q, Wk_i*k, Wv_i*v)
Optionally with normalization when calculating the attention for each head.
See also "Attention is all you need" (https://arxiv.org/abs/1706.03762, pp.SDVariablepad(String name, SDVariable input, SDVariable padding, double constant)Padding operationSDVariablepad(String name, SDVariable input, SDVariable padding, PadMode PadMode, double constant)Padding operationSDVariablepad(SDVariable input, SDVariable padding, double constant)Padding operationSDVariablepad(SDVariable input, SDVariable padding, PadMode PadMode, double constant)Padding operationSDVariablepreciseGelu(String name, SDVariable x)GELU activation function - Gaussian Error Linear Units
For more details, see Gaussian Error Linear Units (GELUs) - https://arxiv.org/abs/1606.08415
This method uses the precise methodSDVariablepreciseGelu(SDVariable x)GELU activation function - Gaussian Error Linear Units
For more details, see Gaussian Error Linear Units (GELUs) - https://arxiv.org/abs/1606.08415
This method uses the precise methodSDVariableprelu(String name, SDVariable input, SDVariable alpha, int... sharedAxes)PReLU (Parameterized Rectified Linear Unit) operation.SDVariableprelu(SDVariable input, SDVariable alpha, int... sharedAxes)PReLU (Parameterized Rectified Linear Unit) operation.SDVariablerelu(String name, SDVariable x, double cutoff)Element-wise rectified linear function with specified cutoff:
out[i] = in[i] if in[i] >= cutoff
out[i] = 0 otherwiseSDVariablerelu(SDVariable x, double cutoff)Element-wise rectified linear function with specified cutoff:
out[i] = in[i] if in[i] >= cutoff
out[i] = 0 otherwiseSDVariablerelu6(String name, SDVariable x, double cutoff)Element-wise "rectified linear 6" function with specified cutoff:
out[i] = min(max(in, cutoff), 6)SDVariablerelu6(SDVariable x, double cutoff)Element-wise "rectified linear 6" function with specified cutoff:
out[i] = min(max(in, cutoff), 6)SDVariablereluLayer(String name, SDVariable input, SDVariable weights, SDVariable bias)ReLU (Rectified Linear Unit) layer operation: out = relu(mmul(in,w) + bias)
Note that bias array is optionalSDVariablereluLayer(SDVariable input, SDVariable weights, SDVariable bias)ReLU (Rectified Linear Unit) layer operation: out = relu(mmul(in,w) + bias)
Note that bias array is optionalSDVariableselu(String name, SDVariable x)Element-wise SeLU function - Scaled exponential Lineal Unit: see Self-Normalizing Neural Networks
out[i] = scale * alpha * (exp(in[i])-1) if in[i]>0, or 0 if in[i] <= 0
Uses default scale and alpha values.SDVariableselu(SDVariable x)Element-wise SeLU function - Scaled exponential Lineal Unit: see Self-Normalizing Neural Networks
out[i] = scale * alpha * (exp(in[i])-1) if in[i]>0, or 0 if in[i] <= 0
Uses default scale and alpha values.SDVariablesigmoid(String name, SDVariable x)Element-wise sigmoid function: out[i] = 1.0/(1+exp(-in[i]))SDVariablesigmoid(SDVariable x)Element-wise sigmoid function: out[i] = 1.0/(1+exp(-in[i]))SDVariablesigmoidDerivative(String name, SDVariable x, SDVariable wrt)Element-wise sigmoid function derivative: dL/dIn given input and dL/dOutSDVariablesigmoidDerivative(SDVariable x, SDVariable wrt)Element-wise sigmoid function derivative: dL/dIn given input and dL/dOutSDVariablesoftmax(String name, SDVariable x)Softmax activation, along the specified dimensionSDVariablesoftmax(String name, SDVariable x, int dimension)Softmax activation, along the specified dimensionSDVariablesoftmax(SDVariable x)Softmax activation, along the specified dimensionSDVariablesoftmax(SDVariable x, int dimension)Softmax activation, along the specified dimensionSDVariablesoftmaxDerivative(String name, SDVariable x, SDVariable wrt, int dimension)Softmax derivative functionSDVariablesoftmaxDerivative(SDVariable x, SDVariable wrt, int dimension)Softmax derivative functionSDVariablesoftplus(String name, SDVariable x)Element-wise softplus function: out = log(exp(x) + 1)SDVariablesoftplus(SDVariable x)Element-wise softplus function: out = log(exp(x) + 1)SDVariablesoftsign(String name, SDVariable x)Element-wise softsign function: out = x / (abs(x) + 1)SDVariablesoftsign(SDVariable x)Element-wise softsign function: out = x / (abs(x) + 1)SDVariablesoftsignDerivative(String name, SDVariable x)Element-wise derivative (dOut/dIn) of the softsign function softsign(INDArray)SDVariablesoftsignDerivative(SDVariable x)Element-wise derivative (dOut/dIn) of the softsign function softsign(INDArray)SDVariableswish(String name, SDVariable x)Element-wise "swish" function: out = x * sigmoid(b*x) with b=1.0
See: https://arxiv.org/abs/1710.05941SDVariableswish(SDVariable x)Element-wise "swish" function: out = x * sigmoid(b*x) with b=1.0
See: https://arxiv.org/abs/1710.05941SDVariabletanh(String name, SDVariable x)Elementwise tanh (hyperbolic tangent) operation: out = tanh(x)SDVariabletanh(SDVariable x)Elementwise tanh (hyperbolic tangent) operation: out = tanh(x)SDVariable[]topK(String[] names, SDVariable input, double k, boolean sorted)Find values and indices for the largest k entries along the last dimension.SDVariable[]topK(SDVariable input, double k, boolean sorted)Find values and indices for the largest k entries along the last dimension.
-
-
-
Constructor Detail
-
SDNN
public SDNN(SameDiff sameDiff)
-
-
Method Detail
-
cReLU
public SDVariable cReLU(SDVariable x)
Concatenates a ReLU which selects only the positive part of the activation with a ReLU which selects only the negative part of the activation. Note that as a result this non-linearity doubles the depth of the activations.- Parameters:
x- Input variable (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
cReLU
public SDVariable cReLU(String name, SDVariable x)
Concatenates a ReLU which selects only the positive part of the activation with a ReLU which selects only the negative part of the activation. Note that as a result this non-linearity doubles the depth of the activations.- Parameters:
name- name May be null. Name for the output variablex- Input variable (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
batchNorm
public SDVariable batchNorm(SDVariable input, SDVariable mean, SDVariable variance, SDVariable gamma, SDVariable beta, double epsilon, int... axis)
- Parameters:
input- Input variable. (NUMERIC type)mean- Mean value. For 1d axis, this should match input.size(axis) (NUMERIC type)variance- Variance value. For 1d axis, this should match input.size(axis) (NUMERIC type)gamma- Gamma value. For 1d axis, this should match input.size(axis) (NUMERIC type)beta- Beta value. For 1d axis, this should match input.size(axis) (NUMERIC type)epsilon- Epsilon constant for numerical stability (to avoid division by 0)axis- For 2d CNN activations: 1 for NCHW format activations, or 3 for NHWC format activations. For 3d CNN activations: 1 for NCDHW format, 4 for NDHWC For 1d/RNN activations: 1 for NCW format, 2 for NWC (Size: AtLeast(min=1))- Returns:
- output variable for batch normalization (NUMERIC type)
-
batchNorm
public SDVariable batchNorm(String name, SDVariable input, SDVariable mean, SDVariable variance, SDVariable gamma, SDVariable beta, double epsilon, int... axis)
- Parameters:
name- name May be null. Name for the output variableinput- Input variable. (NUMERIC type)mean- Mean value. For 1d axis, this should match input.size(axis) (NUMERIC type)variance- Variance value. For 1d axis, this should match input.size(axis) (NUMERIC type)gamma- Gamma value. For 1d axis, this should match input.size(axis) (NUMERIC type)beta- Beta value. For 1d axis, this should match input.size(axis) (NUMERIC type)epsilon- Epsilon constant for numerical stability (to avoid division by 0)axis- For 2d CNN activations: 1 for NCHW format activations, or 3 for NHWC format activations. For 3d CNN activations: 1 for NCDHW format, 4 for NDHWC For 1d/RNN activations: 1 for NCW format, 2 for NWC (Size: AtLeast(min=1))- Returns:
- output variable for batch normalization (NUMERIC type)
-
biasAdd
public SDVariable biasAdd(SDVariable input, SDVariable bias, boolean nchw)
Bias addition operation: a special case of addition, typically used with CNN 4D activations and a 1D bias vector- Parameters:
input- 4d input variable (NUMERIC type)bias- 1d bias (NUMERIC type)nchw- The format - nchw=true means [minibatch, channels, height, width] format; nchw=false - [minibatch, height, width, channels]. Unused for 2d inputs- Returns:
- output Output variable, after applying bias add operation (NUMERIC type)
-
biasAdd
public SDVariable biasAdd(String name, SDVariable input, SDVariable bias, boolean nchw)
Bias addition operation: a special case of addition, typically used with CNN 4D activations and a 1D bias vector- Parameters:
name- name May be null. Name for the output variableinput- 4d input variable (NUMERIC type)bias- 1d bias (NUMERIC type)nchw- The format - nchw=true means [minibatch, channels, height, width] format; nchw=false - [minibatch, height, width, channels]. Unused for 2d inputs- Returns:
- output Output variable, after applying bias add operation (NUMERIC type)
-
dotProductAttention
public SDVariable dotProductAttention(SDVariable queries, SDVariable keys, SDVariable values, SDVariable mask, boolean scaled)
This operation performs dot product attention on the given timeseries input with the given queries
out = sum(similarity(k_i, q) * v_i)
similarity(k, q) = softmax(k * q) where x * q is the dot product of x and q
Optionally with normalization step:
similarity(k, q) = softmax(k * q / sqrt(size(q))
See also "Attention is all you need" (https://arxiv.org/abs/1706.03762, p. 4, eq. 1)
Note: This supports multiple queries at once, if only one query is available the queries vector still has to
be 3D but can have queryCount = 1
Note: keys and values usually is the same array. If you want to use it as the same array, simply pass it for
both.
Note: Queries, keys and values must either be all rank 3 or all rank 4 arrays. Mixing them doesn't work. The
output rank will depend on the input rank.- Parameters:
queries- input 3D array "queries" of shape [batchSize, featureKeys, queryCount] or 4D array of shape [batchSize, numHeads, featureKeys, queryCount] (NUMERIC type)keys- input 3D array "keys" of shape [batchSize, featureKeys, timesteps] or 4D array of shape [batchSize, numHeads, featureKeys, timesteps] (NUMERIC type)values- input 3D array "values" of shape [batchSize, featureValues, timesteps] or 4D array of shape [batchSize, numHeads, featureValues, timesteps] (NUMERIC type)mask- OPTIONAL; array that defines which values should be skipped of shape [batchSize, timesteps] (NUMERIC type)scaled- normalization, false -> do not apply normalization, true -> apply normalization- Returns:
- output Attention result arrays of shape [batchSize, featureValues, queryCount] or [batchSize, numHeads, featureValues, queryCount], (optionally) Attention Weights of shape [batchSize, timesteps, queryCount] or [batchSize, numHeads, timesteps, queryCount] (NUMERIC type)
-
dotProductAttention
public SDVariable dotProductAttention(String name, SDVariable queries, SDVariable keys, SDVariable values, SDVariable mask, boolean scaled)
This operation performs dot product attention on the given timeseries input with the given queries
out = sum(similarity(k_i, q) * v_i)
similarity(k, q) = softmax(k * q) where x * q is the dot product of x and q
Optionally with normalization step:
similarity(k, q) = softmax(k * q / sqrt(size(q))
See also "Attention is all you need" (https://arxiv.org/abs/1706.03762, p. 4, eq. 1)
Note: This supports multiple queries at once, if only one query is available the queries vector still has to
be 3D but can have queryCount = 1
Note: keys and values usually is the same array. If you want to use it as the same array, simply pass it for
both.
Note: Queries, keys and values must either be all rank 3 or all rank 4 arrays. Mixing them doesn't work. The
output rank will depend on the input rank.- Parameters:
name- name May be null. Name for the output variablequeries- input 3D array "queries" of shape [batchSize, featureKeys, queryCount] or 4D array of shape [batchSize, numHeads, featureKeys, queryCount] (NUMERIC type)keys- input 3D array "keys" of shape [batchSize, featureKeys, timesteps] or 4D array of shape [batchSize, numHeads, featureKeys, timesteps] (NUMERIC type)values- input 3D array "values" of shape [batchSize, featureValues, timesteps] or 4D array of shape [batchSize, numHeads, featureValues, timesteps] (NUMERIC type)mask- OPTIONAL; array that defines which values should be skipped of shape [batchSize, timesteps] (NUMERIC type)scaled- normalization, false -> do not apply normalization, true -> apply normalization- Returns:
- output Attention result arrays of shape [batchSize, featureValues, queryCount] or [batchSize, numHeads, featureValues, queryCount], (optionally) Attention Weights of shape [batchSize, timesteps, queryCount] or [batchSize, numHeads, timesteps, queryCount] (NUMERIC type)
-
dropout
public SDVariable dropout(SDVariable input, double inputRetainProbability)
Dropout operation- Parameters:
input- Input array (NUMERIC type)inputRetainProbability- Probability of retaining an input (set to 0 with probability 1-p)- Returns:
- output Output (NUMERIC type)
-
dropout
public SDVariable dropout(String name, SDVariable input, double inputRetainProbability)
Dropout operation- Parameters:
name- name May be null. Name for the output variableinput- Input array (NUMERIC type)inputRetainProbability- Probability of retaining an input (set to 0 with probability 1-p)- Returns:
- output Output (NUMERIC type)
-
dropoutInverted
public SDVariable dropoutInverted(SDVariable input, double p)
Dropout inverted operation. The dropout probability p is the probability of dropping an input.- Parameters:
input- Input array (NUMERIC type)p- Probability of dropping an input (set to 0 with probability p)- Returns:
- output Output (NUMERIC type)
-
dropoutInverted
public SDVariable dropoutInverted(String name, SDVariable input, double p)
Dropout inverted operation. The dropout probability p is the probability of dropping an input.- Parameters:
name- name May be null. Name for the output variableinput- Input array (NUMERIC type)p- Probability of dropping an input (set to 0 with probability p)- Returns:
- output Output (NUMERIC type)
-
elu
public SDVariable elu(SDVariable x)
Element-wise exponential linear unit (ELU) function:
out = x if x > 0
out = a * (exp(x) - 1) if x <= 0
with constant a = 1.0
- Parameters:
x- Input variable (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
elu
public SDVariable elu(String name, SDVariable x)
Element-wise exponential linear unit (ELU) function:
out = x if x > 0
out = a * (exp(x) - 1) if x <= 0
with constant a = 1.0
- Parameters:
name- name May be null. Name for the output variablex- Input variable (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
gelu
public SDVariable gelu(SDVariable x)
GELU activation function - Gaussian Error Linear Units
For more details, see Gaussian Error Linear Units (GELUs) - https://arxiv.org/abs/1606.08415
This method uses the sigmoid approximation- Parameters:
x- Input variable (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
gelu
public SDVariable gelu(String name, SDVariable x)
GELU activation function - Gaussian Error Linear Units
For more details, see Gaussian Error Linear Units (GELUs) - https://arxiv.org/abs/1606.08415
This method uses the sigmoid approximation- Parameters:
name- name May be null. Name for the output variablex- Input variable (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
hardSigmoid
public SDVariable hardSigmoid(SDVariable x)
Element-wise hard sigmoid function:
out[i] = 0 if in[i] <= -2.5
out[1] = 0.2*in[i]+0.5 if -2.5 < in[i] < 2.5
out[i] = 1 if in[i] >= 2.5- Parameters:
x- Input variable (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
hardSigmoid
public SDVariable hardSigmoid(String name, SDVariable x)
Element-wise hard sigmoid function:
out[i] = 0 if in[i] <= -2.5
out[1] = 0.2*in[i]+0.5 if -2.5 < in[i] < 2.5
out[i] = 1 if in[i] >= 2.5- Parameters:
name- name May be null. Name for the output variablex- Input variable (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
hardTanh
public SDVariable hardTanh(SDVariable x)
Element-wise hard tanh function:
out[i] = -1 if in[i] <= -1
out[1] = in[i] if -1 < in[i] < 1
out[i] = 1 if in[i] >= 1- Parameters:
x- Input variable (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
hardTanh
public SDVariable hardTanh(String name, SDVariable x)
Element-wise hard tanh function:
out[i] = -1 if in[i] <= -1
out[1] = in[i] if -1 < in[i] < 1
out[i] = 1 if in[i] >= 1- Parameters:
name- name May be null. Name for the output variablex- Input variable (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
hardTanhDerivative
public SDVariable hardTanhDerivative(SDVariable x)
Derivative (dOut/dIn) of the element-wise hard Tanh function - hardTanh(INDArray)- Parameters:
x- Input variable (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
hardTanhDerivative
public SDVariable hardTanhDerivative(String name, SDVariable x)
Derivative (dOut/dIn) of the element-wise hard Tanh function - hardTanh(INDArray)- Parameters:
name- name May be null. Name for the output variablex- Input variable (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
layerNorm
public SDVariable layerNorm(SDVariable input, SDVariable gain, SDVariable bias, boolean channelsFirst, int... dimensions)
Apply Layer Normalization
y = gain * standardize(x) + bias- Parameters:
input- Input variable (NUMERIC type)gain- Gain (NUMERIC type)bias- Bias (NUMERIC type)channelsFirst- For 2D input - unused. True for NCHW (minibatch, channels, height, width), false for NHWC datadimensions- Dimensions to perform layer norm over - dimension=1 for 2d/MLP data, dimension=1,2,3 for CNNs (Size: AtLeast(min=1))- Returns:
- output Output variable (NUMERIC type)
-
layerNorm
public SDVariable layerNorm(String name, SDVariable input, SDVariable gain, SDVariable bias, boolean channelsFirst, int... dimensions)
Apply Layer Normalization
y = gain * standardize(x) + bias- Parameters:
name- name May be null. Name for the output variableinput- Input variable (NUMERIC type)gain- Gain (NUMERIC type)bias- Bias (NUMERIC type)channelsFirst- For 2D input - unused. True for NCHW (minibatch, channels, height, width), false for NHWC datadimensions- Dimensions to perform layer norm over - dimension=1 for 2d/MLP data, dimension=1,2,3 for CNNs (Size: AtLeast(min=1))- Returns:
- output Output variable (NUMERIC type)
-
layerNorm
public SDVariable layerNorm(SDVariable input, SDVariable gain, boolean channelsFirst, int... dimensions)
Apply Layer Normalization
y = gain * standardize(x) + bias- Parameters:
input- Input variable (NUMERIC type)gain- Gain (NUMERIC type)channelsFirst- For 2D input - unused. True for NCHW (minibatch, channels, height, width), false for NHWC datadimensions- Dimensions to perform layer norm over - dimension=1 for 2d/MLP data, dimension=1,2,3 for CNNs (Size: AtLeast(min=1))- Returns:
- output Output variable (NUMERIC type)
-
layerNorm
public SDVariable layerNorm(String name, SDVariable input, SDVariable gain, boolean channelsFirst, int... dimensions)
Apply Layer Normalization
y = gain * standardize(x) + bias- Parameters:
name- name May be null. Name for the output variableinput- Input variable (NUMERIC type)gain- Gain (NUMERIC type)channelsFirst- For 2D input - unused. True for NCHW (minibatch, channels, height, width), false for NHWC datadimensions- Dimensions to perform layer norm over - dimension=1 for 2d/MLP data, dimension=1,2,3 for CNNs (Size: AtLeast(min=1))- Returns:
- output Output variable (NUMERIC type)
-
leakyRelu
public SDVariable leakyRelu(SDVariable x, double alpha)
Element-wise leaky ReLU function:
out = x if x >= 0.0
out = alpha * x if x < cutoff
Alpha value is most commonly set to 0.01- Parameters:
x- Input variable (NUMERIC type)alpha- Cutoff - commonly 0.01- Returns:
- output Output variable (NUMERIC type)
-
leakyRelu
public SDVariable leakyRelu(String name, SDVariable x, double alpha)
Element-wise leaky ReLU function:
out = x if x >= 0.0
out = alpha * x if x < cutoff
Alpha value is most commonly set to 0.01- Parameters:
name- name May be null. Name for the output variablex- Input variable (NUMERIC type)alpha- Cutoff - commonly 0.01- Returns:
- output Output variable (NUMERIC type)
-
leakyReluDerivative
public SDVariable leakyReluDerivative(SDVariable x, double alpha)
Leaky ReLU derivative: dOut/dIn given input.- Parameters:
x- Input variable (NUMERIC type)alpha- Cutoff - commonly 0.01- Returns:
- output Output variable (NUMERIC type)
-
leakyReluDerivative
public SDVariable leakyReluDerivative(String name, SDVariable x, double alpha)
Leaky ReLU derivative: dOut/dIn given input.- Parameters:
name- name May be null. Name for the output variablex- Input variable (NUMERIC type)alpha- Cutoff - commonly 0.01- Returns:
- output Output variable (NUMERIC type)
-
linear
public SDVariable linear(SDVariable input, SDVariable weights, SDVariable bias)
Linear layer operation: out = mmul(in,w) + bias
Note that bias array is optional- Parameters:
input- Input data (NUMERIC type)weights- Weights variable, shape [nIn, nOut] (NUMERIC type)bias- Optional bias variable (may be null) (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
linear
public SDVariable linear(String name, SDVariable input, SDVariable weights, SDVariable bias)
Linear layer operation: out = mmul(in,w) + bias
Note that bias array is optional- Parameters:
name- name May be null. Name for the output variableinput- Input data (NUMERIC type)weights- Weights variable, shape [nIn, nOut] (NUMERIC type)bias- Optional bias variable (may be null) (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
logSigmoid
public SDVariable logSigmoid(SDVariable x)
Element-wise sigmoid function: out[i] = log(sigmoid(in[i]))- Parameters:
x- Input variable (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
logSigmoid
public SDVariable logSigmoid(String name, SDVariable x)
Element-wise sigmoid function: out[i] = log(sigmoid(in[i]))- Parameters:
name- name May be null. Name for the output variablex- Input variable (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
logSoftmax
public SDVariable logSoftmax(SDVariable x)
Log softmax activation- Parameters:
x- (NUMERIC type)- Returns:
- output (NUMERIC type)
-
logSoftmax
public SDVariable logSoftmax(String name, SDVariable x)
Log softmax activation- Parameters:
name- name May be null. Name for the output variablex- (NUMERIC type)- Returns:
- output (NUMERIC type)
-
logSoftmax
public SDVariable logSoftmax(SDVariable x, int dimension)
Log softmax activation- Parameters:
x- Input (NUMERIC type)dimension- Dimension along which to apply log softmax- Returns:
- output Output - log(softmax(input)) (NUMERIC type)
-
logSoftmax
public SDVariable logSoftmax(String name, SDVariable x, int dimension)
Log softmax activation- Parameters:
name- name May be null. Name for the output variablex- Input (NUMERIC type)dimension- Dimension along which to apply log softmax- Returns:
- output Output - log(softmax(input)) (NUMERIC type)
-
multiHeadDotProductAttention
public SDVariable multiHeadDotProductAttention(SDVariable queries, SDVariable keys, SDVariable values, SDVariable Wq, SDVariable Wk, SDVariable Wv, SDVariable Wo, SDVariable mask, boolean scaled)
This performs multi-headed dot product attention on the given timeseries input
out = concat(head_1, head_2, ..., head_n) * Wo
head_i = dot_product_attention(Wq_i*q, Wk_i*k, Wv_i*v)
Optionally with normalization when calculating the attention for each head.
See also "Attention is all you need" (https://arxiv.org/abs/1706.03762, pp. 4,5, "3.2.2 Multi-Head Attention")
This makes use of dot_product_attention OP support for rank 4 inputs.
see dotProductAttention(INDArray, INDArray, INDArray, INDArray, boolean, boolean)- Parameters:
queries- input 3D array "queries" of shape [batchSize, featureKeys, queryCount] (NUMERIC type)keys- input 3D array "keys" of shape [batchSize, featureKeys, timesteps] (NUMERIC type)values- input 3D array "values" of shape [batchSize, featureValues, timesteps] (NUMERIC type)Wq- input query projection weights of shape [numHeads, projectedKeys, featureKeys] (NUMERIC type)Wk- input key projection weights of shape [numHeads, projectedKeys, featureKeys] (NUMERIC type)Wv- input value projection weights of shape [numHeads, projectedValues, featureValues] (NUMERIC type)Wo- output projection weights of shape [numHeads * projectedValues, outSize] (NUMERIC type)mask- OPTIONAL; array that defines which values should be skipped of shape [batchSize, timesteps] (NUMERIC type)scaled- normalization, false -> do not apply normalization, true -> apply normalization- Returns:
- output Attention result arrays of shape [batchSize, outSize, queryCount] (optionally) Attention Weights of shape [batchSize, numHeads, timesteps, queryCount] (NUMERIC type)
-
multiHeadDotProductAttention
public SDVariable multiHeadDotProductAttention(String name, SDVariable queries, SDVariable keys, SDVariable values, SDVariable Wq, SDVariable Wk, SDVariable Wv, SDVariable Wo, SDVariable mask, boolean scaled)
This performs multi-headed dot product attention on the given timeseries input
out = concat(head_1, head_2, ..., head_n) * Wo
head_i = dot_product_attention(Wq_i*q, Wk_i*k, Wv_i*v)
Optionally with normalization when calculating the attention for each head.
See also "Attention is all you need" (https://arxiv.org/abs/1706.03762, pp. 4,5, "3.2.2 Multi-Head Attention")
This makes use of dot_product_attention OP support for rank 4 inputs.
see dotProductAttention(INDArray, INDArray, INDArray, INDArray, boolean, boolean)- Parameters:
name- name May be null. Name for the output variablequeries- input 3D array "queries" of shape [batchSize, featureKeys, queryCount] (NUMERIC type)keys- input 3D array "keys" of shape [batchSize, featureKeys, timesteps] (NUMERIC type)values- input 3D array "values" of shape [batchSize, featureValues, timesteps] (NUMERIC type)Wq- input query projection weights of shape [numHeads, projectedKeys, featureKeys] (NUMERIC type)Wk- input key projection weights of shape [numHeads, projectedKeys, featureKeys] (NUMERIC type)Wv- input value projection weights of shape [numHeads, projectedValues, featureValues] (NUMERIC type)Wo- output projection weights of shape [numHeads * projectedValues, outSize] (NUMERIC type)mask- OPTIONAL; array that defines which values should be skipped of shape [batchSize, timesteps] (NUMERIC type)scaled- normalization, false -> do not apply normalization, true -> apply normalization- Returns:
- output Attention result arrays of shape [batchSize, outSize, queryCount] (optionally) Attention Weights of shape [batchSize, numHeads, timesteps, queryCount] (NUMERIC type)
-
pad
public SDVariable pad(SDVariable input, SDVariable padding, PadMode PadMode, double constant)
Padding operation- Parameters:
input- Input tensor (NUMERIC type)padding- Padding value (NUMERIC type)PadMode- Padding formatconstant- Padding constant- Returns:
- output Padded input (NUMERIC type)
-
pad
public SDVariable pad(String name, SDVariable input, SDVariable padding, PadMode PadMode, double constant)
Padding operation- Parameters:
name- name May be null. Name for the output variableinput- Input tensor (NUMERIC type)padding- Padding value (NUMERIC type)PadMode- Padding formatconstant- Padding constant- Returns:
- output Padded input (NUMERIC type)
-
pad
public SDVariable pad(SDVariable input, SDVariable padding, double constant)
Padding operation- Parameters:
input- Input tensor (NUMERIC type)padding- Padding value (NUMERIC type)constant- Padding constant- Returns:
- output Padded input (NUMERIC type)
-
pad
public SDVariable pad(String name, SDVariable input, SDVariable padding, double constant)
Padding operation- Parameters:
name- name May be null. Name for the output variableinput- Input tensor (NUMERIC type)padding- Padding value (NUMERIC type)constant- Padding constant- Returns:
- output Padded input (NUMERIC type)
-
preciseGelu
public SDVariable preciseGelu(SDVariable x)
GELU activation function - Gaussian Error Linear Units
For more details, see Gaussian Error Linear Units (GELUs) - https://arxiv.org/abs/1606.08415
This method uses the precise method- Parameters:
x- Input variable (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
preciseGelu
public SDVariable preciseGelu(String name, SDVariable x)
GELU activation function - Gaussian Error Linear Units
For more details, see Gaussian Error Linear Units (GELUs) - https://arxiv.org/abs/1606.08415
This method uses the precise method- Parameters:
name- name May be null. Name for the output variablex- Input variable (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
prelu
public SDVariable prelu(SDVariable input, SDVariable alpha, int... sharedAxes)
PReLU (Parameterized Rectified Linear Unit) operation. Like LeakyReLU with a learnable alpha:
out[i] = in[i] if in[i] >= 0
out[i] = in[i] * alpha[i] otherwise
sharedAxes allows you to share learnable parameters along axes.
For example, if the input has shape [batchSize, channels, height, width]
and you want each channel to have its own cutoff, use sharedAxes = [2, 3] and an
alpha with shape [channels].- Parameters:
input- Input data (NUMERIC type)alpha- The cutoff variable. Note that the batch dimension (the 0th, whether it is batch or not) should not be part of alpha. (NUMERIC type)sharedAxes- Which axes to share cutoff parameters along. (Size: AtLeast(min=1))- Returns:
- output Output (NUMERIC type)
-
prelu
public SDVariable prelu(String name, SDVariable input, SDVariable alpha, int... sharedAxes)
PReLU (Parameterized Rectified Linear Unit) operation. Like LeakyReLU with a learnable alpha:
out[i] = in[i] if in[i] >= 0
out[i] = in[i] * alpha[i] otherwise
sharedAxes allows you to share learnable parameters along axes.
For example, if the input has shape [batchSize, channels, height, width]
and you want each channel to have its own cutoff, use sharedAxes = [2, 3] and an
alpha with shape [channels].- Parameters:
name- name May be null. Name for the output variableinput- Input data (NUMERIC type)alpha- The cutoff variable. Note that the batch dimension (the 0th, whether it is batch or not) should not be part of alpha. (NUMERIC type)sharedAxes- Which axes to share cutoff parameters along. (Size: AtLeast(min=1))- Returns:
- output Output (NUMERIC type)
-
relu
public SDVariable relu(SDVariable x, double cutoff)
Element-wise rectified linear function with specified cutoff:
out[i] = in[i] if in[i] >= cutoff
out[i] = 0 otherwise- Parameters:
x- Input (NUMERIC type)cutoff- Cutoff value for ReLU operation - x > cutoff ? x : 0. Usually 0- Returns:
- output Output (NUMERIC type)
-
relu
public SDVariable relu(String name, SDVariable x, double cutoff)
Element-wise rectified linear function with specified cutoff:
out[i] = in[i] if in[i] >= cutoff
out[i] = 0 otherwise- Parameters:
name- name May be null. Name for the output variablex- Input (NUMERIC type)cutoff- Cutoff value for ReLU operation - x > cutoff ? x : 0. Usually 0- Returns:
- output Output (NUMERIC type)
-
relu6
public SDVariable relu6(SDVariable x, double cutoff)
Element-wise "rectified linear 6" function with specified cutoff:
out[i] = min(max(in, cutoff), 6)- Parameters:
x- Input (NUMERIC type)cutoff- Cutoff value for ReLU operation. Usually 0- Returns:
- output Output (NUMERIC type)
-
relu6
public SDVariable relu6(String name, SDVariable x, double cutoff)
Element-wise "rectified linear 6" function with specified cutoff:
out[i] = min(max(in, cutoff), 6)- Parameters:
name- name May be null. Name for the output variablex- Input (NUMERIC type)cutoff- Cutoff value for ReLU operation. Usually 0- Returns:
- output Output (NUMERIC type)
-
reluLayer
public SDVariable reluLayer(SDVariable input, SDVariable weights, SDVariable bias)
ReLU (Rectified Linear Unit) layer operation: out = relu(mmul(in,w) + bias)
Note that bias array is optional- Parameters:
input- Input data (NUMERIC type)weights- Weights variable (NUMERIC type)bias- Optional bias variable (may be null) (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
reluLayer
public SDVariable reluLayer(String name, SDVariable input, SDVariable weights, SDVariable bias)
ReLU (Rectified Linear Unit) layer operation: out = relu(mmul(in,w) + bias)
Note that bias array is optional- Parameters:
name- name May be null. Name for the output variableinput- Input data (NUMERIC type)weights- Weights variable (NUMERIC type)bias- Optional bias variable (may be null) (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
selu
public SDVariable selu(SDVariable x)
Element-wise SeLU function - Scaled exponential Lineal Unit: see Self-Normalizing Neural Networks
out[i] = scale * alpha * (exp(in[i])-1) if in[i]>0, or 0 if in[i] <= 0
Uses default scale and alpha values.- Parameters:
x- Input variable (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
selu
public SDVariable selu(String name, SDVariable x)
Element-wise SeLU function - Scaled exponential Lineal Unit: see Self-Normalizing Neural Networks
out[i] = scale * alpha * (exp(in[i])-1) if in[i]>0, or 0 if in[i] <= 0
Uses default scale and alpha values.- Parameters:
name- name May be null. Name for the output variablex- Input variable (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
sigmoid
public SDVariable sigmoid(SDVariable x)
Element-wise sigmoid function: out[i] = 1.0/(1+exp(-in[i]))- Parameters:
x- Input variable (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
sigmoid
public SDVariable sigmoid(String name, SDVariable x)
Element-wise sigmoid function: out[i] = 1.0/(1+exp(-in[i]))- Parameters:
name- name May be null. Name for the output variablex- Input variable (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
sigmoidDerivative
public SDVariable sigmoidDerivative(SDVariable x, SDVariable wrt)
Element-wise sigmoid function derivative: dL/dIn given input and dL/dOut- Parameters:
x- Input Variable (NUMERIC type)wrt- Gradient at the output - dL/dOut. Must have same shape as the input (NUMERIC type)- Returns:
- output Output (gradient at input of sigmoid) (NUMERIC type)
-
sigmoidDerivative
public SDVariable sigmoidDerivative(String name, SDVariable x, SDVariable wrt)
Element-wise sigmoid function derivative: dL/dIn given input and dL/dOut- Parameters:
name- name May be null. Name for the output variablex- Input Variable (NUMERIC type)wrt- Gradient at the output - dL/dOut. Must have same shape as the input (NUMERIC type)- Returns:
- output Output (gradient at input of sigmoid) (NUMERIC type)
-
softmax
public SDVariable softmax(SDVariable x, int dimension)
Softmax activation, along the specified dimension- Parameters:
x- Input (NUMERIC type)dimension- Dimension along which to apply softmax- Returns:
- output Output variable (NUMERIC type)
-
softmax
public SDVariable softmax(String name, SDVariable x, int dimension)
Softmax activation, along the specified dimension- Parameters:
name- name May be null. Name for the output variablex- Input (NUMERIC type)dimension- Dimension along which to apply softmax- Returns:
- output Output variable (NUMERIC type)
-
softmax
public SDVariable softmax(SDVariable x)
Softmax activation, along the specified dimension- Parameters:
x- Input (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
softmax
public SDVariable softmax(String name, SDVariable x)
Softmax activation, along the specified dimension- Parameters:
name- name May be null. Name for the output variablex- Input (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
softmaxDerivative
public SDVariable softmaxDerivative(SDVariable x, SDVariable wrt, int dimension)
Softmax derivative function- Parameters:
x- Softmax input (NUMERIC type)wrt- Gradient at output, dL/dx (NUMERIC type)dimension- Softmax dimension- Returns:
- output (NUMERIC type)
-
softmaxDerivative
public SDVariable softmaxDerivative(String name, SDVariable x, SDVariable wrt, int dimension)
Softmax derivative function- Parameters:
name- name May be null. Name for the output variablex- Softmax input (NUMERIC type)wrt- Gradient at output, dL/dx (NUMERIC type)dimension- Softmax dimension- Returns:
- output (NUMERIC type)
-
softplus
public SDVariable softplus(SDVariable x)
Element-wise softplus function: out = log(exp(x) + 1)- Parameters:
x- Input variable (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
softplus
public SDVariable softplus(String name, SDVariable x)
Element-wise softplus function: out = log(exp(x) + 1)- Parameters:
name- name May be null. Name for the output variablex- Input variable (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
softsign
public SDVariable softsign(SDVariable x)
Element-wise softsign function: out = x / (abs(x) + 1)- Parameters:
x- Input variable (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
softsign
public SDVariable softsign(String name, SDVariable x)
Element-wise softsign function: out = x / (abs(x) + 1)- Parameters:
name- name May be null. Name for the output variablex- Input variable (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
softsignDerivative
public SDVariable softsignDerivative(SDVariable x)
Element-wise derivative (dOut/dIn) of the softsign function softsign(INDArray)- Parameters:
x- Input variable (NUMERIC type)- Returns:
- output Output (NUMERIC type)
-
softsignDerivative
public SDVariable softsignDerivative(String name, SDVariable x)
Element-wise derivative (dOut/dIn) of the softsign function softsign(INDArray)- Parameters:
name- name May be null. Name for the output variablex- Input variable (NUMERIC type)- Returns:
- output Output (NUMERIC type)
-
swish
public SDVariable swish(SDVariable x)
Element-wise "swish" function: out = x * sigmoid(b*x) with b=1.0
See: https://arxiv.org/abs/1710.05941- Parameters:
x- Input variable (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
swish
public SDVariable swish(String name, SDVariable x)
Element-wise "swish" function: out = x * sigmoid(b*x) with b=1.0
See: https://arxiv.org/abs/1710.05941- Parameters:
name- name May be null. Name for the output variablex- Input variable (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
tanh
public SDVariable tanh(SDVariable x)
Elementwise tanh (hyperbolic tangent) operation: out = tanh(x)- Parameters:
x- Input variable (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
tanh
public SDVariable tanh(String name, SDVariable x)
Elementwise tanh (hyperbolic tangent) operation: out = tanh(x)- Parameters:
name- name May be null. Name for the output variablex- Input variable (NUMERIC type)- Returns:
- output Output variable (NUMERIC type)
-
topK
public SDVariable[] topK(SDVariable input, double k, boolean sorted)
Find values and indices for the largest k entries along the last dimension.- Parameters:
input- Input data (NUMERIC type)k- The number of values to returnsorted- Whether to return the values sorted or not
-
topK
public SDVariable[] topK(String[] names, SDVariable input, double k, boolean sorted)
Find values and indices for the largest k entries along the last dimension.- Parameters:
names- names May be null. Arrays of names for the output variables.input- Input data (NUMERIC type)k- The number of values to returnsorted- Whether to return the values sorted or not
-
-