Class Float16Utils

java.lang.Object
io.milvus.common.utils.Float16Utils

public class Float16Utils extends Object
  • Constructor Details

    • Float16Utils

      public Float16Utils()
  • Method Details

    • floatToBf16

      public static short floatToBf16(float input)
      Converts a float32 into bf16. May not produce correct values for subnormal floats. This method is copied from microsoft ONNX Runtime: https://github.com/microsoft/onnxruntime/blob/main/java/src/main/jvm/ai/onnxruntime/platform/Fp16Conversions.java
      Parameters:
      input - a standard float32 value which will be converted to a bfloat16 value
      Returns:
      a short value to store the bfloat16 value
    • bf16ToFloat

      public static float bf16ToFloat(short input)
      Upcasts a bf16 value stored in a short into a float32 value. This method is copied from microsoft ONNX Runtime: https://github.com/microsoft/onnxruntime/blob/main/java/src/main/jvm/ai/onnxruntime/platform/Fp16Conversions.java
      Parameters:
      input - a bfloat16 value which will be converted to a float32 value
      Returns:
      a float32 value converted from a bfloat16
    • floatToFp16

      public static short floatToFp16(float input)
      Rounds a float32 value to a fp16 stored in a short. This method is copied from microsoft ONNX Runtime: https://github.com/microsoft/onnxruntime/blob/main/java/src/main/jvm/ai/onnxruntime/platform/Fp16Conversions.java
      Parameters:
      input - a standard float32 value which will be converted to a float16 value
      Returns:
      a short value to store the float16 value
    • fp16ToFloat

      public static float fp16ToFloat(short input)
      Upcasts a fp16 value stored in a short to a float32 value. This method is copied from microsoft ONNX Runtime: https://github.com/microsoft/onnxruntime/blob/main/java/src/main/jvm/ai/onnxruntime/platform/Fp16Conversions.java
      Parameters:
      input - a float16 value which will be converted to a float32 value
      Returns:
      a float32 value converted from a float16 value
    • f32VectorToBf16Buffer

      public static ByteBuffer f32VectorToBf16Buffer(List<Float> vector)
      Rounds a float32 vector to bf16 values, and stores into a ByteBuffer.
      Parameters:
      vector - a float32 vector
      Returns:
      ByteBuffer the vector is converted to bfloat16 values and stored into a ByteBuffer
    • fp16BufferToVector

      public static List<Float> fp16BufferToVector(ByteBuffer buf)
      Converts a ByteBuffer to fp16 vector upcasts to float32 array.
      Parameters:
      buf - a buffer to store a float16 vector
      Returns:
      List of Float a float32 vector
    • f32VectorToFp16Buffer

      public static ByteBuffer f32VectorToFp16Buffer(List<Float> vector)
      Rounds a float32 vector to fp16 values, and stores into a ByteBuffer.
      Parameters:
      vector - a float32 vector
      Returns:
      ByteBuffer the vector is converted to float16 values and stored in a ByteBuffer
    • bf16BufferToVector

      public static List<Float> bf16BufferToVector(ByteBuffer buf)
      Converts a ByteBuffer to bf16 vector upcasts to float32 array.
      Parameters:
      buf - a buffer to store a bfloat16 vector
      Returns:
      List of Float the vector is converted to float32 values
    • f16VectorToBuffer

      public static ByteBuffer f16VectorToBuffer(List<Short> vector)
      Stores a fp16/bf16 vector into a ByteBuffer.
      Parameters:
      vector - a float16 vector stored in a list of Short
      Returns:
      ByteBuffer a buffer to store the float16 vector
    • bufferToF16Vector

      public static List<Short> bufferToF16Vector(ByteBuffer buf)
      Converts a ByteBuffer to a fp16/bf16 vector stored in short array.
      Parameters:
      buf - a buffer to store a float16 vector
      Returns:
      List of Short the vector is converted to a list of Short, each Short value is a float16 value