Class Utf8

    • Method Detail

      • encodedLength

        public static int encodedLength​(CharSequence sequence)
        Returns the number of bytes in the UTF-8-encoded form of sequence. For a string, this method is equivalent to string.getBytes(UTF_8).length, but is more efficient in both time and space.
        Throws:
        IllegalArgumentException - if sequence contains ill-formed UTF-16 (unpaired surrogates)
      • isWellFormed

        public static boolean isWellFormed​(byte[] bytes)
        Returns true if bytes is a well-formed UTF-8 byte sequence according to Unicode 6.0. Note that this is a stronger criterion than simply whether the bytes can be decoded. For example, some versions of the JDK decoder will accept "non-shortest form" byte sequences, but encoding never reproduces these. Such byte sequences are not considered well-formed.

        This method returns true if and only if Arrays.equals(bytes, new String(bytes, UTF_8).getBytes(UTF_8)) does, but is more efficient in both time and space.

      • isWellFormed

        public static boolean isWellFormed​(byte[] bytes,
                                           int off,
                                           int len)
        Returns whether the given byte array slice is a well-formed UTF-8 byte sequence, as defined by isWellFormed(byte[]). Note that this can be false even when isWellFormed(bytes) is true.
        Parameters:
        bytes - the input buffer
        off - the offset in the buffer of the first byte to read
        len - the number of bytes to read from the buffer