首先看一下下面的程序(测试英文和中文在Unicode、UTF-8、UTF-16这三种编码下,一个字符占几个字节)
- System.out.println("a(Unicode) :" + "a".getBytes("Unicode").length);
- System.out.println("a(Unicode) :" + "aa".getBytes("Unicode").length);
- System.out.println("啊(Unicode) :" + "啊".getBytes("Unicode").length);
- System.out.println("啊啊(Unicode) :" + "啊啊".getBytes("Unicode").length);
- System.out.println("");
- System.out.println("a(UTF-8) :" + "a".getBytes("UTF-8").length);
- System.out.println("aa(UTF-8) :" + "aa".getBytes("UTF-8").length);
- System.out.println("啊(UTF-8) :" + "啊".getBytes("UTF-8").length);
- System.out.println("啊啊(UTF-8) :" + "啊啊".getBytes("UTF-8").length);
- System.out.println("");
- System.out.println("a(UTF-16) :" + "a".getBytes("UTF-16").length);
- System.out.println("aa(UTF-16) :" + "aa".getBytes("UTF-16").length);
- System.out.println("啊(UTF-16) :" + "啊".getBytes("UTF-16").length);
- System.out.println("啊啊(UTF-16) :" + "啊啊".getBytes("UTF-16").length);
运行结果如下:
a(Unicode) :4
a(Unicode) :6
啊(Unicode) :4
啊啊(Unicode) :6
a(UTF-8) &nbs