История изменений
Исправление LightDiver, (текущая версия) :
Я тут поэкспериментировал на раст. Если вручную обрабатывать, то там разница практически незаметна:
--- Test: ASCII ---
Read UTF-8 directly: 2.136ms
Convert with encoding_rs: 2.622ms
Fast byte-level conversion: 6.482ms
Convert with iconv: 9.450ms
--- Test: Cyrillic ---
Read UTF-8 directly: 2.906ms
Convert with encoding_rs: 35.093ms
Fast byte-level conversion: 2.892ms
Convert with iconv: 7.831ms
--- Test: Mixed ---
Read UTF-8 directly: 1.368ms
Convert with encoding_rs: 17.427ms
Fast byte-level conversion: 2.782ms
iconv warning: "iconv: недопустимая входная последовательность в позиции 12240\n"
Convert with iconv: 819.765µs
--- Test: Emoji ---
Read UTF-8 directly: 2.512ms
Convert with encoding_rs: 71.829ms
Fast byte-level conversion: 7.765ms
iconv warning: "iconv: недопустимая входная последовательность в позиции 30192\n"
Convert with iconv: 976.654µs
--- Test: Digits ---
Read UTF-8 directly: 146.420µs
Convert with encoding_rs: 232.213µs
Fast byte-level conversion: 1.578ms
Convert with iconv: 4.277ms
--- Test: Punctuation ---
Read UTF-8 directly: 126.472µs
Convert with encoding_rs: 218.345µs
Fast byte-level conversion: 1.495ms
Convert with iconv: 4.230ms
--- Test: Long words ---
Read UTF-8 directly: 301.526µs
Convert with encoding_rs: 469.201µs
Fast byte-level conversion: 2.761ms
Convert with iconv: 7.348ms
--- Test: All together ---
Read UTF-8 directly: 2.972ms
Convert with encoding_rs: 43.406ms
Fast byte-level conversion: 6.322ms
iconv warning: "iconv: недопустимая входная последовательность в позиции 12528\n"
Convert with iconv: 891.437µs
--- Test: Empty ---
Read UTF-8 directly: 5.047µs
Convert with encoding_rs: 3.565µs
Fast byte-level conversion: 2.996µs
Convert with iconv: 566.670µs
--- Test: Very long line ---
Read UTF-8 directly: 99.324µs
Convert with encoding_rs: 210.339µs
Fast byte-level conversion: 1.384ms
Convert with iconv: 3.956ms
«В разы», да - но не в 260 раз уже.
Исходная версия LightDiver, :
Я тут поэкспериментировал на раст. Если вручную обрабатывать, то там разница практически незаметна:
--- Test: ASCII ---
Read UTF-8 directly: 2.136ms
Convert with encoding_rs: 2.622ms
Fast byte-level conversion: 6.482ms
Convert with iconv: 9.450ms
--- Test: Cyrillic ---
Read UTF-8 directly: 2.906ms
Convert with encoding_rs: 35.093ms
Fast byte-level conversion: 2.892ms
Convert with iconv: 7.831ms
--- Test: Mixed ---
Read UTF-8 directly: 1.368ms
Convert with encoding_rs: 17.427ms
Fast byte-level conversion: 2.782ms
iconv warning: "iconv: недопустимая входная последовательность в позиции 12240\n"
Convert with iconv: 819.765µs
--- Test: Emoji ---
Read UTF-8 directly: 2.512ms
Convert with encoding_rs: 71.829ms
Fast byte-level conversion: 7.765ms
iconv warning: "iconv: недопустимая входная последовательность в позиции 30192\n"
Convert with iconv: 976.654µs
--- Test: Digits ---
Read UTF-8 directly: 146.420µs
Convert with encoding_rs: 232.213µs
Fast byte-level conversion: 1.578ms
Convert with iconv: 4.277ms
--- Test: Punctuation ---
Read UTF-8 directly: 126.472µs
Convert with encoding_rs: 218.345µs
Fast byte-level conversion: 1.495ms
Convert with iconv: 4.230ms
--- Test: Long words ---
Read UTF-8 directly: 301.526µs
Convert with encoding_rs: 469.201µs
Fast byte-level conversion: 2.761ms
Convert with iconv: 7.348ms
--- Test: All together ---
Read UTF-8 directly: 2.972ms
Convert with encoding_rs: 43.406ms
Fast byte-level conversion: 6.322ms
iconv warning: "iconv: недопустимая входная последовательность в позиции 12528\n"
Convert with iconv: 891.437µs
--- Test: Empty ---
Read UTF-8 directly: 5.047µs
Convert with encoding_rs: 3.565µs
Fast byte-level conversion: 2.996µs
Convert with iconv: 566.670µs
--- Test: Very long line ---
Read UTF-8 directly: 99.324µs
Convert with encoding_rs: 210.339µs
Fast byte-level conversion: 1.384ms
Convert with iconv: 3.956ms