unicode: reduce the size of utf8data[]

Remove the Hangul decompositions from the utf8data trie, and do algorithmic decomposition to calculate them on the fly. To store the decomposition the caller of utf8lookup()/utf8nlookup() must provide a 12-byte buffer, which is used to synthesize a leaf with the decomposition. This significantly reduces the size of the utf8data[] array. Changes made by Gabriel: Rebase to mainline Fix checkpatch errors Extract robustness fixes and merge back to original mkutf8data.c patch Regenerate utf8data.h Signed-off-by: Olaf Weber <olaf@sgi.com> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-07-23 12:43:29 +02:00 · 2019-04-25 13:49:18 -04:00
parent 44594c2fbf
commit a8384c6879
5 changed files with 3296 additions and 12685 deletions
--- a/fs/unicode/README.utf8data
+++ b/fs/unicode/README.utf8data
@@ -46,8 +46,8 @@ cd to this directory (fs/unicode) and run this command:
 	make C=../.. objdir=../.. utf8data.h.new

 After sanity checking the newly generated utf8data.h.new file (the
-version generated from the 11.0.0 UCD should be 13,834 lines long, and
-have a total size of 1104k) and/or comparing it with the older version
+version generated from the 11.0.0 UCD should be 4,061 lines long, and
+have a total size of 320k) and/or comparing it with the older version
 of utf8data.h, rename it to utf8data.h.

 If you are a kernel developer updating to a newer version of the