ColtM1911
Jungle Girl
- Joined
- Aug 31, 2013
- Messages
- 41
- Reputation score
- 42
Hello!
This time I want to tell you about how to make a machine translation for a game about a machine girl.
So it seems this game was built on a custom engine for which no methods of extracting and replacing text were known. As a consequence, our usual translators (deserving the deepest respect) have given up on this game and decided not to waste their efforts.
One evening I decided to take a look at how this thing works and if something can be done with it, and by chance I found something interesting.
Alright, I'm sorry that this is so dry. I probably missed a lot of details and a proper itroduction, but I'm already too tired for it. Wanted to post this for sevaral weeks already.
Good luck, and ask me if you encounter a problem along the way.
This time I want to tell you about how to make a machine translation for a game about a machine girl.
- Game's ULMF thread
-
You must be registered to see the links
-
You must be registered to see the links
So it seems this game was built on a custom engine for which no methods of extracting and replacing text were known. As a consequence, our usual translators (deserving the deepest respect) have given up on this game and decided not to waste their efforts.
One evening I decided to take a look at how this thing works and if something can be done with it, and by chance I found something interesting.
It turned out that the Japanese text was compiled right into the game's executable file.
I found it by calling GNU Strings program on .exe looking through the results.
Interesting thing here, thogh, that found string literals are not raw text encoded in SJIS or Unicode, but ASCII strings containing HEX representation of an UTF-8 string.
For example, string "はい!" is stored in game's binary as "E381AFE38184EFBC81" (18 bytes).
That's a peculiar and quite ineffective way to store text, if you ask me.
So I just grabbed a HEX editor and replaced this literal with "596573212020202020" ("Yes!" in UTF-8 with space padding and HEX encoding) and it worked just perfectly.
Further action plan was clear.
I wrote a program that called into GNU Strings and parsed its output, filtered out everything that wasn't HEX-encoded, decoded remaining strings and stored them in a spreadsheet file in a column, one decoded string per cell.
Then I used Google Translate to automatically translate this text and made a spreadsheet with original and translated text side by side.
Then I added a mode to my tool to patch the game's executable file by replacing HEX-encoded original japanese text with encoded English text read from translated spreadsheet file.
Thus I got myself a machine-translated version of the game. I played it, and it works very well, with some limitations.
I found it by calling GNU Strings program on .exe looking through the results.
Interesting thing here, thogh, that found string literals are not raw text encoded in SJIS or Unicode, but ASCII strings containing HEX representation of an UTF-8 string.
For example, string "はい!" is stored in game's binary as "E381AFE38184EFBC81" (18 bytes).
That's a peculiar and quite ineffective way to store text, if you ask me.
So I just grabbed a HEX editor and replaced this literal with "596573212020202020" ("Yes!" in UTF-8 with space padding and HEX encoding) and it worked just perfectly.
Further action plan was clear.
I wrote a program that called into GNU Strings and parsed its output, filtered out everything that wasn't HEX-encoded, decoded remaining strings and stored them in a spreadsheet file in a column, one decoded string per cell.
Then I used Google Translate to automatically translate this text and made a spreadsheet with original and translated text side by side.
Then I added a mode to my tool to patch the game's executable file by replacing HEX-encoded original japanese text with encoded English text read from translated spreadsheet file.
Thus I got myself a machine-translated version of the game. I played it, and it works very well, with some limitations.
1. HEX-encoded literal of translated text cannot be longer than literal of original text.
We're replacing parts of a big binary blob, after all. The tool checks for that automatically
and truncates or pads translated text with spaces to make literals of the same size. It should never corrupt the game's executable.
2. Too long lines of text are drawn outside the game window and cannot be read.
This can be detected automatically by the tool in 'check' mode and then fixed manually. I also want to add 'autofix' mode for this and other issues to the tool.
3. Newline special character ('\n') works reliably only if it's zero-index in string divisible by 3.
I think this is due to the fact that in the source text, each character (hieroglyph) is usually encoded in three bytes. The game's engine don't expect to encounter a newline inside a three-byte character block.
The tool produces warnings about this issue in 'check' mode and autofixes it in 'apply' mode.
We're replacing parts of a big binary blob, after all. The tool checks for that automatically
and truncates or pads translated text with spaces to make literals of the same size. It should never corrupt the game's executable.
2. Too long lines of text are drawn outside the game window and cannot be read.
This can be detected automatically by the tool in 'check' mode and then fixed manually. I also want to add 'autofix' mode for this and other issues to the tool.
3. Newline special character ('\n') works reliably only if it's zero-index in string divisible by 3.
I think this is due to the fact that in the source text, each character (hieroglyph) is usually encoded in three bytes. The game's engine don't expect to encounter a newline inside a three-byte character block.
The tool produces warnings about this issue in 'check' mode and autofixes it in 'apply' mode.
It a relatively small program written in Kotlin/JVM.
It uses command-line interface.
It's likely that this tool works only on Linux, as I didn't test it on anything else.
Anyway the tool's archive contains platform-specific scripts to help you launch it.
You need a Java runtime of at least version 8 to run it (JRE 8).
You need program GNU Strings installed and in your PATH to extract text and apply patches using this tool.
It uses command-line interface.
It's likely that this tool works only on Linux, as I didn't test it on anything else.
Anyway the tool's archive contains platform-specific scripts to help you launch it.
You need a Java runtime of at least version 8 to run it (JRE 8).
You need program GNU Strings installed and in your PATH to extract text and apply patches using this tool.
Great! That's why I spent many hours to make HexStrings tool and write this post.
1. Install GNU Strings
- just google how to do it on your OS
2. Install Java Runtime Environment (JRE) of version at least 8
- Java 8 is considered very old, you should have no problem with it
- just google how to do it on your OS
- JDK is fine too
- make sure that 'java' is in your PATH
3. Download HexStrings.zip extract anywhere
4. Use scripts HexStrings (Linux/MacOS) or HexStrings.bat (Windows) to launch it
5. Rename game's original executable to Game.exe.
Also it's a good idea to make a backup, just in case
6. Extract text with
7. Translate .ods by putting translated text in the second column of the spreadsheet.
You can use Google Translate to get initial rough translation.
The tool only uses first and second columns of the spreadsheet, you can use other space for your own
purposes.
You can try to use specialized tools like Translator++. Write me and I'll add .csv export mode.
8. To check your translation for potential issues use
9. To apply your translation patch use
10. To remind yourself of how to use the tool use --help or -h flag in any mode of without mode at all:
1. Install GNU Strings
- just google how to do it on your OS
2. Install Java Runtime Environment (JRE) of version at least 8
- Java 8 is considered very old, you should have no problem with it
- just google how to do it on your OS
- JDK is fine too
- make sure that 'java' is in your PATH
3. Download HexStrings.zip extract anywhere
4. Use scripts HexStrings (Linux/MacOS) or HexStrings.bat (Windows) to launch it
5. Rename game's original executable to Game.exe.
Also it's a good idea to make a backup, just in case
6. Extract text with
Bash:
HexStrings extract -o translation_patch.ods Game.exe
You can use Google Translate to get initial rough translation.
The tool only uses first and second columns of the spreadsheet, you can use other space for your own
purposes.
You can try to use specialized tools like Translator++. Write me and I'll add .csv export mode.
8. To check your translation for potential issues use
Bash:
HexStrings check translation_patch.ods > issues.txt
Bash:
HexStrings apply -p translation_patch.ods -o TranslatedGame.exe Game.exe
Bash:
HexStrings -h
Ok.
1. Note that the translation is not finished.
Currently it's an unedited MTL of low quality (too long unreadable lines everywhere).
2. Download "TranslatedGame.exe" (TranslatedGameExecutable.zip) and "text.ods" (unfinished MTL translation patch.zip)
3. Put the .exe in your game's folder and run it. No need to replace anything or delete game's own executable.
4. Accept that it may not work if you and I have different versions of the game.
5. When you encounter a too long line and desperately want to know what is being said, open "text.ods" using MS Excel or free and open source LibreOffice. Using Ctrl+F find this line by its visible part in the spreadsheet's second column. And do it quickly, as the game will show you next line of text if you stay idle for several seconds. But don't panic, as usually lines that are parts of the same dialogue are tightly grouped in the "text.ods" (and in binary too).
Actually, that's how I played
Note that .ods file is not needed for translation to work, I only posted it for your convenience.
1. Note that the translation is not finished.
Currently it's an unedited MTL of low quality (too long unreadable lines everywhere).
2. Download "TranslatedGame.exe" (TranslatedGameExecutable.zip) and "text.ods" (unfinished MTL translation patch.zip)
3. Put the .exe in your game's folder and run it. No need to replace anything or delete game's own executable.
4. Accept that it may not work if you and I have different versions of the game.
5. When you encounter a too long line and desperately want to know what is being said, open "text.ods" using MS Excel or free and open source LibreOffice. Using Ctrl+F find this line by its visible part in the spreadsheet's second column. And do it quickly, as the game will show you next line of text if you stay idle for several seconds. But don't panic, as usually lines that are parts of the same dialogue are tightly grouped in the "text.ods" (and in binary too).
Actually, that's how I played
Note that .ods file is not needed for translation to work, I only posted it for your convenience.
Alright, I'm sorry that this is so dry. I probably missed a lot of details and a proper itroduction, but I'm already too tired for it. Wanted to post this for sevaral weeks already.
Good luck, and ask me if you encounter a problem along the way.
Attachments
-
2.1 MB Views: 50
-
65.4 KB Views: 36
-
85.1 KB Views: 144
-
1.6 MB Views: 205