Awesome ill definitely give the windows version a shot, though i'm no programmer so it might be a bit messy. Anyways, having more tools couldn't hurt.
This is roughly what I've got right now, need to clean it up a bit for sure.
View attachment 24380
Definitely gonna add a gui later on, if any of y'all have suggestions, don't hesitate to tell me.
I would suggest looking at what features Donocad's parser has, as well as how he implements them for ideas. His included readme file goes into extensive detail on both how to use the parser, as well as what features are included.
Some examples would be looking for "marks" to identify what lines have dialogue. You may have noticed that all lines with dialogue have a line similar to the following preceding them.
Example mark lines:
Object(root).kotoba = [[\
Object(root)\.otoko = [[\
Object(root)\.kotoba1 = [[\
Object(root)\.kotoba2 = [[\
Object(root)\.kotoba_m = [\
this\.mess_plt\.mess\.text = \
Object(root)\.main\.mess_plt\.mess\.text = \
Object(root)\.kotoba3 = [[\
Object(root)\.kotoba4 = [[\
Object(root)\.o_kotoba = [[\
Object(root)\.kaiwa_naiyou = [[\
Object(root)\.miru_kotoba = [[[\
Object(root)\.kihon_1 = [[[\
Object(root)\.kihon_2 = [[[\
Object(root)\.kihon_3 = [[[\
this\.riyuu\.kotoba\.htmlText = \
this\.odoshi\.kotoba\.htmlText = \
this\.kareshi\.kotoba\.htmlText = \
this\.kitori\.kotoba\.htmlText = \
this\.ending\.kotoba\.htmlText = \
This will significantly reduce the amount of junk texts that are extracted if you only extract lines with marks.
Another nice feature to include would be to compare all the extracted dialogue and identify what lines are clones of each other. Then you could mark one line as a key line, then blank out the lines that are clones of the key line. This would allow the translators to not have to translate the same lines repeatedly. Donocad's parser does this at the "mark" compound line level. It may be useful to do this clone detection at the individual line segment level.
Donocad's parser also removes the non-dialogue parts of the extracted dialogue e.g.(font color, html brackets). These non-dialogue parts are reinserted when the dialogue is merged back into the game files.
There is quite a bit more that Donocad's parser does, but those are the highlights to consider incorporating first.
I have also made a small program that lets you decompress, compress, and make difference patches between SWF files. It uses a simple GUI for selecting what game to work on, and then what features to use on said game. I've made this with the ancient BAT programming language. I call it the JSK Codec. I intend it to be the place to plug in other peoples programs and scripts as added features in the future. JoSmiHnTh is currently working on a python script that will look for Japanese sound effects in the dialogue, and auto translating it to English. It is based on my work to do the same with a program called PowerGREP. It uses this
You must be registered to see the links
as a guide. I've included my JSK codec so you can see how it is structured, and how easy it would be to add onto it.
Here's an
You must be registered to see the links
we've made up. It may be helpful in seeing what we've come up with for our translation pipeline. It's a bit outdated, but still useful as a template.