I can provide Translations for my Space
Summary: This is the guiding Epic for the Localization feature, requested by Greater Than Games.
Design
Note: the following is the design for how this will work internally. The UI is likely to be quite different, but the main focus of the internal design is to make this work efficiently, with a bare minimum of impact on the normal performance of the system.
Note that this design is also explicitly designed for crowdsourcing of translations. This is likely to be a common use case for Querki, so we should bake it right in from the start.
Data Model
The central notion here is that we introduce a new Translations Model, built into Querki. Each Instance represents the Translations for a single Thing. It has two predefined Properties:
- The Thing being translated.
- The Language being translated into.
More importantly, though, the Translation Instance may also have any or all of the Properties of the Thing it is a translation of. It is essentially a partial version of that Thing, with translations of whichever Properties have been done.
Runtime
When we are loading up a Space, we scan for all children of the Translation Model, and build them into a table, sorted first by Language, and then indexed by Thing. Note that this table is built at the Space level; it might get trimmed when we are trimming the Things for Read purposes, but we might not bother.
If the current User is not using the native Language of this Space, and there exist translations to the User's preferred language, then in the UserSpaceSession image of the State, we maintain a punchlist of which Things have been "translated" so far. What this means is, when we access any QLText Property on a Thing, if it hasn't been translated, then we check whether there exists a translation into the user's preferred language. If so, all Properties from that Translation get copied over into the Thing. Regardless, we record that this Thing has been translated.
This design is a compromise, but is designed for speed. We only need to translate each Thing once, and only if it is actually accessed in this session.
Languages
Querki will predefine a Language Model, and a number of common Languages. We won't necessarily worry about being complete yet, but we'll hit many of the high points.
The Language Model should be intentionally extensible: Spaces are explicitly allowed to define Languages outside the standard set. This allows them to hit more-obscure Languages if they so choose, as well as adding ephemera such as Klingon and Elvish if desired.
The User's language should be defined by the User. There should be a Querki-wide setting for preferred default language eventually, but that is lower-priority. More importantly, the User can choose which language they wish to use, of the Translations that exist in this Space.
Note that all of this needs to work with Anonymous! That is, as an Anonymous reader, I should be able to read in my preferred language. This implies that we cannot simply manage all of this in UserSpaceSession.
We might want to allow multi-level settings -- that I would prefer, say, French first, Spanish second, and English third. That's a bit more complex, and should be a later goal, but there's no reason we can't do it, and it would probably be valuable in some situations.
The Space should be able to define which is the "base" Language for this Space. It's reasonable for English to be the default, but it must not be hard-coded. Also, as Eric points out, we should in principle allow for the possibility that different Things in the Space might have different original Languages -- this is probably an unusual use case, but it will come up in at least some crowdsourcing use cases. The implication is that Language is a Property that can be set and handled at the usual levels (Space, Model, Thing). Only if it is not set at any level does it fall back to English.
Translations and Permissions
We will add a new Can Translate permission, governing access to the Translation UI and the right to create and edit Translations. This is considered a moderate low permission, generally lower than Can Edit.
There might be a new standard Role for Translator, but that might want to be a "mix-in" Role, rather than being part of the standard hierarchy.
Probably everyone with Editor Role or above should get Can Translate automatically.
Dealing with QL in Translations
Very important: Translations are dangerous!
Consider -- the translator is writing alternate QLText. We will probably need to allow this to include QL expressions, not just plain QText, if we want to allow this to work for more complex situations. But the Translator is not necessarily (or even usually) someone with Edit permissions!
This is probably one of the hardest parts of this project, I think. We will need to at least syntactically analyze all QL in the "main" version of the text, and only permit matching QL in the translations. Ideally, attempts to add invalid QL should be logged and reported (since it's a potential hacking attempt), although that's probably a later story.
If the Translator can Can Edit, then we can probably not worry about this, since they are allowed to create QL expressions anyway.
Tags
Tags are going to be a nuisance. On the one hand, we clearly want to be able to translate them. On the other hand, there isn't much there there -- there isn't a Thing to be translated. So they will need a separate mechanism.
OTOH, Tags are pretty simplistic. So it seems like we can probably get away with a new Tag Translations Model, with one Instance per Language. This contains nothing more than a map, from the real Tag name to the translated one for this Language.
When we are rendering a Tag, we should display it with the translation, if there is one, but the link should be via the real name. This is the tricky bit: it violates an assumption laced through the code, so there will be a lot of debugging to get this part entirely right.
Other Localizations
There is more to localization than just text: we will eventually need to deal with number formats, right-to-left languages, and lots of other fun like that. These should all become stories, but we won't tackle them right off the bat.
UI Considerations
To be designed, but workflow is paramount here. We ideally want a very compact workflow, that allows a translator to very efficiently blow through a lot of translations, in some sort of order that makes conceptual sense. They should be able to easily see the original Property value, the rendered result, a translation editor, and their rendered translation.
The editor for Tags should, likely, show simple tables, grouped by Property, of the Tags that exist for that Property and matching editor fields for their translations.
Action Plan
This is a rough outline of the order of events, to be turned into stories and further refined. All of this should be getting tested as it is built.
- Create a Language Model. This should include at least the (optional) two-letter code, language name in English and language name in its own language.
- Fill in a few built-in Languages, but we don't need the full list immediately.
- Create the Base Language Property: if set on a Thing, Model or Space, it defines the base Language that is being used for its Properties.
- Introduce a bit of Client-to-Server metadata, giving the browser user's preferred Languages. (Note that this is a list, in order of preference.)
- Create the Translations Model.
- Lazily initialize the Translations Table on-demand: when someone wants to read in a Language other than the Base Language of the Thing, we look up all the Translations, and turn them into a quick-to-look-up Map.