Neural Machine Translation Explained

Understanding machine translation of legal texts

Why use LexMachina?

Quickly translate emails or letters

Translate a draft statement of claim for your foreign client’s approval

Translate a contract or GTCs

Translate documents for a foreign authority

Understand a court decision

Machine translation is the optimal solution when time is of the essence.

The preferred way to use LexMachina will depend on the purpose of the translation:

Is a general understanding of the document sufficient or will the translation be legally binding? Does your arbitrator accept purely machine translated exhibits or does the court additionally require a certification for each translation? Does a light review suffice or is a full review essential? Or does the project complexity require the involvement of a specialized legal translation agency?

Don’t hesitate to ask us for personal advice on how to best use our services.

Why human review?

We have compiled a list of the limitations of pure machine translation, along with explanations of why these problems arise and a brief glimpse into how they can be resolved.

Terminology error: Creation of an unfortunate neologism

When terms absent from (or sparsely featured in) the training data are presented for translation to the machine, the machine translates with the most likely candidate…

Explanation:

When terms absent from (or sparsely featured in) the training data are presented for translation to the machine, the machine translates with the most likely candidate sub-word(s) for a given context. For example: “Two unicorns are hiding in the forest“ then translates as ”Deux unicornes se cachent dans la forêt“ because the word “licorne” was not included in the legal texts used to train the machine translation engine.

The creation of neologisms is also a consequence of the machine not recognizing that a given term is a proper noun, a non-translatable. Learn more about this issue, click here.

Solution(s):

Whether it be an un-translatable proper noun or a term with context-dependent meaning, we can manually add any machine-unfamiliar terminology to the training data, rendering it recognizable in following trainings. In the meantime, we can add it to our terminology list. Help us resolve these issues by submitting your suggestions via the “feedback” button in the machine translation interface , or send us feedback via email.

Very soon LexMachina users will be able to create their personal glossaries and thereby directly improve their NMT results. To stay informed on our terminology integration tool, sign up for our newsletter.

Terminology error: Wrong term in a given context and/or lack of consistency

Explanation:

Depending on the context, an “associé” can be a “partner”, a “member”, a “quotaholder”, etc. Despite NMT taking context into account to some extent, translations are ordinarily done sentence by sentence, meaning the machine can only infer the context from the sentence itself, rather than from the discourse (at the document-level). Accordingly, it chooses the term most likely to occur in a given sentential context, based on the frequency and co-occurrence statistics it has collected. This can sometimes lead to inconsistencies within the same document.

Solution(s):

Our specialized sub-engines seek to solve this problem: the more specialized the machine, the higher quality the translations. We are consistently developing sub-engines to suit the needs of Swiss lawyers. Furthermore, our terminology integration tool will allow each user to create their own glossary and to directly influence the NMT results. Sign up for our newsletter to stay informed on the launch of our newest sub-engines and our terminology integration tool.

Those who have opted for our full translator toolbox can easily filter the sentences containing the term in question, check the consistency of the translation, and make corrections by “searching and replacing” unwanted terms. Ask for our user guide for more information on this

Terminology error: Proper noun that should not have been translated

Depending on grammatical conventions of the target or source language, machine translation engines are not always able to identify words that should not be translated…

Explanation:

Depending on grammatical conventions of the target or source language, machine translation engines are not always able to identify words that should not be translated (first and last names, company names, city names, trademarks, etc.). See “creation of an unfortunate neologism” and “wrong term in context” for more on this.

Solution(s):

Let us know if you come across this issue: Click on the “feedback” button in the translation interface or send us feedback via email. We will add the term to our list of non-translatables. The engine will then always retain the term in its original form.

Grammatical error: Wrong translation of a pronoun, use of the masculine instead of the feminine, inconsistency in bullet lists, etc.

For the time being, Neural Machine Translation (NMT) engines generally process text sentence by sentence. Without integration of proper coreference resolution tools…

Explanation:

For the time being, Neural Machine Translation (NMT) engines generally process text sentence by sentence. Without integration of proper coreference resolution tools, the algorithm therefore ignores the content of the previous sentence and, for example, translates pronouns without knowing their referent (“Der Salat ist frisch. Er ist sehr gesund.” will thus be translated as “ The lettuce is fresh. He’s very healthy.”). This problem can also occur with bullet lists, since each bullet point will be translated separately.

This difficulty arises less often when the text to be translated is a common document (e.g. the articles of association of a Ltd) or typical of a particular field (e.g. tax law), as similar phrases will be familiar to the engine and the variability will indeed be less pronounced.

Solution(s):

The technology to solve the issue actually already exists. The main problem is that the professional software (Computer Assisted Translation or CAT tools) currently being used is not compatible with NMT systems translating at the paragraph or document level. This will certainly change over the next few years and LexMachina users can expect to be among the first to benefit from this upcoming technological advancement.

Meanwhile, there are several solutions to address the problem described above, including (i) avoiding bullet lists and ambiguities in the source text, and (ii) using our specialized sub-engines. Sign up to our newsletter to be informed about the launch of the next sub-engines.

Wrong meaning: Misinterpretation of a sentence

Neural Machine Translation (NMT) engines are not always able to correctly render the meaning of the source text. Such errors are often due to complex or ambiguous syntax…

Explanation:

Neural Machine Translation (NMT) engines are not always able to correctly render the meaning of the source text. Such errors are often due to complex or ambiguous syntax structures in the source text, as well as particularly long sentences. The other cases are referred to as the “mysteries of deep learning”: if an adjective or other modifier is added in the source sentence, the translation can change considerably, sometimes for unknown reasons. Thankfully it is an issue that is being heavily researched in academia and other research institutions. If you would like to be informed of the latest discoveries in this area, subscribe to our newsletter.

Solution(s):

It is often sufficient to slightly alter the source sentence in order for the translation error to disappear. It is also recommended that the source text be made as simple as possible: the ideal sentence should be no longer than 128 characters and free of ambiguities.

This type of error also highlights the fact that, despite the considerable progress made by NMT in recent years (including in the field of legal translation), verification of the result by a lawyer or lawyer-linguist is still mandatory if a translation is legally binding or relevant in court proceedings. In such cases, we recommend you order our full review services.

Formatting error: Incorrect bolding, italicization, etc.

The machine translation engine tracks the text formatting (bold, italics, underlining, etc.) in the source text and applies it to terms in the target text. However, since the…

Explanation:

The machine translation engine tracks the text formatting (bold, italics, underlining, etc.) in the source text and applies it to terms in the target text. However, since the machine does not translate word by word, but rather sentence by sentence, formatting may sometimes be applied to the wrong term.

Solution(s):

Currently there is no concrete solution to prevent machine translation engines from making errors of this nature. Fortunately, these errors can be corrected very easily and quickly, in particular as part of a light or full review by a specialized translator

Segmentation error: Unrecognized abbreviations and incorrect sentence segmentation

In order for NMT to work, the text to be translated must be properly separated into sentences (i.e. “segmented”) before being submitted to the engine for translation…

Explanation:

In order for NMT to work, the text to be translated must be properly separated into sentences (i.e. “segmented”) before being submitted to the engine for translation.

If sentences are incorrectly segmented during pre-processing, the translation will be poor, and at times even unintelligible. For example, if the phrase “Pursuant to art. 13, the creditor is granted a 30 day period to exercise his rights.” is cut after “art.” because of the stop used to abbreviate the word “article”, the machine will first translate “Pursuant to art.” and then, regardless of the first sentence, “13, the creditor is granted a 30 day period to exercise his rights.”, which will lead to an unusable result.

Solution(s):

If this is encountered, please tell us about the problematic abbreviation by clicking on the “edit” button in the translation mask or by sending us feedback via email. We will add this abbreviation to LexMachina’s list of “segmentation rules” so that it won’t make the same mistake again. In the meantime, you can retranslate the sentence by simply putting it in brackets, as the machine is configured so as not to segment within brackets.

Omission or non-translation of a term or part of a sentence: Words forgotten or kept in their original language

Sometimes Neural Machine Translation (NMT) engines forget words or parts of sentences (or even whole sentences), and sometimes they leave a term untranslated…

Explanation:

Sometimes Neural Machine Translation (NMT) engines forget words or parts of sentences (or even whole sentences), and sometimes they leave a term untranslated. Such problems are usually due to the length of the sentences or the layout of the source document: superfluous tabs or breaks, optional hyphens, text fields in Word, etc.

Solution(s):

The only way to avoid this type of problem is either to reformulate the text with shorter sentences or to change the initial layout (in this respect, do not hesitate to ask for our “tips and tricks”. If you notice a recurring formatting problem, we can solve this by creating a corresponding pre-editing rule to be applied directly by the machine. Let us know by clicking on the “feedback” button in the online translator or send us feedback via email.

Alternatively, you can order our review services to have these errors corrected by specialized translators.

“Creative” retranslation of legislation: Non-compliance with the official translation of laws or ordinances

Independent of the corpus with which a machine translation engine has been trained, its intrinsic function requires that it always translate texts according to its…

Explanation:

Independent of the corpus with which a machine translation engine has been trained, its intrinsic function requires that it always translate texts according to its own algorithms. That is to say, even if LexMachina has been “fed” with the entire Swiss code of law, it will not reproduce the official translation itself but rather automatically propose its own version.

Solution(s):

This problem can be overcome by feeding the machine a list of sentences that have a fixed translation. We are constantly working on improving this list. Please send us your suggestions by clicking on the “feedback” button in the online translator or via email.

Other (major) issues:

While we are able to address the above-mentioned issues directly, others may arise that require further technological breakthroughs before…

For cases that have escaped our notice:

Despite constant progress, machine translation is far from perfect and may produce errors that a human translator would never have made.

While we are able to address the above-mentioned issues directly, others may arise that require further technological breakthroughs before they can be solved. We are closely monitoring developments along these lines, and we will inform you of all new state-of-the-art solutions LexMachina is implementing. To be among the first to be informed, subscribe to our newsletter.

In the meantime, we are always very interested to hear about any major problem that may arise. Don’t hesitate to contact us: we will carefully analyze your issue and do our best to resolve it.

Review by a legal translator

All of this being said, if you need help to improve the results of the machine translation, our professional legal translators are at your disposal.

See our prices