Nojeok Hill: My View from the Top

Korean Translation Tip: Handle Korean Line Breaks Like a Pro

With the robust multilingual support in Adobe Indesign and recent versions of other design packages, many clients are opting to handle Korean layout in-house.

Unfortunately, people with absolutely no knowledge of Korean can really butcher a layout job.

My Korean Translation Tips have addressed some of the most egregious mistakes and easy-to-fix issues, including Tip #16 (Cardinal Rules of Layout), #26 (Korean Font Differences), #31 (PowerPoint Tips), #29 (Spacing Issues in Word) and #32 (More Font Handling).

But line breaks are also a point of concern.

Suppose you've got this source text:

2015-12-17_9-55-44

And your Korean translation team delivers this fill-in-the-blank translation of it in Word:

2015-12-17_10-20-33

Don't lay it out into your design program like this:

2015-12-17_9-57-23

Or like this:

2015-12-17_10-00-52

Or even like this:

2015-12-17_10-03-30

These are not uncommon issues; they happen all the time, especially when Korean text is mixed with punctuation and English.

Korean Translation Tip, Part I – Hire us to do the most professional layout for you, or at least have us do an in-context proof of the text after you do the layout.

Korean Translation Tip, Part II – If you ignore the first half of this tip, be sure after layout to check all lines that start or end with punctuation and/or English to verify that the text matches the way the translation was delivered to you.

** BONUS – Do you see above that there are four fill-in-the-blank lines in both the source English and translated Korean? They aren't in the same sequence in the two languages! Want to know why? Check out this article and you'll understand: Tip #34 (Why You Can't Translate Phrase-by-Phrase Between English and Korean)

Korean Translation Tip: Why You Can’t Translate Phrase-by-Phrase Between English and Korean, Part II

Last month I posted a short video illustrating how you can't mix-and-match sentence fragments to make proper sentences between English and Korean.

A few people pointed out that this often works with English and Western languages, and so they weren't sure why I would trouble them with a video like this for Korean.

In response, I've put together another one here to emphasize how different Korean is from English and to put this matter to rest, once and for all.

Korean Translation Tip – Just because you can translate phrase-by-phrase between Western languages does not mean you can do it between Western and Asian languages.

Issues in Calculating Rates for KO>EN Translation Jobs, Revisited

Several years ago I posted an article about why I don't generally offer per-word rates for Korean>English translation. The following is from a recent email to a client, explaining things in a bit more detail.

======

Dear <Client>,
 
Here are the issues  I can think of now which make it hard to use source word/character rates on Korean>English work.
  1. The majority of the work I get for KO>EN is scanned source files in PDF format, which can't be analyzed precisely until the translation is complete. On those jobs, fixed quotes in advance or target word billing are the most reasonable. Sometimes these PDFs can be converted to Word through OCR or the native Adobe Acrobat conversion. However, for various reasons, these word counts are extremely unreliable.
  2. Even if the files are editable, I find that it takes an extra measure of care to ensure everyone's talking about the same thing when referring to Korean words/characters. To make matters worse, if the language settings in Word aren't set right, the software will count Korean words as characters (or vice versa, I can't remember which right now) and that creates confusion. At least until a few years ago, Excel also didn't count Korean words and characters correctly.
  3. Korean does not have a long tradition of using words (or even writing left-to-right), and I find that Koreans are not as consistent in their use of spacing as we are in English. Therefore, what you find is that different writing styles yield different Korean word counts, even as the final English translated word count remains unchanged. Furthermore, when clients equate Korean with Chinese and Japanese which don't use words, it adds another layer of confusion. Your colleague mentioned that internally you are assuming two Korean characters to be one word, but that is arbitrary. Korean words are calculated based on discrete units of meaning, and separated by spaces. 
  4. Different types of content return different word count expansions. For example, Korean word lists will translate to English almost at one for one. However, because Korean grammar attaches tags to words and those tags are then translated to English as separate words, the expansion rate increases the more "prose-y" a text is. The expansions vary depending on subject matter, too.
  5. As with the current job, many Korean writers, especially on technical documents, mix a lot of English words into the text. These are embedded in the Korean grammar though and can't be excluded from the word count. However, if the letters of the English words are counted as characters (which is what happens if not analyzed separately), it runs the word count way up. On today's job, there were 1,500 English words mixed in with some 4,000 Korean words. That means rejigging the word counting formula to avoid overcharging. Counting source characters also means having to do something extra with numbers, since that also runs up the count. 
For all these reasons, it is so much easier to just use the English word counts, which are predictable and universally understood. But of course, it is true that this makes it hard to quote projects in advance. One solution is simply to ask me to quote projects first, if you have the time to wait. But as I mentioned to your colleague today, I've also started offering a character rate to clients that just have to have a source-based billing structure. But since it's imprecise, it's still best if I can analyze, adjust and quote the work in advance to take account of the various issues mentioned above. Keep in mind though that if I'm taking the risks of all these unknown factors with an advance quote, I'm also going to aim a bit high; generally, my most competitive pricing is available on English-word rates.
 

Korean Translation Tip: Why You Can’t Translate Phrase-by-Phrase Between English and Korean

We frequently get translation requests for content where the source text has been chopped up into sentence fragments. This is especially common with captions for video, since the content needs to show up on-screen in bite-sized pieces. But sometime clients even send such requests because they want to be able to rearrange words themselves later, or because they sent over bilingual files for translation in a CAT tool which were improperly translated.

In the first case, as long as the source text forms complete thoughts and the translation doesn't have to correspond 1-for-1 by sentence fragment, we can translate it. But the "mix-and-match" approach is a recipe for disaster. 

Here's a video I put together to illustrate how structurally different Korean and English are and to show why the translation of complete thoughts must be done at the sentence level.

 

 

Korean Translation Tip – If you're ever tempted to ask that English sentences and phrases be translated into Korean in the order the words appear in the English (or vice versa), please watch this video again to remind yourself that English and Korean can't be connected in such a linear way.

Korean Translation Tip: Spacing Around Parentheses in Korean Looks Funky and Inconsistent

This tip is based on a reader question about the following graphic in last month's message.

image from koreanconsulting.typepad.com

The reader asked why there isn't a space before or after the parentheses… Good question!

The answer may surprise you, but no, there should not be spaces there. The reason is hard to explain clearly without getting into complicated grammar, but the basic idea is that Korean is made up of character units functioning as standalone words, and also of character tags/markers attached to standalone words to indicate various grammatical meanings (such as subject, object, etc.).

In the case above, without the words in parenthesis, the phrase would read "Study Hard" Campaign은, where 은 is a topic marker attached to the word Campaign. Therefore, the added parenthetical text is stuck right in between the word and its tag and no space is added on either side.

Keep in mind that spaces should be added around parentheses when additional text is not being stuffed between a word and its tag. This phenomenon seems to occur almost exclusively when English and/or numbers are inserted into Korean text. The spacing around parenthesis within pure Korean text generally follows the same rules as we use in English (though Koreans get used to such usage and often go without spacing even when it should be there).

Korean Translation Tip – Correct spacing in Korean around parentheses often looks funky and inconsistent to English speakers. Feel free to bug your linguist for confirmation, but expect to get a response back saying it’s OK.

Microsoft Thanked Me for Renewing My Subscription to the Magazine “Office 365 Small Business Premium”

I have been using the Korean version of Office 365 Small Business Premium for a year and it's time to renew. A couple days ago, Microsoft sent me an email thanking me for renewing.

Only problem….

They used the word for "subscribe" that is only used when subscribing to things to read, such as newspapers and magazines.

And they didn't mess it up once… They used the wrong word four times in one email.

Check it out:

2015-07-26_20-18-43

In every case, the indicated Korean word should be changed to "사용권". In Korean, there is no straight translation for "subscription" in the English sense here. The correct word means "right to use", which, if you think about it, means exactly what it should.

Perhaps the linguists who worked on the job didn't know that they were translating for a software subscription, rather than a magazine subscription. Or perhaps they just reused old TM segments which had been translated for a magazine subscription situation. Or maybe they just weren't paying attention.

Whichever it was, the QA processes failed. 

For lots more of these, check out A Collection of Korean Translation Errors in the User Interfaces of Leading Software.

Explaining My Request for Feedback on Using Machine Translation and Cross-Project Translation Memory Leverage in High-Quality Translation Workflows

I recently asked my agency clients for their feedback on offering a rate discount in exchange for the right to use machine translation and termbase leveraging on certain projects. Machine translation brings up ideas of low-quality output, and cross-client TM sharing is generally off-limits due to confidentiality issues and rights to use content. Machine translation approaches are not without confidentiality and content rights complications, either. Here is a detail explanation of my thoughts on the approach, as explained in an email to a client.

======

Dear [Client] – Thanks for the response and opportunity to explain what I'm thinking. (I'm afraid the following is longer than I expected when I started typing.)

You're right that MT has generally been considered just a cheap way to deliver low-quality output. And I agree that Google Translate and the others are currently useless as resources by themselves on high-quality translation projects. But I think there may be a way that MT can be utilized within a range of high-quality workflows. 
 
MT is now being used on high-quality workflows within certain VERY narrow sub-topics. One of my clients is apparently even making it work for Chinese and Japanese, so Korean is clearly not impossible. For example, the translation of a 500-page cutting robot manual can be used to train an MT engine to produce a very good job on another cutting robot manual. Apparently the process breaks down pretty quickly though. The text for a cutting robot manual may not be a good enough match even for training an MT engine to generate high quality output on an assembly robot manual.
 
If we bring the human translator into the process, along with a properly prepared TB, the MT may be able to bring in a few value-added suggestions right away that the professional can finalize efficiently.
 
In addition, it appears to me that the CAT tools are on the verge of getting MT, TMs and TBs to work together, so that if the software finds, say, an 80% match in the TM, it can identify what's different, replace words from the TB, and then machine translate ONLY the sub-segments that are different. I don't know why this couldn't easily start moving fuzzy matches up 10-20% right off the bat. And there are various algorithms for figuring out how good the MT match is likely to be. That means these high-quality segments could be removed from the translation step and only included for proofreading (i.e. post-MT edited, but to perfection, not the conventional "good enough" level). In this way, the translator is still doing his/her job on the segments where the MT/TM/TB combination doesn't get to the required threshold, and the entire document is still proofread and/or linguistically QAd, and the final product is possibly of a higher quality and consistency than otherwise. Over time, this approach should yield increasingly higher efficiency.
 
There can be no doubt that this is the direction things are moving in our industry and I would like to start experimenting with it now. However, to plug in the MT functionality to my CAT tool generally requires special client permission, and so I've never done it, not even once. And to use the TM from one job to leverage it to build up seed TMs in various fields to apply to other client projects is also out of the question without specific approval.
 
Therefore, my idea is to start by offering a penny discount on projects where the client gives me, in effect, an "indefinite, irrevocable right to use" their content for such an approach. It would never involve revealing full coherent documents in public, but would mean that the segments/TB entries would go into various reference TMs/TBs/corpora to be applied to other projects and/or that the translations we do with that content may be used to train MT engines which may also be used on other projects.
 
I would then watch where things go from there. If I'm just hitting dead-ends and this content isn't useful to my bottom line and doesn't look like it will in the future, I could stop offering the discount. On the other hand, if the approach works, I could even increase the discounts over time. 
 
I see it as a long-term approach. About ten years ago, I was worried that the technology would replace me eventually; I'm cautiously taking the position that the technology is creating more opportunities than it is killing. For example, I remember Google Adwords about 10 years ago… It was doable as a layperson. Today, Google has built in so much complexity that I can't even find an expert who can do it properly and affordably. You would have thought that by now the process would have been automated, but it's gone in the opposite direction, and changes so fast that one cannot stay up on it without investing huge amounts of time in continuously learning.
 
I suspect translation is also going to unfold this way. At memoQfest in May, I realized that some of the approaches I use in translation with my team are unique, but that the software is changing so fast that I can't hope to use all the functionality that's available. Not only learning the software, but also figuring out how to apply it and then continuously updating those approaches to the changing landscape is a process that creates barriers to entry which are likely greater than they they've ever been. You may have noticed that late last year I updated my email footer greeting to say "translation technologist". That's still a bit more of a "hopeful" title than it is in reality, but I also see a new role opening up even in the freelance side of things, which is the role that bridges project managers with translators. Project managers rarely have the time or inclination to really extract all the value and ensure all the quality that exists in the project stages between end-client and translator, especially if translators aren't using CAT tools. And this is even before the MT/TM/TB combination hits its stride. With the right skills and tools, a translation technologist could help achieve all kinds of benefits in the production chain. This part of my thinking is still in-development, but being able to use MT and leveraging TMs on a cross-client basis is surely a place to start.
 
Let me know your thoughts on this.

Korean Translation Tip: Correct Font Handling in Korean Layout

I've posted several short articles on Korean layout in this set of tips, including Cardinal Rules of Korean-Language Layout, Korean Layout Rules for MS PowerPoint and Spacing Issues in MS Word.

In this post, I introduce some font handling advice to improve the way Korean-language layout looks to Korean readers. Of course, this won't get you the same font and layout sensitivity my team delivers, but it will help you avoid a couple no-nos.

So here's the scoop…

The Korean fonts come with a set of double-byte punctuation marks to which spaces are added before or after. These extra spaces aren't needed in Korean; indeed, they look bad! You should use single-byte punctuation (i.e. the same marks we use in English).

In addition, Korean fonts include a collection of English letters. However, don't use these, either! Proper font mapping avoids this issue, but if you find that the fonts still aren' t right, switch the English text back to an English font (most likely the font of the source document).  

Here's what a string of text might look like if punctuation and English fonts are handled incorrectly:

2015-07-15_15-26-28

And here's how it should look:

2015-07-15_15-29-47

 Korean Translation Tip A – Don't use double-byte punctuation in a Korean translation.

 Korean Translation Tip B – Don't use Korean fonts for text that remains in English.

Korean Translation Tip: Applying the Cardinal Rules of Korean-Language Layout to Microsoft PowerPoint Files

I've previously written about the Cardinal Rules of Korean-Language Layout. When clients handle layout on their end, I frequently send them the link to this article so that they or their layout expert can brush up quickly on the rules.

Of course, to implement those correctly, you've got to know how to use the software. For readers working in advanced design programs, I assume you know how.

This tip though is for people dealing with Korean in PowerPoint files who don't feel like paying somebody to fix things, but still want to deliver a good job.

—–

Occasionally a Korean PowerPoint slide will end up with text like the following excerpt from my master's thesis.

2014-04-06_4-50-54

See those ugly red lines under the text marked by the red arrow? That's PowerPoint's quality checker indicating non-existent language problems. And the blue arrow shows a Korean word split incorrectly at the end of the line.

How do you fix these two issues if you don't know Korean?

The answer is going to seem too easy, but it's amazing how hard it was for me to figure it out. (In fact, I didn't figure it out; I had to ask my super-smart layout guy Xiang for the answer!)

To fix things, select the text and then change the language to Korean as follows:

2014-04-06_4-52-00

Doing so produces this correctly formatted text:

2014-04-06_4-53-58

The above is still not my preferred style though. If you really want to make it look nice, right- AND left-justify the text to get this perfect specimen:

2014-04-06_4-54-15

Korean Translation Tip – Follow the above procedure on PowerPoint slides to make the text look like it should; otherwise, if the language settings aren't right, your great Korean translation may still look terrible.

——-

** By the way, setting the language correctly solves problems in PowerPoint files that contain other languages, too!