Skip to main content

Me and RegEx

 

What is RegEx?

Regex or Regular Expressions is not a programming language but more pattern identification. Its main purpose is quality check in translated or other texts and documents. According to Riccardo Schiaffino,

RegEx "is a search-and-replace function on steroids. Regular expressions can assist our translation work by allowing us to search, replace, and filter text in ways that would otherwise be impossible in our software tools." (https://www.ata-chronicle.online/highlights/regular-expressions-an-introduction-for-translators/

If you are a linguist or have some affinity for languages, you will pick up regex quickly, after some trials and errors.:)

For us, translators, regex is important because CAT tools use regular expressions for creating segmentation and auto-translation rules.

See below my first attempts to create some basic rules that can be used for Hungarian translations.

RegEx for English to Hungarian Translations

Example 1: Hungarian (or other names) with more than 1 space between them

Regular Expression: a-záéúőóüö.[A-ZÁÉÚŐÓÜÖ]

Explanation: This regex looks for one or more spaces between words that follow each other with capital letters including Hungarian characters or common Latin characters. It is designed particularly for checking Hungarian and English proper names that contain 2 or more components. Note that the extra space between regular words (lower cases) was not picked up.

The Regular expression first was checked in regex101.com:


As you can see it, it picked up all the extra spaces between the names regardless of whether they contained 2 or more elements or a period between them. (I just realized, this regex can be used also to check if there is an extra space between sentences that end with a period including the ones that start with Hungarian letters which is super helpful and definitely broadens its usage!)

I added the regex in Trados and with the Verify option, it gave me warnings for extra spaces. (Please note that I had some formatting issues with how they were displayed in Trados and placed in segments but this is just another confirmation that the Regular Expression works also to pick up extra spaces between everything that ends with any character or a period and starts with a capital letter.)

Example 2: English and other quotation marks replaced with Hungarian (lower and upper) quotation marks

Regular Expression: ("|'|<|>|‘|“)(.*)("|'|<|>|’|”) Substitution: „$2”

Explanation: It's common to leave English upper quotation marks in translated texts simply because they don't have a direct way to put them into the text in Hungarian, but they are considered to be grammatically incorrect. This expression looks for segments that start or end with other than Hungarian lower and upper quotation marks including ", ', ‘, ’. “, ”, <, >. The replacement changes them to start with a lower quotation mark and ends with the upper quotation mark. Note: The French quotation mark was not included because Hungarian uses them, too.


The Regular Expression was checked in regex101.com It picked up all the wrong quotation marks and left the Hungarian and French. The substitution replaced them all with Hungarian quotation marks.

In Trados, I used the Replace option, included the regex and substitution, and again, it picked up the wrong quotation marks all the way, and with the Replace or Replace All I could change all of them to the Hungarian one.

Have fun!

Contact me:


Comments

Popular posts from this blog

A Closer Look at Netflix's Timed Text Style Guides and Subtitling Best Practices

  Table of Contents Introduction Netflix Timed Text Style Guides Technical aspect Linguistic aspect Forced Narratives Trailers Subtitles vs. CC Conclusion Resources Appendix: SDH Identifiers Table - HU Watch my short "hook" about this post here: Download the slides  here . Introduction Subtitling and audiovisual translation Dubbing and subtitling are very creative processes. Whether the audience watches with dubbed audio, or in the original language with foreign language subtitles, closed captions, or forced narratives, the ultimate goal is to make the shows enjoyable and resourceful. As well as making sure that any text is timed appropriately to the action, capturing creative vision and nuances in translation is critical for this goal. Audiovisual translation is like creating 3D translations. In traditional translation projects, you have the source text and the target text. It's two-dimensional. With audiovisual translations, you have the source text, the visuals...

Contentful Headless CMS - l10n & i18n

Exploring Contentful for Translation and Localization It seems there is a new buzzword in the website building industry: headless CMS (Content Management System). But what is exactly a headless CMS and how does it work? In this write-up, I am going to walk you through the different steps of how I explored one of the most popular headless CMSs, Contentful; how I created a simple website with it; and what approaches I took to localize the website into another language. In the end, I realized that I needed a completely different mindset on website building and localization in general. And I had a clear vision of when a headless CMS is useful and when it is recommended to use a traditional CMS, like WordPress, instead. Finally, I learned a lot about Node.js, Gatbsy.js, website deployment, and the many challenges i18n problems introduced into the subject. Part 1: Headless CMS & Contentful What is headless CMS? A headless CMS is a platform that has no default front-end system to determin...

Time, Cost, and Quality in Localization Processes

Quality, Cost, Time dilemma in Localization Translation projects  have to meet three main requirements as defined in the ‘iron triangle’ invented by Dr. Martin Barnes in the 1970s: quality, time, and cost. One side of the triangle cannot be changed without affecting the other two. According to this theory, translation projects appear to be locked in an impossible equation where the ideal process that is quick, well done, and cheap can never be achieved. According to this idea, projects can be: Fast and cheap  = this results in poor quality. This is what most clients are asking for, without taking into consideration that reducing the amount of time spent on a project will most probably cause issues. Fast and high quality  = expensive. You will need a bigger team, which will cost you more in terms of human resources and organization. High quality and cheap  = slow. In order to save money, the translation agency will choose beginners and give them extra time, and t...