Skip to main content

Me and RegEx

 

What is RegEx?

Regex or Regular Expressions is not a programming language but more pattern identification. Its main purpose is quality check in translated or other texts and documents. According to Riccardo Schiaffino,

RegEx "is a search-and-replace function on steroids. Regular expressions can assist our translation work by allowing us to search, replace, and filter text in ways that would otherwise be impossible in our software tools." (https://www.ata-chronicle.online/highlights/regular-expressions-an-introduction-for-translators/

If you are a linguist or have some affinity for languages, you will pick up regex quickly, after some trials and errors.:)

For us, translators, regex is important because CAT tools use regular expressions for creating segmentation and auto-translation rules.

See below my first attempts to create some basic rules that can be used for Hungarian translations.

RegEx for English to Hungarian Translations

Example 1: Hungarian (or other names) with more than 1 space between them

Regular Expression: a-zÔéúőóüö.[A-ZĆĆ‰ĆšÅĆ“ĆœĆ–]

Explanation: This regex looks for one or more spaces between words that follow each other with capital letters including Hungarian characters or common Latin characters. It is designed particularly for checking Hungarian and English proper names that contain 2 or more components. Note that the extra space between regular words (lower cases) was not picked up.

The Regular expression first was checked in regex101.com:


As you can see it, it picked up all the extra spaces between the names regardless of whether they contained 2 or more elements or a period between them. (I just realized, this regex can be used also to check if there is an extra space between sentences that end with a period including the ones that start with Hungarian letters which is super helpful and definitely broadens its usage!)

I added the regex in Trados and with the Verify option, it gave me warnings for extra spaces. (Please note that I had some formatting issues with how they were displayed in Trados and placed in segments but this is just another confirmation that the Regular Expression works also to pick up extra spaces between everything that ends with any character or a period and starts with a capital letter.)

Example 2: English and other quotation marks replaced with Hungarian (lower and upper) quotation marks

Regular Expression: ("|'|<|>|ā€˜|ā€œ)(.*)("|'|<|>|’|ā€) Substitution: ā€ž$2ā€

Explanation: It's common to leave English upper quotation marks in translated texts simply because they don't have a direct way to put them into the text in Hungarian, but they are considered to be grammatically incorrect. This expression looks for segments that start or end with other than Hungarian lower and upper quotation marks including ", ', ā€˜, ’. ā€œ, ā€, <, >. The replacement changes them to start with a lower quotation mark and ends with the upper quotation mark. Note: The French quotation mark was not included because Hungarian uses them, too.


The Regular Expression was checked in regex101.com It picked up all the wrong quotation marks and left the Hungarian and French. The substitution replaced them all with Hungarian quotation marks.

In Trados, I used the Replace option, included the regex and substitution, and again, it picked up the wrong quotation marks all the way, and with the Replace or Replace All I could change all of them to the Hungarian one.

Have fun!

Contact me:


Comments

Popular posts from this blog

Discussion: Managing Stakeholders

A point of view of a freelance localizer and translator Who was the stakeholder and what was their role? For this discussion, I am going to use my freelancing experiences and use my clients as stakeholders. These clients are mainly LSPs and within that, primarily need to deal with translation project managers or project coordinators. In some cases they are more on the junior side, meaning they are not involved that much in the entire project but only in the coordination between linguists, and in some cases, they are senior project managers who have more interests in the project outcome because they need to manage them from beginning to end and they also need to deal with their clients. In some cases, depending also on the client and the size of the company, the project manager can be the owner or the president of the organization at the same time. As I mainly had pleasant experiences with these stakeholders, I am going to explain how I learned to deal with them in general going into de...

Contentful Headless CMS - l10n & i18n

Exploring Contentful for Translation and Localization It seems there is a new buzzword in the website building industry: headless CMS (Content Management System). But what is exactly a headless CMS and how does it work? In this write-up, I am going to walk you through the different steps of how I explored one of the most popular headless CMSs, Contentful; how I created a simple website with it; and what approaches I took to localize the website into another language. In the end, I realized that I needed a completely different mindset on website building and localization in general. And I had a clear vision of when a headless CMS is useful and when it is recommended to use a traditional CMS, like WordPress, instead. Finally, I learned a lot about Node.js, Gatbsy.js, website deployment, and the many challenges i18n problems introduced into the subject. Part 1: Headless CMS & Contentful What is headless CMS? A headless CMS is a platform that has no default front-end system to determin...

Bike Inside Trados Delivery Project

  I grouped together with 3 other translators to simulate a Trados delivery project from beginning to end to learn about Trados's challenges and setbacks in more depth. The project included creating a SOW/work proposal based on CAT tool analysis, pseudo-translation, establishing a Style Guide, a glossary, and TMs, delivering the final language packages, and recording a video on lessons learned. Follow along with our experiences of the creation of a Trados delivery project for localizing a short text from the website "Bike Inside" divided into 4 languages. The translation process simulates the experience of translating in a small, in-house translation team or in a small group of associated freelancers. Introduction WHAT IS TRANSLATION TECHNOLOGY? Translation Technology refers to technologies that are important in management, engineering, and linguistic roles in translation and localization, including various tools that aid the process of translation and editing workflows. ...