Skip to main content

NMT Customization Pilot Project



This month-long pilot project aimed to train Microsoft Translators' NMT engine and to develop a Neural Machine Translation model to translate the editorial and media press releases of MOSTRA in 2021 from English to Brazilian Portuguese.

The project validated the MT engine for the above-mentioned purpose.

My team and I prepared a Statement of Work for our client with the following details: Objectives, including goals related to quality, efficiency, and costs; Project timeline; Processes and Workflows; Details Costs Table; Deliverables.

By the end of the month, we were able to give a clear, data-backed up solution to our client on whether it's worth investing in the training or hire human translators instead of the machine and we also created a Lesson Learned Video Presentation about the process.

Scroll down to see the proposed timeline of the project, download the sample files, and/or watch the video presentation.

PROJECT TIMELINE

March 1 - March 28

March 1st

KICKOFF MEETING

The official start date of the project; Kickoff Meeting with the client; Proposal presentation and QA.

by March 5th

PREPARATION PHASE

Preparation of the project including data mining, data cleaning, data alignment, setting up the workspace, etc.

by March 19th

NMT TRAINING

10 MT training runs, 1 training/day.

by March 28th

ANALYSIS

MT output analysis, Post Editing, QA.

March 28th

DELIVERY

Delivery of completed items, findings, conclusion, and updated proposal.

You can see and download all the project files here:

NMT Pilot Project Downloadable Files

  • Statement of Work Initial Proposal
  • Statement of Work Updated Proposal
  • Lessons Learned Video Slides Presentation

Lessons Learned Video Presentation

At the end of the project, we created a video presentation to show the different elements and workflows of our month-long customization project and also to describe the challenges we faced and how we overcame them. Watch the video on this link:


Contact me:


Comments

Popular posts from this blog

A Closer Look at Netflix's Timed Text Style Guides and Subtitling Best Practices

  Table of Contents Introduction Netflix Timed Text Style Guides Technical aspect Linguistic aspect Forced Narratives Trailers Subtitles vs. CC Conclusion Resources Appendix: SDH Identifiers Table - HU Watch my short "hook" about this post here: Download the slides  here . Introduction Subtitling and audiovisual translation Dubbing and subtitling are very creative processes. Whether the audience watches with dubbed audio, or in the original language with foreign language subtitles, closed captions, or forced narratives, the ultimate goal is to make the shows enjoyable and resourceful. As well as making sure that any text is timed appropriately to the action, capturing creative vision and nuances in translation is critical for this goal. Audiovisual translation is like creating 3D translations. In traditional translation projects, you have the source text and the target text. It's two-dimensional. With audiovisual translations, you have the source text, the visuals...

Contentful Headless CMS - l10n & i18n

Exploring Contentful for Translation and Localization It seems there is a new buzzword in the website building industry: headless CMS (Content Management System). But what is exactly a headless CMS and how does it work? In this write-up, I am going to walk you through the different steps of how I explored one of the most popular headless CMSs, Contentful; how I created a simple website with it; and what approaches I took to localize the website into another language. In the end, I realized that I needed a completely different mindset on website building and localization in general. And I had a clear vision of when a headless CMS is useful and when it is recommended to use a traditional CMS, like WordPress, instead. Finally, I learned a lot about Node.js, Gatbsy.js, website deployment, and the many challenges i18n problems introduced into the subject. Part 1: Headless CMS & Contentful What is headless CMS? A headless CMS is a platform that has no default front-end system to determin...

Time, Cost, and Quality in Localization Processes

Quality, Cost, Time dilemma in Localization Translation projects  have to meet three main requirements as defined in the ‘iron triangle’ invented by Dr. Martin Barnes in the 1970s: quality, time, and cost. One side of the triangle cannot be changed without affecting the other two. According to this theory, translation projects appear to be locked in an impossible equation where the ideal process that is quick, well done, and cheap can never be achieved. According to this idea, projects can be: Fast and cheap  = this results in poor quality. This is what most clients are asking for, without taking into consideration that reducing the amount of time spent on a project will most probably cause issues. Fast and high quality  = expensive. You will need a bigger team, which will cost you more in terms of human resources and organization. High quality and cheap  = slow. In order to save money, the translation agency will choose beginners and give them extra time, and t...