Research - Martin McCaffery

My research

My thesis was in the area of Machine Translation within the broad field of Natural Language Processing (NLP). I was working with Dr. Mark-Jan Nederhof in the School of Computer Science at the University of St Andrews, a small town on the east coast of Scotland. That's also where I did my Bachelor of Science, in Computer Science with French.

When many people hear NLP these days, they think immediately of machine learning, neural networks and generally data-driven translation systems. My thesis, however, focused not on translation systems but on evaluation, and not on data but on algorithms and their evaluation - both formal and empirical. In other words, I wasn't crunching numbers to produce better translations, but rather thinking through and applying theory so we can better understand how good someone else's translator is.

Why do we care how good translation systems are? There are two reasons I think are relevant. One is on the technical side: some algorithms for training translation systems use a "hill-climbing" method, in which they keep making small (automatic) changes to their translation mechanism to try to maximise the "score" they get from some external measure of quality. That score can be calculated by number of existing tools such as BLEU and Meteor.

The other reason is practical. Imagine you need a translation system for your domain, and you have a number to choose from, but for budget or other reasons none are very good. It may be important not only to know which is the "best" translator, but also what the strengths and weaknesses of each are. Do you need to translate a user manual for a piece of hardware? Then precise translations of nouns can make or break the usability of your document, but getting the right tense in your verbs is almost entirely irrelevant. If you're telling an anecdote, though, it may not matter if slightly the wrong wording is used, but the word order - who did that to whom? - might be critical to understanding.

The evaluation systems I produced focused on this last feature: word ordering. A far smaller number of systems exist which evaluate individual error types rather than overall quality, but within word ordering a few alternatives exist. I produced two systems, both using a specific grammar formalism - dependency parsing - to investigate and compare the grammatical structure of translated sentences. One system produced results comparable with other existing algorithms, while one outperformed all the alternatives for which I had data.

If you are interested in more information, you could try reading the thesis, or speaking to me in person.