After Caterina Balcells graduated from the University of Barcelona, she took a traditional career path as a language teacher. Today, she’s no longer teaching language to humans. She’s teaching language to computers.
Balcells is the chief linguistic officer at Inbenta, a cloud-based natural language processing and semantic search technology provider in San Mateo. In a recent interview, she said she took the position at Inbenta with no background or training in artificial intelligence. Now that she is immersed in the world of natural language processing as a computational linguist, Balcells has become an expert in training “chatbots,” the voice that’s often on the other end of the line when you call customer service. I asked her if some chatbots are easier to train than others, depending on what language or languages they speak, and she said some languages are inherently more difficult than others:https://o1.qnsr.com/log/p.gif?;n=203;c=204663295;s=11915;x=7936;f=201904081034270;u=j;z=TIMESTAMP;a=20410779;e=i
For example, we have been working for some years now with German, and these very long, compound words that they have can be challenging for us. Chinese and Japanese don’t have spaces between sentences, and that’s also a bit complicated for us, because we have to know where a word starts and ends. So there are different things in different languages that are easier than others.
Another key factor, Balcells explained, is that there are far more open source tools available for some languages, like English, than for others:
Here at Inbenta we support over 20 different languages, so we’ve had to develop a lot of the tools ourselves. For example, there is nothing for Catalan, for Basque; now we are working with Norwegian, Swedish, and Asian languages. Most of the tools are available in English, but that doesn’t mean that English is easier than other languages — I don’t think English is easier than Chinese, for example, although they have many things in common. I just learned that recently, because we have our expert in Chinese here, and she said there are things that are very similar. For example, in both languages the same word groupings can mean different things — in English, “ship a book” and “book a ship” have entirely different meanings. The same word can be a noun or a verb, and the same thing happens in Chinese. This is complicated — it can be difficult [to teach a computer] when a word is a noun, and when it’s a verb.
Speaking with Balcells was especially interesting for me, in that I’m a graduate of Georgetown University’s School of Languages and Linguistics. My own experience in that realm was that a large percentage of students were females who went on to become language teachers. When I brought up the gender topic, Balcells said most of the linguists at Inbenta are indeed female:
Most of us have a linguistics background, so we have studied languages at the university level. Some of us have focused on linguistics, others come from the translation field. These are studies that attract a lot of females — I don’t know why. But we also have some male computational linguists here — some of them may have a more technical background. But we don’t notice a big difference here, because all of us have this linguistic background, and we all have studied something similar. That said, computational linguists can come from different fields — not all of them have a linguistic background. Some of them have a technical background — and in those cases, yeah, they are mainly male.
Given that the teaching profession is largely female-dominated, at least on some levels, I asked Balcells what it was like for her going from that profession to the very male-dominated technology sector. I mentioned that I couldn’t help but notice that she’s the only female among the 12 people shown on Inbenta’s website as being on the leadership team. She laughed and said that was “kind of weird” for her:
At the top level of companies, you notice that those roles are mainly filled by males, that’s true. But here in our company, the team as a whole is half male and half female, so we don’t really notice that. For me, changing from teaching to working with computers was kind of like changing from teaching human beings to teaching computers — my feeling is that I am teaching computers here. I used to teach languages to children, and now I’m teaching languages to computers. So it wasn’t such a big change, because I’m still teaching languages.
I wrapped up the conversation by asking Balcells what advice she has for young people who are interested in entering the field of computational linguistics and pursuing a career in artificial intelligence. She said first of all, don’t be afraid:
Sometimes people who apply for a job here say, “I know how to handle a computer, but I’m not an expert.” Don’t be scared — this is one of the things you can do if you study languages and linguistics, and you like computers and technology. This is a field that changes a lot, so you can learn and invent a lot of new things — there are many things that are there to be invented. Even if you don’t have great programming skills, you can find a company like ours where we work hand-in-hand together with developers and engineers. It’s a fascinating job — you’re building things, and then you see them online and see that they’re helping people, and helping companies sell more. It’s very satisfying to see that what you do is helping other people.
A contributing writer on IT management and career topics with IT Business Edge since 2009, Don Tennant began his technology journalism career in 1990 in Hong Kong, where he served as editor of the Hong Kong edition of Computerworld. After returning to the U.S. in 2000, he became Editor in Chief of the U.S. edition of Computerworld, and later assumed the editorial directorship of Computerworld and InfoWorld. Don was presented with the 2007 Timothy White Award for Editorial Integrity by American Business Media, and he is a recipient of the Jesse H. Neal National Business Journalism Award for editorial excellence in news coverage. Follow him on Twitter @dontennant.