If you have ever used Google Translate and wished you could do the same from console, here is a Python script that does just that.
The script will translate words and entire sentences between any language pair known to Google Translate. It will accept both text passed in as shell arguments, as well as data from standard input.
NOTE: This script stopped working after the translation API version 1 was discontinued on December 2011. See the updated script below for a working version.
NOTE: According to http://code.google.com/apis/language/translate/v1/reference.html#!/usr/bin/env python
from urllib2 import urlopen
from urllib import urlencode
import sys
import os
# The google translate API can be found here:
# http://code.google.com/apis/ajaxlanguage/documentation/#Examples
# Language codes are listed here:
#http://code.google.com/apis/ajaxlanguage/documentation/reference.html#LangNameArray
if len(sys.argv) < 3:
name = os.path.basename(sys.argv[0])
print '''
Usage:
%s en es lovely spam
%s es en < file.txt
Available language codes are listed here:
http://code.google.com/apis/ajaxlanguage/documentation/reference.html#LangNameArray
''' % (name,name)
sys.exit(-1)
## hack to be able to display UTF-8 in Windows console
if sys.platform == "win32":
## set utf8 console
if not sys.stdin.encoding == 'cp65001':
os.system('chcp 65001 > nul')
class UniStream(object):
__slots__= "fileno", "softspace",
def __init__(self, fileobject):
self.fileno= fileobject.fileno()
self.softspace = False
def write(self, text):
if isinstance(text, unicode):
os.write(self.fileno, text.encode("utf_8"))
else:
os.write(self.fileno, text)
sys.stdout= UniStream(sys.stdout)
sys.stderr= UniStream(sys.stderr)
lang1=sys.argv[1]
lang2=sys.argv[2]
langpair='%s|%s'%(lang1,lang2)
if len(sys.argv) > 3:
text=' '.join(sys.argv[3:])
else:
text=sys.stdin.read()
base_url='http://ajax.googleapis.com/ajax/services/language/translate?'
params=urlencode( (('v',1.0),
('q',text),
('langpair',langpair),) )
url=base_url+params
content=urlopen(url).read()
start_idx=content.find('"translatedText":"')+18
translation=content[start_idx:]
end_idx=translation.find('"}, "')
translation=translation[:end_idx]
sys.stdout.write(translation + '\n')
This is the updated script that uses the web API. Should work after December 2011.
#!/usr/bin/env python
import sys
import os
import urllib2
from urllib import urlencode
import cookielib
import re
# The google translate API can be found here (***NOT OPERATIONAL SINCE DECEMBER 2011***):
# http://code.google.com/apis/ajaxlanguage/documentation/#Examples
# Language codes are listed here:
#http://code.google.com/apis/ajaxlanguage/documentation/reference.html#LangNameArray
if len(sys.argv) < 3:
name = os.path.basename(sys.argv[0])
print '''
Usage:
%s en es lovely spam
%s es en < file.txt
Available language codes are listed here:
http://code.google.com/apis/ajaxlanguage/documentation/reference.html#LangNameArray
''' % (name,name)
sys.exit(-1)
## hack to be able to display UTF-8 in Windows console
def fix_win32_console():
## set utf8 console
if not sys.stdin.encoding == 'cp65001':
os.system('chcp 65001 > nul')
class UniStream(object):
__slots__= "fileno", "softspace",
def __init__(self, fileobject):
self.fileno= fileobject.fileno()
self.softspace = False
def write(self, text):
if isinstance(text, unicode):
os.write(self.fileno, text.encode("utf_8"))
else:
os.write(self.fileno, text)
sys.stdout= UniStream(sys.stdout)
sys.stderr= UniStream(sys.stderr)
if sys.platform == "win32":
fix_win32_console()
lang1=sys.argv[1]
lang2=sys.argv[2]
if len(sys.argv) > 3:
text=' '.join(sys.argv[3:])
else:
text=sys.stdin.read()
base_url='http://translate.google.com.br/translate_a/t'
# sample browser request
#http://translate.google.com/translate_a/t?client=t&text=col&hl=en&sl=en&tl=es&multires=1&otf=2&ssel=4&tsel=0&sc=1
params=urlencode({'client':'t',
'text':text,
'hl':'en',
'sl':lang1,
'tl':lang2,
'otf':2,
'multires':1,
'ssel':0,
'tsel':0,
'sc':1,
})
url=base_url + '?' + params
cookiejar = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookiejar))
opener.addheaders = [('User-agent', 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1018.0 Safari/535.19'),
('Referer', 'http://translate.google.com/')
]
response = opener.open(url)
translation=response.read()
matcher = re.search('\[\[\["(?P<human_readable_chunk>[^")]*)', translation)
sys.stdout.write(matcher.group('human_readable_chunk'))
Save the script to a file such as gtrans.py and run it as follows (assuming you have Python in your path):
python gtrans.py en es Nobody expects the Spanish Inquisition
The first two parameters are the language codes. A list of codes known to google translate is available here: http://code.google.com/apis/ajaxlanguage/documentation/reference.html. For some reason, not all of the listed codes are actually accepted, for example, bo for Tibetan
To pipe a text file through the script:
python gtrans.py en es < myfile.txt
It is also possible to enter multi-line text directly from the console. To do so, call the script with the language codes only, i.e:
python gtrans.py en es
Enter your text and use the Enter key to start a new line. When you are done, press CTR+d (on Linux) or CTR+z followed by Enter (on Windows).
Note: On Windows input in other languages than English is not going to work. This is due to poor support of Unicode input in cmd.exe. On Linux international input works fine, provided that the console is UTF-8.
Keep in mind though that google has a limit on the size of text to be translated.
Console Google Translate — curl-based version
As an alternative, here is a bash script which uses curl and sed. Updated to work via Google Translate Web API.
#! /bin/bash
USAGE="Usage:
$0 en es Lovely spam!
Some codes: en|fr|de|ru|nl|it|es|ja|la|pl|bo
All language codes:
http://code.google.com/apis/ajaxlanguage/documentation/reference.html#LangNameArray"
if [ "$#" == "0" ]; then
echo "$USAGE"
exit 1
fi
FROM_LNG=$1
TO_LNG=$2
shift 2
QUERY=$*
UA="Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.2) Gecko/20040803"
URL="http://translate.google.com.br/translate_a/t?client=t&hl=en&sl=$FROM_LNG&tl=$TO_LNG&otf=2&multires=1&ssel=0&tsel=0&sc=1"
curl --data-urlencode "text=$QUERY" -A $UA -s -g -4 $URL | sed 's/","/\n/g' | sed 's/\]\|\[\|"//g' | sed 's/","/\n/g' | sed 's/,[0-9]*/ /g'
Thanks for this. I've posted a version of your script that uses the Python json and optparse libraries here: http://gist.github.com/561630
ReplyDeleteIt should be reasonably compatible with Python3, although I think optparse has been deprecated in favour of argpase.
Nice post! I have written a bash version that works pretty much like yours :-)
ReplyDeletehttp://ur1.ca/1g5ak
Regards!
2ksaver: Thanks for sharing the script! I went back to update the post with a curl-based solution and saw your version of the script. I experimented with curl's `--data-urlencode` and looks like google is happy with getting the params via POST instead of GET.
ReplyDeleteVery Very (./translate it en utile) useful :)
ReplyDeleteNice knowledge gaining article. This post is really the best on this valuable topic.
ReplyDeleteTranslation services near me