Merlyn is a Python program for computer control by speech. It uses the CMU PocketSphinx system, and is capable of supporting many tasks that you might want to do on a regular PC, using voice only. It is currently Linux only and is an alpha release for proof-of-concept.
In particular, you can:
- play local music files
- open specified websites
- check the weather, the calendar
- control the mouse and keyboard
- control tools such as the file manager
- switch windows, open menus, save files…
- launch apps, e.g. calculator, editors, browser…
- surf the web, e.g. Amazon, eBay, Google, Netflix, Reddit, Spotify, Twitter, Wikipedia, Youtube
It can operate online or offline. The default mode is online because the default voice is provided by Google’s gTTS, but it can easily be changed for another, e.g. Festival. You may choose from male or female voices, which is why ‘Merlyn‘ was chosen over ‘Merlin’ (more gender-neutral).
I created it because I wanted voice control available in AI Linux. Some of the commenters there seemed to be hoping that it was a J.A.R.V.I.S. – like OS. Well, it wasn’t designed for that purpose, but I thought it might be interesting to add some voice-control capability. I checked out a lot of existing voice control systems for Linux but experienced problems with all of them. The main problems that I experienced were that they were too old, or too new! The old ones hadn’t been updated for years (many appear to have been abandoned) and no longer installed cleanly. And the new ones looked promising, but aren’t ready for reliable use.
Quick Usage Overview
| Say 'Merlyn' to make him/her listen.
| Merlyn will obey the next command. If that is 'keep listening'
| then Merlyn will continue to obey commands until you say 'stop
| listening'. Say 'help' to see this message again, and to get
| further help.
> Yes Alan
> Fri Mar 31 11:04:41 BST 2017
> Yes Alan
< keep listening
< youtube alan parsons
Created new window in existing browser session.
< close tab
< youtube emerson lake and palmer
Created new window in existing browser session.
< raise volume
< show general
> Didn't understand: one
< show websites
< stop listening
> Speak my name for your next command.
Some of Merlyn’s responses (indicated by the ‘>’) are also spoken, e.g. ‘Yes Alan’ (can be configured to be your name). By default, Merlyn will only obey your next command, unless you tell him/her to ‘keep listening’. Later, you can command her to stop listening. In the example, I started listening to Alan Parsons, but changed my mind in favour of ELP. And I wanted it louder. Then I asked to see the list of general commands. There are also commands for the keyboard and mouse control, browser control and websites, and editing. You can easily add your own commands by editing the existing command files, or creating a new one. Oh, the ‘one’ that Merlyn didn’t understand was due to some background noise (probably ELP is too loud…), ‘one’ isn’t a command in my configuration. Finally, I tell Merlyn to stop listening, until I call her name again.
The following commands should get you started:
- Keep listening, stop listening
- Show general (or websites, or keyboard, or applications…)
- Amazon, Google, Reddit, Spotify, Twitter, Wikipedia, Youtube, …
- Youtube Beatles (or Alan Parson, Deep Purple, Emerson Lake and Palmer, Moody Blues, …)
- Raise volume, lower volume, mute sound, silence, …
- mouse north (or east or south or west) n (n = how many pixels, e.g. ‘one thousand and twenty four’)
- Mouse click (or click here)
I can almost guarantee some of the above won’t work for you, immediately. Your setup is likely different from mine, and you’ll need to find the best setting for your microphone sensitivity. You may get false negatives (Merlyn didn’t hear your command) if your mic sensitivity is too low. You may get false positives (Merlyn heard commands that you didn’t say) if the background noise (e.g. music or conversation) is too high.
A good way to adjust the mic is by using an audio tool such as Audacity. Record your voice and adjust the sensitivity until the displayed waveform almost clips, then turn it down slightly.
You may also not have the required programs installed. Check the cmds/*.txt files to see the required programs and either change them for your own equivalents, or install them. Be sure to read the ‘Changing and Adding Commands’ section below.
Invoke the demo with ‘merlyn demo’. Place the microphone near a speaker. Its very likely that not every command will be properly understood. Bear in mind that the demo is a pre-recorded set of commands and it has no way to notice if they weren’t understood, and would not have any way to repeat them differently, as you would.
Download & Installation
Add this to your .bashrc or .merlyn:
alias mln="cd ~/Merlyn"
and run (source) it.
Changing and Adding Commands
The Merlyn directory contains:
cmds data demo lang listen.py Merlyn.py spells x
cmds data lang spells and x are subdirectories. cmds contains some .txt files which specify the available commands. A command consists of a ‘name : value’ pair, or ‘command : spell’. The command is what the user can say, and the spell is instructions to the OS, e.g.:
CALENDAR : zenity --calendar --title="Merlyn" 2>/dev/null
DATE : echo $(date) | $MLN_SPELLS/speak.py 2>/dev/null
TIME : echo $(date) | $MLN_SPELLS/speak.py 2>/dev/null
NAME : echo "My name is Merlyn." | $MLN_SPELLS/speak.py 2>/dev/null
WEATHER : curl wttr.in/$MLN_LOCATION
Most commands fit on one line; if a larger amount of code is need it can be put in a script and placed into the spells directory. The $MLN_* variables were set in your .bashrc or .merlyn file.
When you’re ready, run the lmtool.py to generate all.txt (read by Merlyn) and the data/*.shw files (shown to user when they say ‘show general’ or whatever), and the lang/corpus.txt file. The corpus file can be submitted to CMU’s lmtool to generate your language files. Download the tar file (I like to use wget inside the lang subdir, after deleting all the current contents). Take the 4-digit number the files came with, and edit Merlyn/listen.py to replace the current number.
I’m adding more commands and cleaning up. There is a primitive syntax for allowing commands to have parameters, currently numbers only, for mouse movement and calculator commands. This might need some extension for other use cases…
I’d welcome any contributions, e.g. command files or well-commented code. I’ll acknowledge any such contributions, and be sure to put your name and optional link in the file comments (after a ‘# ‘).
A Selection of Commands
(There are many more)
address field, address, close tab, close window, dot com, dot co uck, dot net, dot org, find on page, go back, go forward, go to browser, new tab, new window, next tab, open browser, page back, page forward, previous tab, quit browser, quit firefox, refresh page, search field, tab eight, tab five, tab four, tab nine, tab one, tab seven, tab six, tab three, tab two,
music, music next, music pause, music play, music prev, music show, music silence, open rhythmbox, go to rhythmbox, quit rhythmbox, search music,
calculator, open calculator, quit calculator, number <number> (spoken in words, e.g. one thousand and twenty four), add <number>, subtract <number>, times <number>, divided by <number>, press enter (for equals).
open v l c, go to v l c, quit v l c, play faster, normal speed, half size, full size, open volume control, go to volume control,
lower volume, mute sound, next track, pause play, play pause, previous track, raise volume, restart track, un mute sound,
backgammon, checkers, chess, gnome-chess, pychess, draughts, go, maelstrom, mastermind, pinball, reversi, supertuxkart,
calendar, date, time, name, weather,
audacity, genie, image editor, image viewer, leaf pad, note pad, oscilloscope, screencast, settings, software centre, terminal, virtualbox, webcam,
merlyn, exit, help, keep listening, stop listening, show applications, show general, show keyboard, show websites,
GENERIC DESKTOP COMMANDS:
always on top, close dialog, next window, page down, page up, quit application, save file,
WINDOWS & MENUS:
document menu, edit menu, file menu, format menu, help menu, insert menu, maximize window, tools menu, view menu,
files, open file manager, go to file manager, quit file manager,
open desktop, open documents, open downloads, open merlyn, open music, open pictures,
click and hold, click here, mouse click, double click, middle click, mouse location, mouse west, mouse east, mouse north, mouse south (last 4 to be followed by a number (of pixels))
letter a, … letter z, capital a, … capital z,
number zero, … number nine,
eff one, … eff twelve,
backspace, cancel, colon, comma, control a, … control z, control shift c, control shift e, control shift n, control shift o, control shift v, control shift w, delete, end line, enter, equals, escape, home, minus, period, pipe, plus, press delete, press end, press enter, press escape, press home, press tab, question mark, restore window, semicolon, space bar, space, star, super key, switch back, tab, tilde,
go down, go left, go right, go up, down arrow, left arrow, right arrow, up arrow,
ai linux, amazon, dictation, ebay, facebook, fractal art gallery, google, google mail, google news, here be dragons, linkedin, map, netflix, python 3 codes, reddit, search, sphinx tool, twitter, wikipedia, x do tools, youtube, tuxar,
MY FAVOURITE MUSIC:
amazon alan parsons, spotify alan parsons, youtube alan parsons, … amazon yes, spotify yes, youtube yes,