Merlyn: Computer Control by Speech Recognition

12Apr - by Alan - 0 - In Linux
Merlyn Computer Control by Speech
Merlyn Computer Control by Speech

Merlyn is a Python program for computer control by speech. It uses the CMU PocketSphinx system, and is capable of supporting many tasks that you might want to do on a regular PC, using voice only. It is currently Linux only and is an alpha release for proof-of-concept.

In particular, you can:

  • play local music files
  • open specified websites
  • check the weather, the calendar
  • control the mouse and keyboard
  • control tools such as the file manager
  • switch windows, open menus, save files…
  • launch apps, e.g. calculator, editors, browser…
  • surf the web, e.g. Amazon, eBay, Google, Netflix, Reddit, Spotify, Twitter, Wikipedia, Youtube

It can operate online or offline. The default mode is online because the default voice is provided by Google’s gTTS, but it can easily be changed for another, e.g. Festival. You may choose from male or female voices, which is why ‘Merlyn‘ was chosen over ‘Merlin’ (more gender-neutral).

I created it because I wanted voice control available in AI Linux. Some of the commenters there seemed to be hoping that it was a J.A.R.V.I.S. – like OS. Well, it wasn’t designed for that purpose, but I thought it might be interesting to add some voice-control capability. I checked out a lot of existing voice control systems for Linux but experienced problems with all of them. The main problems that I experienced were that they were too old, or too new! The old ones hadn’t been updated for years (many appear to have been abandoned) and no longer installed cleanly. And the new ones looked promising, but aren’t ready for reliable use.

Quick Usage Overview

[email protected]:~$ merlyn
| Say 'Merlyn' to make him/her listen.
| Merlyn will obey the next command. If that is 'keep listening'
| then Merlyn will continue to obey commands until you say 'stop
| listening'. Say 'help' to see this message again, and to get
| further help.
< merlyn
> Yes Alan
< time
> Fri Mar 31 11:04:41 BST 2017
< merlyn
> Yes Alan
< keep listening
> Listening
< youtube alan parsons
Created new window in existing browser session.
< close tab
< youtube emerson lake and palmer
Created new window in existing browser session.
< raise volume
< show general
< one
> Didn't understand: one
< show websites
< stop listening
> Speak my name for your next command.

Some of Merlyn’s responses (indicated by the ‘>’) are also spoken, e.g. ‘Yes Alan’ (can be configured to be your name). By default, Merlyn will only obey your next command, unless you tell him/her to ‘keep listening’. Later, you can command her to stop listening. In the example, I started listening to Alan Parsons, but changed my mind in favour of ELP. And I wanted it louder. Then I asked to see the list of general commands. There are also commands for the keyboard and mouse control, browser control and websites, and editing. You can easily add your own commands by editing the existing command files, or creating a new one. Oh, the ‘one’ that Merlyn didn’t understand was due to some background noise (probably ELP is too loud…), ‘one’ isn’t a command in my configuration. Finally, I tell Merlyn to stop listening, until I call her name again.

The following commands should get you started:

  • Merlyn
  • Help
  • Keep listening, stop listening
  • Show general (or websites, or keyboard, or applications…)
  • Amazon, Google, Reddit, Spotify, Twitter, Wikipedia, Youtube, …
  • Youtube Beatles (or Alan Parson, Deep Purple, Emerson Lake and Palmer, Moody Blues, …)
  • Raise volume, lower volume, mute sound, silence, …
  • mouse north (or east or south or west) n (n =  how many pixels, e.g. ‘one thousand and twenty four’)
  • Mouse click (or click here)

I can almost guarantee some of the above won’t work for you, immediately. Your setup is likely different from mine, and you’ll need to find the best setting for your microphone sensitivity. You may get false negatives (Merlyn didn’t hear your command) if your mic sensitivity is too low. You may get false positives (Merlyn heard commands that you didn’t say)  if the background noise (e.g. music or conversation) is too high.

A good way to adjust the mic is by using an audio tool such as Audacity. Record your voice and adjust the sensitivity until the displayed waveform almost clips, then turn it down slightly.

You may also not have the required programs installed. Check the cmds/*.txt files to see the required programs and either change them for your own equivalents, or install them. Be sure to read the ‘Changing and Adding Commands’ section below.

Invoke the demo with ‘merlyn demo’. Place the microphone near a speaker. Its very likely that not every command will be properly understood. Bear in mind that the demo is a pre-recorded set of commands and it has no way to notice if they weren’t understood, and would not have any way to repeat them differently, as you would.

Download & Installation

Download from GitHub.

Add this to your .bashrc or .merlyn:

# Merlyn
alias mln="cd ~/Merlyn"
alias merlyn="~/Merlyn/listen.py"
export MLN_BROWSER=google-chrome
export MLN_DATA=/home/c/Merlyn/data
export MLN_FM="pcmanfm"
export MLN_LOCATION=Neath
export MLN_MYNAME=Alan
export MLN_SPELLS=/home/c/Merlyn/spells

and run (source) it.

Changing and Adding Commands

The Merlyn directory contains:

cmds  data  demo  lang  listen.py  Merlyn.py  spells  x

cmds  data  lang  spells  and  x are subdirectories. cmds contains some .txt files which specify the available commands. A command consists of a ‘name : value’ pair, or ‘command : spell’. The command is what the user can say, and the spell is instructions to the OS, e.g.:

CALENDAR : zenity --calendar --title="Merlyn" 2>/dev/null
DATE : echo $(date) | $MLN_SPELLS/speak.py 2>/dev/null
TIME : echo $(date) | $MLN_SPELLS/speak.py 2>/dev/null
NAME : echo "My name is Merlyn." | $MLN_SPELLS/speak.py 2>/dev/null
WEATHER : curl wttr.in/$MLN_LOCATION

Most commands fit on one line; if a larger amount of code is need it can be put in a script and placed into the spells directory. The $MLN_* variables were set in your .bashrc or .merlyn file.

When you’re ready, run the lmtool.py to generate all.txt (read by Merlyn) and the data/*.shw files (shown to user when they say ‘show general’ or whatever), and the lang/corpus.txt file. The corpus file can be submitted to CMU’s lmtool to generate your language files. Download the tar file (I like to use wget inside the lang subdir, after deleting all the current contents). Take the 4-digit number the files came with, and edit Merlyn/listen.py to replace the current number.

Ongoing Development

I’m adding more commands and cleaning up. There is a primitive syntax for allowing commands to have parameters, currently numbers only, for mouse movement and calculator commands. This might need some extension for other use cases…

I’d welcome any contributions, e.g. command files or well-commented code. I’ll acknowledge any such contributions, and be sure to put your name and optional link in the file comments (after a ‘# ‘).

A Selection of Commands

(There are many more)

Applications

BROWSERS:
address field, address, close tab, close window, dot com, dot co uck, dot net, dot org, find on page, go back, go forward, go to browser, new tab, new window, next tab, open browser, page back, page forward, previous tab, quit browser, quit firefox, refresh page, search field, tab eight, tab five, tab four, tab nine, tab one, tab seven, tab six, tab three, tab two,
RHYTHMBOX:
music, music next, music pause, music play, music prev, music show, music silence, open rhythmbox, go to rhythmbox, quit rhythmbox, search music,
CALCULATOR:
calculator, open calculator, quit calculator, number <number> (spoken in words, e.g. one thousand and twenty four), add <number>, subtract <number>, times <number>, divided by <number>, press enter (for equals).
VLC:
open v l c, go to v l c, quit v l c, play faster, normal speed, half size, full size, open volume control, go to volume control,
MEDIA:
lower volume, mute sound, next track, pause play, play pause, previous track, raise volume, restart track, un mute sound,
GAMES:
backgammon, checkers, chess, gnome-chess, pychess, draughts, go, maelstrom, mastermind, pinball, reversi, supertuxkart,
INFORMATION:
calendar, date, time, name, weather,
MISCELLANEOUS APPS:
audacity, genie, image editor, image viewer, leaf pad, note pad, oscilloscope, screencast, settings, software centre, terminal, virtualbox, webcam,

General

MERLYN CONTROL:
merlyn, exit, help, keep listening, stop listening, show applications, show general, show keyboard, show websites,
GENERIC DESKTOP COMMANDS:
always on top, close dialog, next window, page down, page up, quit application, save file,
WINDOWS & MENUS:
document menu, edit menu, file menu, format menu, help menu, insert menu, maximize window, tools menu, view menu,
FILE MANAGER:
files, open file manager, go to file manager, quit file manager,
MAIN FOLDERS:
open desktop, open documents, open downloads, open merlyn, open music, open pictures,

Keyboard

MOUSE:
click and hold, click here, mouse click, double click, middle click, mouse location, mouse west, mouse east, mouse north, mouse south (last 4 to be followed by a number (of pixels))
ALPHABET:
letter a, … letter z, capital a, … capital z,
NUMBERS:
number zero, … number nine,
FUNCTION KEYS:
eff one, … eff twelve,
OTHER KEYSTROKES:
backspace, cancel, colon, comma, control a, … control z, control shift c, control shift e, control shift n, control shift o, control shift v, control shift w, delete, end line, enter, equals, escape, home, minus, period, pipe, plus, press delete, press end, press enter, press escape, press home, press tab, question mark, restore window, semicolon, space bar, space, star, super key, switch back, tab, tilde,
ARROW KEYS:
go down, go left, go right, go up, down arrow, left arrow, right arrow, up arrow,

Websites

ai linux, amazon, dictation, ebay, facebook, fractal art gallery, google, google mail, google news, here be dragons, linkedin, map, netflix, python 3 codes, reddit, search, sphinx tool, twitter, wikipedia, x do tools, youtube, tuxar,
MY FAVOURITE MUSIC:
amazon alan parsons, spotify alan parsons, youtube alan parsons, … amazon yes, spotify yes, youtube yes,

Leave a Reply

Your email address will not be published. Required fields are marked *