Writing a command and control application with voice recognition

Have you ever dreamed about controlling your PC with voice commands? Well, now you can (though only some specific actions)!

What do I need?
- A computer with Ubuntu (you can still do the same on other distributions, but this post won’t cover that).
- A microphone (a cheap one will do).
- Some application(s) which you want to control and which can be used with commands on the terminal.

Installation
Go over to my PPA and install packages julius and julius-voxforge from there.

Writing the command and control application
Follow the instructions from /usr/share/doc/julius-voxforge/examples/README to create your own grammar, and then edit the command.py file to suit your needs (the simplest configuration would be to just edit the dictionary near line 60). Finally, to execute it: julius -quiet -input mic -C julian.jconf 2>/dev/null | ./command.py

Problems
I don’t really have much experience with Julius, but if you have problems with the instructions explained here leave a comment or ping me on IRC (RainCT@Freenode) and I’ll try to help you. But first look at the examples below to ensure that you’ve done everything right :).

More?
I’m currently working at further improving those packages and getting them into Ubuntu. Also, I may write another post in the future explaining how to create your own speech corpora and acoustic models, but I can’t promise anything.

Example on how to control Rhythmbox:

. example.voca:
% NS_B
<s> sil

% NS_E
</s> sil

% ID
DO d uw
COMP k ax m p

% COMMAND
PLAY p l ey
NEXT n eh k s t
PREV p r iy v
SHOW sh ow
UP ah p
DOWN d aw n
SILENCE s ay l ax n s

. sample.grammar
S: NS_B ID COMMAND NS_E

. command.py’s parse function (note: Wordpress breaks the indentation)
def parse(line):
params = [param.lower() for param in line.split() if param]
commands = {
'play': 'rhythmbox-client --play',
'silence': 'rhythmbox-client --pause',
'next': 'rhythmbox-client --next',
'prev': 'rhythmbox-client --previous',
'show': 'rhythmbox-client --notify',
'up': 'rhythmbox-client --volume-up',
'down': 'rhythmbox-client --volume-down',
}
if params[1] in commands: os.popen(commands[params[1]])

. Usage: (Action - Verbal command)
Reproduce - DO PLAY
Pause - DO SILENCE (I didn’t use “DO PAUSE” because like that it had a very high error rate)
Next song - DO NEXT
Previous song - DO PREV (”DO PREVIOUS” can’t be used because VoxForge’s acoustic models don’t support some of it’s phonemes)
Show the name of the current song - DO SHOW
Increment Rhythmbox’s volume - DO UP
Decrement Rhythmbox’s volume - DO DOWN

Random tip:
You can let the computer answer to your commands using either espeak “text to say” or, if you have Festival (which sounds more natural) installed, festival -b ‘(SayText “text to say”)’.

Happy hacking!

10 Responses to “Writing a command and control application with voice recognition”

  1. Arnau Says:

    Guai!!! Quan tingui temps ho provaré!

  2. Loffe Says:

    Wow. It’s cool to control music by voice. Unfortunately it don’t work when I play songs very load :P

    Do you know if there’s a project to make voice control easy to the user? Something to control apps with a simple gui…

  3. Peng’s posts for Sunday, 27 July « I’m Just an Avatar Says:

    […] Gevatter: Writing a command and control application with voice command. Let’s admit it. Most of us Trek fans have long waited for the day that we can control our […]

  4. RainCT Says:

    [@ Loffe]

    Yeah, that with the loud music is true, but I think it works quit good anyway. While I was writing the script I wondered if it would recognize anything while I have music playing, but as you see up to a certain point it does :).

    About GUI, I know of the two following ones, but I haven’t tried them yet:

    - Simon (http://simon-listens.org/index.php?id=122&L=1), which uses Julius, but it doesn’t seem to build on Hardy.

    - The panel applet gnome-voice-control (http://live.gnome.org/GnomeVoiceControl), which uses a different engine, Sphinx (which from my previous experience with it seems to require a pretty good microphone - using the cheap one I have it only recognizes like 1 word out of 50). It is in the repositories, but the version in Hardy doesn’t work (although I just spoke with the maintainer and he is working on fixing it now).

  5. Gunni Says:

    Just if any other stumbles upon the error i got.
    The .dict file was not build, i had to use “sudo mkdfa sample” to build the dict and dfa file. I dont know why, cause the directory i created was in my homefolder, belonging me and was chmod 777 -R by me.

  6. VoxForge Says:

    Hi RainCT,

    thank you for creating julius and julius-voxforge in intrepid, that will make my project for school much more simple! Will it be updated when the new audio speech-files in VoxForge are processed? (Now there are around 50 new speech submissions.)

  7. RainCT Says:

    VoxForge: Hey. I’m happy that it’s useful for you :). Drop me a mail once the new version is released and I’ll try if I can get it in, but I can’t promise anything as Feature Freeze is about to start (which means that no new features can enter Intrepid).

  8. VoxForge Says:

    Hi RainCT,

    thank you for your quick reply!
    Is speech recognition accuracy really a feature? I thought that a feature is a new package with new options. It depends, I think, on the reviewer. Maybe you can better not release it, but waiting a few months and try then to release it in intrepid-backports as it might be a bigger improvement.
    I don’t know when Ken comes back from holiday and will process the submissions, but I’ll drop you a mail when it is ready.
    On the other hand, if it will not be entered in Intrepid, is it simply possible to overwrite the acoustic model in order to improve the accuracy?

  9. RainCT Says:

    Voxforge: Well, “new upstream versions” are considered features, unless they are “bugfix only”. But as VoxForge is not an application it may well be possible to get an exception. In regards to your last question, you can of course use any acoustic model you want (just write the path to that one you want instead of to /usr/share/julius-voxforge/acoustic/, in julian.jconf).

  10. VoxForge Says:

    Thank you!

Leave a Reply

If you have an OpenID, you may fill it in here. If your OpenID provider provides a name and email, those values will be used instead of the values here. Learn more about OpenID or find an OpenID provider.