Voice control with Amazon Alexa
A voice control system for home control of the SmartHome presents a certain challenge. Admittedly, Amazon Alexa and Google Home advertise exactly, but if you try it, you soon realize how limited it is. It works quite well with a limited number of lamps, sockets and maybe even the heater. But when it comes to more complex scenarios with hundreds of controllable elements, you are pushing the limits of built-in skills.
With the Android App Habdroid and openHAB a rudimentary speech recognition is already possible by default. The Android App is limited to using the Google Speech Recognition to send the recognized text via REST API to openHAB. There, the text is then further processed with a RULE. Mostly here by regex searched for words to then trigger various actions.
A first implementation was based on a separate app which basically works on the same principle as Habroid. The key difference is that after the text has been sent to openHAB, it waits for a response to display it as a toast message. i.e. You get feedback from openHAB what has been recognized and deployed.
Since this variant but always needs an app and thus inevitably requires a mobile phone, I also experimented with a second variant. This variant is now permanently in use with me. It is based on an own Alexa skill available at the following URL.
Sentence analysis and semantic detection
It is not simply searched in the text for words that then flow directly into an action, but rather structured the sentence decomposed and analyzed by what, where and how.
In this way you can form more complex and compound sentences.
“All shutters down and in the living room the indirect light to 30%”.
“Light in the hallway and in the living room and in the bedroom the shutters down”
In the second example, e.g. recognized that the living room is also about the light and that it should also be switched on in the hallway.
The sentence decomposition follows the following scheme
- First, the sentence is separated by words like “and”
- Now each part is searched for a group (light, dimmer, socket, roller shutter)
- If more than one room is found in the subset, it is divided again
- Now the group is filled in with the recognized parts. This happens first backwards and then forward. Backwards, first because the last-mentioned groups have a higher priority
- Now the action is recognized for each subset. This is group dependent. “Up” and “Down” is only possible for shutters. On and off only for light and percentages only for dimmers
- Afterwards, the action is filled in the same way for all subsets as for the group. First back and then forward.
- Finally, the rooms are filled in for the partial sets. This happens first forward and then backwards. Again, the order is due to linguistic circumstances. First named spaces are also relevant in the following subset if there was no explicit mention.
- If everything went well, you now have the following information:
the room, the group (light etc …) and the action (on / off, up / down)
- Finally, it is recognized by the group and the room which light, socket or roller shutter could be exactly meant
- If no room is detected, the location of Alexa Dot, who took the sentence, is used. Ex: When I say in the living room “light on” I know that the living room light is meant.
In this way you can form relatively flexible sentences.