Using IoT and Computer Vision to Build a Stand-in Smart Robot for Remote Workers
Guest WriterGuest Writer
Employees of many companies are used to joining meetings no matter where they are, using tools such as Slack, Google Hangouts, and Zoom. A sweet consequence of this is often a generous work from home policy, allowing them to work from home whenever they want.
Trouble is, working remotely leads to missing out on all the fun that takes place outside of meetings when they're not connected. Wouldnât it be cool to have a robot representing us at the office, showing us what goes on while weâre not there?
A group of IoT, computer vision and full-stack specialists at Tryolabs got really enthusiastic about the idea and went on a mission to create a smart robot that could be remotely controlled from home and show us what takes place at the office.
They had three rules to follow:
[caption id="attachment_34236" align="aligncenter" width="1200"]
(Hackathon team: JoaquĂn, Braulio, Javier, and Lucas) Image Credit: Tryolabs[/caption]To have a robot that can easily move around the office, it must be mobile, stable, small enough to pass through doors, and big enough to not be overseen and trampled on by the team working at the office. We went through several iterations of its structural design before settling on this one:
[caption id="attachment_34237" align="aligncenter" width="1200"]
Image Credit: Tryolabs[/caption]We chose aluminum as the main material for the components since itâs light, robust, and cheap.
Once we defined the design and selected the materials, we cut the aluminum parts and put them together with small screws. Since we had to work with the tools available at the office and from the store around the corner, this was a rather improvised and humorous process. We used heavy books to shape the aluminum and sunglasses as safety glasses while drilling into the components, just to give you an idea. đ
The main hardware components we settled on were:
While evaluating various approaches to solving this problem, we came across WebRTC, which promised to be the tool we were looking for:
Specifically, we used the WebRTC extension included in UV4L. This tool allowed us to create bi-directional communication with extremely low latency between the robot and the remote workerâs computer.WebRTC is ideal for telepresence, intercom, VoIP software in general as it has a very powerful standard and modern protocol which has a number of features and is compatible with various browsers, including Firefox, Chrome, Opera, etc.
The WebRTC extension for the UV4L Streaming Server allows for streaming of multimedia content from audio, video, and data sources in real-time as defined by the WebRTC protocol.
â WebRTC
Running the UV4L server with the WebRTC extension enabled, we were able to serve a web app from the RaspberryPi, then simply access it from the remote workerâs browser establishing real-time bidirectional communication; amazing!
This allowed us to set up a unidirectional channel for the video from the PiCamera to the browser, a bidirectional channel for the audio, and an extra unidirectional channel to send the commands from the browser to the robot.
Inspired by the web app example from the UV4L project, we integrated the data channels mentioned above into a basic but functional front-end, including the following components:
https://gist.github.com/fa119fed3eb164c7d80930c7971591f4
As a result, we were able to control the robot, âwalkâ it through the office and enable remote workers to see their teams and approach them via the robot.
However, it wasnât enough for our enthusiastic team and we continued to pursue the ultimate goal: having an autonomous robot.
A recently released project called PoseNet surfaced fast. Itâs presented as a âmachine learning model, which allows for real-time human pose estimation in the browserâ. So, we dug deeper into that.
The performance of that neural net was astounding and it was really attractive as we ran it over TensorFlowJS in the browser. This way, we were able to get a higher accuracy and FPS rate than by running it from the RaspberryPi, and also less latency than if we had run it on a third server instead.
Rushed by the parameters of the hackathon, we skimmed the projectâs documentation and demo web app source code. Once we identified which files we were going to need, we imported them and immediately jumped to integrating these functionalities into our web app.
We wrote a basic detectBody
 function to infer the pose estimation key points that invoked thenet.estimateMultiplePoses
 with these params:
https://gist.github.com/4858b3d42c6d1cab16a3cd7dbcfc0c77
Said detectBody
was invoked at a rate of 3 times per second to refresh the pose estimation key points.
Then, we adapted some util functions in order to print the detected body key points and plot its skeleton above the video, arriving at a demo like this:
https://www.youtube.com/watch?v=3DhSR67Uj4Q
This was a very quick proof of concept which added a wonderful feature and hugely expanded the potential capabilities of our robot.
If youâd like to know how this model works under the hood, you can read more here.
https://www.youtube.com/watch?v=ujTBNP5BuRQ
We managed to build the hardware, implement the communication software, and build a PoC for an additional feature using computer vision, which facilitates the robotâs interaction with people.
https://www.youtube.com/watch?v=lKbGat8Bfus
Interested in building a mini-robot for remote collaboration yourself? Check out the repo here!
New Podcast Episode
Recent Articles