Using IoT and Computer Vision to Build a Stand-in Smart Robot for Remote Workers

Guest Writer

- Last Updated: December 2, 2024

Guest Writer

- Last Updated: December 2, 2024

Employees of many companies are used to joining meetings no matter where they are, using tools such as Slack, Google Hangouts, and Zoom. A sweet consequence of this is often a generous work from home policy, allowing them to work from home whenever they want.

Trouble is, working remotely leads to missing out on all the fun that takes place outside of meetings when they're not connected. Wouldn’t it be cool to have a robot representing us at the office, showing us what goes on while we’re not there?

A group of IoT, computer vision and full-stack specialists at Tryolabs got really enthusiastic about the idea and went on a mission to create a smart robot that could be remotely controlled from home and show us what takes place at the office.

They had three rules to follow:

⏰ Hours to complete the project: 48
👫 Number of team members: 4
☕️ Coffee provided: Unlimited

The remainder of this post was written by TryoLabs regarding their hackathon.

[caption id="attachment_34236" align="aligncenter" width="1200"]

(Hackathon team: Joaquín, Braulio, Javier, and Lucas) Image Credit: Tryolabs[/caption]

Building the Hardware of a Mini-Robot

It all started with the design of the mini-robot’s mechanical structure, which is present at the office while the remote worker isn’t.

To have a robot that can easily move around the office, it must be mobile, stable, small enough to pass through doors, and big enough to not be overseen and trampled on by the team working at the office. We went through several iterations of its structural design before settling on this one:

[caption id="attachment_34237" align="aligncenter" width="1200"]

A sketch of the robotic remote worker design

Image Credit: Tryolabs[/caption]

We chose aluminum as the main material for the components since it’s light, robust, and cheap.

Once we defined the design and selected the materials, we cut the aluminum parts and put them together with small screws. Since we had to work with the tools available at the office and from the store around the corner, this was a rather improvised and humorous process. We used heavy books to shape the aluminum and sunglasses as safety glasses while drilling into the components, just to give you an idea. 🙈

The main hardware components we settled on were:

Layers of aluminum sheets to build the structural backbone
Screws
RaspberryPi
PiCamera
1 Servo motor SG90
H-bridge to control the motors
2 DC motors
2 wheels
Swivel Casters
Wire
PowerBank

Enabling the Smart Robot to Communicate in Real-Time

While some of us continued working on the hardware and assembling the pieces, the rest of the team started building the software that would control all the components mentioned above.

Implementing WebRTC

The aim of the smart robot’s software was to enable real-time communication between the remote workers and the teams at the office. In other words, the robot needed to be able to transmit video and audio from the office to the people working remotely and vice versa.

While evaluating various approaches to solving this problem, we came across WebRTC, which promised to be the tool we were looking for:

WebRTC is ideal for telepresence, intercom, VoIP software in general as it has a very powerful standard and modern protocol which has a number of features and is compatible with various browsers, including Firefox, Chrome, Opera, etc.

The WebRTC extension for the UV4L Streaming Server allows for streaming of multimedia content from audio, video, and data sources in real-time as defined by the WebRTC protocol.

— WebRTC

Specifically, we used the WebRTC extension included in UV4L. This tool allowed us to create bi-directional communication with extremely low latency between the robot and the remote worker’s computer.

Running the UV4L server with the WebRTC extension enabled, we were able to serve a web app from the RaspberryPi, then simply access it from the remote worker’s browser establishing real-time bidirectional communication; amazing!

This allowed us to set up a unidirectional channel for the video from the PiCamera to the browser, a bidirectional channel for the audio, and an extra unidirectional channel to send the commands from the browser to the robot.

Building a UI to Manage Communication

To be able to see the data and send the commands in a user-friendly way for the remote worker, we researched how to integrate those functionalities into an accessible and practical front-end.

Inspired by the web app example from the UV4L project, we integrated the data channels mentioned above into a basic but functional front-end, including the following components:

index.html: the HTML5 page, which contains the UI elements (mainly *video)* to show the incoming streaming and **the canvas to show the pose estimation key-points
main.js: defines the callbacks triggered by user actions like “start streaming”, “load net”, “toggle pose estimation”, etc.
signalling.js: implements the WebRTC signaling protocol over WebSocket

https://www.youtube.com/watch?v=ecI1tK6ILns&t=

Controlling the Robot’s Movements Remotely

To handle the movement commands the smart robot would receive from the remote worker, we developed a controller written in Python, that runs like a system service. This service translates commands that control the robot’s motors by:

Setting the pins, connected to the H-bridge wheel motors, to high or low
Establishing the PWM frequency and duty-cycle for the Servo, which adjusts the PiCamera’s orientation

Here’s a snippet of the controller classes:

https://gist.github.com/fa119fed3eb164c7d80930c7971591f4

As a result, we were able to control the robot, “walk” it through the office and enable remote workers to see their teams and approach them via the robot.

However, it wasn’t enough for our enthusiastic team and we continued to pursue the ultimate goal: having an autonomous robot.

Adding Computer Vision to the Smart Robot

We thought, wouldn’t it be awesome if the robot could recognize people and react to their gestures and actions (and in this way have a certain amount of personality)?

A recently released project called PoseNet surfaced fast. It’s presented as a “machine learning model, which allows for real-time human pose estimation in the browser”. So, we dug deeper into that.

The performance of that neural net was astounding and it was really attractive as we ran it over TensorFlowJS in the browser. This way, we were able to get a higher accuracy and FPS rate than by running it from the RaspberryPi, and also less latency than if we had run it on a third server instead.

Rushed by the parameters of the hackathon, we skimmed the project’s documentation and demo web app source code. Once we identified which files we were going to need, we imported them and immediately jumped to integrating these functionalities into our web app.

We wrote a basic detectBody function to infer the pose estimation key points that invoked thenet.estimateMultiplePoses with these params:

https://gist.github.com/4858b3d42c6d1cab16a3cd7dbcfc0c77

Said detectBodywas invoked at a rate of 3 times per second to refresh the pose estimation key points.

Then, we adapted some util functions in order to print the detected body key points and plot its skeleton above the video, arriving at a demo like this:

https://www.youtube.com/watch?v=3DhSR67Uj4Q

This was a very quick proof of concept which added a wonderful feature and hugely expanded the potential capabilities of our robot.

If you’d like to know how this model works under the hood, you can read more here.

Results

48 hours and an unknown amount of coffee led to the construction of a mini-robot with the ability to walk through the office, enable real-time communication between the remote worker and their office mates, and even transport an LP. 😜

https://www.youtube.com/watch?v=ujTBNP5BuRQ

We managed to build the hardware, implement the communication software, and build a PoC for an additional feature using computer vision, which facilitates the robot’s interaction with people.

https://www.youtube.com/watch?v=lKbGat8Bfus

Future enhancements could include object detection features that would allow the robot to recognize objects and interact with them without human help using Luminoth, our open source toolkit for computer vision, for example.

Interested in building a mini-robot for remote collaboration yourself? Check out the repo here!

Written by Lucas Micol, full-stack developer at Tryolabs. This blog post was originally published here.