I have spent a great summer this year. I have been longing to participate in GSoC for a long time. I never thought I would have the chance to get in or even be accepted. It was extraordinary to have the opportunity to contribute to mlpack.
During these three months of summer, I have adapted mlpack for low resource devices and the overall results were very good, even great. We have achieved a major refactoring in the entire codebase in mlpack, resulting in the following:
- A major reduction in the size of all the binaries in mlpack.
- Replace two major dependencies that require linking (boost program_options, etc…)
- Adding support for embedded systems.
- Pave the way to the removal of all boost dependencies.
- Pave the way to add support for new systems.
This work will result in a novel major version of mlpack. Hopefully, users and developers will be amazed by the considerable improvements in the new version.
Let us see what I have done this summer:
The first step of this project was to add support for several embedded platforms to mlpack. This has allowed mlpack to become available and suitable for users working on micro-controllers, Internet Of Things (IOT) projects, and any old system that has the capacity to run machine learning algorithms. As we have started adding the support for different systems, we have noticed that the binary size for different mlpack executable is considerable and needs to be reduced. This reduction will allow to make these executables adapted for small memory devices.
Therefore, we have looked inside these binaries using profiler to understand the reason behind these used megabytes. It was clear that external dependencies such as boost serialization, caused a major overhead to mlpack binaries. Thus, after some discussion with the community, we have decided to remove all the dependencies that were causing this major overhead. This work was divided into three objectives:
Replace boost program_options by CLI11
The first objective was to replaced boost::program options by CLI11. This provided mlpack with a single-file header library that manages the user interface, without the need to link the library at the end. This work was finished completely and merged into mlpack master. The pull request can be found here:
Replace boost serialization by Cereal
The second objective was to replace boost::serialization by cereal. This reduced the binary size of each mlpack binary significantly and provided mlpack with a serialization library that did not require linking. This work is in its final stage and should be merged soon into mlpack master. The pull request can be found here:
Adding support for different architectures
The third objective was to add support for several devices, this allowed the user to use a normal build system in order to build mlpack on different architectures or low resources devices. The pull request can be found here:
When the last two pull requests are merged completely, I am looking forward to create a tutorial video for users that describes how to install mlpack on devices with very low resources. This will allow new users, contributors, and developers to use directly the work realized in this GSoC project easily rather than spending time reading documentation and looking for an example or asking a question in mlpack Github issues.
I would also like to add examples into mlpack. These examples will be specific for low resource devices, such as IOT devices.
None of the above would have been even possible without the support of my
mentors. I had great support and encouragement from my two mentors.
Firstly, I would like to thank my mentor Ryan Curtin
rcurtin for being so patient with
me. He did the debugging that I was not able to understand. Thank you for working me through
what I did not comprehend. I have learned greatly from working at your side.
I would like also to thank Roberto Hueso
Gomez for his support, ideas, reviews, and effective help
when I did not know what to do.
I would like to thank the mlpack community. My pull requests were not only reviewed
by my mentors, but by several mlpack maintainers, such as Marcus Edel
and Ryan Birmingham and by several mlpack community member and contributors
such as Jeffin Sam and Yashwant Singh Parih.
Also, I would like to thanks Yashwant Singh Parih for giving an estimation of the amount of megabytes we have gained by the second pull request which can be found in this link:
I would like to thanks all GSoC students in the community, I have spent a lot of time reading their blog post, and see their contributions, we had very cool projects this year.
I can not even quantify the skills that I have developed this summer at GSoC. I would like to thank Google for creating this program and giving the opportunity to students from around the world to learn from senior software engineers, professionals, and learn how to integrate an open-source community.
I had a great joy learning this summer with this project. If I have the chance to repeat GSoC, I will absolutely do.