The day of result, was a very, very long day.
With this small writeup, I intend to talk about everything before that day, my experiences, my journey, and the role of Matplotlib throughout!
About Me#
I am a third-year undergraduate student currently pursuing a Dual Degree (B.Tech + M.Tech) in Information Technology at Indian Institute of Information Technology, Gwalior.
During my sophomore year, my interests started expanding in the domain of Machine Learning, where I learnt about various amazing open-source libraries like NumPy, SciPy, pandas, and Matplotlib! Gradually, in my third year, I explored the field of Computer Vision during my internship at a startup, where a big chunk of my work was to integrate their native C++ codebase to Android via JNI calls.
To actuate my learnings from the internship, I worked upon my own research along with a friend from my university. The paper was accepted in CoDS-COMAD’21 and is published at ACM Digital Library. (Link, if anyone’s interested)
During this period, I also picked up the knack for open-source and started glaring at various issues (and pull requests) in libraries, including OpenCV [contributions] and NumPy [contributions].
I quickly got involved in Matplotlib’s community; it was very welcoming and beginner-friendly.
Fun fact: Its dev call was the very first I attended with people from all around the world!
First Contributions#
We all mess up, my very first PR to an organisation like OpenCV went horrible, till date, it looks like this:
In all honesty, I added a single commit with only a few lines of diff.
However, I pulled all the changes from upstream
master
to my working branch, whereas the PR was to be made on3.4
branch.
I’m sure I could’ve done tons of things to solve it, but at that time I couldn’t do anything - imagine the anxiety!
At this point when I look back at those fumbled PRs, I feel like they were important for my learning process.
Fun Fact: Because of one of these initial contributions, I got a shiny little badge [Mars 2020 Helicopter Contributor] on GitHub!
Getting started with Matplotlib#
It was around initial weeks of November last year, I was scanning through Good First Issue
and New Feature
labels, I realised a pattern - most Mathtext related issues were unattended.
To make it simple, Mathtext is a part of Matplotlib which parses mathematical expressions and provides TeX-like outputs, for example:
I scanned the related source code to try to figure out how to solve those Mathtext issues. Eventually, with the help of maintainers reviewing the PRs and a lot of verbose discussions on GitHub issues/pull requests and on the Gitter channel, I was able to get my initial PRs merged!
Learning throughout the process#
Most of us use libraries without understanding the underlining structure of them, which sometimes can cause downstream bugs!
While I was studying Matplotlib’s architecture, I figured that I could use the same ideology for one of my own projects!
Matplotlib uses a global dictionary-like object named as rcParams
, I used a smaller interface, similar to rcParams, in swi-ml - a small Python library I wrote, implementing a subset of ML algorithms, with a switchable backend.
Where does GSoC fit?#
It was around January, I had a conversation with one of the maintainers (hey Antony!) about the long-list of issues with the current ways of handling texts/fonts in the library.
After compiling them into an order, after few tweaks from maintainers, GSoC Idea-List for Matplotlib was born. And so did my journey of building a strong proposal!
About the Project#
Proposal Link: Google Docs (will stay alive after GSoC), GSoC Website (not so sure)#
Revisiting Text/Font Handling#
The aim of the project is divided into 3 subgoals:
-
Font-Fallback: A redesigned text-first font interface - essentially parsing all family before rendering a “tofu”.
(similar to specifying font-family in CSS!)
-
Font Subsetting: Every exported PS/PDF would contain embedded glyphs subsetted from the whole font.
(imagine a plot with just a single letter “a”, would you like it if the PDF you exported from Matplotlib to embed the whole font file within it?)
-
Most mpl backends would use the unified TeX exporting mechanism
Mentors Thomas A Caswell, Antony Lee, Hannah.
Thanks a lot for spending time reading the blog! I’ll be back with my progress in subsequent posts.