Exploring New Map Layers
Social and Economic Aspects of Package Management
Introduction
This is the final post in a three-part series on the wondrous world of package management. Catch up on post #2 here if you missed it.
So far along our journey through the world of package management, we’ve stayed focused mostly on the major technical challenges. For most of us software engineers, that falls squarely within our comfort zone and we might be tempted to stay put. That would be a major mistake.
If you tried to identify what makes Go’s package ecosystem simpler and more usable, you’d probably start by listing some technical achievements: first, Go’s package manager has the benefit of hindsight after observing the consequences of building on an NP-complete package resolution algorithm. Second, Go’s alternative algorithm is a genuinely clever one that maximizes what’s possible with a linear time algorithm.
But neither point comes close to explaining why Go’s approach works in practice and has been accepted by its community. To do that, you’d have to look at the social practices around maintaining and consuming software packages — which inform the design of Go’s package ecosystem throughout.
In this post, we’ll shift away from the mostly technical discussion and instead talk about two additional map layers in the world of package management: the social and economic considerations. Both layers cover the entire world of package management and hence are critical for understanding it. They differ in that social aspects serve as dragonbane, whereas economic aspects serve as dragonwort.
Dragonbane: Social Aspects of Package Management
In part two, you’ll recall that we built a dependency checker to identify version constraints in our repositories, and that we got pushback within the team against fixing every single affected repository. That contention deserves further exploration because, by focusing on what matters, we hopefully can draw a better map of package management. That way, we might just be able to follow the footsteps of Anaximander, who over 1,110 years before Isidore of Seville drew a map of the world that is significantly more accurate and detailed — as you can see in the contemporary reconstruction to the side. The center of that map happens to be Miletus, Anaximander’s home.
Package management became a thing during the early days of open source, between 1993 and 1994 CE, largely as a more humane alternative to building software yourself and for the FreeBSD, Linux, and OpenBSD operating systems as well as the Perl programming language — all of which are open source. Open source software eliminates some aspects of commercial software distribution, such as payment processing or copy protection, that significantly complicate the latter. But it also requires significant user expertise, involves many more stakeholders across many more organizations, and has attendant technical, operational, legal, and economic risks. With this understanding, package managers primarily help with coordination and risk management. It’s only by doing so that they enable the fine-grained composition of software components (“packages”).
In other words: Package managers are tools for addressing social problems!
In his blog post on package management, Sam Boyer presents package managers as tools that seek to minimize harm in the presence of significant risks and uncertainties. He builds on a blog post by Julia Evans, which probably first applied the concept of harm reduction to software development practices. Harm reduction is the proven public health practice that seeks to make undesired outcomes from, say, recreational drug use, more unlikely instead of rigidly rejecting anything but abstinence as acceptable behaviors. Boyer argues that’s also the primary function of a package manager.
While I believe that Evans and Boyer make a convincing case, I would like to offer a second, complementary perspective: package management is an exercise in communicating and managing expectations. When compared to other engineering disciplines, software development stands out for its constant and far more rapid change. So a primary criterion for development tools is whether they help manage that change and isolate humans from negative consequences as much as possible.
Semantic versioning is a great example of a convention that does just that: a patch release implies a bug fix, a minor version release implies new features, and a major version release implies backwards-incompatible changes. But communicating the latter is not enough: backwards-incompatible changes are almost guaranteed to cause significant work and disruption. Go’s package manager builds on that insight and shifts some of the pain of major version releases back to where it belongs: to the package maintainers instead of the package consumers.
The contrast to Python’s package ecosystem is stark. Many of the standards
for Python’s package ecosystem, including those for version numbers and
constraints, really are piecemeal green-field exercises in specification writing
that aren’t anchored in practical requirements but optimize for abstract notions
such as extensibility and flexibility. That’s bad enough but, worse, the
maintainers of the primary package manager, pip
, do not even use
their own tool.
The practice of using one’s own tools — so-called dog-fooding — is
an important systems-building practice because it provides the developers with
feedback early and often. I have certainly used it while building my parser generator
featuring modular syntax. It is also used by cargo, the package manager for
Rust; maven, the package manager for Java; as well as npm and yarn, the package
managers for JavaScript. It is a tremendous missed opportunity for
pip
’s developers and it probably explains many of the peculiarities
and pathologies of the Python package ecosystem.
Dragonwort: Economic Considerations
We are on a roll mapping out the non-technical challenges of package management. So in the spirit of Hecataeus — who significantly improved on Anaximander’s work in mapping the world around Miletus, again shown in a contemporary reconstruction — it’s time to explore economic aspects of package management.
Most package ecosystems don’t just trade in open source projects but also do so under the permissive MIT and Apache 2.0 licenses — reaching almost 63% of packages, according to one recent survey. That makes using these same open source packages an attractive proposition for corporations as well. As a direct result and not surprisingly, package managers have become a standard tool for software development in general. To leverage this critical infrastructure for commercial software development, enterprises such as Enigma interface with the commons through a private proxy that is backed by the public registry. Internal packages are published to the proxy only, which ensures that they remain invisible to the outside world.
For instance, our small business data processing pipeline integrates 220 external open source Python packages that way — 63 of which are direct dependencies of our internal tasks and libraries. The relatively small number of external dependencies probably is a result of our pipeline using a few huge packages, including NumPy, Pandas, and PySpark that cover most needs. Furthermore, Python developers thankfully don’t tend to follow the micro-package dogma of, say, Node.js’ package ecosystem. For comparison, I wrote the custom static website generator for my personal website — a much simpler piece of software — in Node.js and very much sought to minimize external dependencies, preferring to code as much as I could myself. Yet my static website generator still requires 375 external packages.
That a gift economy would become foundational to the hyper-capitalist technology industry is nothing short of astonishing. It certainly helps that, thanks to the unprecedented rate of change in computing technology, source code by itself, when not maintained by engineers, rapidly diminishes in economic value. At the same time, the contrast between the collaborative open source commons and the highly competitive industry it enables has also resulted in contentious and even exploitative practices. One outcome is that relatively simple or low-level libraries serving as convenient but nonessential building blocks are shared liberally — hence the increasing fraction of packages using permissive open source licenses. Yet complete and sophisticated services such as databases or search engines are shared only under newly restrictive licenses.
Some developers of open source software are increasingly chafing at these economic disparities and have taken to sabotaging their own packages, for example, by removing all source code from the repository or by replacing a package’s functionality by something less useful if not outright dangerous. That has resulted in considerable disruption especially within the Node.js ecosystem — with much anger directed at responsible developers. Personally, I can empathize with the frustration felt by developers protesting against such exploitation and also by developers cleaning up the resulting messes. But therein lies the rub: the pain of disruption caused by developers who protest is largely felt by other package developers—not by the company executives and venture capitalists who are empowered to correct the disparities.
Thankfully, a well-designed package manager can help you mostly avoid confrontation with these particular dragons. However, npm did not. Its centralized registry saw a tenfold increase in load from November 2012 to October 2013. It almost broke the ecosystem at that time. The unique solution taken by npm’s primary developer and copyright holder of the source code was to create a company backed by venture capital. That worked for a while but caused significant strife several years later. npm Inc’s former CEO tells one story and npm Inc’s employee #2 tells quite another. Both are very interesting. In contrast, the design of Go’s package ecosystem dispensed with package distribution for its registry. That significantly reduces operating costs and eliminates one source of contention.
Alas, by now, 2022 CE, the npm registry and much of the code for the Go programming language are at least mirrored by Microsoft’s GitHub, the most popular open source commons. Microsoft is also responsible for the development of the most popular open source IDE, Visual Studio Code, as well as the most popular cross-platform application runtime, Electron. For good measure, it optionally ships the Linux open source operating system as a subsystem of Windows. If you knew the Microsoft of the late 1990s and early 2000s, that’s quite a turnaround. In any case, it means that all software development now critically depends on Microsoft.
Then again, when you consider that the Olympic mountains to the west of Microsoft’s headquarters in Redmond, WA and the Cascade mountains to the east make for ideal dragon breeding grounds, that development isn’t too surprising.
Conclusion
Across a series of three blog posts, I made four major points:
First, it is possible to rein in dependency hell, even in Python, and without a monorepo. It took us only a little more than 2,600 lines of code (not counting tests).
Second, building a package ecosystem on an NP-complete version satisfiability algorithm is madness, especially now that we know a much better alternative. If you are involved in maintaining a package manager, it’s time to switch to Go’s much saner design.
Third, while we software engineers love to engineer ourselves out of every challenge, we’d be well-advised to be more cognizant of social and economic factors first. They make a huge difference.
Finally, we address interesting engineering challenges here at Enigma. You might want to consider joining us.