Running Your Own Open Source Project - Part 2


As suggested in Part 1 running your own open source project involves wearing a lot of different hats – software architect, designer, domain expert, and so on. Here are some more, illustrated with lessons learned running the SOFA Statistics project since 2009.

Hat #4 - Promotions Manager/Copy Writer

In many ways this one is the hardest – many of us would rather make things than tell people about them. So at best a mixed bag here.

Firstly, the failures. Even though I tried to follow all the recommended best practices for the media I never managed to get a single thing published in the mainstream media in New Zealand or abroad with the notable exception of c't – Magazin für Computertechnik – a printed German computer magazine with a circulation over 300,000 – and that was not the result of anything I did. So I'm probably not a great source of guidance on wearing this hat.

Within the open source media there were some successes including a cover article in Full Circle Magazine, positive reviews in Linux Monthly, and an interview in FLOSS For Science. And being an open source project numerous people blogged about SOFA and added SOFA to lists of available open source statistics packages.

A further promotional activity was attending conferences and talking about specific aspects of SOFA of broader relevance e.g. the quest for an open source business model.

Promoting SOFA

Hat #5 – Domain Knowledge (Statistics)

If your application involves detailed domain knowledge e.g. electrical engineering or acoustics, then the challenge is compounded. In the case of SOFA Statistics the project has the ambitious goal of making statistics open for all. And the world of statistics is vast and deeply technical. Initially it was thought that an 80:20 approach would work – that 80% of needs could be met with 20% of the tests. The problem is that there is no consensus on what that 20% should be. SOFA users are regularly surveyed as to what features are missing that they most need – and it was hoped that there would be some degree of convergence making it possible to target a manageable list of features. As it turned out there was a very broad array of needs with very little overlap. So it was decided to 1) focus on the areas of functionality already covered and make them easier to use; and 2) make it easier to export data for use in other, more specialised packages such as R.

Of course, it is sometimes possible to delegate much of the technical expertise to external libraries but even then it may be necessary to modify these so they meet the application's needs. In the case of SOFA, for example, numerous changes were made (very carefully) to the statlib library code. These included:

  • Exposing more fields in the results e.g. for the Wilcoxon T test added calculation of n, medians, and minimum and maximum values.
  • Providing results as dictionaries so the calling code could display and otherwise process it in a fine-grained manner. Previously output was in the form of monolithic pre-formatted text
  • Handling zero-division errors
  • Adding the ability to use high-precision numbers (Python's Decimal) for ANOVA tests (at the expense of being much slower)

Hat #6 – Product Manager

What should your product include? What functionality should it leave out – even if requested by some enthusiastic and determined users? Can your product be explained concisely enough so potential users can quickly work out if it is what they need? Is there a risk of ending up in No Man's Land – with something too complex for more simple use cases but too simple and restricted for more advanced use cases? What work flow should your product cater to? Which market segments should you target and which should you ignore? The technical decisions are only one part of the many which need to be made when running a project.

Although a project often starts with a basic intuition that the world needs your product, effort needs to go into discovering if this is in fact the case. And what niche does your product occupy – what is its unique selling proposition? In the case of SOFA some of the key differences were intended to be as follows:

  • Beautiful output – there is no shortage of products out there that have accurate but ugly output. There was also an attempt to make all the other dimensions of the project adequately aesthetically pleasing – including the website, and the GUI interface. Some people see such aspects as irrelevant; others as essential.
  • Ease of Use/Learn as you go – a priority for SOFA was to provide simple guidance for beginner users to enable them to carry out basic analysis e.g. One Way ANOVAs – with some confidence. In addition to the Statistics Wizard all documentation was pitched at clear explanation for people with limited existing statistical knowledge (including people whose knowledge was “rusty”). And any explanations provided had to be acceptable to the statistical community.

SOFA statistics wizard

  • Cross-platform – the assumption being that people might use SOFA of multiple devices e.g. on Windows at the University lab, and on a Mac and a Linux machine at home.
  • Scriptable – although SOFA is fully scriptable from within Python this ability does not seem to have been a killer-feature for the sorts of users SOFA attracted.
  • Multi-project – it was assumed that SOFA would be a useful Swiss army knife for connecting to multiple datasets – not all of them on the same databases or even types of databases. One dataset might be living on a MySQL database (with SOFA connecting dynamically), while other data would be imported from a spreadsheet into the internal SQLite database, and other data yet again entered directly into SOFA. It is not clear that this functionality was valuable enough to compensate for the considerable effort invested.

The uniqueness of SOFA wasn't in any one of these differences but in the combination of them in one product.

As discussed in the section on domain knowledge, the amount your project can bite off depends on the amount you can chew. The more contributors you have, and the greater their willingness to work towards project goals, the more you can potentially cover. It would have been nice to add more functionality to SOFA but the necessary level of community to support that never emerged. Having said that, SOFA has a large number of enthusiastic users and correspondence suggests it has been of considerable help to a diverse range of people.

Misc Other Hats

  • Testing Manager – making a project robust requires tests – not least of all when your project is trusted to generate accurate results and where bugs may not be obviously apparent. And as bugs are identified, additional tests need to be added to prevent regressions. Writing and running these is another of your responsibilities. And everything has to work cross-platform, on different screen sizes, in a multi-lingual environment.
  • Documentation Manager – there are two compelling motivations for writing good documentation:
    • substantially reduces load on “help desk”
    • encourages wide-spread usage of your application
      Additionally the act of explaining how to use particular functionality often makes design warts glaringly obvious. If a feature or work flow is hard to explain or hard to justify then perhaps a redesign is required.
  • Community Manager – I never really managed to grow much of a community and the main contributions people gave were ideas, help getting to the bottom of bugs, spreading the word, and creation of a small amount of video documentation. I read “The Art of Community: Building the New Age of Participation” by Jono Bacon but never really managed to get that side of things to fire. I suspect it is much harder when few of your users are developers. And one large lesson was that people's ability to deliver on their plans is frequently much less than they hope.
  • Research & Development Team – Some of the technical solutions you need may push the envelope. An example from the SOFA project was successfully displaying HTML with full Dojo support cross-platform inside the wxPython GUI itself.
  • Legal/Business Management – If you're preparing for the possibility of large-scale success you need to cover some legal and business bases: licensing, trademarks, handling any agreements associated with applying an open-core business model, contributor agreements, and more.
  • Web Developer – Obviously your project will need a web presence and if it is aimed at end users it might not be sufficient to piggy-back on tech-oriented deployment sites. Will there be a video gallery, image gallery, wiki, documentation, discussion group?
  • Project Manager – There are benefits to having the motivators and the doers separated. In many open source projects the roles will coexist in the same person. Can you get strategically-important things done even if they are unpleasant and not “fun”?

A Satisfying Challenge

Running an open source project is a challenge on many different levels – and if you're looking for a challenge it could be perfect. Nothing else is quite the same and the best thing is that the barriers to entry are so low. If you have an idea itching in your mind you can starting creating something straight away with freely-available open source tools. And the internet provides a ready-made distribution and promotion channel. In spite of all the challenges associated with running an end-user project it is satisfying to make something that lots of people find useful. Although running SOFA is, on occasion, a hard grind, I always find it encouraging to reflect on the wide range of research projects SOFA has been involved in and the geographical spread of these. For example:

SOFA impact example

If this is the sort of thing that appeals to you, there has probably never been a better time to get started than now. So what are you waiting for!?