Source Code Under Apache License; Data Under Creative Commons License

Ravi General Note: I would like to say that these licensing related discussions are helping me arrive at licensing policy decisions for‘s future software products in general, besides just the “Spoken English App.”. I mean the same issues would crop up for “Rural Agricultural Portal” if that work is taken up in future.

One of the reasons why I am capturing all these discussions (in suitably edited form) as posts here is that can be a model which other interested persons, in India at least as legal matters vary from country to country at least in implementation, can use for starting their own small social web “service to society” software activities. The key word here is “small”. One single person should be able to start it off and manage as a part-time volunteer activity. Perhaps to keep the management effort limited the number of contributors could also be limited to a small number.

Significant numbers of such small “Software Seva Volunteer Social Networks” may unlock and productively channelize the great potential that India’s large & capable software professional community has to serve the nation through part-time volunteer Software Seva and derive joy (Ananda) from such happy & useful service activities.

The discussion below builds on the post, “Are Text & Audio – Data/Content – covered by Apache License?”:  gives a picture of data licensing as of Nov. 2009. But this post also does not clearly say whether GPL or Apache license cover the data part of the released software.

Maybe a solution is to provide two releases:
a) The software with just test/limited data – Under Apache License

b) The database with full content – Under CC0 or Open Data Commons Open Database License (ODbL) (

Setting up the database (b) to work with (a) should be made a very simple process for the user. Some click-and-run install script like stuff.

This sounds like an interesting solution. We are not getting into dual license mess as we do not have two licenses being applied to the same entity.

Mail exchange with friends on the above is given below:

Friend 1 wrote: It does not seem uncommon that different parts of a distribution be differently licensed. I do recall something on those lines from work, but not the details at the moment. I am unsure of how good/reliable this is, but the page talks of more than one license:

Ravi responded: That’s a useful find. It uses two licenses and since the license coverage area is different – source code & multimedia content – I think “dual license” confusion does not come into play.

Friend 1 wrote: Perhaps  may help you a bit more? It has separate sections for Media & Software.

Lastly, seems to be a service very specific to multimedia content. I am not suggesting you pick this service, rather, it appears to indicate that different content can and is licensed differently.

Ravi responded: Thanks for the links. Did a quick read. Will do detailed read, if required. Useful links, for sure.

Friend 1 wrote: Don’t you know anyone in the legal who could give you general advise? I mean, how ever careful one is, an expert with real-world experience is any day a safer bet. Ain’t it?

Ravi responded: The legal guys I knew back in my software industry days were not into this kind of stuff at that time, at least. Anyway am out of touch with them. Don’t know any other legal guys.

A later mail exchange with friend 1:

Friend 1 wrote: When replying this morning, I did look at the very famous open-sourced, GPL licensed, LAME MP3 encoder – I am unable to find the exact links right now and I don’t recall why I did not send out the link this morning – but here goes: they have just two MP3 files shipped with the source (and it is hardly a second long or so).

Ravi responded: Interesting. But as it is shipped only with two small sample kind of MP3 files it really does not have to bother about audio files licensing. Any “other files” clause in the license is enough to take care of it.

Friend 1 wrote: Coming to the present case, yes, it would be very sensible to set up the database and scripts – but if spoken English is the contribution, I guess the major contribution would be the audio files for the translation and speaking. I mean, the software to play them back and relate Telugu sentence with the English part would not be a major contribution, IMO, as much as the audio files (or equivalent). Am I right?

Ravi responded: Yes, you are right. Not only the audio files but the text sentences too will be the major contribution; the code, in comparison, would be quite minor but not minor enough to not be licensed properly.

Friend 1 wrote: If yes, then setting up the DB to work with the software etc., is fine, but the major benefit of the work (the audio files) would not be shipped with the open-source software. Is that correct?

Ravi responded: Well, the software and data (db schema + db content) have to be downloaded separately. And both will be free & open source licensed. So that opens up the following benefits for users:

a) Those who want to use “Spoken English App.” software + data can do so.
b) Those who want just the code of “Spoken English App.” but want to use it with their own or different data set can do so.
c) Those who only want the data of “Spoken English App.” can download it, modify it and redistribute it, without bothering about the code at all.

Friend 1 wrote: Also, can you check with (a gentleman who has a Telugu/English “free” website) if he has any thoughts/suggestions on this? I mean, IIRC, he hosts PDF/Page Maker files that contain translations – those are not source code files in the sense of programming language sources (but could/may  fall under the ‘source’ definition of Apache Licence.

Ravi responded: (The gentleman who has a Telugu/English “free” website) seems to be a very helpful, service oriented person. However he does not seem to be tech-savvy.

I don’t think he would know much about licensing. When I was asking whether I could use his stuff, his response was a polite but surprised but-you-can-download-it-so-whats-the-problem kind of one. At that time I did not know enough about copyright & licensing. Now I know that since his site does not have an open data license (e.g. CC0), legally he owns the copyright for it even if he has provided a download link.

Legally either he needs to put his stuff under a CC0 kind of license OR I need to have an agreement from him that he has contributed his stuff for my app. – a Contributor License Agreement kind of thing. Once I figure out the Data licensing part for the next thing I will look at is Contributor License Agreement. Starting a social web software activity is NOT EASY :(.

Friend 1 wrote: Another project just came to mind: The Khan Academy ( – they do work similar to what you intend: provide material/services for Educational Training. Their work is licensed under Perhaps if you delve deeper, you may find something more with this project?

Ravi responded: I feel Khan Academy using a CC license is a straightforward solution. Viewed their site again but this time from a licensing point of view :). Confirmed they are using a CC license. They don’t seem to have any downloads of the software used to run the site. The downloads are only content related.

A mail exchange with another friend is given below.

Friend 2 wrote: I spent some time looking for info. Chromium (the open-source version of the Chrome browser) does not seem to use a different license for artwork. See These are the png files used in the UI, and there are no license file in this directory or in its parent (src/ui). The nearest license is that of top level (src), which is a BSD license (not Apache as I’d said earlier).

Ravi responded: I think the artwork (png files etc.) for Chromium would be just a few files. So licensing may not be issue. In fact, they may not want the artwork to be freely copied and used by others! I mean, it will be odd for some other browser to use the Chrome logo (I presume it has one).

Friend 2 wrote: You can confirm by asking on the list chromium-discuss:

The similar mailing list for Android is:

But probably the best list is the Apache Legal list:

Ravi responded: I think it may be a good idea to seek advise about it from Apache-legal list. Let me think about it. Thanks for this tip.

A later mail exchange with friend 2 is given below.

Friend 2 wrote: I thought they treat the logo differently from other art, like icons. The former are trademarked and not under an open license, even in the case of Firefox.

Ravi responded: Ah! So logo is treated different even though it must be part of the source download (as a gif/png file perhaps).


Firefox source code is free software. Most of it is tri-licensed under the Mozilla Public License (MPL), the GNU General Public License (GPL) and the GNU Lesser General Public License (LGPL).[citation needed] These licenses permit anyone to view, modify, and/or redistribute the source code, and several publicly released applications have been built on it; for example, Netscape, Flock, Miro, Iceweasel, and Songbird make use of code from Firefox.

Mozilla not only forbids creating derivative works from the Firefox logo (i.e. modifying it),[147] but also strongly discourages creating independent but similar logos.[148]

— end extract from wiki Firefox#Licensing —

And Firefox is tri-licensed! So MPL, GPL & LGPL licenses do not cover the logo image file. Perhaps that is why these licenses are referred to as “(source) code licenses”.

Ravi Concluding Remarks: I think it is clear to me that the “common sense” and “widely accepted” understanding of “code licenses” is that they are limited to source code sharing & modification (& object code sharing) & distribution of such derived works. They do not clearly cover non-code stuff which will include art-work as well as data. So icons & sample data fall into the “grey area”. As long as the usage is trivial nobody will make an issue of it (even legally such trivial usage matters being raised to court may be treated as “frivolous” and wasting the court’s time).

I am further quite convinced that the “common sense” understanding of this matter is that sharing significant amount of data will need a different (data) license.

Now I feel I don’t need to mail to the Apache legal mailing list.

To conclude, the right decision for licensing seems to “Source Code Under Apache License; Data Under Creative Commons License”.

The specific Creative Commons license (CC0, say) needs to be thought through; the other data license, ODbL, is nowhere near as well known as Creative Commons.

This entry was posted in FOSS Licensing. Bookmark the permalink.