Category: Websites

Jadoo and static website generators

Coming back from holidays, I fired my RSS reader and, among many interesting posts, I found this one from Smashing Magazine about static website generators being the Next Big Thing on the web (and a follow-up deep-diving into four of them).

The first paper describes how the web started as something static, became all dynamic and is progressively coming back to something more static, at least for some specific tasks. The interesting thing is that the author also describes pros and cons of each stage and why the web jumped to the next level.

jadooWhile reading this, I couldn’t help thinking of Jadoo, a pet project I started in 2007. Its goal was to get rid of the complexity and number of resources required to run a dynamic blog system. Following some notes from Alexandre Dulaunoy, it was written in Python and already used concepts now hidden under buzzwords 😉 like templating and a rudimentary meta-data organization. At that time, there was nothing like Markdown, assets management, caching, Github, … (not as widespread as today at least). There is an initial post and an update – then I gave up (reasons inside). Note drawbacks I wrote at that time are still drawbacks of current static website generators (manual update and local edition only). All these ideas in 2007, one year before Jekyll … 😉

P.S. The irony is that posts about Jadoo were later transferred to WordPress – and this blog is also npw currently hosted on WordPress!

Privacy -vs- information conservation time

In my opinion privacy issues are a by-product of information conservation times reaching infinite.

For centuries and more humans were used to their own type of memory. When information reaches the brain, it is stored in short-term memory. When relevant and/or repeated, it is gradually consolidated into long-term memory (this is roughly the process).

Schematic memory consolidation process

The invention of oral transmission of knowledge, written transmission (incl. Gutenberg) and, to a certain extend, internet, all these successively increased the duration of retention of information shared with others. The switch from oral to written transmission of knowledge also sped up the dissemination of information as well as its fixed, un-(or less-) interpreted nature.

Duration of information over time

With the internet (“1.0” in order to put some buzzword) the duration of information is also extended but somehow limited ; it was merely a copy of printing (except speed of transmission). Take this blog, for instance: information stored here will stay as long as I maintain or keep the engine alive. The day I decide to delete it, information is gone. And the goal of internet was to be able to reach information where it is issued, even if there are troubles in communication pipes.

However on top of this internet came a serie of tools like search engines (“Google”) and centralized social networks (“Facebook”). Now this information is copied, duplicated, reproduced, either because of the digital nature of the medium that allows that with ease. But also because these services deliberately concentrate the information otherwise spread. Google concentrate (part of) the information in its own datacenters in order to extract other types of information and serves searches faster. Facebook (and other centralized social networks) asks users to voluntarily keep their (private) information in their own data repository. And apparently the NSA is also building its own database about us at its premises.

In my opinion, whenever we were sharing information before, privacy issues were already there (what do you share? to whom? in which context? …). But the duration of information is now becoming an issue.

Is it so difficult to maintain a free RSS reader?

A few months ago Google decided to retire its Google Reader (it stopped working on July 1st, 2013). As it was simple, effective and good-looking, a lot of people complained about this demise. A few days ago The Old Reader, one of the most successful replacement for Google Reader, also announced it will close its gates, only to keep early registered users. And today Feedly, another successful alternative, announced it is introducing a pro version at 5.00 USD per month.

One of the reasons often evoked is the difficulty for these relatively small projects (before Google Reader demise) to handle the many users who migrated to their platform. Difficulties in terms of hardware resources but also human resources, finances, etc.

So, to answer my own question, yes, it looks like it’s difficult to maintain a free RSS reader with an extensive number of users. And free software alternatives like Tiny Tiny RSS, pyAggr3g470r or Owncloud can be difficult for users to install (and especially maintain – same type of difficulties: necessity to have a host and technical capabilities, time, money (even if at a different scale), …).

Two thoughts on this. Fist people are used to free products on the internet (count myself among them). And we take for granted that services on the web are and will remain free. RSS and its associated readers were a great inventions to keep track of information coming from various sources. However with the explosion of the number of these sources is RSS still a valid tool? One solution is to restrict ourselves to some, carefully selected sources of information. The other is to imitate statistics: summary statistics exist for raw data, datamining should become as easy to use for raw information (but I don’t think datamining is as easy as summary statistics).

Which leads me to my second thought: aren’t this just signs of the end of RSS as we know it? People thought of it because of a giant web service provider removed its “support” for RSS. What if it is just the end of RSS because it is not adapted anymore to “modern” use?

Let me try a comparison. E-mail is an older system than RSS. It is however still there. It serves another purpose: one-to-one or one-to-few communication. But since its origin e-mail clients tried to innovate by adding features, among which is automated classification of e-mail. Spam filters exist since a long time. Rules can be defined in most e-mail clients. GMail (again from Google) is now classifying your own e-mail with “Priority”, “Social” etc. These tools help us to de-clutter our Inbox and keep only relevant e-mails in front of us when we need them. I think RSS would benefit from similar de-clutter/summarizing tools. We just need to find / invent them.

Happy New Year 2012!

I wish you a very happy New Year 2012! Lots of things happened since 6 years (since I started this blog) and lots of things happened in this last year too. I’m sure it is the same in your life. I hope you will have lots of new discoveries in 2012 as well as a healthy and strong life, full of happiness!

If I look back, the top 5 posts this year were:

  1. Human Development Index 2011
  2. Adobe Flash Player update: qui fait le malin tombe dans le ravin
  3. Aaron Swartz versus JSTOR (btw there isn’t any news about this case since then)
  4. Yesterday was International Day of Older Persons
  5. Today is World Population Day

I missed a lot of things recently, like the closing ceremony of the International Year of Chemistry (December 1st), the UN World AIDS day (December 1st), the World day bacterial resistance awareness (November 18th) and the UN World Diabetes Day (November 14th). Maybe next year …

There is no point writing down the top 5 keywords that lead to this blog: they are all related to the HDI (Human Development Index).

Although I like to read about other people’s predictions for 2012 (and the coming years), I won’t do any: it’s up to you to act and do something you want to be part of 2012 🙂 Happy New Year!

No more Read More!

Just a little post to write how I hate those “Read More” sentences in blog post!

Grrr, again a disguised "Read More"!
Grrr, again a disguised "Read More"! This post has a very low information content as presented.

“Read More” is a way to cut your blog post in two: one part that will be shown in your blog RSS flux, on your front page and another part that will only be read by those who click on the “Read More”. A variant of this is the […] (as shown above).

Most of the time, I read blogs I subscribed to via a RSS reader (or news aggregator). I read blog posts when I have time. A central place like the RSS reader is like a personalized newspaper. I usually read blog posts grouped by the interest I have at the moment I read them. An organized place like the RSS reader is like a personalized newspaper, just on a certain topic, just when you need it.

Now, if there are “Read More” after 2 or 3 lines of text, I usually don’t know anything about the quality of the remaining article. Either I click, I launch my web browser (wait, wait, wait …) and finally read the remaining part of the article. It’s very rare I do that. Usually, I just go to the next article. Most of the time, the next article is by someone else, on another blog (items being classed by interest topic and time in my reader). After a few disappointments of finding a “Read More”, I just cancel my subscription to the RSS feed. Unless information given there is very, very interesting.

I know that “Read More” directs more traffic to your website. I know getting more traffic makes you earn money (if you serve ads), can potentially make you earn money (if you sell something) or simply gives you more details about your audience. I know that  “Read More” allows to “shorten” long blog post so your front page is not 10-pages long.

But in the end, the goal is to be read. If you write well, have a good product and/or don’t annoy people with unnecessary “clicks”, people will anyway come to your website and potentially buy your stuff. Imho, simplifying readers’ lives by giving them all the information they want in the format they want regardless of the channel (RSS, e-mail alert and website) is more important than any other reason. Don’t force page views, it’s annoying!

An update on JoVE

Sorry We're Closed by bluecinderella on FlickrThree years ago, I wrote about JoVE, the Journal of Visualized Experiments. JoVE was a peer reviewed, open access, online journal devoted to the publication of biological research in a video format. I recently discovered that since 2009, JoVE is now just a peer reviewed, open access, online journal devoted to the publication of biological research in a video format. You can debate at length on whether JoVE was Open Access (as I thought) or not. I just think it’s sad although I understand their motives: in a recent exchange with them, they wrote they “handle most production of our content [themselves] and it is a very very costly operation”.

The recent exchange I had with Jove was about another previous post describing a way to store the videos locally, as anyone would do with Open Access articles in PDF format. I was unaware of two things:

  1. JoVE dropped the “Open Access” wording as I wrote above (however, there is still a possibility to publish a video in free access for a higher fee, as described as “Open access” in the About section for authors);
  2. the “trick” was still working (and people at JoVE seemed to be aware of that and I saw similar description of the trick elsewhere).

Unfortunately, this trick will not work anymore in the coming weeks since they will “do token authentication with [their] CDN“. JoVE will remain for me a very interesting journal with videos of quality and without any equivalent yet (SciVee doesn’t play in the same playground and I wonder why Research Explainer missed the comparison in their 2010 interview).

I was then wondering what could have been the impact of this decision on the number of videos published in JoVE as free access. I didn’t find any statistics related to this on the JoVE website (unrelated thought: I like the way BioMed Central gives access to its whole corpus). I then relied on PubMed to find all the indexed articles from JoVE and relied on its classification of “Free Full Text” (i.e. copied on the PubMed Central website, including the video). At the time of writing (August 2011), on a total of 1191 indexed articles, 404 are “Free Full Text”. This is nearly 34% of all JoVE articles. When you split this by year since 2006 (when JoVE went online), you obtain the following table and chart:

Year All articles Free Full Text articles Note
2006 18 18 Full free access
2007 127 127 Full free access
2008 115 87
2009 217 118 Introduction of Closed Access
2010 358 42
2011 356 12 So far (August 2011)
2011 534 18 Extrapolation to full year keeping the same proportion

Total number of articles and free full texts in JoVE

As we can see on the left chart, plotting the total number of articles in JoVE -vs- time, there is a steady increase in the number of articles since 2006. This tend to prove that more and more scientists enjoy publishing videos. It would be nice to have access to JoVE statistics in order to see if there is the same increase in the overall number of views of all videos. With “web 2.0” and broadband access in universities, I guess we would see this increase.

However, as we can see on the right chart, plotting the percentage of JoVE “Free Full Texts” in PubMed -vs- time, there is a dramatic decrease in the percentage of Free Full Texts in JoVE since 2008-2009. Less and less videos are published and available for free in PubMed Central. This is unfortunate for the reader without subscription. This may also be unfortunate for the publisher since there are less and less authors over time who pay the premium for free access. But since authors also pays for closed access, there is certainly a financial equilibrium.

Some methodological caveats … The PMC Free Full Texts are not necessarily in free access on the JoVE website (and vice-versa ; all the ones I checked are but I didn’t check all of them!). This might explain why there is already a reduction in Free Full Texts in PMC in 2008 while JoVE closed their journal in April 2009. I expected the same proportion of free articles published until the end of 2011 than in the beginning of 2011 ; this might not be the case (let’s see in January 2012 ; this also leads to the question: “is there a seasonal trend in publishing in JoVE?”).

What I take as a (obvious) message is that if authors can pay less for the same publication, they will, regardless of how accessible and affordable the publication will be for the reader. I don’t blame anyone. But I can’t help thinking the Open Access model is better for the universal access to knowledge.

Photo credit: Sorry We’re Closed by Cinderella on Flickr (CC-by-nc-sa)

Aaron Swartz versus JSTOR

Boston Wiki Meetup Aaron Swartz, a 24-year old hacker, was recently indicted on data theft charges for downloading over 4 million documents from JSTOR, a US-based online system for archiving academic journals. Mainstream media (ReutersGuardianNYTTime, …) reported this with a mix of facts and fiction. I guess that the recent attacks of hacking groups on well-known websites and the release of data they stole on the internet gave to this story some spice.

First, I really appreciate what Aaron Swartz did and is currently doing. From The Open Libraryweb.py, RSS, to the Guerilla Open Access Manifesto and Demand Progress, he brought a lot to the computer world and the awareness of knowledge distribution.

Other blogs around the world are already talking about that and sometimes standing up for him. I especially liked The Economics of JSTOR (John Levin), The difference between Google and Aaron Swartz (Kevin Webb) and Careless language and poor analogies (Kevin Smith). I also encourage you to show your support for Aaron as I think he’s only the scapegoat for a bigger process …

I also think Aaron Swartz went too fast. If you do the maths (see appendix below), the download speed was approximately 49Mb per second. Even in a crowded network as the MIT one, this continuous amount of traffic coming from a single computer (or a few if you forge your addresses) is easily spotted. I understand he might have been in a hurry given that his access was not fully legal (although I think it initially was). It was the best thing to do if he wanted to collect a maximum amount of files in the shortest period of time.

This lead me to wonder what was the goal behind this act.

People stated it was his second attempt at downloading large amounts of data (which is not exactly true), depicting him like a serial perpetrator. Others stated that his motives were purely academic (text-mining research, JSTOR Data For Research being somewhat limited). One can also think of an act similar to Anonymous or LulzSec that were in the press recently. Or money, maybe (4*10^6 articles at an average of $15 per article makes $60 million), although this seems highly unlikely. The simple application of his Guerrilla Open Access Manifesto?

What is also puzzling me is the goal of JSTOR. It constantly repeats that it is supporting scholarly work and access to knowledge around the world. From its news statement, it says it was not its fault to prosecute Aaron Swartz but US Attorney’s Office’s. But at the same time, they assure they secured “the content” and made sure it will not be distributed. And the indictment doesn’t contain anything related to intellectual property theft. The only portion related to the content is a fraudulent access to “things of value”.

I think one of the issue JSTOR has is that it doesn’t actually own the material it sells to scientists. The actual publishers are dictating what JSTOR can digitize and what it can’t. And unfortunately, they only see these papers as “things of monetary value”.

However these things are actual scientific knowledge, usually from a distant past and usually without any copyright anymore. Except the cost of digitizing and building the search engine database (which are both  provided by Google Books and Google Scholar for free, or the Gutenberg project in another area), all the costs related to the dissemination of these papers are already covered, usually since a long time. The irony is that some of the papers behind the JSTOR paywall are sometimes even freely available elsewhere (at institutions’ and societies’ repositories, e.g.).

It wouldn’t have cost much to put all these articles under an Open Access license while transferring them to JSTOR. JSTOR would then charge for the actual digitizing work but wouldn’t have to “secure the content” in case of redistribution since it would then be allowed. The not–for–profit service provided by JSTOR would then benefit to the knowledge instead of being one additional roadblock to it.

JSTOR, don’t become the RIAA or the MPAA of old scholar content!

Appendix. The maths

In “retaliation”, Gregory Maxwell posted 32Gb of data containing 18,592 JSTOR articles on the internet. This is an average of 1.762Mb per JSTOR article. Aaron Swartz downloaded 4*10^6 articles from JSTOR that represents approximately 6.723Tb of data. That took him 4 days (September 25th, 26th and October 8th and 9th, 2010) at an average of 1,721.17Gb per day. If we assume the computer was working 10 hours per day (he has to plug and unplug the computer during working hours), the average download speed id 172Gb per hour or 2.869Gb per minute or 48.958Mb per second.

Photo credit: Boston Wiki Meetup by Sage Ross on Flickr (CC-by-sa).