Category: Websites

Trend in COVID-19 cases by Zip code in Maryland

Since the Maryland Department of Health (MDH) started to display number of COVID-19 cases for each Zip code in its dashboard, I was wondering how to display this information in a nice way. The MDH display the information as a map – very nice but it lacks from where each Zip code came from: is the number of cases increasing or decreasing?

Following on my busy chart with the evolution of all Zip codes (and highlighting just one of them – that may not be the one you are interested in, see previous post), I created a simple dashboard where you can select the Zip code you are interested in and see how cases are evolving. You can play with it here: https://jepoirrier.shinyapps.io/md-coronavirus-zip-app/ (screenshot below). Enjoy!

Dashboard showing the number of COVID-19 cases by Zip codes in Maryland, MD. See post for URL.

As usual, the data is from the MDH coronavirus page. Please consult that page for official information. The data is in this file on the Github repo for my other project related to Coronavirus in Maryland.

Jadoo and static website generators

Coming back from holidays, I fired my RSS reader and, among many interesting posts, I found this one from Smashing Magazine about static website generators being the Next Big Thing on the web (and a follow-up deep-diving into four of them).

The first paper describes how the web started as something static, became all dynamic and is progressively coming back to something more static, at least for some specific tasks. The interesting thing is that the author also describes pros and cons of each stage and why the web jumped to the next level.

While reading this, I couldn’t help thinking of Jadoo, a pet project I started in 2007. Its goal was to get rid of the complexity and number of resources required to run a dynamic blog system. Following some notes from Alexandre Dulaunoy, it was written in Python and already used concepts now hidden under buzzwords 😉 like templating and a rudimentary meta-data organization. At that time, there was nothing like Markdown, assets management, caching, Github, … (not as widespread as today at least). There is an initial post and an update – then I gave up (reasons inside). Note drawbacks I wrote at that time are still drawbacks of current static website generators (manual update and local edition only). All these ideas in 2007, one year before Jekyll … 😉

P.S. The irony is that posts about Jadoo were later transferred to WordPress – and this blog is also npw currently hosted on WordPress!

Privacy -vs- information conservation time

In my opinion privacy issues are a by-product of information conservation times reaching infinite.

For centuries and more humans were used to their own type of memory. When information reaches the brain, it is stored in short-term memory. When relevant and/or repeated, it is gradually consolidated into long-term memory (this is roughly the process).

The invention of oral transmission of knowledge, written transmission (incl. Gutenberg) and, to a certain extend, internet, all these successively increased the duration of retention of information shared with others. The switch from oral to written transmission of knowledge also sped up the dissemination of information as well as its fixed, un-(or less-) interpreted nature.

With the internet (“1.0” in order to put some buzzword) the duration of information is also extended but somehow limited ; it was merely a copy of printing (except speed of transmission). Take this blog, for instance: information stored here will stay as long as I maintain or keep the engine alive. The day I decide to delete it, information is gone. And the goal of internet was to be able to reach information where it is issued, even if there are troubles in communication pipes.

However on top of this internet came a serie of tools like search engines (“Google”) and centralized social networks (“Facebook”). Now this information is copied, duplicated, reproduced, either because of the digital nature of the medium that allows that with ease. But also because these services deliberately concentrate the information otherwise spread. Google concentrate (part of) the information in its own datacenters in order to extract other types of information and serves searches faster. Facebook (and other centralized social networks) asks users to voluntarily keep their (private) information in their own data repository. And apparently the NSA is also building its own database about us at its premises.

In my opinion, whenever we were sharing information before, privacy issues were already there (what do you share? to whom? in which context? …). But the duration of information is now becoming an issue.

Is it so difficult to maintain a free RSS reader?

A few months ago Google decided to retire its Google Reader (it stopped working on July 1st, 2013). As it was simple, effective and good-looking, a lot of people complained about this demise. A few days ago The Old Reader, one of the most successful replacement for Google Reader, also announced it will close its gates, only to keep early registered users. And today Feedly, another successful alternative, announced it is introducing a pro version at 5.00 USD per month.

One of the reasons often evoked is the difficulty for these relatively small projects (before Google Reader demise) to handle the many users who migrated to their platform. Difficulties in terms of hardware resources but also human resources, finances, etc.

So, to answer my own question, yes, it looks like it’s difficult to maintain a free RSS reader with an extensive number of users. And free software alternatives like Tiny Tiny RSS, pyAggr3g470r or Owncloud can be difficult for users to install (and especially maintain – same type of difficulties: necessity to have a host and technical capabilities, time, money (even if at a different scale), …).

Two thoughts on this. Fist people are used to free products on the internet (count myself among them). And we take for granted that services on the web are and will remain free. RSS and its associated readers were a great inventions to keep track of information coming from various sources. However with the explosion of the number of these sources is RSS still a valid tool? One solution is to restrict ourselves to some, carefully selected sources of information. The other is to imitate statistics: summary statistics exist for raw data, datamining should become as easy to use for raw information (but I don’t think datamining is as easy as summary statistics).

Which leads me to my second thought: aren’t this just signs of the end of RSS as we know it? People thought of it because of a giant web service provider removed its “support” for RSS. What if it is just the end of RSS because it is not adapted anymore to “modern” use?

Let me try a comparison. E-mail is an older system than RSS. It is however still there. It serves another purpose: one-to-one or one-to-few communication. But since its origin e-mail clients tried to innovate by adding features, among which is automated classification of e-mail. Spam filters exist since a long time. Rules can be defined in most e-mail clients. GMail (again from Google) is now classifying your own e-mail with “Priority”, “Social” etc. These tools help us to de-clutter our Inbox and keep only relevant e-mails in front of us when we need them. I think RSS would benefit from similar de-clutter/summarizing tools. We just need to find / invent them.

Happy New Year 2012!

I wish you a very happy New Year 2012! Lots of things happened since 6 years (since I started this blog) and lots of things happened in this last year too. I’m sure it is the same in your life. I hope you will have lots of new discoveries in 2012 as well as a healthy and strong life, full of happiness!

If I look back, the top 5 posts this year were:

Human Development Index 2011
Adobe Flash Player update: qui fait le malin tombe dans le ravin
Aaron Swartz versus JSTOR (btw there isn’t any news about this case since then)
Yesterday was International Day of Older Persons
Today is World Population Day

I missed a lot of things recently, like the closing ceremony of the International Year of Chemistry (December 1st), the UN World AIDS day (December 1st), the World day bacterial resistance awareness (November 18th) and the UN World Diabetes Day (November 14th). Maybe next year …

There is no point writing down the top 5 keywords that lead to this blog: they are all related to the HDI (Human Development Index).

Although I like to read about other people’s predictions for 2012 (and the coming years), I won’t do any: it’s up to you to act and do something you want to be part of 2012 🙂 Happy New Year!

No more Read More!

Just a little post to write how I hate those “Read More” sentences in blog post!

Grrr, again a disguised "Read More"! This post has a very low information content as presented.

“Read More” is a way to cut your blog post in two: one part that will be shown in your blog RSS flux, on your front page and another part that will only be read by those who click on the “Read More”. A variant of this is the […] (as shown above).

Most of the time, I read blogs I subscribed to via a RSS reader (or news aggregator). I read blog posts when I have time. A central place like the RSS reader is like a personalized newspaper. I usually read blog posts grouped by the interest I have at the moment I read them. An organized place like the RSS reader is like a personalized newspaper, just on a certain topic, just when you need it.

Now, if there are “Read More” after 2 or 3 lines of text, I usually don’t know anything about the quality of the remaining article. Either I click, I launch my web browser (wait, wait, wait …) and finally read the remaining part of the article. It’s very rare I do that. Usually, I just go to the next article. Most of the time, the next article is by someone else, on another blog (items being classed by interest topic and time in my reader). After a few disappointments of finding a “Read More”, I just cancel my subscription to the RSS feed. Unless information given there is very, very interesting.

I know that “Read More” directs more traffic to your website. I know getting more traffic makes you earn money (if you serve ads), can potentially make you earn money (if you sell something) or simply gives you more details about your audience. I know that “Read More” allows to “shorten” long blog post so your front page is not 10-pages long.

But in the end, the goal is to be read. If you write well, have a good product and/or don’t annoy people with unnecessary “clicks”, people will anyway come to your website and potentially buy your stuff. Imho, simplifying readers’ lives by giving them all the information they want in the format they want regardless of the channel (RSS, e-mail alert and website) is more important than any other reason. Don’t force page views, it’s annoying!

An update on JoVE

Three years ago, I wrote about JoVE, the Journal of Visualized Experiments. JoVE was a peer reviewed, open access, online journal devoted to the publication of biological research in a video format. I recently discovered that since 2009, JoVE is now just a peer reviewed, ~~open access,~~ online journal devoted to the publication of biological research in a video format. You can debate at length on whether JoVE was Open Access (as I thought) or not. I just think it’s sad although I understand their motives: in a recent exchange with them, they wrote they “handle most production of our content [themselves] and it is a very very costly operation”.

The recent exchange I had with Jove was about another previous post describing a way to store the videos locally, as anyone would do with Open Access articles in PDF format. I was unaware of two things:

JoVE dropped the “Open Access” wording as I wrote above (however, there is still a possibility to publish a video in free access for a higher fee, as described as “Open access” in the About section for authors);
the “trick” was still working (and people at JoVE seemed to be aware of that and I saw similar description of the trick elsewhere).

Unfortunately, this trick will not work anymore in the coming weeks since they will “do token authentication with [their] CDN“. JoVE will remain for me a very interesting journal with videos of quality and without any equivalent yet (SciVee doesn’t play in the same playground and I wonder why Research Explainer missed the comparison in their 2010 interview).

I was then wondering what could have been the impact of this decision on the number of videos published in JoVE as free access. I didn’t find any statistics related to this on the JoVE website (unrelated thought: I like the way BioMed Central gives access to its whole corpus). I then relied on PubMed to find all the indexed articles from JoVE and relied on its classification of “Free Full Text” (i.e. copied on the PubMed Central website, including the video). At the time of writing (August 2011), on a total of 1191 indexed articles, 404 are “Free Full Text”. This is nearly 34% of all JoVE articles. When you split this by year since 2006 (when JoVE went online), you obtain the following table and chart:

Year	All articles	Free Full Text articles	Note
2006	18	18	Full free access
2007	127	127	Full free access
2008	115	87
2009	217	118	Introduction of Closed Access
2010	358	42
2011	356	12	So far (August 2011)
2011	534	18	Extrapolation to full year keeping the same proportion

As we can see on the left chart, plotting the total number of articles in JoVE -vs- time, there is a steady increase in the number of articles since 2006. This tend to prove that more and more scientists enjoy publishing videos. It would be nice to have access to JoVE statistics in order to see if there is the same increase in the overall number of views of all videos. With “web 2.0” and broadband access in universities, I guess we would see this increase.

However, as we can see on the right chart, plotting the percentage of JoVE “Free Full Texts” in PubMed -vs- time, there is a dramatic decrease in the percentage of Free Full Texts in JoVE since 2008-2009. Less and less videos are published and available for free in PubMed Central. This is unfortunate for the reader without subscription. This may also be unfortunate for the publisher since there are less and less authors over time who pay the premium for free access. But since authors also pays for closed access, there is certainly a financial equilibrium.

Some methodological caveats … The PMC Free Full Texts are not necessarily in free access on the JoVE website (and vice-versa ; all the ones I checked are but I didn’t check all of them!). This might explain why there is already a reduction in Free Full Texts in PMC in 2008 while JoVE closed their journal in April 2009. I expected the same proportion of free articles published until the end of 2011 than in the beginning of 2011 ; this might not be the case (let’s see in January 2012 ; this also leads to the question: “is there a seasonal trend in publishing in JoVE?”).

What I take as a (obvious) message is that if authors can pay less for the same publication, they will, regardless of how accessible and affordable the publication will be for the reader. I don’t blame anyone. But I can’t help thinking the Open Access model is better for the universal access to knowledge.

Photo credit: Sorry We’re Closed by Cinderella on Flickr (CC-by-nc-sa)

Aaron Swartz versus JSTOR

Aaron Swartz, a 24-year old hacker, was recently indicted on data theft charges for downloading over 4 million documents from JSTOR, a US-based online system for archiving academic journals. Mainstream media (Reuters, Guardian, NYT, Time, …) reported this with a mix of facts and fiction. I guess that the recent attacks of hacking groups on well-known websites and the release of data they stole on the internet gave to this story some spice.

First, I really appreciate what Aaron Swartz did and is currently doing. From The Open Library, web.py, RSS, to the Guerilla Open Access Manifesto and Demand Progress, he brought a lot to the computer world and the awareness of knowledge distribution.

Other blogs around the world are already talking about that and sometimes standing up for him. I especially liked The Economics of JSTOR (John Levin), The difference between Google and Aaron Swartz (Kevin Webb) and Careless language and poor analogies (Kevin Smith). I also encourage you to show your support for Aaron as I think he’s only the scapegoat for a bigger process …

I also think Aaron Swartz went too fast. If you do the maths (see appendix below), the download speed was approximately 49Mb per second. Even in a crowded network as the MIT one, this continuous amount of traffic coming from a single computer (or a few if you forge your addresses) is easily spotted. I understand he might have been in a hurry given that his access was not fully legal (although I think it initially was). It was the best thing to do if he wanted to collect a maximum amount of files in the shortest period of time.

This lead me to wonder what was the goal behind this act.

People stated it was his second attempt at downloading large amounts of data (which is not exactly true), depicting him like a serial perpetrator. Others stated that his motives were purely academic (text-mining research, JSTOR Data For Research being somewhat limited). One can also think of an act similar to Anonymous or LulzSec that were in the press recently. Or money, maybe (4*10^6 articles at an average of $15 per article makes $60 million), although this seems highly unlikely. The simple application of his Guerrilla Open Access Manifesto?

What is also puzzling me is the goal of JSTOR. It constantly repeats that it is supporting scholarly work and access to knowledge around the world. From its news statement, it says it was not its fault to prosecute Aaron Swartz but US Attorney’s Office’s. But at the same time, they assure they secured “the content” and made sure it will not be distributed. And the indictment doesn’t contain anything related to intellectual property theft. The only portion related to the content is a fraudulent access to “things of value”.

I think one of the issue JSTOR has is that it doesn’t actually own the material it sells to scientists. The actual publishers are dictating what JSTOR can digitize and what it can’t. And unfortunately, they only see these papers as “things of monetary value”.

However these things are actual scientific knowledge, usually from a distant past and usually without any copyright anymore. Except the cost of digitizing and building the search engine database (which are both provided by Google Books and Google Scholar for free, or the Gutenberg project in another area), all the costs related to the dissemination of these papers are already covered, usually since a long time. The irony is that some of the papers behind the JSTOR paywall are sometimes even freely available elsewhere (at institutions’ and societies’ repositories, e.g.).

It wouldn’t have cost much to put all these articles under an Open Access license while transferring them to JSTOR. JSTOR would then charge for the actual digitizing work but wouldn’t have to “secure the content” in case of redistribution since it would then be allowed. The not–for–profit service provided by JSTOR would then benefit to the knowledge instead of being one additional roadblock to it.

JSTOR, don’t become the RIAA or the MPAA of old scholar content!

Appendix. The maths

In “retaliation”, Gregory Maxwell posted 32Gb of data containing 18,592 JSTOR articles on the internet. This is an average of 1.762Mb per JSTOR article. Aaron Swartz downloaded 4*10^6 articles from JSTOR that represents approximately 6.723Tb of data. That took him 4 days (September 25th, 26th and October 8th and 9th, 2010) at an average of 1,721.17Gb per day. If we assume the computer was working 10 hours per day (he has to plug and unplug the computer during working hours), the average download speed id 172Gb per hour or 2.869Gb per minute or 48.958Mb per second.

Photo credit: Boston Wiki Meetup by Sage Ross on Flickr (CC-by-sa).

Facebook -vs- Twitter short message usage?

The other evening, we started an interesting discussion with some colleagues about usage of Twitter and Facebook. Obviously most people in the room were (and are) using Facebook and knew about the feature (“status”) allowing you to share text messages with your friends (and the whole world). Less people were aware of Twitter, although is also offers the possibility to share text messages with your friends (and the whole world too). I was wondering why most (if not all) people in the room were registered on Facebook but almost none of them were registered (or even using) Twitter. Do not even mention Identi.ca, the open source alternative to Twitter.

Both Facebook and Twitter play in the “social networking websites” circle and both are proprietary. You must register with both to be allowed to participate although no registration is required to read Twitter messages (they are public by default). No such thing with Facebook: only registered users can read what other users posted. Another difference: Facebook allow you to share more than just text messages (photos, videos, play games, etc.) while Twitter relies on third-parties for that (although they are rolling out a photo sharing service). Is that difference in features that make most people prefer Facebook on Twitter? Is that just a snowball effect?

Twitip states that “Facebook appeals to people looking to reconnect with old friends and family members or find new friends online; the mashup of features like email, instant messaging, image and video sharing, etc. feels familiar, while Twitter is a bit harder to get your arms around at first. […] Twitter on the other hand, encourages you grab ideals in byte-size chunks and use your updates as jumping off points to other places or just let others know what you’re up to at any given moment.” Even with those differences, Facebook and Twitter had very similar demographics in 2010, according to Digital Surgeons.

Sharing information via social channels (Facebook, Twitter and alike) grew fast between 2009 (14%) and 2010 (24%) according to Social Twist. It even overtook instant messaging. But this shouldn’t hide the fact that most people still use e-mails to share links. Is it because most people using social media are still “old” (25-35 years old) and used to send and receive e-mails. Of course, Social Twist only records a special kind of measure (media sharing) and I wonder if the supposed use of social media in “Arab revolutions” will have an impact on the 2011 usage. It would be interesting to see the trend in the coming years.

Coming back to the initial question, I think most people in that other evening were mostly using Facebook (and not Twitter) mainly because of the snowball effect (most of the friends are also on Facebook). I mainly use Twitter to share information and Facebook to keep in touch with my friends’lives.

And you, do you use Facebook and Twitter in different manners?

P.S. If you want you can follow me on Twitter and, yes, you can find me on Facebook 😉

Is there a life after delicious?

Delicious is “a social bookmarking service that allows users to tag, save, manage and share web pages from a centralized source. With emphasis on the power of the community, Delicious greatly improves how people discover, remember and share on the Internet“. I extensively use(d) it and I think it’s one of the very good tools Yahoo! (its parent company) has to offer on the web for the moment (along with Flickr and the currency converter). I was thus very disappointed to read persisting rumours that Yahoo! will shut down Delicious. And I’m not totally reassured by the official comment from the Delicious blog: “No, we are not shutting down Delicious. While we have determined that there is not a strategic fit at Yahoo!, we believe there is a ideal home for Delicious outside of the company where it can be resourced to the level where it can be competitive“.

So, first, export your bookmarks. It seems there is no limit to the number of bookmarks you can save, all of them are there. Delicious uses a modified Netscape bookmark file format with meaningful use of HTML tags. In clear: this file can easily be parsed and stored in another format, in a database, in another tool.

Now what if someone finally decides to shut down Delicious? Or what is the future of your data if Delicious outside Yahoo! is transformed into a paid web service (like Historious for example)? What is the competition? What are the alternatives?

I will not blindly list them all. You can find them via a simple web search. Instead I’ll list the features I liked in Delicious and also add some improvements I would have liked to see. If I go for something new (or an improved Delicious), it would be nice if it’s better than the actual one, isn’t it?

What I like in Delicious (in no particular order):

simple: a link, some tags, a description and you are done
tags suggestions based on other users’ tags
private tags
simple “social” link, suggestions between users and trends about what’s currently bookmarked
web-based (accessible from everywhere), quite fast
simple API, extensions for most popular browsers, widgets and badges for inclusion in websites
free, accounts not necessarily linked to Yahoo! (at least in the beginning)

And now what can be added to make it better?

free license (e.g. Affero)
free access to dataset (the whole dataset, not just your dataset ; so compliance with OSSD))
decentralised system while maintaining interoperability (à la Jabber)

And for the rest of your digital life, what if the service provider decides “that there is not a strategic fit” for it at the parent company? Self-hosting seems to be the only viable alternative.