Tag: Privacy

Time commuting in Belgium

DISO1 – Data I Sit On, episode 1. This post is the first of a series of a few exploring data I collected in the past and that I found interesting to look at again … (I already posted about data I collected, see the Quantified Self tag on this blog)

Life is short and full of different experiences. One of the experiences I don’t specifically enjoy but is integral part of life is commuting. Although I tried to minimize commuting (mainly by choosing home close to the office) and benefit(ed) from good work conditions (flexible working hours, home working, etc.), a big change occurred when I took a new opportunity, in 2015, to work in the Belgian capital, Brussels.

Debian-lover-car-jepoirrier-on-flickr
Traffic jam in Brussels – one of my pictures on Flickr (CC-by-sa)

From where I lived at that time, using public transportation was not a viable option, unfortunately: it implied roughly 2 hours to go one way and changing at least 2 times between bus, train and metro. Anyway Belgium is know for having lots of cars and I benefited from a company car. Since some time, I’m also interested in Quantitative Self so I started collecting data about my daily commute.

What I try to see is the seasonality of commuting (I would initially expect shorter commute time during school breaks), the differences between leaving for work after driving children to school or without driving them, … There is also an extensive literature on the impact of commuting on the quality of life …

So, how did I do that?

The route usually taken, between my home then (in Wavre) and my office then (in Brussels, both in Belgium), is 28km long and the fastest I ever saw on Google maps to drive this distance is about 20-25 minutes.

I took note of the following elements in whatever default note-taking app is there in my phone at that moment (Keep on Android, Notes on iOS). The first field in each row is the date in a %y%m%d format, i.e. year, month and day of month as zero-padded decimal numbers, 2 digits only for each. The second field is the start time in a %H%M format, i.e. hours (24-hour clock) and minutes also as zero-padded decimal numbers, 2 digits only for each. Start time is defined when I enter my car at home, in the morning. The third field is the arrival time (same format as start time), defined as when I stop the engine at work. The fourth and fifth fields are start and arrival times when I go back home, defined and formatted the same way, mutatis mutandis. Any missed start/arrival times is marked as “na” or “NA”. It corresponds, for instance, at times when I leave the office but I stop to meet a client (or more prosaically, to do grocery shopping) before coming back home. I may have missed one or two whole days at max. The data is on Github.

On a daily basis, the little game is to try to figure out which lane is the fastest, if there is a pattern in the journey that makes it faster (I think there is). However, there are so many little things to track in this game that I did not track these small differences. The journey is assumed to take more or less the same route.

At the end, the complete log is saved on my computer and analysed in R (version 3.3.2). The typical measures I’m interested in are departure/arrival times over time, commute duration over time, commute duration per month or per day of the week or per season, … for both the morning and afternoon journeys if applicable. Some funny measures should be the earliest I left for work, the latest I arrived at work, the earliest I left work, the latest I left work, the shortest journey ever (to compare to Google estimate) and the longest journey ever …

An unintended measure here is the amount of time actually spent in the office (on a side note, this is different than productivity – but I didn’t find any unambiguous or flawless measure of productivity so far …). Some interesting variations could be to see the average and median duration of my work days, the shortest day or longest day I had, … (I don’t know if my former employer would be happy or angry to see these results 😉 but note this doesn’t take into account the numerous times I worked from home, even in evenings after having worked the whole day in the office …).

In theory, the fastest I could go is at an average 84km/h (28km in 20 minutes, according to Google Maps, so this is according to traffic, not maximum speed limits). In practice, this is a whole different story …

In a bit more than a year of collected data:

  • the earliest I left home was 6.11 and the latest 10.11;
  • consequently, the earlier I arrived at work was 6.32 and the latest 10.36;
  • the shortest trip to work was 18 minutes and the longest one was 160 minutes (it was on March, 22, 2016, the day of Brussels airport bombing because the office is close to the airport – I still remember);
  • the earliest I left work was 12.34 (I assume half-day of holidays) and the latest 21.24 (I assume lots of work then);
  • consequently, the earlier I arrived back home was 12.59 and the latest 21.43;
  • the shortest trip back home was 7 minutes (there should be some input error here!!!) and the longest trip was 128 minutes (nothing surprising, here, with Brussels traffic jams).

Finally, the shortest stay in office was 242 minutes (4 hours and 2 minutes) – it was that half-day of holidays. And the longest stay in office was 754 minutes (12 hours and 34 minutes).

As always, these things are nice when rendered as graphs …

180215-BxlAllTrafficPoints

A first note it that none of these graphs show any seasonality in the data. At first, I thought I would go faster during school holidays – but it was more a feeling than anything else, as the data show. And although the time at work varied widely over time, the average time spent at work seems to be pretty constant over the year, I was surprised by this:

180215-BxlTimeSpentAtWork

Finally, the time spent in car depending on the departure time is interesting:

180215-BxlTimeSpentInCar

Going to work was clearly split into 2 periods: leaving home (“Start Time”) before 8.30 and after 8.30. That’s because either I went early (and avoided the morning rush hour) or I drove the kids to school and drove to work at the end of rush hour. But although I tried to minimize the journey, the journey after driving the kids to school was still taking more time.

For the evening, going home became a shorter trip if I was able to delay it. And the later I come back, the shorter the trip. (However, if I didn’t drive the kids to school in the morning, the deal is that I would pick them up in the afternoon – fortunately, afterschool care is cheap in Belgium).

All this to come to the quality of life … I didn’t measure anything related to quality of life. I just remember that the first few weeks were very tiring. However, this commuting factor should be added to other tiring factors: learning a new job, adjusting to a new environment, etc. But there is a body of scientific work looking at the quality of life of commuting (I really like this paper as a starter [1], probably because it was published during that period): fatigue, stress, reduced sleep time, heart disease, absenteeism, BMI (weight), … are all linked – in a way – to commuting (either driving or just sitting in public transport).

[1] Künn‐Nelen, A. (2016) Does Commuting Affect Health? Health Econ., 25: 984–1004. doi: 10.1002/hec.3199

And a last point: privacy. This data is from 2015-2016. People who know me (even former colleagues!) know where I worked. And even without knowing me, you know when I leave home, when I leave the office, my pattern of organization, etc. Do I want that? Part of the answer is that I only post this data now, 2-3 years later. On the other hand, here is another free, small dataset!

Next steps? I’m continuing to track my journeys to work, even now we moved to the USA. For privacy reasons, I will not publish those data immediately. But it will be interesting, later, to compare the different patterns and try to understand at least some differences … It would also be interesting to give more time to this small experiment and, for instance, try to capture any impact on mood, productivity, … But this would become a whole different story!

2013 in review: how to use your users’ collected data

With a few days of interval I received two very different ways of reviewing data collected by users of “activity trackers”.

Jawbone_20140117-075010b The first one came from Jawbone (although I don’t own the UP, I might have subscribed to one of their mailing-lists earlier) and is also publicly available here. Named “2013, the big sleep” it a kind of infographics of how public (and mostly American) events influenced sleep of the “UP Community”. Here data about all (or at least a lot of) UP users were aggregated and shown. This is Big Data! This is a wonderful and quantitative insight on the impact of public event on sleep! But this is also a public display of (aggregated) individual data (something that UP users most probably agreed by default when accepting the policy, sometimes when they first used their device).

The second way came from Fitbit, also via e-mail. There was written how many steps I took in total as well as my most and least active periods / days of 2013. At the bottom there was a link to a public page comparing distances traveled in general with what it could mean in the animal kingdom (see below or here). This is not Big Data (although I am sure Fitbit have access to all these data). But at the same time (aggregated) individual data are not shared with the general public (although here again I am sure a similar policy apply to Fitbit users).

Different companies, different ways to handle the data … I hope people will realise the implication of sharing their data in an automated ways in such centralized services.

Fitbit2_20140117-075745

Privacy -vs- information conservation time

In my opinion privacy issues are a by-product of information conservation times reaching infinite.

For centuries and more humans were used to their own type of memory. When information reaches the brain, it is stored in short-term memory. When relevant and/or repeated, it is gradually consolidated into long-term memory (this is roughly the process).

Schematic memory consolidation process

The invention of oral transmission of knowledge, written transmission (incl. Gutenberg) and, to a certain extend, internet, all these successively increased the duration of retention of information shared with others. The switch from oral to written transmission of knowledge also sped up the dissemination of information as well as its fixed, un-(or less-) interpreted nature.

Duration of information over time

With the internet (“1.0” in order to put some buzzword) the duration of information is also extended but somehow limited ; it was merely a copy of printing (except speed of transmission). Take this blog, for instance: information stored here will stay as long as I maintain or keep the engine alive. The day I decide to delete it, information is gone. And the goal of internet was to be able to reach information where it is issued, even if there are troubles in communication pipes.

However on top of this internet came a serie of tools like search engines (“Google”) and centralized social networks (“Facebook”). Now this information is copied, duplicated, reproduced, either because of the digital nature of the medium that allows that with ease. But also because these services deliberately concentrate the information otherwise spread. Google concentrate (part of) the information in its own datacenters in order to extract other types of information and serves searches faster. Facebook (and other centralized social networks) asks users to voluntarily keep their (private) information in their own data repository. And apparently the NSA is also building its own database about us at its premises.

In my opinion, whenever we were sharing information before, privacy issues were already there (what do you share? to whom? in which context? …). But the duration of information is now becoming an issue.

Google+ API started

Logo Google PlusGoogle+ (G+) is a social networking and identity service operated by Google. It started a few months ago like a closed service from where you can’t get out any data and where the only possible interaction (read/write/play) is only possible via the official interfaces (i.e. the web and android clients). Google promised to release a public API and it partly did so tonight, here.

As they stated, “this initial API release is focused on public data only — it lets you read information that people have shared publicly on Google+” (emphasis is mine). So you can already take most of your data out of G+ (note that it was already possible to download your G+ stream with Takeout from the Google Data Liberation Front). As usual, it’s a RESTful API with OAuth authorization. It comes with its own rules and terms (it could be interesting to add to GooDiff). The next step would be to be able to directly write something on Google+.

I only tried to try the examples so far. But unfortunately I got an authorization error. I won’t go further for tonight but their error screen is interesting 🙂

Error 400 screen - Bad request - Google+ API

A question of a few centimetres

It’s funny to see that in a short span of time, a few centimetres can make a difference. This month, Austria authorised Niko Alm to wear a pasta strainer as “religious headgear” on his driving-licence (BBC). This month too, Belgian law banned women from wearing the full Islamic veil in public (BBC).

Well, the Belgian law doesn’t exactly formally forbid the Islamic veil although it was often named as the “anti-burqa law”. The exact terms are:

Seront punis d’une amende de quinze euros à vingt-cinq euros et d’un emprisonnement d’un jour à sept jours ou d’une de ces peines seulement, ceux qui, sauf dispositions légales contraires, se présentent dans les lieux accessibles au public le visage masqué ou dissimulé en tout ou en partie, de manière telle qu’ils ne soient pas identifiables.
Toutefois, ne sont pas visés par l’alinéa 1er, ceux qui circulent dans les lieux accessibles au public le visage masqué ou dissimulé en tout ou en partie de manière telle qu’ils ne soient pas indentifiables et ce, en vertu de règlements de travail ou d’une ordonnance de police à l’occasion de manifestations festives.

The automated Google translation gives:

Shall be punished by a fine of fifteen to twenty-five euros euros and imprisonment from one day to seven days or one of these penalties, who, unless required by law, occur in places accessible to public masked or concealed in whole or in part, in such a way that they are not identifiable.
However, are not covered by paragraph 1, those that circulate in places accessible to the public masked or concealed in whole or in part in such a way that they are not identifiable and that, under regulations of work or Order of Police on the occasion of festivities.

This is even more scary: the law basically asks everyone to clearly show her/his face in public spaces except for work (e.g. construction workers with dust protection) or when the police explicitly authorised it during events. If it’s too cold in winter and your hood is hiding part of your face, you may be arrested. On top of that, you add the increasing number of CCTV in operation in Belgium as well as some good face recognition software and you have a tightly controlled society. 😦

Photo credits. Left: Masked by Katayun on Flickr (CC-by-nc-sa). Right: Heiliger Führerschein (Episode #6 – Das Finale) on Niko Alm’s blog.

Facebook updates: nothing to fuss about

So Facebook, the current paramount social website, updated its website with the possibility to download all your data (among other updates). I don’t see why people need to fuss about this.

Although maybe useful, the important is not to be able to retrieve your data. After all, if your pictures are on Facebook, they were previously on your computer / camera / whatever. So you should already have them (and Facebook sends them to you in a zip file? what a feature!). Unless Facebook allows you to also download data about you but uploaded by others; this is a bit more interesting from a sociological / academic point of view (what has been posted about you). And then? A “big” step towards interoperability between social websites? Are you joking? For interoperability, you need 2 partners and, to my knowledge, no other websites (social or not) are currently offering the possibility to upload data from Facebook. Will it arrive? I’m sure of it. Is it secure? I doubt it: nothing is 100% secure in IT, Facebook is no exception. But this is still not important!

The important thing would have been to have total control on your data. The ability to post data. The ability to effectively remove data (Facebook policy explicitely states nothing is necessarily physically erased, not even your account if you decide to close it!). The ability to remove data about you posted by others. The ability to control data posted about your children. The ability to have real privacy.

So, why do I blog this? I don’t really get why people are so excited about this feature. Oxford building a new library [1, 2], why and how, this has nothing to do with the topic of this post but this is news!

Bodleian Library: Divinity School
Photo credit: Bodleian Library: Divinity School by Beth Hoffmann on Flickr (CC-by-nc-sa)

Belgian State Security report 2008

When I first opened the Belgian State Security Report 2008 (PDF in French or in Dutch), I had the a feeling of déjà vu: the cover picture is in fact a part of the Great Court of the British Museum in London, UK. Strange for a report on Belgian security and surveillance …

The British Museum as illustration for a Security report
Comparison between an actual photo of the British Museum Great Court (left, by Guillermo Viciano, under CC-by-sa) and the cover of the Belgian State Security Report 2008 (right)

Then I saw it’s only a light version for the web, not the full version. I had a look at the Justice website and the Security web page but I couldn’t find the original version (if you have the full version, I’m interested).

The report summarizes all the activities done by the Security in 2008, including the groups, countries and activities watched, a report on the cases where it was involved (Belliraj, Benali, Trabelsi cases, a.o.) and a broad view of what they did to check people background, protect some others and check various accreditations.

The most interesting part for me, however, was a short description of a bill about data collection methods by the Security. This bill was submitted to the Belgian Senate in December 2008 and was recently adopted (the full text is here, in French). It’s now submitted to the Belgian king for signature.

Briefly, this bill modifies an existing law from 1998 and, among other things, tells apart ordinary data collection methods from specific (articles 18/7 and 18/8) and exceptional ones (articles starting from 18/9). As expected, the bill allows the use of techniques to intercept and read private communications between persons. The bill also allows entering into computer systems, removing protections, installing spyware, decrypting and collecting data (but it does not allow their destruction).

All these methods are controlled post hoc by two different bodies, an ad hoc administrative commission composed of magistrates (renewed each year by the king following a suggestion by the government) and a permanent “R” committee. Specific and exceptional methods needs to be approved first by the administrative commission but there is always the possibility for the Security hierarchy to bypass this and send a written notice to the commission later on. How many times can this last step be forgotten?

Although it’s nice to have the reference to the bill and be able to look for it on the internet, I would have liked to see some statistics about how many times these specific and exceptional measures were applied, how many times they were refused by the administrative commission, how many times the hierarchy allowed a mission and informed the commission later on, etc. in the same way they proudly show graphs of the number of hours spent protecting VIPs. I know details are protected by secret but it would still have been nice to have an idea on how often these methods are used.