How do I handle my bibliographic data?

In science, you have to justify nearly all your assertions and this is done by citing another scientific paper, called a “reference”. With practise and advices of some people, I arrived to a satisfactory references management system I’ll explain below. My “problem” is that in the academic world where I work nearly everyone use EndNote or Reference Manager, two proprietary reference management software for MS-Windows. And I want to use the simple yet powerful BibTeX system.

I used both MS-Windows software and they all have advantages and drawbacks. Reference Manager has the nicest user interface but it lacks of support for other documents than MS-Word, RTF parsing and BibTex export. EndNote has an ugly and sometimes not logical UI but at least it has RTF document support (but nor OpenOffice nor Wordperfect documents are supported). With EndNote, you can find tricks on the web to import and export references from and to BibTeX. But you always lack something in the conversion. The reason is you have fields in EndNote that are not supported by standard BibTeX (this is not a problem since BibTeX simply ignore them without error but they are still available in your .bib file – you don’t loose any information) and fields in BibTeX that are not supported by EndNote (and EndNote simply discards them – you loose information).

I tried many other software, both free/open source and proprietary but I stay with my simple BibTeX file (simple, powerful, no additional markers, file not easily corrupted, small size, …). Maybe when the bibliographic project will be ready, I’ll try it.

In biology and medicine, the source/database of article references is PubMed. 99% of my references are first found on PubMed (the 1% left contains books, multimedia files and some articles not referenced on PubMed). Each reference is associated with a “PubMed ID” (PMID).

When an electronic version of an article is read, I change its file name following this convention: “first author’s name – last two digits of year – journal abbreviation – issue number – first page – some keywords” (without spaces nor punctuation marks, e.g. maquet01-science-294-1048-sleep-learning-memory.pdf). All the files are stored in the same directory. When a paper version of an article is read, I write keywords/tags on top of the first page and put the article in a file (each file has one subject ; it’s sometimes difficult to choose in which particular file an article should go).

Then I go to PubMed, find the PMID and look for this ID in both EndNote and PyP2B. I add the result of PyP2B in my BibTeX file with the keywords/tags ; this is simple (eveything is done by the script, even the key generation – I use something like “firstauthorsnameXX” where XX are the publication year last two digits). Since EndNote only understands reference numbers, I have to add the BibTeX key as a “label”. Fortunately, EndNote can sort references based on their labels. I also add a link to the PDF if it exists. I first thought that it will take me a lot of time to maintain these two lists of references but, practically, it doesn’t take much time.

With this system, I can:

  • look for a particular article in my server directory based on keywords/tags, first author, journal name and publication year
  • look for a particular reference in my BibTeX file
  • look for a particular reference in my EndNote files
  • use my BibTeX file in any LaTeX documents
  • use my EndNote files with MS-Word documents if someone ask for this type of document
  • 🙂

Previously, I had the project of writing a central repository of all the references in a real database and then generate both EndNote and BibTeX files from it. But I lack time to write this and EndNote file format is not open (why bother reading unofficial specifications about this format then?).