You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Gerardo Marx 527ca7e2a4 Bibtex basic guide added 10 months ago
Readme.md Bibtex basic guide added 10 months ago

Readme.md

Advisory note

If you are starting from scratch we recommend using biblatex because that package provides localization in several languages, its actively developed and makes bibliography management easier and more flexible.

Introduction

Many tutorials have been written about what is and how to use it. However, based on our experience of providing support to Overleafs users, its still one of the topics that many newcomers to find complicated—especially when things dont go quite right; for example: citations arent appearing; problems with authors names; not sorted to a required order; URLs not displayed in the references list, and so forth.

In this article well pull together all the threads relating to citations, references and bibliographies, as well as how Overleaf and related tools can help users manage these.

Well start with a quick recap of how and bibliography database (.bib) files work and look at some ways to prepare .bib files. This is, of course, running the risk of repeating some of the material contained in many online tutorials, but future articles will expand our coverage to include bibliography styles and biblatex—the alternative package and bibliography processor.

Bibliography: just a list of \bibitems

Lets first take a quick look “under the hood” to see what a reference list is comprised of—please dont start coding your reference list like this because later in this article well look at other, more convenient, ways to do this.

A reference list really just a thebibliography list of \bibitems:

\begin{thebibliography}{9}
\bibitem{texbook}
Donald E. Knuth (1986) \emph{The \TeX{} Book}, Addison-Wesley Professional.

\bibitem{lamport94}
Leslie Lamport (1994) \emph{\LaTeX: a document preparation system}, Addison
Wesley, Massachusetts, 2nd ed.
\end{thebibliography}

By default, this thebibliography environment is a numbered list with labels [1], [2] and so forth. If the document class used is article, \begin{thebibliography} automatically inserts a numberless section heading with \refname (default value: References). If the document class is book or report, then a numberless chapter heading with \bibname (default value: Bibliography) is inserted instead. Each \bibitem takes a cite key as its parameter, which you can use with \cite commands, followed by information about the reference entry itself. So if you now write

\LaTeX{} \cite{lamport94} is a set of macros built atop \TeX{} \cite{texbook}.

together with the thebibliography block from before, this is what gets rendered into your PDF when you run a processor (i.e. any of latex, pdflatex, xelatex or lualatex) on your source file:

Notice how each \bibitem is automatically numbered, and how \cite then inserts the corresponding numerical label.

\begin{thebibliography} takes a numerical argument: the widest label expected in the list. In this example we only have two entries, so 9 is enough. If you have more than ten entries, though, you may notice that the numerical labels in the list start to get misaligned:

Well have to make it \begin{thebibliography}{99} instead, so that the longest label is wide enough to accommodate the longer labels, like this:

If you compile this example code snippet on a local computer you may notice that after the first time you run pdflatex (or another processor), the reference list appears in the PDF as expected, but the \cite commands just show up as question marks [?].

This is because after the first run the cite keys from each \bibitem (texbook, lamport94) are written to the .aux file and are not yet available for reading by the \cite commands. Only on the second run of pdflatex are the \cite commands able to look up each cite key from the .aux file and insert the corresponding labels ([1], [2]) into the output.

On Overleaf, though, you dont have to worry about re-running pdflatex yourself. This is because Overleaf uses the latexmk build tool, which automatically re-runs pdflatex (and some other processors) for the requisite number of times needed to resolve \cite outputs. This also accounts for other cross-referencing commands, such as \ref and \tableofcontents.

A note on compilation times

Processing reference lists or other forms of cross-referencing, such as indexes, requires multiple runs of software—including the engine (e.g., pdflatex) and associated programs such as , makeindex, etc. As mentioned above, Overleaf handles all of these mulitple runs automatically, so you dont have to worry about them. As a consequence, when the preview on Overleaf is refreshing for documents with bibliographies (or other cross-referencing), or for documents with large image files (as discussed separately here), these essential compilation steps may sometimes make the preview refresh appear to take longer than on your own machine. We do, of course, aim to keep it as short as possible! If you feel your document is taking longer to compile than youd expect, here are some further tips that may help.

Enter Bibtex

There are, of course, some inconveniences with manually preparing the thebibliography list:

  • Its up to you to accurately format each \bibitem based on the reference style youre asked to use—which bits should be in bold or italic? Should the year come immediately after the authors, or at the end of the entry? Given names first, or last names first?
  • If youre writing for a reference style which requires the reference list to be sorted by the last names of first authors, youll need to sort the \bibitems yourself.
  • For different manuscripts or documents that use different reference styles youll need to rewrite the \bibitem for each reference.

This is where Bibtex and bibliography database files (.bib files) are extremely useful, and this is the recommended approach to manage citations and references in most journals and theses. The biblatex approach, which is slightly different and gaining popularity, also requires a .bib file but well talk about biblatex in a future post.

Instead of formatting cited reference entries in a thebibliography list, we maintain a bibliography database file (lets name it refs.bib for our example) which contains format-independent information about our references. So our refs.bib file may look like this:

@book{texbook,
  author = {Donald E. Knuth},
  year = {1986},
  title = {The {\TeX} Book},
  publisher = {Addison-Wesley Professional}
}

@book{latex:companion,
  author = {Frank Mittelbach and Michel Gossens
            and Johannes Braams and David Carlisle
            and Chris Rowley},
  year = {2004},
  title = {The {\LaTeX} Companion},
  publisher = {Addison-Wesley Professional},
  edition = {2}
}

@book{latex2e,
  author = {Leslie Lamport},
  year = {1994},
  title = {{\LaTeX}: a Document Preparation System},
  publisher = {Addison Wesley},
  address = {Massachusetts},
  edition = {2}
}

@article{knuth:1984,
  title={Literate Programming},
  author={Donald E. Knuth},
  journal={The Computer Journal},
  volume={27},
  number={2},
  pages={97111},
  year={1984},
  publisher={Oxford University Press}
}

@inproceedings{lesk:1977,
  title={Computer Typesetting of Technical Journals on {UNIX}},
  author={Michael Lesk and Brian Kernighan},
  booktitle={Proceedings of American Federation of
             Information Processing Societies: 1977
             National Computer Conference},
  pages={879888},
  year={1977},
  address={Dallas, Texas}
}

You can find more information about other reference entry types and fields here—theres a huge table showing which fields are supported for which entry types. Well talk more about how to prepare .bib files in a later section.

Now we can use \cite with the cite keys as before, but now we replace thebibliography with a \bibliographystyle{...} to choose the reference style, as well as \bibliography{...} to point at the .bib file where the cited references should be looked-up.

\LaTeX{} \cite{latex2e} is a set of macros built atop \TeX{} \cite{texbook}.
\bibliographystyle{plain} % We choose the “plain” reference style
\bibliography{refs} % Entries are in the refs.bib file

This is processed with the following sequence of commands, assuming our document is in a file named main.tex (and that we are using pdflatex):

pdflatex main bibtex main pdflatex main pdflatex main and we get the following output:

Whoah! Whats going on here and why are all those (repeated) processes required? Well, heres what happens.

  1. During the first pdflatex run, all pdflatex sees is a \bibliographystyle{...} and a \bibliography{...} from main.tex. It doesnt know what all the \cite{...} commands are about! Consequently, within the output PDF, all the \cite{...} commands are simply rendered as [?], and no reference list appears, for now. But pdflatex writes information about the bibliography style and .bib file, as well as all occurrences of \cite{...}, to the file main.aux.

  2. Its actually main.aux that is interested in! It notes the .bib file indicated by \bibliography{...}, then looks up all the entries with keys that match the \cite{...} commands used in the .tex file. then uses the style specified with \bibliographystyle{...} to format the cited entries, and writes a formatted thebibliography list into the file main.bbl. The production of the .bbl file is all thats achieved in this step; no changes are made to the output PDF.

  3. When pdflatex is run again, it now sees that a main.bbl file is available! So it inserts the contents of main.bbl i.e. the \begin{thebibliography}....\end{thebibliography} into the source, where \bibliography{...} is. After this step, the reference list appears in the output PDF formatted according to the chosen \bibliographystyle{...}, but the in-text citations are still [?].

  4. pdflatex is run again, and this time the \cite{...} commands are replaced with the corresponding numerical labels in the output PDF!

As before, the latexmk build tool takes care of triggering and re-running pdflatex and bibtex as necessary, so you dont have to worry about this bit.

Some notes on using and .bib files

A few further things to note about using and .bib files:

  • You may have noticed that although refs.bib contained five reference entries, only two are included in the reference list in the output PDF. This is an important point about : the .bib files role is to store bibliographic records, and only entries that have been cited (via \cite{...}) in the .tex files will appear in the reference list. This is similar to how only cited items from an EndNote database will be displayed in the reference list in a Microsoft Word document. If you do want to include all entries—to be displayed but without actually citing all of them—you can write \nocite{*}. This also means you can reuse the same .bib file for all your projects: entries that are not cited in a particular manuscript or report will be excluded from the reference list in that document.
  • Bibtex requires one \bibliographystyle{...} and one \bibliography{...} to function correctly—in future posts well see how to create multiple bibliographies in the same document. If you keep getting “undefined citation” warnings, check that you have indeed included those two commands, and that the names are spelled correctly. File extensions are not usually required, but bear in mind that file names are case sensitive on some operating systems—including on Overleaf! Therefore, if you typed \bibliographystyle{IEEetran} (note the typo: “e”) instead of \bibliographystyle{IEEEtran}, or wrote \bibliography{refs} when the actual file name is Refs.bib, youll get the dreaded [?] as citations.
  • In the same vein, treat your cite keys as case-sensitive, always. Use the exact same case or spelling in your \cite{...} as in your .bib file.
  • The order of references in the .bib file does not have any effect on how the reference list is ordered in the output PDF: the sorting order of the reference list is determined by the \bibliographystyle{...}. For example, some readers might have noticed that, within my earlier example, the first citation in the text latex2e is numbered [2], while the second citation in the text (texbook) is numbered [1]! Have and lost the plot? Not at all: this is actually because the plain style sorts the reference list by alphabetical order of the first authors last name. If you prefer a scheme where the numerical citation labels are numbered sequentially throughout the text, youll have to choose a bibliography style which implements this. For example, if instead we had used \bibliographystyle{IEEEtran} for that example, wed get the following output. Notice also how the formatting of each cited item in the reference list has automatically updated to suit the IEEEs style: