Introduction

Perhaps the most important aspect of this project deals with formatting citations and bibliographies. Since the advent of personal computing and productivity applications, there have been two primary mechanisms to format bibliographies and citations. BibTeX has long provided such support in LaTeX, and has a loyal following in the hard sciences in particular. Commercial applications like Endnote, ProCite, and Reference Manager have provided an equivalent for the world of GUI productivity applications like Microsoft Word.

All of these applications work based on the same basic principle. The document contains references to bibliographic records stored in an external database, and a processor goes through, extracts the citations, and assembles the formatted bibliographies according to the specifications in a separate configuration file; for BibTeX .bst files, and binary files for the commercial products. In all cases, formatting is tied directly to the application.

Why CiteProc?

CiteProc learns from existing solutions, but improves on them in the following ways:

  1. It is designed around the Metadata Object Description Schema (MODS) from the Library of Congress, which provides a much more comprehensive data model than BibTeX or RIS.
  2. The citation style language is XML; designed to be both easy to read and write, and also powerful.
  3. The code is based on XSLT 2.0, with all the benefits that go along with supporting a standard XML-based processing model. Implementations in other languages are certainly possible, however, so as long as they generate the same output based on the CSL styling files.
  4. Interaction between the XSLT processor and bibliographic database happens over standard HTTP. No additional code is required.
  5. The XSLT code is constructed in such a way that input drivers could be easily written for different document formats. The new unparsed-text() function in XSLT 2.0 even makes it feasible to write drivers for non-XML input formats such as RTF or TeX. There is also an output driver system to easily add new output formats.

See stylesheet documentation for details.

Examples

The distribution includes sample citation styles, source documents, and stylesheets. As a scholar working at the borders of the social sciences and humanities, one of my consistent problems with both Endnote and BibTeX was the inconsistent way they treated foot/endnote style citations, compared to all others, such as author-year. If I needed to change a document from one to the other, I needed to manually go through and change the citation coding throughout the document.

One design goal of this project was to fix this problem. Thus, here is an example of an author-year style, and here of a note-based one. They are formatted with the same document source.

CiteProc Compatability

For a database project to be compatible with CiteProc, it needs primarily to be able to accept a query over HTTP (SRU and CQL are strongly recommended) and return a collection of bibliographic records in response. Those records must conform to the MODS XML Schema from the Library of Congress.

CiteProc is free software, licensed under the CC-GNU GPL.