CiteProc

Overview

CiteProc is a comprehensive solution for bibliographic and citation formatting. It consists of an easy-to-use XML citation style language (CSL), and the XSLT code to format documents based on them. In essence, it is designed to serve as an XML-based analog to BibTeX, but with dramatic improvements in ease-of-use, metadata flexibility, and international support.

CiteProc reads the source document for citation references and collects the corresponding records from an external bibliographic data store, and then formats the bibliography and citations according to specifications in the CSL file.

CiteProc Overview

The data store can either be a flat XML file, or a server that supports HTTP-based XQuery or SRU queries. SRU is a particularly promising new RESTful protocol that comes out of the library world, and which can provide a nice—easily implemented—standard around which a diversity of bibliographic solutions can interoperate.

CiteProc is but one example of the possibilities opened up with SRU, where a user in North America can format their documents using bibliographic data stored on the other side of the world. Indeed, CiteProc is bundled with just such an example! The RefBase project out of Germany recently added SRU support. The sample XML and XSLT files included here (samples/docbook-test-sru-refbase.xml and xslt/document/refbase-xhtml.xsl respectively) demonstrate how easy it is to add this support.

Design

The core of the logic is embedded in the design of CSL, which has the following features:

  • a tree-based design that mimics bibliographic metadata
  • organized around references classes, rather than types
  • modularizes formatting concerns

At the core, CSL and CiteProc are both organized around structural classes (monograph, serial, etc.) rather than reference types or genres. CSL thus does not have the “generic” reference type definitions one finds in other style languages. Rather, CSL instead mandates definitions for three of the most common “types”: article, book, and chapter. These then serve as the class-specific fallback definitions. Indeed, most formatting ends up being handled by these generic definitions.

This design decision addresses problems with type-based formatting systems like BibTeX or those in commmerical applications like Endnote, which tend to be fairly fragile and not very portable. In these systems, if a social scientist who frequently cites archival documents uses a style file created by a physical scientist, they will typically find they need to heavily edit the file in order to format their references. By moving to a class-based logic and modularizing formatting configuration as much as possible, styles become more portable. Likewise, retaining the familiar type-like interface is easier for users creating style files.

Organization

The CiteProc code consists of one main stylesheet—called citeproc.xsl—which is imported into a standard document stylesheet.

CiteProc file structure

Requirements

CiteProc is written in XSLT 2.0, and as such requires a compliant processor. At this point, this means Saxon 8. In addition, it requires a data store for MODS bibliographic data. CiteProc supports flatfile and server-based options, including both SRU and XQuery. The eXist XML DB is a good option. Bibutils is an excellent tool to convert from legacy formats like Endnote, RIS and BibTeX to MODS, and back.

Useage

There are a few example document stylesheets in the xsl/document directory. Choose one, and run it using a citation-style parameter like so from the samples directory:

java net.sf.saxon.Transform -o test.html samples/test.xml \ xsl/document/dbng-xhtml.xsl
          citation-style="author-year"

CiteProc is free software, licensed under the CC-GNU GPL.