Human ecology
Classification and the librarian's problem

The librarian's problem

[Leibniz rejected Locke's tripartite division of knowledge.] ... "A truely memorable story might deserve a place in the annals of universal history; yet might equally deserve a place in the history of a particular country or even of a particular individual. A librarian is often undecided over the section in which a particular book needs to be catalogued (cf. Serres 1968: 22-3)"
(Eco 1993:1995, p.278)

"Theoretically, a classification system should be so organised that material on any one subject can be found in only one place. Some subjects, however, have so many aspects, so many phases, so many contributing factors that it may not be possible to place all material relating to such a subject in only one class. ... It is important to remember that even though books are classified according to the subject which is given the greatest emphasis, they may, to some extent, treat other subjects."
(Gates 1979, p.37)


Categories in Leibniz' private collection of books.
The librarian's problem in Cantorian set theory.

Consider a set of books, B. The librarian's problem is to order these books in such a way that all books about a similar subject are shelved together.

Two complementary approaches to this problem both involve the definition of a set of subjects, S.

Books as primitives
The first approach starts from the books. By analysing the contents of each book, a number of a posteriori subjects can be assigned to them.

The i-th book, b(i), may be concerned with m(i) a posteriori subjects, s'

S' AND b(i) = { s'(i,1), ..., s'(i,m(i)) }

(Let us assume that m >= 1)

The set of a posteriori subjects, S', then becomes a list of these assigned subjects. For j books, this is not simply,

{ s'(1,1), ..., s'(1,m(1)), ..., s'(i,1), ..., s'(i,m(i)), ..., s'(j,1), ..., s'(j,m(j)) }

as the interpretation of any given subject s' may be the same as the interpretation of another given subject.

To get around this problem, we must consider the set of unique interpretations, C

If we assume that this assignment of subjects is valid, then the number of unique interpretations, n(C), must be less than or equal to the number of assigned subjects, n(S').

n(C) <= n(S')

where n(S') = m(1) + m(2) + ... + m(j)

Subjects as primitives
The second approach starts with a set of subjects, S. The assignment of each book then takes the form

{ s(a(i,1)), ..., s(a(i,m(i))) }

where, if there are k subjects s(1), ..., s(k), then each assignment relation, a(x,y), is an integer between 1 and k.


Computational set theory approach
Note: I call this computational set theory because I first created this representation with dBase 3+ records. I converted records containing one or more keywords into a table of {record, keyword} pairs.

From the above definitions, it is clear that the relations between the set of books and the set of subjects can also be represented as a table of pairs, {b,s}:

{b(1),s(a(1,1))}
...
{b(1),s(a(1,m(1)))}
...
{b(i),s(a(i,1))}
...
{b(i),s(a(i,m(i)))}
...
{b(j),s(a(j,1))}
...
{b(j),s(a(j,m(j)))}

Table 1.1 - relations between books and subjects sorted in book order.

This table can also be sorted on subject:

{s(1),b()}
...
{s(1),b()}
...
{s(k),b()}

Table 1.2 - relations between books and subjects sorted in subject order.

Note: If we replace the interpretation 'book' with 'word or phrase' and the interpretation 'subject' with 'class', these two collections together form one representation of the structure of Roget's Thesaurus. Table 1.2 forms the main section of the thesaurus, table 1.1 forms the index.

The representation given in tables 1.1 and 1.2, has some limitations.

The advantage of this representation is that it retains the observed links between book and subject. However, it does not allow books to relate to other books, or books to be collections of books, i.e.

van Dieren, W., ed. (1995) "Taking Nature into Account", Springer-Verlag, New York.

In this case do we consider the individual essays that make up the book to books in their own right (as they have their own authors), or as another set of objects seperate to books?

To get around this problem we can consider a more general case of the two tables given above.


General case of object relations
Using the structural properties of tables 1.1 and 1.2, we can create a general table that can represent relationships between a set, V, of n objects.

{v(1),v(a(1,1))}
...
{v(1),v(a(1,m(1)))}
...
{v(i),v(a(i,1))}
...
{v(i),v(a(i,m(i)))}
...
{v(n),v(a(n,1))}
...
{v(n),v(a(n,m(n)))}

Table 2 - relations between general objects.

This is the general structure for a dictionary or an encyclopedia.

Using this general structure it is possible to produce a graph theory version of the librarian's problem.

Note: 12/11/99
An exploration of graph theory has found that this structure is similiar to an adjacency list.



Links to other sites...
Created 3/6/99
Last modified 12/11/99