Copyright Donovan Rebbechi 2001. This tutorial may be distributed freely if this copyright notice is preserved. Changes require authors permission. The code is no-strings attached free software, it may be redistributed, modified, and used.
A topic that is often addressed in introductory computer science courses is the design and implementation of a linked list. This seemingly simple task turns out to be surprisingly difficult. Unfortunately, most textbooks make a terrible mess of it. We examine some of the design issues that surface in the process of designing and implementing such a class, and offer some implementations.
A linked list is a data structure that consists of a
sequence of nodes, x1, ..., xn
where each node xi contains the following information:
A C++ implementation of a node is quite simple:
The main desirable property of the linked list is that the ordering
is a property of the
Notice that this operation did not require a change in the memory address of
other nodes in the list, and it did not require us to move data around (deleting
the second element of an array on the other hand would require this.)
In fact, even operations such as sorting a linked list do not invalidate
pointers to elements in the list, because the ordering is a
property of the values of the member variable
On the other hand, linked lists do have some disadvantages. For example, they
support sequential access but not random access. To
locate the nth element, one must dereference n
pointers. This is in contrast to the constant time element access in an
array.
Some good advice found in one textbook (which unfortunately does not appear
to heed its own advice) is to pay careful attention to how an interface
will be actually used, when designing it. It's instructive to write some
code that uses your class before you implement it. This may
seem perverse to the uninitiated, but it is in fact very useful. First,
it has the benefit of making sure that the class is usable, and secondly, it
serves as a kind of instant test suite.
The most difficult aspect of designing a list class is element access.
While this seems very simple, it's surprisingly subtle, and textbooks do
a surprisingly good job of botching it.
The following are key questions:
Addressing the former issue immediately addresses the latter, because if
we have a way to represent a position in the list, then we can use an operation
that moves from one position to the next, to iterate.
First, we look at the first consideration.
Clearly, the best way to flexibly address the former concern would be to
provide a way to address the latter. There are a few ways to access elements
in a list:
The first approach is not acceptable. It suffers from the substantial
problem that simply accessing data modifies the state of the list.
Here is an example of code that will break if we use the bad approach:
The problem with the above code is that the print() statement needs to
modify the "current" element. This has a few consequences: it means
that we cannot pass a list by const reference to any useful function
(since most useful functions will modify the "current" element), and
the deeper problem is that seemingly innocent statements, even debugging
statements, modify the state of the list. The prospect of debugging
statements changing program behaviour should send shivers down the
spine of any programmer. Such behaviour results in infuriating
Heisenbugs -- bugs whose visibility depends on whether or not
debugging is enabled.
This raises an issue -- change in the state of the "current" element
should not be considered a change in the list. In other words, there
is a need for a representation of "current element" in the list that is not
part of the list itself. Having found our first approach unworkable, we
move on to the other two alternatives.
An approach that appeals to beginners, and even some misguided textbook authors,
is to use array notation to access elements of linked lists.
This is tempting, because programmers become acquainted with arrays at
an early level, and from this stems a natural desire to treat any
sequence container as an array. Resist this temptation at all costs!!!
There
are several problems with this, but they all boil down to a very
simple fact:
Accessing the nth node takes n operations
where n is the size of the list,
so the loop takes 1 + 2 + 3 + ... + n = n(n+1)/2 operations.
This is disasterous for large values of n, considering that we
can do the same in linear time if we use the pointer based or iterator
based approach.
Another fundamental problem with integer offsets is that they are invalidated
by insertions and deletions.
For example, consider this code:
Suffice it to say that we will not seriously consider implementing a linked list
that uses this misguided approach as the primary means of element access.
The second approach is the
one that is usually used in C implementations. In a C++ implementation,
there are two different sub-categories of this approach.
The linked-list-is-not-a-node approach has the advantage that encapsulation
makes it
possible to ensure that client code does not corrupt the list. An example
of code using this approach:
This approach uses a special iterator class that encapsulates
a node pointer. One of the key advantages of this is that we can use this
to provide consistent semantics to different container types. We defer
further discussion of iterators.
In addition, we also need to be able to create lists. This gives rise
to the following creational operations:
There are other operations that may be required, and have consequences for
our design. For example, fast
removal at the back of the list, and concatenation of two lists. These
require us to keep track of the last node of the list. Depending on
how important these operations are, it may be useful for the list class
to hold a pointer to the last element of the list.
Another possible design consideration is using a sentinel node
at the beginning of the list. This has the drawback of adding overhead,
but is advantageous in that it makes it possible to avoid "special-casing".
In our examples, we will decline both the "sentinel" and the "tail pointer"
design enhancements, while observing that these are actually quite useful
in certain situations, and could be introduced to our class via a wrapper
class.
So we make a first attempt at writing an interface. For now, we omit
the destructor. Strictly speaking, it isn't part of the interface, it's
an implementation detail specific to the C++ language.
We get stuck on the last two items. To "insert anywhere in the list",
we need a way to represent a place in the list.
Fortunately, our previous discussion has prepared us for this.
Another problem with singly linked lists is that given a node
in the list, we can not erase that node in constant time, and we
can't insert a node before it in constant time. This is because these
operations require manipulation of the preceding node. This is a problem
with singly linked lists that detracts from their usability, and the
simplest solution is to use a doubly linked list instead if you want a
more flexible, general purpose class (one could write iterators that
contain ordered pairs of succesive node pointers, but given this overhead,
one might as well use a doubly linked list)
We address this problem by using
We first look into implementing a list, using the pointer based
approach, with the linked-list-is-not-a-node model.
We previously raised the importance of considering
how the class is actually going to be
used prior to implementing it.
In this example, we could iterate over the list by using node pointers.
For example, we could write a loop, and an insertion like this:
The notation is not that elegant, but it works.
There are a few issues that surface during implementation. For example,
it proves useful to implement a
Another implementation issue is that copying a singly linked list requires
a function to reverse the list, because the obvious strategy of iterating
over the nodes and inserting them gives us a list in the wrong order.
So a method that reverses the order of the list by pointer manipulation
proves desirable.
Another subtle issue is
Here is what the code would look like:,
and here is a simple driver program
The implementation of this is straightforward. It deletes the first node
in the list until the list is empty.
The end() function is not strictly necessary, it is included for the
benefit of programmers who are used to STL conventions (checking
This list is still somewhat incomplete. It would be nice to have methods
that perform splice and merge operations, as well as sorting and searching.
But the code we have is a good start.
The simple implementation has some positive traits that set it apart from
many textbook examples.
It has the following properties:
However, there is one key drawback:
There are classes that are designed especially for abstracting the notion
of position in a container. These classes are called iterators,
and not only are they very useful, they are the focus of the next section.
An iterator is an object that represents a position. It makes
it possible to move around in a container, without exposing the
structure of that container.
All container iterators support a "fetch" operation that retrieves the data at
a given position, as well as comparison and assignment operations.
But different iterators support
different operations. For example, an iterator for a singly linked list
supports "forward" movement, but not "backward" movement. An iterator
for an array class supports movement in both directions, and supports
constant time addition (random access).
There are different types of iterators, with different capabilities:
General operations for iterators:
Iterators make for containers that are very
consistent and easy to use. To get some idea of the power of
iterators, here is some example code that uses iterators for vastly
different STL containers:
As you can see, the iterator lets the user get at the data, and keeps
the details of the underlying data structure out of the way. While
it's useful to know that the set is implemented as a binary tree, and
has those performance characteristics (namely, it's maintained in
sorted order, and searching takes O(log(n)) ), the complexity of the
underlying data structure does not interfere with accessing data in
the container. There is a good reason that the STL set is called
Iterators also have the advantage that one can use them to write
generic functions
that operate on pairs of iterators (ranges), and are entirely
container independent (though they may be dependent on capaibilities of
the iterator).
For example, we could use the copy function to write some
code that prints the contents of the
above containers as follows:
An iterator that refers to a place in the list should support
the following operations:
Some things we can't do --
Now we can add operations that depend on position to the list:
We need to insert after the given iterator, because there is
no way to backtrack in a singly linked list. We have the same issue
with erase -- to erase a given node, we need to modify the node that
precedes it.
You may have observed that the methods
This is obviously a problem -- the function prototype claims to guarantee
that the state of the list will remain unchanged, but does not honor this
contract. To deal with this, we need a separate type for
constant iterators,
Since there is an implicit conversion from iterator to
const_iterator, there is no need to provide more comparison
operators to handle mixed comparisons.
So this results in the addition of the following methods:
The standard template library (STL) offers a number of algorithms
that we can use on our list for free, provided that we make our
iterators STL compatible. The main thing this requires is
several typedefs.
in particular, the following information is expected:
The simplest way to do this is derive
The reason we use const T is that this makes the pointer and
reference types const *T and const T&.
It should be noted that there is no requirement that iterators be
derived from std::iterator. However, it is a useful technique.
The iterator category tells algorithms about the capabilities of the
iterator. For example, this tells the compiler that our iterators cannot
be used with an algorithm that requires bidierctional or random access
iteration. The typedefs for
Finally, we have our list class
and driver program
There are several other possible variations on the linked list theme.
We discuss some of them:
The main possible advantage of exposing the circular structure to client
code is that it could make use ranges of iterators that spanned the
"end" of the list. But there is also an obvious problem -- if we use
the obvious implementation, we have
We implement a circular linked list, with the following properties:
The bottom line is that we trade a slight amount of overhead (a sentinel node)
for a fair degree of flexibility.
At a first glance, it doesn't appear obvious that this design helps us
write code that inserts before a given iterator.
The trick here is to use "off-by-one" iterators. In particular, dereferencing
an iterator produces the element at the next node:
The main change in this list with respect to the non-circular list is
the fact that we don't keep track of the head node, we keep track of a
pointer
We dicuss the important changes:
Without further ado, we present the full listing. The
source can be downloaded here, as
can the driver program
We present example code for
a circular list without a sentinel and a
driver program.
The main changes are that we need a lot of special-case code in methods
such as
Here, we compare three list implementations -- the circular, and basic
lists implemented above, and the SGI implementation of the STL
Introduction to Linked Lists
template <class T>
struct Node
{
Node (const T& x, Node* y):data(x),next(y) {}
T data;
Node* next;
};
Node*
data members of the list.
This means that a lot of algorithms only require pointer manipulations.
For example, to delete a node x2 from a linked list,
the following code suffices:
x1->next = x2->next;
delete x2;
next
in each node.
This is often a very useful property. For
example, it is possible to maintain a linked list in a sorted state, and
hold pointers to elements in the list, while adding data to the list. The
pointers do not need to be updated when new elements are added.
Design Considerations
find, insert
and erase
require a means of representing position.
Iterating over lists
Accessing list elements
A Bad Approach to Element Access
find (mylist,3); // sets current element
print (mylist); // print() modifies current element
cout << mylist.current();
A Naive (and bad) Approach to Element Access
a linked list is not an array, and is not a random access
data structure
And hence attempts to treat it as one inevitably lead to disaster.
Consider
the following code:
for (int i = 0; i < mylist.size(); ++i )
cout << mylist[i] << endl;
int pos = mylist.find ( "Thomas" );
mylist.insert ( "William" );
cout << mylist[pos]; // pos no longer refers to "Thomas"
This is a weakness that we are stuck with when we work with arrays, but
there is no reason why we should suffer the same for lists, because inserting
in a list does not change the mewmory address of the nodes.
The Pointer Approach to Element Access
These approaches may seem identical on the face of it, but there
are some key differences. For example, inserting an element before the
head node is problematic in the former case.
These are examples of ways one could write a function to insert
at the front, using the linkedlist-is-a-node approach:
void insert_front ( Node*& head, const element_type & x)
{
Node* tmp = new Node(head, x);
head = tmp;
}
// example client code
insert_front(mylist, x);
Node* insert_front ( Node* head, const element_type & x)
{
Node* tmp = new Node(head, x);
return tmp;
}
// example client code
head = insert_front(mylist, x);
The linked-list-is-a-node approach is problematic in that calling
them on a Node that is
not at the head of the list will result in disaster (it will "unlink" the
list, making access to elements beyond the operand inaccesible, and it will
leave the pointer to the operand pointing at garbage)
void insert_front ( const element_type & x)
{
Node* tmp = new Node(head, x);
head = tmp;
}
// example client code
mylist.insert_front(x);
Notice that client code does not have the opportunity to corrupt the list,
because the client code does not have to know which Node is at the front
of the list.
The Iterator Based Approach To Element Access
template <class T>
class List
{
Node* head;
public:
struct Node
{
Node(const T& x, Node* y):data(x),next(y) {}
T data;
Node* next;
};
void push_front(const T& x)
{
Node* tmp = new Node(x, head);
head = tmp;
}
List();
List(const List&);
List& operator=(const List&);
bool empty();
void pop_front();
T& front();
};
A Design Crisis
erase_after
and
insert_after
methods, that operate on the successor to the
function argument. The ugliness of this is noted.
A Simple Implementation
for ( Node* i = L.begin(); i!=L.end(); i=i-> next )
cout << i->data << endl;
Node* x = L.find (3);
L.insert_after(x,4);
Some Implementation Issues
swap()
function that swaps
the data of two lists. This function makes the assignment operator easier
to implement (it can reuse the copy constructor) and it also offers a
cheap destructive copy operation.
const
correctness. To declare a method
that gives out pointers to data in the list const
is dangerous,
because it would allow functions that accepted a const
reference
to a list to use these pointers to modify the list. So we provide
const
versions of the methods begin()
and front()
that return a const
pointer/reference.
template <class T> class List
{
public:
struct Node
{
Node(const T& data, Node* next=0):data(data),next(next) {}
Node* next;
T data;
};
List() : head(0) {}
List(const List& L) : head(0)
{
// copy in reverse order
for ( const Node* i = L.begin(); i!= L.end(); i=i->next )
push_front(i->data);
reverse();
}
void reverse()
{
Node* p = 0; Node* i = begin(); Node* n;
while (i)
{
n = i->next;
i->next = p;
p = i; i = n;
}
head = p;
}
void swap(List& x)
{
Node* tmp = head; head = x.head; x.head = tmp;
}
List& operator=(const List& x)
{
List tmp(x);
swap(tmp);
return *this;
}
~List() { clear(); }
void clear() { while (!empty()) pop_front(); }
bool empty() { return ! head; }
void push_front(const T& x) {
Node* tmp = new Node(x,head); head = tmp;
}
void pop_front() {
if (head) { Node* tmp = head; head=head->next; delete tmp; }
}
void insert_after(Node* x, const T& data)
{
Node* tmp = new Node(data, x->next);
x->next = tmp;
}
void erase_after(Node* x)
{
Node* tmp = x->next;
if (tmp)
{
x->next = x->next->next;
delete tmp;
}
}
T& front() { return head->data; }
const T& front() const { return head->data; }
Node* begin() { return head; }
Node* end() { return 0; }
const Node* begin() const { return head; }
const Node* end() const { return 0; }
private:
Node* head;
};
#include "simple_list.h"
#include <iostream>
int main()
{
List<int> X;
X.push_front(3);
X.push_front(2);
X.push_front(1);
for (List<int>::Node* it = X.begin(); it; it = it->next )
cout << it->data << endl;
X.reverse();
for (List<int>::Node* it = X.begin(); it; it = it->next )
cout << it->data << endl;
}
Commentary on the Code
Node
is straightforward, and has been
covered.
List()
The default constructor initialises head
to a
NULL
pointer. This is important, because of the class invariant that the element
one past the end of the list (or, in the case of an empty list, the pointer to
the "first Node") is signified by a NULL
pointer.
List(const List& L)
The copy constructor traverses its argument, but the problem is that
the data is inserted at the front. So the elements at the front of the
list are inserted first, and hence end up at the end of the list.
So we need to reverse the list to rectify the situation. This requires
a reverse() member function.
void reverse()
Implementing this function is delicate. We need to traverse the list,
and modify the next
pointer of each node, so it points to
the previous node. This requires that we maintain the following information
in our loop:
i
that we are changing.
next
pointer
in the current node to this addres, and it is not possible to
backtrack in a singly linked list.
next
pointer in the
current node, we have no way of accessing the next node.
So we need to save the address of the next node before
modifying the current node. Note the code:
while(i)
{
n = i->next;
This saves the address of the next node, so we
can continue moving through the list, after we modify the next
pointer:
i->next = p; // make the next pointer point to previous node.
p = i; // update previous node to point to "current"
i = n; // recall the address of the node after i
void swap(List& x)
The swap function is a very useful idiom in C++. It offers a means
to transfer ownership:
List tmp;
tmp.swap(x); // tmp "steals" the data from x
And in circumstances where it's desirable to swap data, it makes it possible
to do this in constant time (that is, the time does not depend on the
length of the list, since we only swap head pointers) We will later
see that this also provides us with a means to implement the
assignment operator in terms of the copy constructor.
List& operator=(const List& );
This is a very easy function to implement, with the help of a copy constructor.
~List()
The destructor removes all the nodes from the list. Since this functionality
is useful to client code, we implement this in a separate function,
clear()
.
void clear()
bool empty()
The implementation of this is straightforward. Note that there is an implicit
coversion from pointer to bool
, we needn't provide an explicit
conversion.
void push_front(const T& x)
This inserts a node. This is a very straightforward task -- create the
new node, and update the list so that the head node points to the
address of our new node.
void pop_front()
This removes the first node. Observant readers may notice that we don't
return the value of the deleted node in this function. There is an important
reason why we don't do this -- exception safety. A detailed
discussion of this topic is outside the scope of this discussion. For
further reading, I recommend Exceptional C++ by Herb Sutter.
void insert_after(Node* x, const T& data);
void erase_after(Node* x);
We previously discussed the reasons for inserting or erasing after
the current node -- it's because we can't backtrack in a singly linked list.
Note that we check the special case where erase_after
is called
on the last node of the list. In this case, erase_after
has
no effect.
T& front();
const T& front() const ;
We provide two versions of front()
.
The compiler automatically decides which version of the function gets called.
One of these is automatically
called when we have a const
list (for example, if a list is
passed to a function by const
reference), the other is used on
non-const lists. front()
can be used in conjunction with
pop_front
.
Foo x = mylist.front();
mylist.pop_front();
The exception safety benefit of this approach is that we know that we
have succesfully copied the element at the front of the list before it
is removed.
Node* begin();
Node* end();
const Node* begin() const;
const Node* end() const;
i != mylist.end()
as opposed to i!= NULL
)
Like the front()
member function, we need a const
and non-const
version. We discuss the reasoning behind
this further on.
A Critique of the Simple Implementation
Node
is an implementation detail, which
exposes the structure of the list. This violates encapsulation.
A more abstract and ideal representation would be to have a class
to represent position, and give the class similar semantics to the
familiar C++ pointer: the ++
operator to advance, and
the *
operator to get at the data in the node.
In the case of the list, it happens that Node
pointers
do not fare too badly as a means of representing position, because their
structure closely resembles the interface we want to expose to client code.
However, more advanced examples do not share this property. A good example
of this is a sorted list implemented as a binary search tree. A node in
such a class looks like this:
template <class T>
struct Node
{
Node* parent;
Node* left;
Node* right;
T data;
};
The structure of such a node is not very helpful to client code.
It is true that we could turn Node
into a full-fledged class,
with private data and public member functions. But we would still be stuck with
explicit pointers, and we'd also be in the awkward position of using a class
that serves two separate purposes (representing a node in a data structure,
and a bookmark for client code)
Introducing Iterators
*it; // fetch data referenced by an iterator;
++it; it++; // move a forward iterator forward.
--it; it--; // move bidirectional iterator backward.
it += n; it -=n; // increment/decrement random access iterator
i1 == i2; j1 == j2; // comparison
i1 = x.begin(); // assignment
// print out the elements of an STL set (implemented as a binary
// search tree)
for (set<double>::iterator it = x.begin(); it != x.end(); ++it )
cout << *it << endl;
// print out the elements of an STL list (doubly linked list)
for (list<double>::iterator it = y.begin(); it != y.end(); ++it )
cout << *it << endl;
// print out the elements of an STL vector (dynamic array)
for (vector<double>::iterator it = z.begin(); it != z.end(); ++it )
cout << *it << endl;
set
and not binary_search_tree
-- the STL
has diffused the complexity to the point that it really does behave
like a "set", and iterators deserve a substantial amount of the credit
for the success of this abstraction.
// print out the elements of an STL set (implemented as a binary
// search tree)
copy (x.begin(),x.end(),ostream_iterator<double>(cout,"\n"));
// print out the elements of an STL list (doubly linked list)
copy (y.begin(),y.end(),ostream_iterator<double>(cout,"\n"));
// print out the elements of an STL vector (dynamic array)
copy (z.begin(),z.end(),ostream_iterator<double>(cout,"\n"));
Implementing Iterators
T& operator*();
overloading the operator gives us pointer-like semantics for lists.
it++; ++it
it1 == it2; it1 != it2
iterator i; iterator j(it); i = j;
so we have:
( it +5 )
in constant time, because
this is a linear time operation in a linked list.
( it1 < it2 )
in constant time.
T& operator*();
iterator& operator++(); // ++it
iterator operator++(int); // it++
bool operator==(const iterator&); // i == j
bool operator!=(const iterator&); // i != j
iterator(); // iterator i;
iterator(Node*);
iterator(const iterator&); // iterator j(i)
iterator& operator=(const iterator&); // iterator j; j = i;
iterator begin(); // return iterator at front of list
iterator end(); // return iterator at end of list
insert_after(iterator,const T&); // insert after iterator
erase_after(iterator); // erase after iterator
const Correctness
begin()
and
end()
were not declared const
. There
is a reason for this -- suppose we pass our list to a function by
const
reference. If we declared begin()
const, then this code would be legal:
void do_something(const list<int> & a )
{
*a.begin() = 3; // modify list
}
const_iterator
. This supports the same
operations as iterator, with the following differences:
const_iterator
from an
iterator
.
operator*()
returns a const reference.
const_iterator begin() const;
const_iterator end() const;
const T& front() const;
One Last Thing: STL Compatibility
typedef T value_type; // type of element
typedef T& reference; // return type of operator*()
typedef T* pointer; // return type of operator->()
difference_type; // type used to measure distance between iterators
typedef std::forward_iterator_tag iterator_category;
iterator
from
std::iterator
. The standard iterator class is a template
that takes two parameters -- an iterator tag, and a type.
For example, for a list iterator
, we want to derive from
std::iterator<std::forward_iterator_tag, T >
and for a const_iterator
, we derive from
std::iterator<std::forward_iterator_tag, const T >
const_iterator
are similar,
just replace references and pointers with const
versions.
An STL Compatible Singly Linked List
#include <iterator>
template <class T>
class List
{
struct Node
{
Node(const T& x,Node* y = 0):m_data(x),m_next(y){}
T m_data;
Node* m_next;
};
Node* m_head;
public:
class iterator
: public std::iterator<std::forward_iterator_tag, T>
{
Node* m_rep;
public:
friend class const_iterator;
friend class List;
inline iterator(Node* x=0):m_rep(x){}
inline iterator(const iterator& x):m_rep(x.m_rep) {}
inline iterator& operator=(const iterator& x)
{
m_rep=x.m_rep; return *this;
}
inline iterator& operator++()
{
m_rep = m_rep->m_next; return *this;
}
inline iterator operator++(int)
{
iterator tmp(*this); m_rep = m_rep->m_next; return tmp;
}
inline reference operator*() const { return m_rep->m_data; }
inline pointer operator->() const { return m_rep; }
inline bool operator==(const iterator& x) const
{
return m_rep == x.m_rep;
}
inline bool operator!=(const iterator& x) const
{
return m_rep != x.m_rep;
}
};
class const_iterator
: public std::iterator<std::forward_iterator_tag, const T>
{
const Node* m_rep;
public:
friend class iterator;
friend class List;
inline const_iterator(const Node* x=0):m_rep(x){}
inline const_iterator(const const_iterator& x):m_rep(x.m_rep) {}
inline const_iterator(const iterator& x):m_rep(x.m_rep){}
inline const_iterator& operator=(const const_iterator& x)
{
m_rep=x.m_rep; return *this;
}
inline const_iterator& operator=(const iterator& x)
{
m_rep=x.m_rep; return *this;
}
inline const_iterator& operator++()
{
m_rep = m_rep->m_next; return *this;
}
inline const_iterator operator++(int)
{
const_iterator tmp(*this); m_rep = m_rep->m_next; return tmp;
}
inline reference operator*() const { return m_rep->m_data; }
inline pointer operator->() const { return m_rep; }
inline bool operator==(const const_iterator& x) const
{
return m_rep == x.m_rep;
}
inline bool operator!=(const const_iterator& x) const
{
return m_rep != x.m_rep;
}
};
List() : m_head(0) {}
List(const List& L) : m_head(0)
{
for ( const_iterator i = L.begin(); i!=L.end(); ++i )
push_front(*i);
reverse();
}
void reverse()
{
Node* p = 0; Node* i = m_head; Node* n;
while (i)
{
n = i->m_next;
i->m_next = p;
p = i; i = n;
}
m_head = p;
}
void swap(List& x)
{
Node* tmp = m_head; m_head = x.m_head; x.m_head = tmp;
}
List& operator=(const List& x)
{
List tmp(x);
swap(tmp);
return *this;
}
~List() { clear(); }
void clear() { while (!empty()) pop_front(); }
inline void push_front(const T&x)
{
Node* tmp = new Node(x);
tmp->m_next = m_head;
m_head = tmp;
}
inline void pop_front()
{
if (m_head)
{
Node* newhead = m_head->m_next;
delete m_head;
m_head = newhead;
}
}
inline bool empty() { return m_head; }
inline T& front() { return *begin(); }
inline const T& front() const { return *begin(); }
inline iterator begin() { return iterator(m_head); }
inline iterator end() { return iterator(); }
inline const_iterator begin() const { return m_head; }
inline const_iterator end() const { return const_iterator(); }
void erase_after (iterator& x)
{
Node* tmp = x.m_rep->m_next;
if (x.m_rep->m_next)
x.m_rep->m_next = x.m_rep->m_next->m_next;
delete tmp;
}
void insert_after (iterator& x, const T& y)
{
Node* tmp = new Node(y,x.m_rep->m_next);
x.m_rep->m_next = tmp;
}
};
// linked list driver program
#include "linked_list.h"
#include <algorithm>
#include <numeric>
#include <iostream>
template <class T>
void print(const List<T> & x)
{
std::copy(x.begin(), x.end(), std::ostream_iterator<T>(std::cout,"\n"));
}
int main()
{
List<int> x;
x.push_front(3);
x.push_front(4);
x.push_front(9);
print(x);
List<int>::iterator it = std::find(x.begin(),x.end(),9);
if (it != x.end())
std::cout << *it << std::endl;
x.insert_after (it,11);
std::cout << "The list is" << std::endl;
print (x);
int sum = std::accumulate ( x.begin(),x.end(),0);
int sumsq = std::inner_product( x.begin(),x.end(),x.begin(),0);
std::cout << "sum " << sum << std::endl;
std::cout << "sum sq " << sumsq << std::endl;
std::cout << "Copying ... " << std::endl;
List<int> y(x);
print (y);
std::cout << "Done with copy" << std::endl;
List<int>::const_iterator cit;
std::cout << "** ";
for (cit = y.begin(); cit != y.end(); ++cit)
std::cout << *cit << ' '; std::cout << std::endl;
}
Variations on the Theme
template < class T>
struct Node
{
Node( const T& data, Node* prev, Node* next):data(data),prev(prev),next(next) {}
T data;
Node* next;
Node* prev;
};
An iterator in a doubly linked list is bidirectional -- we can move
backwards or forwards in a doubly linked list.
Doubly linked lists do not suffer from the
awkward special-casing issues associated with singly linked lists.
As such, they are useful as general purpose classes, which is why
the STL includes a class with similar performance characteristics
(the C++ standards document tends to beat around the bush -- it doesn't
specifically mandate a linked list, but rather, enumerates the properties of
a linked list, and mandates a class with those properties)
Note that it may not be necessary to expose the circular
structure to client code. If the client wants to traverse the list in
"rotated" order, they can call a constant-time rotate, and traverse it
like an ordinary list. The only disadvantage of this is that is inconvenient
in that it involves modifying the data in the list.
It's interesting to note that the SGI implementation of the
(x0 , x2 , ... , xn ) ->
(xi mod n , x(i+2) mod n , ... , x(i-1+n) mode n )
This becomes a
constant time operation.
std::list
class uses a sentinel node and a circular list.
Note that they do not use the same trick for their singly linked
slist
class (which is very similar to the previous example), because
while list
is a general use class, slist
is intended
to provide maximum speed, and minimum size overhead.
begin() == end()
.
We can circumvent this problem by using a sentinel, but using a sentinel
is a problem, in that the sentinel will get in the way if we want to use
ranges that span the "end" of the list. Another alternative is to
linearise iterators by storing state information -- an initial node, and
a count of how many "laps" the iterator has done. This results in iterators
that are larger, and more fragile (since the state depends on two node pointers,
deleting either pointer will invalidate the iterator)
But it offers the following benefits:
Implementing A Circular Linked List with a Sentinel
inline T& operator*() { return m_rep->m_next->m_data; }
The iterator has the effect of abstracting an "always look ahead by one
position" strategy. Instead of providing messy operations such as
find the iterator before the element whose value is 9, or
erase the iterator after this one, which amounts to forcing
the client code to look ahead, the iterators automatically do this work.
This is possible because of the fact that we have a sentinel node, hence
every node in the list has a predecessor. In fact, even the
sentinel node has a predecessor, because the list is circular.
m_node
that points to the last element in the list.
Hence m_node->next
points to the dummy sentinel node,
and m_node->next->next
points to the first node
in the list.
void reverse()
This function is tricky to implement, because looping in a circular list
is not that easy. The problem is that when iterating over a circular list,
the termination condition is the same as the initial condition. One
gets around this problem by terminating on an empty list, then
using a do-while loop to perform the first step without a check.
There is another subtle issue -- recall that the data member m_node
points to the last member of the list. So we need to reassign this to
a pointer to the first node, which is
m_node->m_next>m_next
. (Note that m_node->m_next
is the dummy sentinel node).
inline void push_front(const T&x); { insert (begin(),x); }
inline void push_back(const T& x) { insert (end(),x); }
inline void pop_front() { erase(begin()); }
inline bool empty() { return m_node == m_node->m_next; }
Push and pop functions admit trivial implementations, because of the lack of
a need for special-casing in our new setting. Note that even though
end()
is "one past the end of the list", it is still a valid
iterator, and we can use it in a call to insert()
.
Note also that empty()
checks that the sentinel is its own
successor.
void erase (iterator x)
The implementation of the erase() method suffers some awkwardness, but
it's not that much worse than the dreaded "erase_after" method.
First, it checks that we're not trying to erase the sentinel node.
if (x==end())
return;
Then
it checks to see if the last node on the list is being erased.
if (x.m_rep->m_next == m_node)
m_node = x.m_rep;
Since
m_node
needs to point to the last node on the list,
this variable needs to be reassigned if the last element of the list
is removed. If the iterator does reference the last element on the
list, its corresponding pointer, m_rep
points to the
element before that, thanks to our off-by-one iterator implementation.
So in this case, we assign the value of x.m_rep
to
m_node
.
The last few lines simply cut out the node referenced by the iterator :
Node* tmp = x.m_rep->m_next;
x.m_rep->m_next = x.m_rep->m_next->m_next;
delete tmp;
void insert (iterator x, const T& y)
This is similar to our old insert_after
function, except it
requires a check for the case where we insert before end()
.
In the case where we insert before end()
, the
newly inserted node becomes the last node in the list, so it's necessary to
assign the value m_node
accordingly.
void rotate(iterator)
This rotates the list so that the iterator argument is at the front
of the list. Rotation may seem like an operation that is "natural"
for a circular list, but the only requirement for a constant
time implementation is that both ends of the list be quickly accesible.
#include <iterator>
template <class T>
class List
{
struct Node
{
Node(const T& x,Node* y = 0):m_data(x),m_next(y){}
T m_data;
Node* m_next;
};
Node* m_node;
public:
class iterator
: public std::iterator<std::forward_iterator_tag, T>
{
Node* m_rep;
public:
friend class List;
friend class const_iterator;
typedef T value_type;
typedef T& reference;
typedef T* pointer;
typedef int difference_type;
typedef std::forward_iterator_tag iterator_category;
inline iterator(Node* x=0):m_rep(x){}
inline iterator(const iterator& x):m_rep(x.m_rep) {}
inline iterator& operator=(const iterator& x)
{
m_rep=x.m_rep; return *this;
}
inline iterator& operator++()
{
m_rep = m_rep->m_next; return *this;
}
inline iterator operator++(int)
{
iterator tmp(*this); m_rep = m_rep->m_next; return tmp;
}
inline reference operator*() const { return m_rep->m_next->m_data; }
inline pointer operator->() const { return m_rep->m_next; }
inline bool operator==(const iterator& x) const
{
return m_rep == x.m_rep;
}
inline bool operator!=(const iterator& x) const
{
return m_rep != x.m_rep;
}
};
class const_iterator
: public std::iterator<std::forward_iterator_tag, const T>
{
const Node* m_rep;
public:
friend class List;
friend class iterator;
inline const_iterator(const Node* x=0):m_rep(x){}
inline const_iterator(const const_iterator& x):m_rep(x.m_rep) {}
inline const_iterator(const iterator& x):m_rep(x.m_rep){}
inline const_iterator& operator=(const const_iterator& x)
{
m_rep=x.m_rep; return *this;
}
inline const_iterator& operator=(const iterator& x)
{
m_rep=x.m_rep; return *this;
}
inline const_iterator& operator++()
{
m_rep = m_rep->m_next; return *this;
}
inline const_iterator operator++(int)
{
const_iterator tmp(*this); m_rep = m_rep->m_next; return tmp;
}
inline reference operator*() const { return m_rep->m_next->m_data; }
inline pointer operator->() const { return m_rep->m_next; }
inline bool operator==(const const_iterator& x) const
{
return m_rep == x.m_rep;
}
inline bool operator!=(const const_iterator& x) const
{
return m_rep != x.m_rep;
}
};
List() : m_node(new Node(T())) { m_node->m_next = m_node; }
List(const List& L) : m_node(new Node(T()))
{
m_node->m_next=m_node;
for ( const_iterator i = L.begin(); i!= L.end(); ++i )
push_front(*i);
reverse();
}
void reverse()
{
if (empty())
return;
Node* new_m_node = m_node->m_next->m_next;
Node* p = m_node; Node* i = m_node->m_next; Node* n;
do
{
n = i->m_next;
i->m_next = p;
p = i; i = n;
}
while (p!=m_node);
m_node = new_m_node;
}
void swap(List& x)
{
Node* tmp = m_node; m_node = x.m_node; x.m_node = tmp;
}
List& operator=(const List& x)
{
List tmp(x);
swap(tmp);
return *this;
}
~List() { clear(); delete m_node; }
void clear() { while (!empty()) pop_front(); }
inline void push_front(const T&x)
{
insert (begin(),x);
}
inline void push_back(const T& x)
{
insert (end(),x);
}
inline void pop_front()
{
erase(begin());
}
inline bool empty() { return m_node == m_node->m_next; }
inline T& front() { return *begin(); }
inline const T& front() const { return *begin(); }
inline T& back() { return m_node->data; }
inline const T& back() const { return m_node->data; }
inline iterator begin() { return iterator(m_node->m_next); }
inline iterator end() { return iterator(m_node); }
inline const_iterator begin() const
{
return const_iterator(m_node->m_next);
}
inline const_iterator end() const
{
return const_iterator(m_node);
}
void erase (iterator x)
{
if (x==end())
return;
if (x.m_rep->m_next == m_node)
m_node = x.m_rep;
Node* tmp = x.m_rep->m_next;
x.m_rep->m_next = x.m_rep->m_next->m_next;
delete tmp;
}
void insert (iterator x, const T& y)
{
Node* tmp = new Node(y,x.m_rep->m_next);
if (x.m_rep == m_node)
m_node = tmp;
x.m_rep->m_next = tmp;
}
// rotate x to beginning
void rotate (iterator x)
{
if (x == end())
return;
Node* sentinel = m_node->m_next;
m_node->m_next = m_node->m_next->m_next;
sentinel->m_next = x.m_rep->m_next;
x.m_rep->m_next = sentinel;
m_node = x.m_rep;
}
};
Implementing A Circular Linked List without a Sentinel
push_front(), push_back(),pop_front()
, and some
extra baggage for the iterators. The up()
method is provided
to offer a means of handing the entire list as a range to an STL function.
For example, the range it, it.up()
will include the entire
list, while it, it
would obviously be an empty list.
On the other hand, the rotate()
function is somewhat simpler
in this version. The driver
code for this list can transparently handle ranges that straddle the end.
The main penalty is not so much in the increased iterator sizes (we
don't need to construct iterators very often anyway), but in the
increased bookkeeping overhead. The relative slowness of comparing and
incrementing iterators is a liability in cases where a lot of simple operations
need to be performed on large lists. On the other hand, if iteration speed
is not a potential performance bottleneck, these lists could prove useful.
They have the versatility of the sentinel based linked list, and lessmemory overhead.
Comparison Between Different Lists
list
class.
The SGI list is a doubly linked list. This list has a characteristic
that sets it apart from singly linked lists -- it supports bidirectional
iteration.
Support for bidirectional iteration is an STL requirement, so a linked
list that meets the requirements for the STL list must be
doubly linked. As we will see, there is a pattern emerging in the
comparison -- there is an escalating tradeoff between minimising overhead
and maximising flexibility. The STL class aims to offer as much flexibility
as possible.
Operation/Feature | Simple list | Our Circular list | SGI STL list |
---|---|---|---|
Circular | No | Yes | Yes |
Doubly linked | No | No | Yes |
Sentinel | No | Yes | Yes |
Empty list overhead | One pointer | One pointer and one node | One pointer and one node |
Overhead per node | One pointer and data | One pointer and data | Two pointers and data |
Constant time insert at front | Yes | Yes | Yes |
Constant time append at back | No | Yes | Yes |
Constant time remove at front | Yes | Yes | Yes |
Constant time remove at back | No | No | Yes |
Access first element | Constant time | Constant time | Constant time |
Access last element | Linear time | Constant time | Constant time |
insert() | insert_after only | Yes | Yes |
erase() | erase_after only | Yes | Yes |
forward iteration | Yes | Yes | Yes |
reverse iteration | No | No | Yes |
Constant time rotate | No | Yes | Yes (using splice()) |
While we've covered a lot of issues, there are more issues that the reader should be aware of. Think of this section as an invitation to further exploration.
is an important consideration in library design, especially where templates are involved. There are three possible guarantees a class can make for a given method:
Exception safety is one of the most essential topics for library designers, especially when template code is involved. Further reading is highly recommended.
is a classical problem with implementing template classes. The problem
is that templates generate a certain amount of object code per template instance. So for example, if you create a
list<int>
,
list<float>
,
list<string>
, and a
list<double>
, then the amount of object code generated
is approximately 4 times as much as what you'd have if you only created
one of the types of lists.
The solution to this problem is to pull as much code as possible out of
the templates. It is possible to do this for containers of pointers, by
using specialisations:
template <class T> class List<T*> : private List<void*> { public: typedef List<void*> super; inline void push_back(const T* & x) { super::push_back(x); } inline T* front() { return (T*) super::front(); } ... };
This way, the overhead per class amounts to a small type-safety wrapper,
but the bulk of the implementation need only be compiled once.
The approach I would recommend to this is to use an STL list as a basis,
and use private inheritance.
One of the nice things about this approach is that you can choose to
only re-implement the member functions that you need, to minimise
the overhead. In this example, I only implement push_front
,
pop_front
and non-const iterators (conventional, as
well as back/front inserters).
Most of the member functions are very simple to implement, one simply
forwards the call to the base class.
This section is not intended to be a tutorial so much as it is an
introduction to this template bloat issue (even so, it's as detailed
as anything you're going to find in a book).
However, in this instance, an example is in itself very instructive.
Here is some example code (download) that
illustrates the basic approach.
One can test the effectiveness of the example, by comparing the
size of object
files generated by creating several instances of PtrList
parametrised by different types, and do the same for the
STL list class. One can explicitly request instances with the
following syntax:
template class list<int>
The only compiler I know of that uses the
derive-from-void*
idiom presented here is Metrowerks Code
Warrior.
template <class T> class PtrList : private std::list<void*> { typedef std::list<void*> base_class; public: inline void push_back (T* x) { base_class::push_back(x); } inline void push_front (T* x) { base_class::push_front(x); } friend class back_insert_iterator<PtrList>; friend class front_insert_iterator<PtrList>; class iterator : private base_class::iterator { public: friend class PtrList; inline iterator& operator++() { base_class::iterator::operator++(); return *this; } inline iterator operator++(int x) { return base_class::iterator::operator++(x); } inline iterator& operator--() { base_class::iterator::operator--(); return *this; } inline iterator operator--(int x) { return base_class::iterator::operator--(x); } inline bool operator==(const iterator& right) const { return base_class::iterator::operator==(right); } inline T* operator*() { return (T*)base_class::iterator::operator*(); } private: inline iterator ( const base_class::iterator & x) : base_class::iterator(x) {} }; iterator begin() { return iterator(base_class::begin()); } iterator end() { return iterator(base_class::end()); } }; template<class T> class back_insert_iterator<PtrList<T> > : private back_insert_iterator<std::list<void*> > { typedef back_insert_iterator<std::list<void*> > back_inserter_base; public: back_insert_iterator(PtrList<T> & x) : back_inserter_base(x) {} back_insert_iterator& operator=(T* x) { back_inserter_base::operator=(x); return *this; } back_insert_iterator& operator*(){ return *this; } back_insert_iterator& operator++(){ return *this; } back_insert_iterator operator++(int){ return *this; } }; template<class T> class front_insert_iterator<PtrList<T> > : private front_insert_iterator<std::list<void*> > { typedef front_insert_iterator<std::list<void*> > front_inserter_base; public: front_insert_iterator(PtrList<T> & x) : front_inserter_base(x) {} front_insert_iterator& operator=(T* x) { front_inserter_base::operator=(x); return *this; } front_insert_iterator& operator*(){ return *this; } front_insert_iterator& operator++(){ return *this; } front_insert_iterator operator++(int){ return *this; } };
There are many textbooks that cover linked lists, and they reflect different approaches. Unfortunately, the vast majority of them do a fairly poor job at it. Of these titles, the only one that provides much insight to beginners on the topic of implementing data structures is the Weiss book. The Koenig/Moo book is also good, but it addresses a more advanced audience.
Of these books, I'd enthusiastically recommend Cormen et al as a pure theory book for Data Structures. I believe theory should be studied separately from implementation, it is complex and rich enough in its own right to deserve special attention. On the other hand, sticky implementation details are important, and this is where the books by Weiss prove valuable. The Koenig and Moo book also does a good job at addressing many library implementation and generic programming issues, but it is not a data structures book (the main theme is the related topic of library design)
std::cout