A perceptron network (F. Rosenblatt, 1957) also has 2 layers as in the previous Hebb's network, except the learning law is different. This is a supervised learning algorithm based on some prior knowledge.

We first consider the case in which there is only output node in the
network. Presented with a pattern vector
, the
output is computed to be

where is the weight vector and is the bias. The inequality can also be written as:

As shown in the figure, this network is a 2-class classifier. The weight vector represents the normal direction of a hyperplane in the n-D space, which partitions the space into two halves each for one of the two classes. The inner product

is proportional to the projection of onto the normal direction , which is either greater or smaller than the bias , the distance of a hyperplane with normal direction to the origin, depending on to which of the two classes a specific pattern vector belongs.

The binary output is either indicating belongs to class 1 ( ) or indicating belongs to class 2 ( ), i.e.,

The weight vector is obtained in the training process based on a set of training samples with known classes (labeled): , where is an n-D vector representing a pattern and the binary label indicates to which one of the two classes or the input belongs.

Specifically, in each step of the training process, one of the patterns
randomly chosen is presented to the input nodes for the network
to generate an output . This output is then compared with the the desired
output corresponding to this input to obtain their difference ,
based on which the weight vector is then updated by the following
learning law:

where is the learning rate and

Here represents the error, the difference between the actual output and the desired output, and the learning law is called the -rule. How the learning law works can be shown by the following three cases:

- If , , the weights are not modified.
- If the desired output is but
and therefore
, then and

Next time when the same is presented, the net input is

i.e., the output will be more likely to be the same as the desired . - If the desired output is but
and therefore
, then and

Next time when the same is presented, the net input is

i.e., the output will be more likely to be the same as the desired .

**Perceptron convergence theorem:**

If and are two linearly separable clusters of patterns, then a perceptron will develop a in finite number of training trials to classify them.

As the perceptron network is only capable of linear operations, the corresponding classification is limited to classes that are linearly separable by a hyperplane in the n-D space. A typical example of two classes not linearly separable is the XOR problem, where the 2-D space is divided into four regions for two classes 0 and 1, just like the Karnaugh map for the exclusive-OR of two logical variables .

When there are output nodes in the network, the network becomes a multiple-class classifier. The weight vectors () for these output nodes define hyperplanes that partition the n-D space into multiple regions each corresponding to one of the classes. Specifically, if , then the n-D space can be partitioned into as many as regions.

The limitation of linear separation could be overcome by having more than one learning layer in the network. However, the learning law for the single-layer perceptron network no longer applies. New training method is needed.