# Simple classifiers from data dependent hypothesis classes Page: 4 of 9

This
**article**
is part of the collection entitled:
Office of Scientific & Technical Information Technical Reports and
was provided to Digital Library
by the UNT Libraries Government Documents Department.

#### Extracted Text

The following text was automatically extracted from the image on this page using optical character recognition software:

one. Then Nn(z2n, F) is the number of different sets

in

{z2n n If : f E Fzm, zn C z2n}.

We define the shatter coefficients as

Sn12n(F) = SUpNAn(z2n,-F).

Z2n

Theorem 2.1 (Cannon et al., 2002) below provides a

bound on estimation error when using data dependent

hypothesis classes. The proof of this theorem relied

heavily on the main result from Cannon et al. (2002)P'( sup le(f) - e(f)I> e) <

fEZ,'

2 _ n,2

2Sn,2n(.Fn)e ee_ 2(2)

which bounds the error deviance.

Theorem 2.1. Let Y = {0, 1} and let F be a data-

dependent classifier space. Given an i.i.d. n-sample

zn, let

e*n = inf e(f)

denote the optimal generalization error in the data de-

pendent class F'z. and choose f to solve the learning

strategy

mmn (f) (3)

of minimizing the empirical error over the class FYZ,-

Then for any n and e,

'P(e(f) - e*, > e) < 2Sn,2n(F)e'e~ 8

In the next section we focus on a data dependent class

we call linear point-to-point (LPP). The class LPP

is a subset of the class of linear classifiers. When a

data dependent class F is obtained by restricting a

traditional class a

Sn,2n(F) (2n)vc(3) + 1.

Since the VC dimension for the class of linear classifiers

in Rd is given by d + 1, we have

Sn,2n(Fn) C (2n)d+1 + 1.

These bounds may be very loose. Indeed, Cannon

et al. (2002) showed that the shatter coefficients for

LPP are in fact independent of dimension and sat-

isfy Sn,2n(Fn) < 8n3 - 2n. These bounds may be

plugged directly into Theorem 2.1 to obtain proba-

bilistic bounds on estimation error similar to (1).

It is interesting to note that for LPP the shatter coef-

ficients satisfy (see Cannon et al. 2002, p. 348)

Sn,2n(n) < (2n)VC2n(-) + 1,where VC2n(F) is the data dependent VC dimension

of order 2n and is defined asVC2n(F) =

max

{n: 3 ZnCZ2fl: P2 shatters z.}n.

In words, VC2n(F) is the size of the largest subset of

some 2n points which is shattered by the data depen-

dent class on those 2n points. When the data depen-

dent class F is obtained by restricting a traditional

class then VC2n(F) will be less than or equal to the

VC dimension VC(a).

3. A Simple Linear Classifier

We present a representative example of a simple clas-

sifier derived from a data dependent hypothesis class.

The class we consider was introduced by Cannon et al.

(2002) as a restriction of the traditional class of lin-

ear classifiers. For computational considerations we

let X = Rd with the usual inner product and metric.

The linear point-to-point (LPP) data-dependent hy-

pothesis class is the subset of linear classifiers whose

orientations are determined by the pairwise differences

between samples On = xn(i)-xn(j), i # j. For a fixed

sample pair (xn(i), xn(j)), consider the family of lin-

ear classifiers defined by all hyperplanes orthogonal to

An. The class of indicator sets on this family is lin-

early ordered by subset inclusion. The LPP class is

the (polynomial) union of these families over all sam-

ple pairs in the training data. More formally LPP is

the function class

{f : f (x) = H (On - x - b), i, j < n, b E R} (4)

for i : j and where 7-t is the heaviside function.

The ordering structure on the indicator sets is the

defining characteristic of a simple class. Indeed ex-

ploiting this structure makes it an easy matter to de-

sign a polynomial-time learning algorithm that opti-

mizes the empirical risk (3) over the class. Consider

the following brute force approach where we are given

an n-sample zn as input. For a fixed on we com-

pute and sort the dot products Vk A - xn(k) for

k = 1 to n and order the elements of the n-sample zn

with respect to the sorted list. Now the dot products

{vkI1 1are used to determine thresholds bk = vk+Vk+1

for a sequence of n - 1 linear classifiers whose orienta-

tions are determined by An. Since the related indica-

tor sets are ordered by subset inclusion and change

incrementally at each new threshold, they are triv-

ial to compute using the newly sorted zn.2 Similarly

2The extreme classifiers that label all points the same

## Upcoming Pages

Here’s what’s next.

## Search Inside

This article can be searched. **Note: **Results may vary based on the legibility of text within the document.

## Tools / Downloads

Get a copy of this page or view the extracted text.

## Citing and Sharing

Basic information for referencing this web page. We also provide extended guidance on usage rights, references, copying or embedding.

### Reference the current page of this Article.

Cannon, A. (Adam); Howse, J. W. (James W.); Hush, D. R. (Donald R.) & Scovel, James C. Simple classifiers from data dependent hypothesis classes, article, January 1, 2003; United States. (https://digital.library.unt.edu/ark:/67531/metadc932925/m1/4/: accessed April 20, 2019), University of North Texas Libraries, Digital Library, https://digital.library.unt.edu; crediting UNT Libraries Government Documents Department.