Problem Set 4 Partial Solutions
Problem 4.3: Here is a short Scheme program to find the answer to 4.3.
(define (log2 n)
(/ (log n) (log 2)))
(define (info p q)
(if (or (= p 0) (= q 0))
0
(* -1 (+ (* p (log2 p)) (* q (log2 q))))))
(define (info-gain ys1 ns1 ys2 ns2 ys3 ns3)
(let* ((tot1 (+ ys1 ns1))
(tot2 (+ ys2 ns2))
(tot3 (+ ys3 ns3)))
(- (info (/ ys1 tot1) (/ ns1 tot1))
(+
(* (/ tot2 tot1)
(info (/ ys2 tot2) (/ ns2 tot2)))
(* (/ tot3 tot1)
(info (/ ys3 tot3) (/ ns3 tot3)))))))
We find:
>>> (info-gain 100 100 70 40 30 60)
0.06665370714512742
>>> (info-gain 100 100 12 0 88 100)
0.06276448651079403
>>> (info-gain 100 100 13 0 87 100)
0.06826219558661883
So N = 13 is the smallest value of N that would make the second attribute more informative than the first.