Andrew Ng机器学习编程作业代码分析1——Linear Regression

yalewoo 最后修改于 2019-05-12 发表于 2016-04-02 8,031 views 6

机器学习 matlab, 机器学习, Ng笔记

Coursera上Andrew Ng的机器学习课程有8次编程作业。本帖记录我练习过程中学到的知识，希望对大家有帮助。

第一次编程作业之前，我先介绍一些matlab的基础知识。

Matlab基础

Matlab的工作目录

使用matlab中图形化的Current Folder面板可以修改当前工作目录

只有进入工作目录， Matlab才能默认找到该目录下的各种文件。

还可以使用命令来调整工作目录：

pwd 查看当前工作目录

cd 进入某目录

ls 列出当前目录下的内容

m脚本文件

matlab是解释型的语言，在命令行界面可以输入命令执行。脚本文件就是把多个命令合在一起，在命令行调用这个脚本文件就可以执行文件里面的一句句命令。

例如在命令行输入两条命令

执行后可以在Workspace窗口看到已有的变量

我们也可以使用脚本文件来完成相同的事情。新建一个文件，内容为

c = 7
d = c*7

1 2	c = 7 d = c*7

保存到当前工作目录下，命名script1.m。然后在命令界面输入script1，就相当于执行了文件里的这两条语句。

之后在Workspace窗口可以看到变量c和d。

m函数文件

函数文件用来定义matlab中的函数，可以供上层调用。函数文件要保存为函数名.m ，才可以通过函数名来调用。经过我的测试，文件名和文件中的函数名不一致时，以文件名为准。

文件内容的格式如下

function 返回值 = 函数名(输入参数)
	% YOUR CODE HERE
end

function 返回值 = 函数名(输入参数)

% YOUR CODE HERE

end

返回值和输入参数都可以有多个，之间用逗号隔开。返回值有多个的时候要用方括号包起来。

function [返回值1, 返回值2] = 函数名(输入1，输入2，输入3)
	% YOUR CODE HERE
end

function [返回值1, 返回值2] = 函数名(输入1，输入2，输入3)

% YOUR CODE HERE

end

示例：

我们新建一个f1.m，内容如下

function s = f1(a)
    s = a+8;
end

function s = f1(a)

s = a+8;

end

保存到工作目录后就可以使用这个函数

语句中的分号

语句不带分号会输出运行结果，如果语句带分号则不输出结果。

ex1-单变量线性回归

第一次编程作业的文件如下图

脚本文件ex1用来执行单变量线性回归，ex1_multi.m用来执行多变量线性回归。submit.m用来提交你的作业到服务器，本文不包含对这部分代码的分析。

我们先看ex1的代码

0. Initialization

初始化部分包含3个语句

 %% Initialization
clear ; close all; clc

1 2	%% Initialization clear ; close all; clc

clear 清除工作区的所有变量。还可以后面跟变量名来清除某个变量。

close all 关闭所有窗口（显示图像的figure窗口）

clc 清除命令窗口的内容（就是命令界面以前的命令）

两个百分号%%是matlab中用来表示代码块的注释。从%%开始到下一个%%之间会作为一个代码块，在matlab中查看时会用黄白相间显示

1. warmUpExercise.m

执行这部分的脚本如下

 %% ==================== Part 1: Basic Function ====================
 % Complete warmUpExercise.m 
fprintf('Running warmUpExercise ... \n');
fprintf('5x5 Identity Matrix: \n');
warmUpExercise()

fprintf('Program paused. Press enter to continue.\n');
pause;

%% ==================== Part 1: Basic Function ====================

% Complete warmUpExercise.m

fprintf('Running warmUpExercise ... \n');

fprintf('5x5 Identity Matrix: \n');

warmUpExercise()

fprintf('Program paused. Press enter to continue.\n');

pause;

fprintf和c语言中的printf用法类似，也支持%d等占位符，也可以直接输出字符串，\n表示换行符。Matlab中字符串用单引号括起来。

pause用来暂停。

中间调用了warmUpExercise函数，也就是warmUpExercise.m对应的函数。这个函数要求输出一个5*5的单位矩阵，直接使用eye函数就可以了。

warmUpExercise.m的内容如下

function A = warmUpExercise()
    A = eye(5);
end

function A = warmUpExercise()

A = eye(5);

end

输入参数无，返回值A。

之后在命令界面可以调用这个函数（支持tab自动补全）

2. plotData.m

执行这部分的脚本如下

 %% ======================= Part 2: Plotting =======================
fprintf('Plotting Data ...\n')
data = load('ex1data1.txt');
X = data(:, 1); y = data(:, 2);
m = length(y); % number of training examples

 % Plot Data
 % Note: You have to complete the code in plotData.m
plotData(X, y);

fprintf('Program paused. Press enter to continue.\n');
pause;

%% ======================= Part 2: Plotting =======================

fprintf('Plotting Data ...\n')

data = load('ex1data1.txt');

X = data(:, 1); y = data(:, 2);

m = length(y); % number of training examples

% Plot Data

% Note: You have to complete the code in plotData.m

plotData(X, y);

fprintf('Program paused. Press enter to continue.\n');

pause;

该部分先从文件读取数据，然后调用plotData来画图

目录中的ex1data1.txt就是要输入的数据，格式如下（此处是该文件内容的一小部分）

6.1101,17.592
5.5277,9.1302
8.5186,13.662
7.0032,11.854
5.8598,6.8233
8.3829,11.886
7.4764,4.3483

6.1101,17.592

5.5277,9.1302

8.5186,13.662

7.0032,11.854

5.8598,6.8233

8.3829,11.886

7.4764,4.3483

使用load函数来读取文件，会自动返回生成的矩阵

在工作区窗口可以看到data的类型

97行2列的矩阵。可以看出ex1data1.txt中有97行数据。

Matlab中矩阵和向量的下标都是从1开始，而不像c语言中从0开始。

下面的代码把data矩阵的第一列给X，第2列给y。X和y的类型都是列向量（n*1矩阵）。

X = data(:, 1); y = data(:, 2);

引用矩阵中的元素是通过括号。假设a是一个m*n矩阵

a(3,4)就是a的第3行第4列的元素。

冒号表示所有

a(:,4)表示矩阵a的第4列元素，这个结果是列向量（m*1矩阵）

a(3,:)表示a的第3行元素，是行向量（1*n矩阵）

引用向量中的元素时，括号里只有1个数字，例如b是向量

b(3)表示b中第3个元素。

m = length(y);

length函数，返回向量y的长度。这也是我们的训练集中实例的个数。

最后调用plotData(X, y)来画图。plotData.m的内容如下

function plotData(x, y)
    figure; % open a new figure window

    plot(x, y, 'rx', 'MarkerSize', 10); % Plot the data
    ylabel('Profit in $10,000s'); % Set the yaxis label
    xlabel('Population of City in 10,000s'); % Set the xaxis label
end

function plotData(x, y)

figure; % open a new figure window

plot(x, y, 'rx', 'MarkerSize', 10); % Plot the data

ylabel('Profit in $10,000s'); % Set the yaxis label

xlabel('Population of City in 10,000s'); % Set the xaxis label

end

plot用来画点。’rx’表示红色，x型。’MarkerSize’, 10 表示大小是10 。plot的用法非常灵活，可以参考官方文档。此处的格式为plot(X1,Y1,LineSpec, ‘PropertyName’, PropertyValue,…)

xlabel和ylabel用于设置坐标说明。

在我们的脚本ex1中，X存放了数据第一列，y存放了数据的第二列，调用画图函数就可以画出散点图了

3. Gradient descent

这一部分主要任务是计算代价和梯度。相关知识见Andrew Ng机器学习课程笔记2——线性回归 | 雅乐网

先来看一看这部分的脚本代码

 %% =================== Part 3: Gradient descent ===================
fprintf('Running Gradient Descent ...\n')
X = [ones(m, 1), data(:,1)]; % Add a column of ones to x
theta = zeros(2, 1); % initialize fitting parameters
 % Some gradient descent settings
iterations = 1500;
alpha = 0.01;
 % compute and display initial cost
computeCost(X, y, theta)
 % run gradient descent
theta = gradientDescent(X, y, theta, alpha, iterations);
 % print theta to screen
fprintf('Theta found by gradient descent: ');
fprintf('%f %f \n', theta(1), theta(2));
 % Plot the linear fit
hold on; % keep previous plot visible
plot(X(:,2), X*theta, '-')
legend('Training data', 'Linear regression')
hold off % don't overlay any more plots on this figure
 % Predict values for population sizes of 35,000 and 70,000
predict1 = [1, 3.5] *theta;
fprintf('For population = 35,000, we predict a profit of %f\n',...
    predict1*10000);
predict2 = [1, 7] * theta;
fprintf('For population = 70,000, we predict a profit of %f\n',...
    predict2*10000);
fprintf('Program paused. Press enter to continue.\n');
pause;

%% =================== Part 3: Gradient descent ===================

fprintf('Running Gradient Descent ...\n')

X = [ones(m, 1), data(:,1)]; % Add a column of ones to x

theta = zeros(2, 1); % initialize fitting parameters

% Some gradient descent settings

iterations = 1500;

alpha = 0.01;

% compute and display initial cost

computeCost(X, y, theta)

% run gradient descent

theta = gradientDescent(X, y, theta, alpha, iterations);

% print theta to screen

fprintf('Theta found by gradient descent: ');

fprintf('%f %f \n', theta(1), theta(2));

% Plot the linear fit

hold on; % keep previous plot visible

plot(X(:,2), X*theta, '-')

legend('Training data', 'Linear regression')

hold off % don't overlay any more plots on this figure

% Predict values for population sizes of 35,000 and 70,000

predict1 = [1, 3.5] *theta;

fprintf('For population = 35,000, we predict a profit of %f\n',...

predict1*10000);

predict2 = [1, 7] * theta;

fprintf('For population = 70,000, we predict a profit of %f\n',...

predict2*10000);

fprintf('Program paused. Press enter to continue.\n');

pause;

第4行 X = [ones(m, 1), data(:,1)] ，是给X最左边添加一列，全为1，代表$x_0$.

computeCost(X, y, theta)用来计算代价，看一下computeCost.m文件

function J = computeCost(X, y, theta)
 %COMPUTECOST Compute cost for linear regression
 %   J = COMPUTECOST(X, y, theta) computes the cost of using theta as the
 %   parameter for linear regression to fit the data points in X and y

 % Initialize some useful values
m = length(y); % number of training examples

 % You need to return the following variables correctly 
J = 0;
 % ====================== YOUR CODE HERE ======================
 % Instructions: Compute the cost of a particular choice of theta
 %               You should set J to the cost.

 % =========================================================================
end

function J = computeCost(X, y, theta)

%COMPUTECOST Compute cost for linear regression

% J = COMPUTECOST(X, y, theta) computes the cost of using theta as the

% parameter for linear regression to fit the data points in X and y

% Initialize some useful values

m = length(y); % number of training examples

% You need to return the following variables correctly

J = 0;

% ====================== YOUR CODE HERE ======================

% Instructions: Compute the cost of a particular choice of theta

% You should set J to the cost.

% =========================================================================

end

代价的计算公式如下

$$J(\theta_0, \theta_1) = \frac{1}{2m} \sum_{i=0}^{m} (h(x^{(i)}) – y^{(i)})^2$$

其中

$$h(x) = \theta^{T} x$$

此时X 和 y的规格如下

这个求和可以用循环来解决，不过matlab的专长是矩阵，我们应该利用向量和矩阵的特点实现同时计算多个变量。这叫做向量化计算，是非常重要的。

这里说一下matlab中的运算符，基本的有四则运算+ – * /和^幂运算

如果是两个矩阵运算，会按照矩阵运算的规则，例如矩阵乘法和矩阵除法（乘以逆矩阵）。

如果想对矩阵中的每个元素做计算，要使用点号. ，例如矩阵A和矩阵B对应位置元素相乘，应该用 A .* B

另外，矩阵和标量做运算，结果是对矩阵中每个元素做运算。例如A中每个月元素加2，A+2

先来看求和号里面的第i项，对应第i个数据。

先用一个变量求出h(x)

hx = X * theta

注意个矩阵运算的结果中包含了i从0到m所有的结果

然后表示出每个 $ (h(x^{(i)}) – y^{(i)})^2$

(hx – y).^2

这里用 .^ 表示对矩阵中每个元素平方，而不是求矩阵的平方。

上面的向量所有元素加起来就是累加结果

sum((hx – y).^2)

最后求得J

J = sum((hx – y).^2) / (2*m)

最终代码

function J = computeCost(X, y, theta)
    m = length(y); % number of training examples
    J =  sum((X * theta - y).^2) / (2*m);
end

function J = computeCost(X, y, theta)

m = length(y); % number of training examples

J = sum((X * theta - y).^2) / (2*m);

end

然后是计算梯度theta = gradientDescent(X, y, theta, alpha, iterations); ，原始代码如下

function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
 %GRADIENTDESCENT Performs gradient descent to learn theta
 %   theta = GRADIENTDESENT(X, y, theta, alpha, num_iters) updates theta by 
 %   taking num_iters gradient steps with learning rate alpha

 % Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);

for iter = 1:num_iters
    % ====================== YOUR CODE HERE ======================
    % Instructions: Perform a single gradient step on the parameter vector
    %               theta. 
    % Hint: While debugging, it can be useful to print out the values
    %       of the cost function (computeCost) and gradient here.
    % ============================================================

    % Save the cost J in every iteration    
    J_history(iter) = computeCost(X, y, theta);
end
end

function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)

%GRADIENTDESCENT Performs gradient descent to learn theta

% theta = GRADIENTDESENT(X, y, theta, alpha, num_iters) updates theta by

% taking num_iters gradient steps with learning rate alpha

% Initialize some useful values

m = length(y); % number of training examples

J_history = zeros(num_iters, 1);

for iter = 1:num_iters

% ====================== YOUR CODE HERE ======================

% Instructions: Perform a single gradient step on the parameter vector

% theta.

% Hint: While debugging, it can be useful to print out the values

% of the cost function (computeCost) and gradient here.

% ============================================================

% Save the cost J in every iteration

J_history(iter) = computeCost(X, y, theta);

end

J_history用来记录每次迭代时的代价值。

梯度的计算公式

$$\theta_j := \theta_j -\alpha \frac{1}{m} \sum_{i=0}^{m} \left( (h(x^{(i)}) – y^{(i)}) \cdot x_j\right)$$

这个我还没有找到向量化的方法，似乎只能对每个参数进行遍历求值

先表示求和号里面，第j个参数对应的是

(X * theta – y) .* X(:, j)

最终第j个参数的更新计算方法

theta_new(j) = theta(j) – (alpha / m) * sum((X * theta – y) .* X(:, j));

最终代码

function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
    m = length(y); % number of training examples
    J_history = zeros(num_iters, 1);
    theta_new = theta;
    n = length(theta);

    for iter = 1:num_iters
        for j = 1 : n
            theta_new(j) = theta(j) - (alpha / m) * sum((X * theta - y) .* X(:, j));
        end
        theta = theta_new;

        % Save the cost J in every iteration    
        J_history(iter) = computeCost(X, y, theta);
    end
end

function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)

m = length(y); % number of training examples

J_history = zeros(num_iters, 1);

theta_new = theta;

n = length(theta);

for iter = 1:num_iters

for j = 1 : n

theta_new(j) = theta(j) - (alpha / m) * sum((X * theta - y) .* X(:, j));

end

theta = theta_new;

% Save the cost J in every iteration

J_history(iter) = computeCost(X, y, theta);

end

这里之所以用theta_new，就是为了保证n个参数同步更新，不会前面的更新后影响后面更新的结果。

4. Visualizing J

略

ex1_multi-多变量线性回归

1. Feature Normalization

特征缩放的公式是

$$x_n = \frac{x_n – \mu_n}{s_n}$$

由于缩放之后，输入新参数预测的时候，需要对输入做相同的缩放，才可以得出正确的结果，因此此函数返回了缩放时用到的均值和标准差。

使用到了mean和std函数，可以在matlab中使用help mean和help std来查看用法。

可以求出每列的均值和标准差，之后可以对每行进行缩放

mu = mean(X);
sigma = std(X);
for i=1:size(X,1)
    X_norm(i, :) = (X(i, :) - mu) ./ sigma;
end

mu = mean(X);

sigma = std(X);

for i=1:size(X,1)

X_norm(i, :) = (X(i, :) - mu) ./ sigma;

end

也可以采用空间换时间的方法，把mu和sigma拷贝一份复制成多行的就可以直接用元素对应运算了

mu = mean(X);
sigma = std(X);
mu2 = repmat(mu, size(X,1), 1);
sigma2 = repmat(sigma, size(X,1), 1);

X_norm = (X - mu2) ./ sigma2;

mu = mean(X);

sigma = std(X);

mu2 = repmat(mu, size(X,1), 1);

sigma2 = repmat(sigma, size(X,1), 1);

X_norm = (X - mu2) ./ sigma2;

2. Gradient Descent

由于单变量中我们的计算方法同样适用于多变量，这里的代码不需要改变，直接用ex1中的代码即可。

3. Normal Equations

计算公式

$$\theta = (x^Tx)^{-1} x^T y$$

计算代码

theta = pinv(X' * X) * X' * y;

1	theta = pinv(X' * X) * X' * y;

4. 选择学习率

ex1_multi.m的第85行可以修改学习率。经过试验，学习率和迭代步数适当增加后，可以得到和正规方程相同的结果

5. 预测房价

也要修改ex1_multi.m，梯度下降的计算代码如下：

 % Estimate the price of a 1650 sq-ft, 3 br house
 % ====================== YOUR CODE HERE ======================
 % Recall that the first column of X is all-ones. Thus, it does
 % not need to be normalized.
te = [1650 3];
te = te - mu;
te = te ./ sigma;

price = [1 te] * theta; % You should change this

% Estimate the price of a 1650 sq-ft, 3 br house

% ====================== YOUR CODE HERE ======================

% Recall that the first column of X is all-ones. Thus, it does

% not need to be normalized.

te = [1650 3];

te = te - mu;

te = te ./ sigma;

price = [1 te] * theta; % You should change this

正规方程的计算

 % Estimate the price of a 1650 sq-ft, 3 br house
 % ====================== YOUR CODE HERE ======================
price = [1 1650 3] * theta; % You should change this

% Estimate the price of a 1650 sq-ft, 3 br house

% ====================== YOUR CODE HERE ======================

price = [1 1650 3] * theta; % You should change this

文章《Andrew Ng机器学习编程作业代码分析1——Linear Regression》共有6条评论：

匿名

我下载不下来里面的数据，就是那个代码里的txt文件，大神求助哈。我要怎么搞得到数据呢，364025847@qq.com

2017年3月30日下午5:30 回复
- 匿名
  
  编程作业的代码下载后解压里面就有的啊
  
  2017年3月30日下午6:54 回复
木子了

很棒，谢谢博主

2016年4月23日下午9:15 回复
- 匿名
  
  我下载不下来里面的数据，就是那个代码里的txt文件，大神求助哈。
  
  2017年3月30日下午5:28 回复
- 匿名
  
  我下载不下来里面的数据，就是那个代码里的txt文件，大神求助哈。364025847@qq.com
  
  2017年3月30日下午5:29 回复

雅乐网

计算机技术、学习成长