likes
comments
collection
share

如何向 Git 官方社区提交代码(二)

作者站长头像
站长
· 阅读数 2

背景

第一次向 Git 社区提交代码起源于 Gitee 的一个需求。Gitee 创建新仓库有多种方式,其中一种是根据用户提供的仓库 URL 导入新仓库,Gitee 首先会对用户提供的 URL 进行校验,只有合格的仓库地址才会生成新仓库。但是有种情况比较特别,若用户确实提供了一个合法的 Git 仓库地址,但是这个仓库不是完整的,常见的情况就是浅仓库(shallow repo),这类仓库缺乏足够的历史提交信息,因此会带来一些问题。为了避免这种情况,在服务端就需要对用户导入的仓库进行特别处理。在服务端,导入仓库的动作本质上就是执行 git clone --mirror 命令,那么有没有可能在执行这个克隆命令的过程中就对仓库进行判断呢?如果能,这将避免克隆完整个仓库再进行判断仓库是否为浅仓库,这样无疑更有效。 于是我开始研究 Git 源码,琢磨如何给 git-clone 命令赋予一项新的能力。

很快收到 Junio C Hamano 反馈,主要问题有:

  • commit 信息中说明不清,英文语法错误:
> This patch add a new option that reject to clone a shallow repository.

一条规范的 commit 信息格式应该是在开始就解释更改需求,然后在结尾提供解决方案。
A canonical form of our log message starts by explaining the need,
and then presents the solution at the end.


> Clients don't know it's a shallow repository until they download it
> locally, in some scenariors, clients just don't want to clone this kind

这里的需求场景描述地不够有说服力。
"scenarios".  "in some scenarios" would have to be clarified a bit
more to justify why it is a good idea to have such a feature.


> of repository, and want to exit the process immediately without creating
> any unnecessary files.

这里表述不够清晰,Junio 顺便提供了他的建议
"clients don't know it's a shallow repository until they download"
leading to "so let's reject immediately upon finding out that they
are shallow" does make sense as a flow of thought, though.

> +--no-shallow::
> +	Don't clone a shallow source repository. In some scenariors, clients

"scenarios" (no 'r').
  • 新增选项 --no-shallow 不对,应该考虑到布尔类选项存在反选情况,即 --no-no-shallow ,因此不应该这么写,建议改为 --reject-shallow, 它的反选即为 --no-reject-shallow,这样语义上也更符合直觉:
> diff --git a/builtin/clone.c b/builtin/clone.c

> @@ -90,6 +91,7 @@ static struct option builtin_clone_options[] = {
>  	OPT__VERBOSITY(&option_verbosity),
>  	OPT_BOOL(0, "progress", &option_progress,
>  		 N_("force progress reporting")),
> +	OPT_BOOL(0, "no-shallow", &option_no_shallow, N_("don't clone shallow repository")),
>  	OPT_BOOL('n', "no-checkout", &option_no_checkout,
>  		 N_("don't create a checkout")),
>  	OPT_BOOL(0, "bare", &option_bare, N_("create a bare repository")),

It is a bad idea to give a name that begins with "no-" to an option
whose default can be tweaked by a configuration variable [*].  If
the configuration is named "rejectshallow", perhaps it is better to
call it "--reject-shallow" instead.

This is because configured default must be overridable from the
command line.  I.e. even if you have in your ~/.gitconfig this:

    [clone]
        rejectshallow = true

you should be able to say "allow it only this time", with

    $ git clone --no-reject-shallow http://github.com/git/git/ git

and you do not want to have to say "--no-no-shallow", which sounds
just silly.

	Side note. it is a bad idea in general, even if the option
	does not have corresponding configuration variable.  The
	existing "no-checkout" is a historical accident that
	happened long time ago and cannot be removed due to
	compatibility.  Let's not introduce a new option that
	follows such a bad pattern.

	Junio 进一步解释:现存的选项 `no-checkout` 是一个很久之前的历史错误,
	由于兼容性,已经无法移除这个错误了,现在新的选项不应该按照这种模式命名。
	P.S. 刚好我就是按照`no-checkout`选项的模式创造出`no-shllow`选项的-:)

更重要的是,命令行优先于配置,即命令行选项要能够覆盖配置文件选项。当配置文件从全局上不允许克隆 shllow 仓库时,而用户想要允许单次克隆 shallow 仓库时,自然地,他会使用 --no-shallow 选项的反选项 --no-no-shallow 来覆盖掉配置中的选项 。但这样 --no-no-shallow 听来去就很傻。

  • 由于 Git 存在多种传输协议,目前的修改只是解决了本地克隆问题,因此仍需改进:
> @@ -963,6 +968,7 @@ static int path_exists(const char *path)
>  int cmd_clone(int argc, const char **argv, const char *prefix)
>  {
>  	int is_bundle = 0, is_local;
> +	int is_shallow = 0;
>  	const char *repo_name, *repo, *work_tree, *git_dir;
>  	char *path, *dir, *display_repo = NULL;
>  	int dest_exists, real_dest_exists = 0;
> @@ -1215,6 +1221,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
>  		if (filter_options.choice)
>  			warning(_("--filter is ignored in local clones; use file:// instead."));
>  		if (!access(mkpath("%s/shallow", path), F_OK)) {
> +			is_shallow = 1;
>  			if (option_local > 0)
>  				warning(_("source repository is shallow, ignoring --local"));
>  			is_local = 0;

This change is to the local clone codepath.  Cloning over the wire
would not go through this part.  And throughout the patch, this is
the only place that sets is_shallow to 1.

Also let's note that this is after we called parse_options(), so the
value of option_no_shallow is known at this point.

So, this patch does not even *need* to introduce a new "is_shallow"
variable at all.  It only needs to add

                        if (option_no_shallow)
                                die(...);

instead of adding "is_shallow = 1" to the above hunk.

I somehow think that this is only half a feature---wouldn't it be
more useful if we also rejected a non-local clone from a shallow
repository?

  • 测试代码中的问题:
And for that ...

> diff --git a/t/t5606-clone-options.sh b/t/t5606-clone-options.sh
> index 7f082fb23b6a..9d310dbb158a 100755
> --- a/t/t5606-clone-options.sh
> +++ b/t/t5606-clone-options.sh
> @@ -42,6 +42,13 @@ test_expect_success 'disallows --bare with --separate-git-dir' '
>  
>  '
>  
> +test_expect_success 'reject clone shallow repository' '
> +	git clone --depth=1 --no-local parent shallow-repo &&
> +	test_must_fail git clone --no-shallow shallow-repo out 2>err &&
> +	test_i18ngrep -e "source repository is shallow, reject to clone." err
> +
> +'
> +

... in addition to the test for a local clone above, you'd also want
to test a non-local clone, perhaps like so:

test_expect_success 'reject clone shallow repository' '
	rm -fr shallow-repo &&
	git clone --depth=1 --no-local parent shallow-repo &&
	test_must_fail git clone --no-shallow --no-local shallow-repo out 2>err &&
	test_i18ngrep -e "source repository is shallow, reject to clone." err

'

Ditto for the other test script.

Also, you would want to make sure that the command line overrides
the configured default.  I.e.

	git -c clone.rejectshallow=false clone --reject-shallow

should refuse to clone from a shallow one, while there should be a
way to countermand a configured "I always refuse to clone from a
shallow repository" with "but let's allow it only this time", i.e.

	git -c clone.rejectshallow=true clone --no-reject-shallow

or something along the line.


> diff --git a/t/t5611-clone-config.sh b/t/t5611-clone-config.sh
> index 8e0fd398236b..3aab86ad4def 100755
> --- a/t/t5611-clone-config.sh
> +++ b/t/t5611-clone-config.sh
> @@ -92,6 +92,13 @@ test_expect_success 'clone -c remote.<remote>.fetch=<refspec> --origin=<name>' '
>  	test_cmp expect actual
>  '
>  
> +test_expect_success 'clone -c clone.rejectshallow' '
> +	rm -rf child &&
> +	git clone --depth=1 --no-local . child &&
> +	test_must_fail git clone -c clone.rejectshallow child out 2>err &&

This is not quite right, even though it may happen to work.  The
"clone.rejectshallow" variable is a configuration about what should
happen when creating a new repository by cloning, so letting "git
clone -c var[=val]" to set the variable _in_ the resulting repository
would not make much sense.  Even if the clone succeeded, nobody would
look at that particular configuration variable that is set in the
resulting repository.

I think it would communicate to the readers better what we are
trying to do, if we write

	test_must_fail git -c clone.rejectshallow=true clone child out

instead.

Thanks.

通过测试代码 Junio 指出: clone.rejectshallow 配置和命令行选项 --reject-shallow 存在逻辑上的交叉重叠问题,因此测试时应该体现出这一点。

以上就是我第一次提交的代码,没想到能很快收到那么多宝贵的检视建议。 但是,这只是开始,后面还有更多的检视回合,收到了更加细致的检视意见。

比如 Johannes Schindelin 给了一些意见:

我觉得这个补丁大部分都很好,但我还有一点点改进建议
I like most of the patch, and will only point out a couple of things that
I think can be improved even further.

> diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
> index 02d9c19cec75..0adc98fa7eee 100644
> --- a/Documentation/git-clone.txt
> +++ b/Documentation/git-clone.txt
> @@ -149,6 +149,11 @@ objects from the source repository into a pack in t=
he cloned repository.
>  --no-checkout::
>  	No checkout of HEAD is performed after the clone is complete.
>
> +--[no-]reject-shallow::
> +	Fail if the source repository is a shallow repository.
> +	The 'clone.rejectShallow' configuration variable can be used to
> +	give the default.

使用 `to specify the default` 说起来更顺口一些
I am not a native speaker, either, but I believe that it would "roll off
the tongue" a bit better to say "to specify the default".

变量命名问题:

> diff --git a/builtin/clone.c b/builtin/clone.c
> index 51e844a2de0a..eeddd68a51f4 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -50,6 +50,8 @@ static int option_no_checkout, option_bare, option_mir=
ror, option_single_branch
>  static int option_local =3D -1, option_no_hardlinks, option_shared;
>  static int option_no_tags;
>  static int option_shallow_submodules;
> +static int option_shallow = -1;    /* unspecified */
> +static int config_shallow = -1;    /* unspecified */

I would much prefer those variable names to include an indicator that this
is about _rejecting_ shallow clones. I.e. `option_reject_shallow`.

Also, I think that we can do with just a single `option_reject_shallow`
(we do not even need that `reject_shallow` variable in `cmd_clone()`):

- in `git_clone_config()`, only override it if it is still unspecified:

	if (!strcmp(k, "clone.rejectshallow") && option_reject_shallow < 0)
		option_reject_shallow =3D git_config_bool(k,v);

- in `cmd_clone()`, test for a _positive_ value:

	if (option_reject_shallow > 0)
		die(_("source repository is shallow, reject to clone."));

  and

	if (option_reject_shallow > 0)
 		transport_set_option(transport, TRANS_OPT_REJECT_SHALLOW, "1");

One thing to note (in the commit message, would be my preference) is that
`cmd_clone()` is _particular_ in that it runs `git_config()` _twice_. Once
before the command-line options are parsed, and once after the new Git
repository has been initialized. Note that my suggestion still works with
that: if either the original config, or the new config set
`clone.rejectShallow`, it is picked up correctly, with the latter
overriding the former if both configs want to set it.

使用 option_reject_shallowconfig_shallow 更好一些,它能更直接地表明这个选项是要拒绝 shallow clone 的。同时他提醒,在 cmd_clone()函数中 git_config 会被调用两次,使用 option_reject_shallow 能避免在 cmd_clone 中使用 reject_shallow

P.S. 划线这段话是错误的,见 Johannes 本人回复:Johannes'reply,以及 Junio 的回复:Junio's reply。最后用了两个变量: option_reject_shallowconfig_reject_shallow,在 cmd_clone() 中它们共同决定另一个变量:reject_shallow

后面同样变量命名:

> diff --git a/fetch-pack.c b/fetch-pack.c
> index fb04a76ca263..34d0c2896e2e 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -1129,9 +1129,11 @@ static struct ref *do_fetch_pack(struct fetch_pac=
k_args *args,
>  	if (args->deepen)
>  		setup_alternate_shallow(&shallow_lock, &alternate_shallow_file,
>  					NULL);
> -	else if (si->nr_ours || si->nr_theirs)
> +	else if (si->nr_ours || si->nr_theirs) {
> +		if (args->remote_shallow)

Even as a non-casual reader, this name `remote_shallow` leads me to assume
incorrect things. This option is not about wanting a remote shallow
repository, it is about rejecting a remote shallow repository.
用 `reject_shallow` 代替 `remote_shllow`
Please name this attribute `reject_shallow` instead of `remote_shallow`.
That will prevent future puzzlement.

Johannes 还对测试用例提出了一些检视意见,篇幅有限,这里省略。

在最后几个回合,Junio 又给出了很多有意义的检视意见:

> In some scenarios, users may want more history than the repository
> offered for cloning, which happens to be a shallow repository, can
> give them. But because users don't know it is a shallow repository
> until they download it to local, users should have the option to

'should' 在这里感觉语气太重了
I find the "should" too strong, but let's keep reading.

> refuse to clone this kind of repository, and may want to exit the
> process immediately without creating any unnecessary files.
确认是语气重了。同时存在冗余。
Yes, that is too strong and also redundant.

> Althought there is an option '--depth=x' for users to decide how
> deep history they can fetch, but as the unshallow cloning's depth
句子若以 'although' 开头,则后面不应该用 `but`做转折。
"Although"; if you begin with "although", you shouldn't write "but".

> is INFINITY, we can't know exactly the minimun 'x' value that can
> satisfy the minimum integrity,
> so we can't pass 'x' value to --depth,
> and expect this can obtain a complete history of a repository.

If the argument were "we might start with depth x, and the source
may be deep enough to give us x right now, but we may want to deepen
more than they can offer, and such a user would want to be able to
say 'I want depth=x now, but make sure they are not shallow'", I
would understand it, but I do not see where the "minimum integrity"
comes from---there doesn't appear to be anything related to
integrity here.

> In other scenarios, if we have an API that allow us to import external

"allows"

> repository, and then perform various operations on the repo.
> But if the imported is a shallow one(which is actually possible), it
> will affect the subsequent operations. So we can choose to refuse to
> clone, and let's just import a normal repository.
建议丢掉这一整段,因为它跟前面讲的场景差不多,并没有提供新信息。
I'd suggest dropping this entire paragraph.  That is not any new
scenario at all.  API or not, you essentially just said "you can do
various things on your clone once you have it, but some things you
may want to do you would want a full history".  That is what you
started the whole discussion above and does not add any new
information.


> @@ -858,6 +861,9 @@ static int git_clone_config(const char *k, const char *v, void *cb)
>  		free(remote_name);
>  		remote_name = xstrdup(v);
>  	}
> +	if (!strcmp(k, "clone.rejectshallow") && option_reject_shallow < 0)
> +		option_reject_shallow = git_config_bool(k, v);

Does this "single variable is enough" really work?

Imagine that the first time around we'd read from $HOME/.gitconfig
that says true (flips the variable from "unspecified").  Further
imagine that we are running "git clone -c config.rejectShallow=no"
to countermand the global config.  We call write_config() to write
the extra configuration value out, and then call git_config() to
read from the repository configuration again.

Because of the value taken from $HOME/.gitconfig, however, the
attempt to override is silently ignored, isn't it?

Other than that, the changes to the code from the previous round
looked sensible.

虽然比上一版的更新好了很多,但Junio并没有放弃任何可能的问题,仍对config配置
和option配置有疑问,他的顾虑是正确的,这为下一版的更新提供了正确的思路。

这就是我第一次向 Git 社区提交代码的情况:问题多多,反馈多多。各种严谨细致又有启发性的检视意见让我感受到了 Git 社区的技术氛围。这个过程中让我不断思考如何让代码写得更好:更有逻辑,更简洁,更严谨,而不仅仅是实现功能。

这个经验让我想到,英语中两个单词:polishcooking 可以很贴切的形容他们对待代码的态度。 polish 意为润色,修改,抛光,打磨。用在代码上, commit 信息上,甚至写文档上,意味着这些过程是不断改进,不断打磨,才能变得更好。 cooking 意为烹饪,比喻写代码就像烹饪,所谓心急吃不了热豆腐,即使做简单的菜都需要耐心,细心。

回到主题,前面的每次修改,直接强推到原来的 PR 中即可, GitGitGadget 会自动将每次的更新转换为不同的版本再提交到上游社区。那如何确定提交的版本是最终版本呢?前面提到过 Git 的几个主要分支,当我提交到第十版后,很快就被合入到 seen分支,意味着被初步接受,然后又马上进入 next 分支,意味着各项测试也没问题,然后又马上进入 master 分支。特别地,这个过程会有 状态更新 提醒, Git 社区有个 What's cooking in git.git 栏目,是维护者 Junio 用来管理各种提交的状态更新的,它会表明目前社区正在 cooking 哪些人的哪些代码(patch),以及各个代码的目前状态。 当你的代码在 What's cooking in git.git 中进入 Graduated to 'master' 或者 Will merge to 'master',那就表明马上会合入主线啦。

下一篇文章,将会继续写我在后续的提交代码过程中更多的经历。