Pypi本地镜像搭建

《从零开始搭建一个开源镜像站》之Pypi

1、 知识准备

1、1 为什么要搭建Pypi镜像站

由于众所周知的原因,Pypi的服务器在国外,直接访问很慢。为了加速pip下载,搭建一个本地镜像站十分有必要,可以方便一片区域的人。

1、2 如何搭建Pypi开源镜像站

简介:搭建一个镜像站,就是把远程服务器上的内容完全同步下来(全量同步),当然也可以只同步部分,但绝不可以进行修改!

那么,如何搭建呢?一般而言,官方考虑到各个地方的下载需要,会制作出同步用的工具(软件、脚本)。比如,Python官方就提供了bandersnatch来方便大家进行Pypi的同步;像这样的同步工具还有pip2pi。它们各有优缺点,这里不再赘述,感兴趣者,自行Google。此处,我们用bandersnatch来制作一个Pypi镜像。

2、开始搭建

2、1 环境准备

2、1、1 硬件准备

网络、硬盘、CPU、内存、主板......一般,重点需要考虑的是硬盘。到目前(2017年7月19日)为止,Pypi将近有700GB。

2、1、2 安装bandersnatch

本次教程在Debian 9下测试通过

bandersnatch官方文档提供了两种安装方式,都十分简单。此处仅介绍方法一。

安装虚拟环境

apt install virtualenv

生成bandersnatch所用的虚拟环境

virtualenv --python=python3.5 bandersantch

安装bandersnatch

cd bandersantch && bin/pip install bandersantch

注意
* 如果安装出错,考虑是否需要升级pipsetuptools。升级命令pip install -U pip && pip install -U setuptools

2、1、3 修改配置文件

一般,安装之后不会出现配置文件。运行bin/bandersnatch mirror(在虚拟环境目录下)就会在/etc目录下生成bandersnatch.conf文件。这是,我们需要修改一下此文件。
文件生成之后,会有默认内容。如下:

[mirror]
; The directory where the mirror data will be stored.
directory = /srv/pypi

; The PyPI server which will be mirrored.
; master = https://testpypi.python.org
; scheme for PyPI server MUST be https
master = https://pypi.python.org
;master = https://pypi.tuna.tsinghua.edu.cn/simple
;master = https://mirrors.ustc.edu.cn
;master = https://mirrors.neusoft.edu.cn

; The network socket timeout to use for all connections. This is set to a
; somewhat aggressively low value: rather fail quickly temporarily and re-run
; the client soon instead of having a process hang infinitely and have TCP not
; catching up for ages.
timeout = 10

; Number of worker threads to use for parallel downloads.
; Recommendations for worker thread setting:
; - leave the default of 3 to avoid overloading the pypi master
; - official servers located in data centers could run 10 workers
; - anything beyond 10 is probably unreasonable and avoided by bandersnatch
workers = 3

; Whether to hash package indexes
; Note that package index directory hashing is incompatible with pip, and so
; this should only be used in an environment where it is behind an application
; that can translate URIs to filesystem locations.  For example, with the
; following Apache RewriteRule:
;     RewriteRule ^([^/])([^/]*)/$ /mirror/pypi/web/simple/$1/$1$2/
;     RewriteRule ^([^/])([^/]*)/([^/]+)$/ /mirror/pypi/web/simple/$1/$1$2/$3
; Setting this to true would put the package 'abc' index in simple/a/abc.
; Recommended setting: the default of false for full pip/pypi compatability.
hash-index = false

; Whether to stop a sync quickly after an error is found or whether to continue
; syncing but not marking the sync as successful. Value s; Whether or not files that have been deleted on the master should be deleted
; on the mirror, too.
; IMPORTANT: if you are running an official mirror than you *need* to leave
; this on.
delete-packages = true

; Advanced logging configuration. Uncomment and set to the location of a
; python logging format logging config file.
; log-config = /etc/bandersnatch-log.conf

; vim: set ft=cfg:
hould be "true" or
; "false".
stop-on-error = false

directory参数是存放同步文件的目录,此处重点需要修改这个。
其他参数默认就好,或者,自行搜索,了解各参数含义,调试出适合自己的参数。

2、1、4 同步镜像

切换到bandersnatch虚拟环境目录,重新运行bin/bandersnatch mirror命令,即可开始Pypi的同步。
同步镜像的时间和本地带宽、磁盘IO情况息息相关。由于Pypi很大,此处建议使用screen命令使同步过程在后台运行。