一个node小脚本爬取公开的地图文件
项目中有用到地图,需要将地图本地化,本文将一步步从需求分析到实现,教你一步步用node脚本下载网络上的地图文件到本地。
分析
我们打开网址DataV。这是阿里一个免费的地图网址,可以通过引入JSON API,可以动态的加载相应的地图JSON文件到echarts或者leaflet中使用。
第一步:分析请求
打开网站,打开F12,然后刷新一下,看下请求:
它首先请求了all.json。它是一个对象数组。点开里面的内容,主要是每个省市区的对象。
第二步:下钻
随机点击一个省份,比如福建省:
它通过地址https://geo.datav.aliyun.com/areas_v3/bound/${350000}.json
请求了福建省的bound信息。然后调用了https://geo.datav.aliyun.com/areas_v3/bound/${350000}_full.json
获取了福建省下的下级,通过childrenNum可以看到福建省下级行政区有9个。
继续点击福州市:
它通过地址https://geo.datav.aliyun.com/areas_v3/bound/${350100}.json
请求了福州省的bound信息。然后调用了https://geo.datav.aliyun.com/areas_v3/bound/${350100}_full.json
获取了福州市下的下级,通过childrenNum可以看到福州市下级行政区有13个。
继续点击福州市下的闽侯县:
它通过地址https://geo.datav.aliyun.com/areas_v3/bound/${350121}.json
请求了闽侯县的bound信息。但是没有请求https://geo.datav.aliyun.com/areas_v3/bound/${350121}_full.json
。原来是它的childrenNum为0。说明已经没有下级的行政区了。
核心流程
-
需要有一个方法(firstStep)用于请求all.json文件。
-
遍历firstStep返回的数组对象,这个方法(secondStep)下载每个adcode对应的json文件。
- 这个方法里需要节流,当下载完一个文件后才下载下一个文件,使用async await
- 需要判断是否有dist目录(用于保存下载的文件),需要稍微懂点fs文件操作的几个API
- 需要保存下载失败的文件(failedDownloads,保存adcode就行)使用try catch捕获失败
- 如果dist目录下已下载的文件,就不要重新下载了(下载文件之前判断一下)
-
通过地址
https://geo.datav.aliyun.com/areas_v3/bound/${350100}.json
下载的文件需判断返回数据里的childrenNum字段,如果childrenNum > 0
还需要下载https://geo.datav.aliyun.com/areas_v3/bound/${350100}_full.json
文件。 -
all.json文件全部下载完后需要检查
failedDownloads
的length是否大于0。如果大于0则需要调用第二步重新下载。
编码
1. 初始化项目
找个目录比如echarts-map-download
使用npm init -y
初始化项目。
2. 下载对应一些依赖
npm i axios chalk ora
3. 编码download.js
download.js是核心文件全部功能都在此实现
下载all.json文件
// 第一步:下载并解析all.json
const firstStep = async () => {
const spinner = ora('Downloading all.json').start();
try {
const response = await axios.get('https://geo.datav.aliyun.com/areas_v3/bound/all.json');
spinner.succeed('Downloaded all.json successfully');
return response.data;
} catch (error) {
spinner.fail('Failed to download all.json');
process.exit(1);
}
};
判断是否有dist目录
// 确保dist目录存在
const distPath = path.resolve('dist');
if (!fs.existsSync(distPath)) {
fs.mkdirSync(distPath);
}
下载文件
// 逐个下载
for (let i = 0; i < jsonArray.length; i++) {
const item = jsonArray[i];
const adcode = item.adcode;
// 这里我偷懒了,没有判断childrenNum为0的情况,而是我发现adcode结尾不是‘00’,说明还有下级
// 实测发现有点小问题,比如东莞市下居然没有下级行政区了,会有报错的情况,不过无关紧要
const isSpecialCode = adcode.toString().endsWith('00');
// 构建下载链接和文件名
const normalFileName = `${adcode}.json`;
const fullFileName = `${adcode}_full.json`;
const normalFilePath = path.resolve('dist', normalFileName);
const fullFilePath = path.resolve('dist', fullFileName);
// 检查并下载普通文件
if (!fs.existsSync(normalFilePath)) {
const spinner = ora(`Downloading ${normalFileName}`).start();
try {
const response = await axios.get(`https://geo.datav.aliyun.com/areas_v3/bound/${adcode}.json`);
fs.writeFileSync(normalFilePath, JSON.stringify(response.data));
spinner.succeed(chalk.green(`Downloaded ${normalFileName} successfully`));
} catch (error) {
spinner.fail(chalk.red(`Failed to download ${normalFileName}`));
failedDownloads.push(adcode);
}
} else {
skippedDownloads.push(normalFileName);
}
// 如果是特殊代码,检查并下载_full文件
if (isSpecialCode) {
if (!fs.existsSync(fullFilePath)) {
const spinner = ora(`Downloading ${fullFileName}`).start();
try {
const response = await axios.get(`https://geo.datav.aliyun.com/areas_v3/bound/${adcode}_full.json`);
fs.writeFileSync(fullFilePath, JSON.stringify(response.data));
spinner.succeed(chalk.green(`Downloaded ${fullFileName} successfully`));
} catch (error) {
spinner.fail(chalk.red(`Failed to download ${fullFileName}`));
failedDownloads.push(adcode);
}
} else {
skippedDownloads.push(fullFileName);
}
}
}
主函数
// 主函数
const main = async () => {
const jsonArray = await firstStep();
const { failedDownloads, skippedDownloads } = await secondStep(jsonArray);
// 输出跳过的下载
if (skippedDownloads.length > 0) {
console.log(chalk.blue(`Skipped downloads: ${skippedDownloads.join(', ')}`));
}
// 如果有失败的下载,尝试重新下载
if (failedDownloads.length > 0) {
console.log(chalk.yellow(`Retrying failed downloads: ${failedDownloads.join(', ')}`));
const retryResult = await secondStep(failedDownloads.map(adcode => ({ adcode })));
if (retryResult.failedDownloads.length > 0) {
console.log(chalk.red(`Failed downloads after retry: ${retryResult.failedDownloads.join(', ')}`));
} else {
console.log(chalk.green('All failed downloads were successful on retry'));
}
} else {
console.log(chalk.green('All downloads completed successfully'));
}
};
download.js全部代码
const axios = require('axios');
const fs = require('fs');
const path = require('path');
// 动态导入ora、chalk
(async () => {
const { default: ora } = await import('ora');
const { default: chalk } = await import('chalk');
// 第一步:下载并解析all.json
const firstStep = async () => {
const spinner = ora('Downloading all.json').start();
try {
const response = await axios.get('https://geo.datav.aliyun.com/areas_v3/bound/all.json');
spinner.succeed('Downloaded all.json successfully');
return response.data;
} catch (error) {
spinner.fail('Failed to download all.json');
process.exit(1);
}
};
// 第二步:遍历JSON数组,下载每个adcode对应的json
const secondStep = async (jsonArray) => {
let failedDownloads = [];
let skippedDownloads = [];
// 确保dist目录存在
const distPath = path.resolve('dist');
if (!fs.existsSync(distPath)) {
fs.mkdirSync(distPath);
}
// 逐个下载
for (let i = 0; i < jsonArray.length; i++) {
const item = jsonArray[i];
const adcode = item.adcode;
const isSpecialCode = adcode.toString().endsWith('00');
// 构建下载链接和文件名
const normalFileName = `${adcode}.json`;
const fullFileName = `${adcode}_full.json`;
const normalFilePath = path.resolve('dist', normalFileName);
const fullFilePath = path.resolve('dist', fullFileName);
// 检查并下载普通文件
if (!fs.existsSync(normalFilePath)) {
const spinner = ora(`Downloading ${normalFileName}`).start();
try {
const response = await axios.get(`https://geo.datav.aliyun.com/areas_v3/bound/${adcode}.json`);
fs.writeFileSync(normalFilePath, JSON.stringify(response.data));
spinner.succeed(chalk.green(`Downloaded ${normalFileName} successfully`));
} catch (error) {
spinner.fail(chalk.red(`Failed to download ${normalFileName}`));
failedDownloads.push(adcode);
}
} else {
skippedDownloads.push(normalFileName);
}
// 如果是特殊代码,检查并下载_full文件
if (isSpecialCode) {
if (!fs.existsSync(fullFilePath)) {
const spinner = ora(`Downloading ${fullFileName}`).start();
try {
const response = await axios.get(`https://geo.datav.aliyun.com/areas_v3/bound/${adcode}_full.json`);
fs.writeFileSync(fullFilePath, JSON.stringify(response.data));
spinner.succeed(chalk.green(`Downloaded ${fullFileName} successfully`));
} catch (error) {
spinner.fail(chalk.red(`Failed to download ${fullFileName}`));
failedDownloads.push(adcode);
}
} else {
skippedDownloads.push(fullFileName);
}
}
}
// 返回失败的下载列表和跳过的下载列表
return { failedDownloads, skippedDownloads };
};
// 主函数
const main = async () => {
const jsonArray = await firstStep();
const { failedDownloads, skippedDownloads } = await secondStep(jsonArray);
// 输出跳过的下载
if (skippedDownloads.length > 0) {
console.log(chalk.blue(`Skipped downloads: ${skippedDownloads.join(', ')}`));
}
// 如果有失败的下载,尝试重新下载
if (failedDownloads.length > 0) {
console.log(chalk.yellow(`Retrying failed downloads: ${failedDownloads.join(', ')}`));
const retryResult = await secondStep(failedDownloads.map(adcode => ({ adcode })));
if (retryResult.failedDownloads.length > 0) {
console.log(chalk.red(`Failed downloads after retry: ${retryResult.failedDownloads.join(', ')}`));
} else {
console.log(chalk.green('All failed downloads were successful on retry'));
}
} else {
console.log(chalk.green('All downloads completed successfully'));
}
};
main();
})();
这里使用了立即执行函数,ora、chalk是ESM,在node环境可以这样使用。相关资料点击这里
执行下载
npm run download
志哥我想说
仅仅是最简单的脚本,有些复杂的场景有接口保护的则需要构造请求参数等情况。
小脚本已上传到gitHub欢迎食用。
转载自:https://juejin.cn/post/7373482464720781349