从前端到后端如何实现大文件上传?
前言
随着网络技术的不断发展和应用场景的不断扩大,文件上传成为了 Web 应用中非常重要的一部分。然而,对于一些大型的文件,如高质量的视频、音频、图片、文件等。直接使用传统的方式上传往往会遇到各种问题,例如上传速度慢、占用过多带宽、上传失败等。因此,本文将介绍如何从前端到后端实现大文件上传,并采用切片上传技术来提高上传效率和稳定性。
1.什么是切片上传?
切片上传技术是通过把一个大的文件,分割成若干个小块(即“切片”),然后将这若干个小块进行逐步上传,最后服务端再将这些小块拼接成一个完整的文件。从而避免由于网络不稳定,或者上传时间过程,上传失败的情况。
2.如何实现切片上传?
既然是切片上传,那么肯定会有同学问:如何将将一个完整的文件切成若干个小块?要实现这个部分,其实是利用了Blob.prototype.slice的这个方法,因为File接口也继承了Blob接口的属性, 所以我们可以在File的原型链上看到slice这个方法。
2.1 获取File文件
// HTML
<input type="file" id="fileUpload" />
// JS
const fileUpload = document.querySelector("#fileUpload");
fileUpload.addEventListener('change', function(e){
const file = e.target.files[0];
}, false);
2.2 生产唯一Hash标识
const getHashByFile = (file) => {
return new Promise((resolve, reject) => {
let fileReader = new FileReader();
// 1.读取文件,并得到一个ArrayBuffer的对象文件数据
fileReader.readAsArrayBuffer(file);
// 2.文件读取完成回调函数
fileReader.onload = (e) => {
const buffer = e.target.result;
/**
* @desc 根据文件内容计算出文件的Hash值
* @url https://github.com/satazor/js-spark-md5
*/
const spark = new SparkMD5.ArrayBuffer();
spark.append(buffer);
// 唯一标识
const HASH = spark.end();
// 文件的后缀名
const suffix = /\.([a-zA-Z0-9]+)$/.exec(file.name)[1];
resolve({
buffer,
HASH,
suffix,
filename: `${HASH}.${suffix}`
});
}
// 2.文件读取进度回调函数
fileReader.onprogress = (e) => {
const progress = (e.loaded / file.size) * 100;
console.log('===文件读取进度===>', progress);
}
})
}
2.3 获取文件切片
const getChunks = (file, HASH, suffix) => {
const { size } = file;
let chunkCount = Math.ceil(size / chunkSize);
if (chunkCount > max) {
chunkSize = size / max;
chunCount = max;
}
let index = 0;
const chunks = [];
while (index < chunkCount) {
chunks.push({
file: file.slice(index * chunkSize, (index + 1) * chunkSize),
filename: `${HASH}_${index + 1}.${suffix}`
})
index++;
}
return chunks;
}
2.4 通过FormData方式发送切片上传请求
const getFormData = (data = {}) => {
const { file, filename } = data;
const formData = new FormData();
formData.append('file', file);
formData.append('filename', filename);
return formData;
}
2.5 服务器接收到文件进行写入
app.post('/post_upload_chunk', async (req, resp) => {
try {
let { fields, files } = await multiparty_upload(req);
let file = (files.file && files.file[0]) || {},
filename = (fields.filename && fields.filename[0]) || "",
path = '',
isExists = false;
// 创建存放切片的临时目录
let [, HASH] = /^([^_]+)_(\d+)/.exec(filename);
path = `${UPLOAD_DIR}/${HASH}`;
!fs.existsSync(path) ? fs.mkdirSync(path) : null;
// 把切片存储到临时目录中
path = `${UPLOAD_DIR}/${HASH}/${filename}`;
// 判断文件是否存在
isExists = await exists(path);
if (isExists) {
resp.send({
code: 0,
msg: 'success',
})
return;
}
// 写入文件
writeFile(resp, path, file);
} catch (err) {
console.log('上传切片报错:', err);
resp.send({
code: -1,
msg: 'fail'
})
}
})
2.6 切片全部上传完成请求服务器合并文件
const onFileChangeHandler = async (e) => {
const file = e.target.files[0];
// 1.获取文件的唯一表示Hash
const { HASH, suffix } = await getHashByFile(file);
try {
uploaded_chunks = await apiGetUploadedChunks({ HASH, suffix });
const chunks = await getChunks(file, HASH, suffix);
/**
* 1.循环遍历获取切片FormData数据
* 2.循环调用接口上传
*/
const requests = chunks
.map(chunk => getFormData(chunk))
.map(fd => apiPostUploadChunk(fd));
// 发送切片上传请求
await Promise.all(requests);
// 发送切片合并请求
await apiPostUploadMerge({ HASH, suffix });
} catch (err) {
console.log('onFileChangeHandler报错:', err);
}
}
2.7 服务器合并文件并返回资源地址
app.post('/post_upload_merge', async (req, resp) => {
let { HASH, count, } = req.body;
try {
let { filename, path } = await merge(HASH, count);
resp.send({
code: 0,
msg: 'success',
servicePath: `${HOSTNAME}/${filename}`
})
} catch (err) {
console.log('切片合并失败:', err);
resp.send({
code: -1,
msg: 'fail'
})
}
})
3.什么是断点续传?
在用户进行上传过程中,由于各种原因上传失败。此时已有部分切片已经上传成功,当用户再次进行文件上传的时候,不再上传之前已经上传过的切片,这就是断点续传。这样做不仅可以减少网络消耗,也可以减少上传时间,提高用户上传体验。
4.如何实现断点续传?
const onFileChangeHandler = async (e) => {
const file = e.target.files[0];
// 1.获取文件的唯一表示Hash
const { HASH, suffix } = await getHashByFile(file);
try {
uploaded_chunks = await apiGetUploadedChunks({ HASH, suffix });
const chunks = await getChunks(file, HASH, suffix);
/**
* 断点续传
* 1.先根据文件的Hash唯一表示获取已经上传过的切片
* 2.在进行切片上传的时候,过滤掉已经上传过的切片
*/
const requests = chunks
.filter(v => !uploaded_chunks.includes(v.filename))
.map(chunk => getFormData(chunk))
.map(fd => apiPostUploadChunk(fd));
// 发送切片上传请求
await Promise.all(requests);
// 发送切片合并请求
await apiPostUploadMerge({ HASH, suffix });
} catch (err) {
console.log('onFileChangeHandler报错:', err);
}
}
5.什么是秒传?
前面上传过程中一直在提到一个东西,那就是根据文件内容生产的唯一Hash标识。这个标识贯穿整个上传流程,不管文件名称或者位置如何变化,它是始终是不会发生改变,它也就是文件的一个从始至终的唯一ID。当用户在上传中,服务端查询到该Hash存在,则立即返回对应的资源地址,不存在则继续走切片上传流程。
6.如何实现文件秒传?
app.get('/get_uploaded_chunks', async (req, resp) => {
try {
const { HASH, suffix } = req.query;
// 文件夹目录地址
const foldPath = `${UPLOAD_DIR}/${HASH}`;
// 文件地址
const filePath = `${foldPath}.${suffix}`;
// 文件名
const fileName = `${HASH}.${suffix}`;
let fileList = [];
const isFileExist = await exists(filePath);
// 文件如果存在,就显示妙传
if (isFileExist) {
resp.send({
code: 2,
msg: 'success',
servicePath: `${HOSTNAME}/${fileName}`
});
return;
}
const isFoldExist = await exists(foldPath);
if (isFoldExist) {
// 读取chunk文件
fileList = fs.readdirSync(foldPath);
// chunk文件排序
fileList = fileList.sort((a, b) => {
let reg = /_(\d+)/;
return reg.exec(a)[1] - reg.exec(b)[1];
})
resp.send({
code: 0,
msg: 'success',
fileList: fileList
});
return;
}
resp.send({
code: 0,
msg: 'success',
fileList: fileList
});
} catch (err) {
console.log('获取已经上传的切片报错:', err);
resp.send({
code: 0,
msg: 'fail',
fileList: fileList
})
}
})
7.切片上传的应用场景
如今是一个信息化的时代,也是一个媒体资源爆炸的时代!每天会出现大量的视频音频资源,这些资源的网络传播肯定离不开文件的上传。切片上传、断点续传、秒传这些上传方案,在视频领域,如:在线教育、直播、短视频等场景广泛应用。
最后总结
通过前面的内容我们已经了解了整个切片上传的流程,简单来说大致可以分为4个步骤就可以完成大文件上传。
- 根据File文件对象生成唯一的Hash标识
- 对整个大的文件进行分割成若干个小块
- 前端发送请求依次上传每一个小的切片
- 前端所有切片上传完成之后通知服务端进行切片合并
客户端源码
import axios from 'axios';
import SparkMD5 from 'spark-md5';
// 1.服务器地址
const BASE_URL = 'http://localhost:5000';
// 2.接口地址
const API = {
GET_UPLOADED_CHUNKS: BASE_URL + '/get_uploaded_chunks',
POST_UPLOAD_CHUNK: BASE_URL + '/post_upload_chunk',
POST_UPLOAD_MERGE: BASE_URL + '/post_upload_merge'
}
// 单个chunk的大小
let chunkSize = 50 * 1024 * 1024;
// chunk的个数
let chunkCount = 0;
// chunk个数的极限值
let max = 100;
// 保存已经上传过的切片
let uploaded_chunks = [];
/**
* @desc 获取所有chunk
* @params file文件对象
*/
const getChunks = (file, HASH, suffix) => {
const { size } = file;
let chunkCount = Math.ceil(size / chunkSize);
if (chunkCount > max) {
chunkSize = size / max;
chunCount = max;
}
let index = 0;
const chunks = [];
while (index < chunkCount) {
const item = file.slice(index * chunkSize, (index + 1) * chunkSize);
chunks.push({
file: item,
filename: `${HASH}_${index + 1}.${suffix}`
})
index++;
}
return chunks;
}
/**
* @desc 获取文件唯一标识Hash
* @params file文件对象
*/
const getHashByFile = (file) => {
return new Promise((resolve, reject) => {
let fileReader = new FileReader();
// 1.读取文件,并得到一个ArrayBuffer的对象文件数据
fileReader.readAsArrayBuffer(file);
// 2.文件读取完成回调函数
fileReader.onload = (e) => {
const buffer = e.target.result;
/**
* @desc 根据文件内容计算出文件的Hash值
* @url https://github.com/satazor/js-spark-md5
*/
const spark = new SparkMD5.ArrayBuffer();
spark.append(buffer);
// 唯一标识
const HASH = spark.end();
// 文件的后缀名
const suffix = /\.([a-zA-Z0-9]+)$/.exec(file.name)[1];
resolve({
buffer,
HASH,
suffix,
filename: `${HASH}.${suffix}`
});
}
// 2.文件读取进度回调函数
fileReader.onprogress = (e) => {
const progress = (e.loaded / file.size) * 100;
console.log('===文件读取进度===>', progress);
}
})
}
/**
* @desc 获取FormData对象
*/
const getFormData = (data = {}) => {
const { file, filename } = data;
const formData = new FormData();
formData.append('file', file);
formData.append('filename', filename);
return formData;
}
/**
* @desc 获取已经上传过的切片chunk
*/
const apiGetUploadedChunks = async (params = {}) => {
try {
const res = await axios.get(API.GET_UPLOADED_CHUNKS, {
params
});
return res.data;
} catch (err) {
console.log('获取已经上传过的切片chunk报错:', err);
return {};
}
}
/**
* @desc 上传单个切片
*/
const apiPostUploadChunk = async (formData = {}) => {
try {
const res = await axios.post(API.POST_UPLOAD_CHUNK, formData, {
headers: {
'Content-Type': 'multipart/form-data'
}
});
return res;
} catch (err) {
console.log('上传单个切片报错:', err);
return {}
}
}
/**
* @desc 发送合并请求
*/
const apiPostUploadMerge = async (params = {}) => {
try {
const res = await axios.post(API.POST_UPLOAD_MERGE, {
HASH: '',
suffix: '',
...params
}, {
headers: {
'Content-Type': 'application/x-www-form-urlencoded'
}
});
return res;
} catch (err) {
console.log('发送合并请求报错:', err);
}
}
/**
* @desc 获取用户上传文件函数
*/
const onFileChangeHandler = async (e) => {
const file = e.target.files[0];
// 1.获取文件的唯一表示Hash
const { HASH, suffix } = await getHashByFile(file);
try {
const { code, fileList, servicePath } = await apiGetUploadedChunks({ HASH, suffix });
// 秒传
if (+code === 1) {
alert(`上传成功: ${servicePath}`);
return;
}
uploaded_chunks = fileList;
const chunks = await getChunks(file, HASH, suffix);
/**
* 1.过滤掉已经上传过的切片
* 2.循环遍历获取切片FormData数据
* 3.循环调用接口上传
*/
const requests = chunks
.filter(v => !uploaded_chunks.includes(v.filename))
.map(chunk => getFormData(chunk))
.map(fd => apiPostUploadChunk(fd));
// 发送切片上传请求
await Promise.all(requests);
// 发送切片合并请求
await apiPostUploadMerge({ HASH, suffix });
} catch (err) {
console.log('onFileChangeHandler报错:', err);
}
}
/**
* @desc 大文件上传入口函数
*/
const init = () => {
const uploadFile = document.getElementById("fileUpload");
uploadFile.addEventListener('change', onFileChangeHandler, false)
}
init(); // 执行入口函数
服务端源码
let express = require('express');
let fs = require('fs');
let path = require('path');
let bodyParser = require('body-parser');
let multiparty = require('multiparty');
let app = express(),
PORT = 5000,
HOST = 'http://localhost',
HOSTNAME = `${HOST}:${PORT}`;
// 上传Base地址
const UPLOAD_DIR = `${__dirname}/upload`;
// 静态资源目录
app.use('/', express.static('upload'));
// 中间件
app.use(bodyParser.urlencoded({
extended: true,
}))
// 允许跨域
app.all('*', (req, resp, next) => {
resp.header('Access-Control-Allow-Origin', '*');
resp.header('Access-Control-Allow-Methods', 'POST, GET');
next();
})
/** @desc 解析formData数据 */
const multiparty_upload = (req, auto) => {
typeof auto !== 'boolean' ? auto = false : null;
let config = {
maxFieldsSize: 200 * 1024 * 1024
}
if (auto) config.uploadDir = uploadDir;
return new Promise(async (resolve, reject) => {
new multiparty.Form(config).parse(req, (err, fields, files) => {
if (err) {
reject(err);
return;
}
resolve({
fields,
files
})
})
})
}
/** @desc 检测文件是否存在 */
const exists = (path) => {
return new Promise((resolve, reject) => {
fs.exists(path, (res) => {
resolve(!!res);
})
})
}
/** @desc 文件写入到文件夹里面 */
const writeFile = (resp, path, file) => {
try {
// 1.读取文件
const buffer = fs.readFileSync(file.path);
// 2.写入文件
fs.writeFile(path, buffer, function (err) {
resp.send({
code: 0,
msg: 'success'
})
});
} catch (err) {
console.log('chunk写入失败:', err);
}
}
/** @desc 合并切片 */
const merge = (HASH, count) => {
return new Promise(async (resolve, reject) => {
let path = `${UPLOAD_DIR}/${HASH}`,
fileList = [],
suffix,
isExists;
// 切片是否存在
isExists = await exists(path);
if (!isExists) {
reject('HASH path is not found');
return;
}
fileList = fs.readdirSync(path);
if (fileList.length < count) {
reject('this slice has not been uploaded');
return;
}
const newFileList = fileList.sort((a, b) => {
let reg = /_(\d+)/;
return reg.exec(a)[1] - reg.exec(b)[1];
})
newFileList.forEach(item => {
!suffix ? suffix = /\.([0-9a-zA-Z]+)$/.exec(item)[1] : null;
fs.appendFileSync(`${UPLOAD_DIR}/${HASH}.${suffix}`, fs.readFileSync(`${path}/${item}`));
fs.unlinkSync(`${path}/${item}`);
})
// 移除文件夹
fs.rmdirSync(path);
resolve({
path: `${UPLOAD_DIR}/${HASH}.${suffix}`,
filename: `${HASH}.${suffix}`
})
})
}
/**
* @desc 获取已经上传完的切片
*/
app.get('/get_uploaded_chunks', async (req, resp) => {
try {
const { HASH, suffix } = req.query;
// 文件夹目录地址
const foldPath = `${UPLOAD_DIR}/${HASH}`;
// 文件地址
const filePath = `${foldPath}.${suffix}`;
// 文件名
const fileName = `${HASH}.${suffix}`;
let fileList = [];
const isFileExist = await exists(filePath);
// 文件如果存在,就显示妙传
if (isFileExist) {
resp.send({
code: 1,
msg: 'success',
fileList: fileList,
servicePath: `${HOSTNAME}/${fileName}`
});
return;
}
const isFoldExist = await exists(foldPath);
if (isFoldExist) {
// 读取chunk文件
fileList = fs.readdirSync(foldPath);
// chunk文件排序
fileList = fileList.sort((a, b) => {
let reg = /_(\d+)/;
return reg.exec(a)[1] - reg.exec(b)[1];
})
resp.send({
code: 0,
msg: 'success',
fileList: fileList
});
return;
}
resp.send({
code: 0,
msg: 'success',
fileList: fileList
});
} catch (err) {
console.log('获取已经上传的切片报错:', err);
resp.send({
code: 0,
msg: 'fail',
fileList: fileList
})
}
})
/**
* @desc 上传切片
*/
app.post('/post_upload_chunk', async (req, resp) => {
try {
let { fields, files } = await multiparty_upload(req);
let file = (files.file && files.file[0]) || {},
filename = (fields.filename && fields.filename[0]) || "",
path = '',
isExists = false;
// 创建存放切片的临时目录
let [, HASH] = /^([^_]+)_(\d+)/.exec(filename);
path = `${UPLOAD_DIR}/${HASH}`;
!fs.existsSync(path) ? fs.mkdirSync(path) : null;
// 把切片存储到临时目录中
path = `${UPLOAD_DIR}/${HASH}/${filename}`;
// 判断文件是否存在
isExists = await exists(path);
if (isExists) {
resp.send({
code: 0,
msg: 'success',
})
return;
}
// 写入文件
writeFile(resp, path, file);
} catch (err) {
console.log('上传切片报错:', err);
resp.send({
code: -1,
msg: 'fail'
})
}
})
/**
* @desc 合并切片
*/
app.post('/post_upload_merge', async (req, resp) => {
let { HASH, count, } = req.body;
try {
let { filename, path } = await merge(HASH, count);
resp.send({
code: 0,
msg: 'success',
servicePath: `${HOSTNAME}/${filename}`
})
} catch (err) {
console.log('切片合并失败:', err);
resp.send({
code: -1,
msg: 'fail'
})
}
})
app.listen(PORT, () => {
console.log(`Server is started at port: ${PORT}~`)
})
转载自:https://juejin.cn/post/7232700464403398716