PANDAS & glob - 无法确定 Excel 文件格式，您必须手动指定引擎？

站长

2024年07月15日 09:54 · 阅读数 29

代码可以正常运行，但是会出现有以下错误：

Excel file format cannot be determined, you must specify an engine manually.

下面是我的代码和操作流程：

1- 列表中客户列的ID：

 customer_id = ["ID","customer_id","consumer_number","cus_id","client_ID"]

2- 在文件夹中查找所有 xlsx 文件并读取它们的代码：

 l = [] #use a list and concat later, faster than append in the loop
for f in glob.glob("./*.xlsx"):
    df = pd.read_excel(f).reindex(columns=customer_id).dropna(how='all', axis=1)
    df.columns = ["ID"] # to have only one column once concat
    l.append(df)
all_data  = pd.concat(l, ignore_index=True) # concat all data

我添加了引擎 openpyxl

df = pd.read_excel(f, engine="openpyxl").reindex(columns = customer_id).dropna(how='all', axis=1)

新的错误提示：

 BadZipFile: File is not a zip file

pandas 版本：1.3.0 python 版本：python3.9 操作系统：MacOS

有没有更好的方法从文件夹中读取所有 xlsx 文件？

原文由 MTALY 发布，翻译遵循 CC BY-SA 4.0 许可协议

1个回答

test

2024-07-15

当一个 excel 文件被 MS excel 打开时，一个隐藏的临时文件会在同一目录中创建：

 ~$datasheet.xlsx

因此，当运行代码以从文件夹中读取所有文件时，会出现错误：

 Excel file format cannot be determined, you must specify an engine manually.

当所有文件都关闭并且没有隐藏的临时文件 ~$filename.xlsx 在同一目录中时，代码可以完美运行。

原文由 MTALY 发布，翻译遵循 CC BY-SA 4.0 许可协议

适合作为回答的

经过验证的有效解决办法
自己的经验指引，对解决问题有帮助
遵循 Markdown 语法排版，代码语义正确

不该作为回答的

询问内容细节或回复楼层
与题目无关的内容
“赞”“顶”“同问”“看手册”“解决了没”等毫无意义的内容