远古项目往往有很多问题,编码首当其冲。鉴于 GBK 和 UTF-8 互不兼容,IDEA默认打开就会乱码,单个转码太慢太麻烦,所以整个批量转 UTF-8 国际通用码的小工具。
import os
import chardet
import codecs
def write_file(file_path, content, encoding="utf-8"):
with codecs.open(file_path, "w", encoding) as f:
f.write(content)
def convert_to_utf8(src_path):
with open(src_path, "rb") as f:
raw_data = f.read()
detected = chardet.detect(raw_data)
original_encoding = detected["encoding"]
if original_encoding is None:
print(f"[SKIP] {src_path}: encoding not detected")
return
if original_encoding.lower() != "utf-8":
try:
with codecs.open(src_path, "r", original_encoding) as f:
content = f.read()
write_file(src_path, content, encoding="utf-8")
print(f"[OK] {src_path}: {original_encoding} → utf-8")
except Exception as e:
print(f"[ERROR] {src_path}: failed to convert ({original_encoding}) - {e}")
else:
print(f"[SKIP] {src_path}: already utf-8")
def process_directory(root_dir):
for parent, dirnames, filenames in os.walk(root_dir):
for filename in filenames:
if filename.endswith((".java", ".jsp")):
full_path = os.path.join(parent, filename)
convert_to_utf8(full_path)
if __name__ == "__main__":
src_path = "C:/Users/File"
process_directory(src_path)
python主要转 java 和 jsp 文件,如果你有需求,可在 if filename.endswith((".java", ".jsp")) 这行代码的括号中添加后缀格式。
前提:有Python环境
-
首先复制代码并保存为.py文件,名称随意,例: convert.py
-
替换需要转码的目录路径,根目录即可,会递归执行
python# 注意路径划分以正斜杠/ src_path = "C:/Users/File" -
在保存的位置打开终端并执行
plaintextpip install chardet -
最后执行
plaintextpython convert.py
出现 [OK] 开头说明成功了。还有,不要忘记 jsp 文件开头的 pageEncoding 也要改为 UTF-8,否则部署打开全是乱码锟斤拷。
最后附个视频,对编码不了解的可以看看,视频相同,两个平台。
YouTube:
BiliBili:
