I was in troubles to convert big xlsx files in PHP because there is no such a library that could make it as good as possible. What I’ve found out were libraries that could manage xls / xlsx files but it took a lot of time and performance was horrible.
So, not spending too much time to think about it I’ve figured out to leave PHP and try to make it in Python. What I wanted to achive was converting xls files into csv and handle it in PHP.
In Python we got xlrd library that enables us to make this “crazy” thing. Here it is: https://pypi.python.org/pypi/xlrd. You can install it using pip, such like:
1 |
$ pip install xlrd |
But for my case it was better to use VirtualENV which is better solution when someday you’d like to move your software somewhere and you forgot about installing all those stuff. However, each way you chose would be fine till it’s working.
After installing required items all you need is to run the script that converts xls into csv. I have already written simple script that can make it. Please find code below and save it to your local system setting the proper permissions right (chmod +x filename).
convert_xls.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
import logging import time import traceback import xlrd import csv import sys import re logging.basicConfig(level=logging.INFO, format='%(asctime)s %(message)s') def csv_from_excel(): xls = sys.argv[1] target = sys.argv[2] logging.info("Start converting: From '" + xls + "' to '" + target + "'. ") try: start_time = time.time() wb = xlrd.open_workbook(xls) sh = wb.sheet_by_index(0) csvFile = open(target, 'wb') wr = csv.writer(csvFile, quoting=csv.QUOTE_ALL) for row in xrange(sh.nrows): rowValues = sh.row_values(row) newValues = [] for s in rowValues: if isinstance(s, unicode): strValue = (str(s.encode("utf-8"))) else: strValue = (str(s)) isInt = bool(re.match("^([0-9]+)\.0$", strValue)) if isInt: strValue = int(float(strValue)) else: isFloat = bool(re.match("^([0-9]+)\.([0-9]+)$", strValue)) isLong = bool(re.match("^([0-9]+)\.([0-9]+)e\+([0-9]+)$", strValue)) if isFloat: strValue = float(strValue) if isLong: strValue = int(float(strValue)) newValues.append(strValue) wr.writerow(newValues) csvFile.close() logging.info("Finished in %s seconds", time.time() - start_time) except Exception as e: print (str(e) + " " + traceback.format_exc()) csv_from_excel() |
If you have a python script created in your local filesystem the next step would be manage it from PHP. And this is most simplest solution. Using exec, system, or shell_exec solves the problem.
1 2 3 |
<?php exec(‘/usr/bin/python convert_xls.py input.xls out.csv’); ?> |
OR in command line.
1 |
$ /usr/bin/python convert_xls.py input.xls out.csv |
I know that solution is not really clean but it works. To have it more elegant I would suggest to create some kind of service in Python that handles it in a good way. I will think about it …
Note: I have also fixed issue with Long numbers in XLS.
Update
If you need to parse big XLS, XLSX files I would suggest not to make in in PHP. Just parse this using JAVA where PHP would be used only for UI.
Any questions, let me know. Please find comment form below.
Jacklyn
/ March 6, 2015This article is very interesting, but is hard to find in search engine.
I found it on 16 spot. You can reach google top ten easily using one handy wordpress plugin and
increase targeted traffic many times. Just search in google for:
Akalatoru’s Rank Plugin
Someone
/ February 22, 2016Dude, xrange is undefined!
Shehel
/ August 23, 2016undefined because xrange is for python2. You probably have python3 which uses range
hossein1221
/ September 10, 2016NameError: name ‘unicodes’ is not defined
when use str in python3
=>
TypeError: ‘str’ does not support the buffer interface
willingness
/ September 25, 2016Thx for this great script!
After converting a lot xls files to csv, I now have encoding trouble with a particular file. The output is:
*** No CODEPAGE record, no encoding_override: will use ‘ascii’
‘ascii’ codec can’t decode byte 0xc3 in position 16: ordinal not in range(128) Traceback (most recent call last):
File “xls2csv.py”, line 20, in csv_from_excel
wb = xlrd.open_workbook(xls)
File “/Library/Python/2.7/site-packages/xlrd/__init__.py”, line 441, in open_workbook
ragged_rows=ragged_rows,
File “/Library/Python/2.7/site-packages/xlrd/book.py”, line 107, in open_workbook_xls
bk.fake_globals_get_sheet()
File “/Library/Python/2.7/site-packages/xlrd/book.py”, line 687, in fake_globals_get_sheet
self.get_sheets()
File “/Library/Python/2.7/site-packages/xlrd/book.py”, line 678, in get_sheets
self.get_sheet(sheetno)
File “/Library/Python/2.7/site-packages/xlrd/book.py”, line 669, in get_sheet
sh.read(self)
File “/Library/Python/2.7/site-packages/xlrd/sheet.py”, line 1356, in read
strg = unpack_string(data, 7, bk.encoding or bk.derive_encoding(), lenlen=1)
File “/Library/Python/2.7/site-packages/xlrd/biffh.py”, line 269, in unpack_string
return unicode(data[pos:pos+nchars], encoding)
UnicodeDecodeError: ‘ascii’ codec can’t decode byte 0xc3 in position 16: ordinal not in range(128)
Anyone with an idea?
FirstLemuel
/ December 2, 2017I see you don’t monetize your page, don’t waste your traffic, you can earn extra bucks
every month because you’ve got hi quality content. If you want to know how to make extra money, search for:
Boorfe’s tips best adsense alternative
Shaul
/ May 1, 2018Thank you
It help me very much
I improved the script to support multi sheets too:
shaul@sm ~12:43:17$
shaul@sm ~12:43:17$ cat Desktop/convert_xls.py
#!/usr/bin/python
# from http://strife.pl/2014/12/converting-large-xls-xlsx-files-to-csv-using-python/
import logging
import time
import traceback
import xlrd
import csv
import sys
import re
logging.basicConfig(level=logging.INFO, format=’%(asctime)s %(message)s’)
def csv_from_excel(sheet_nu=0):
global wr, wb
start_time = time.time()
sh = wb.sheet_by_index(sheet_nu)
for row in xrange(sh.nrows):
rowValues = sh.row_values(row)
newValues = []
for s in rowValues:
if isinstance(s, unicode):
strValue = (str(s.encode(“utf-8″)))
else:
strValue = (str(s))
isInt = bool(re.match(“^([0-9]+)\.0$”, strValue))
if isInt:
strValue = int(float(strValue))
else:
isFloat = bool(re.match(“^([0-9]+)\.([0-9]+)$”, strValue))
isLong = bool(re.match(“^([0-9]+)\.([0-9]+)e\+([0-9]+)$”, strValue))
if isFloat:
strValue = float(strValue)
if isLong:
strValue = int(float(strValue))
newValues.append(strValue)
wr.writerow(newValues)
logging.info(“Finished in %s seconds, Sheet:%s”, time.time() – start_time, sheet_nu)
if len(sys.argv) < 3:
print "Usage: "+ sys.argv[0] +" XLS_file CSV_file"
exit (1)
xls = sys.argv[1]
target = sys.argv[2]
logging.info("Start converting: From '" + xls + "' to '" + target + "'. ")
try:
wb = xlrd.open_workbook(xls)
csvFile = open(target, 'wb')
wr = csv.writer(csvFile, quoting=csv.QUOTE_ALL)
for idx in range(wb.nsheets):
csv_from_excel(idx)
csvFile.close()
except Exception as e:
print (str(e) + " " + traceback.format_exc())
exit (0)
shaul@sm ~12:43:19$
shaul@sm ~12:43:19$
MirkaBeni
/ August 1, 2018Peels help to eliminate from the greater part old amendments dermatological cover. Procedures effective at any time, recommended for skin of any species. After superficial exposure disappear minor wrinkles, skin becomes smooth, without wrinkles. Regeneration – 3 days. deeper peeling well struggles wrinkles (expression, old, smoothing minor scars, eliminates age spots.
Recovery – 7 days. Deep peeling conducting in the beauty medical facilities. This is most effective procedure, although requires long period regeneration – about thirty days.
chemical skin peel before and after
Arnottzxj
/ December 26, 2020and 12 thousand Georgian manuscripts
Airbladevid
/ January 5, 2021XVII century was Nicholas Jarry .
WILDKATmnj
/ January 9, 2021text carrier and protective
Keypadadix
/ January 24, 2021Of his works, he is especially famous
Artisanldn
/ February 14, 2021bride, Julie d’Angenne.
Nespressowbw
/ February 17, 2021book about the chess of love “, created by
Dysonmra
/ April 3, 2021from lat. manus – “hand” and scribo – “I write”) ]
Flukeryd
/ July 7, 2021Since the era of Charlemagne
Flukeero
/ September 6, 2021“Julia’s Garland” (fr. Guirlande de Julie)
Businesstbp
/ September 18, 2021new texts were rewritten
AllenTug
/ October 1, 2021Amritsar is a city in the northwestern Indian state of Punjab, 28 kilometers from the border with Pakistan. At the center of its walled old town, the gilded Golden Temple (Harmandir Sahib) is the holiest gurdwara (religious complex) of the Sikh religion. It’s at the end of a causeway, surrounded by the sacred Amrit Sarovar tank (lake), where pilgrims bathe.
Amritsar
Amritsar is a city in the northwestern Indian state of Punjab, 28 kilometers from the border with Pakistan. At the center of its walled old town, the gilded Golden Temple (Harmandir Sahib) is the holiest gurdwara (religious complex) of the Sikh religion. It’s at the end of a causeway, surrounded by the sacred Amrit Sarovar tank (lake), where pilgrims bathe.
Feederoyo
/ December 19, 2021Century to a kind of destruction:
TerryJiz
/ January 9, 2022Магазин «Автогурман» ориентирован на активную реализацию всех требующихся
комплектующих для вашего транспортного средства.
https://xn--80aafe9bhdrpm.su/katalogTO/katalogTO/models/?id=12 Только здесь у вас имеется
уникальный шанс заказать сертифицированные запчасти Kia, полностью соответствующие самым
скрупулезным требованиям завода-производителя.