- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am using Intel Devcloud and I have created a lightgbm model for binary classification using this code -
import lightgbm as lgb
bst = lgb.train(params, train_data, num_boost_round=10000, valid_sets=[valid_data], callbacks = [callback])
daal_model = d4p.get_gbt_model_from_lightgbm(bst)
After this I tried to save this "daal_model" in my pc using pickle.
import pickle
with open('model.pkl','wb') as out:
pickle.dump(daal_model, out)
Issue -
The model.pkl file is 8.02 GB large. How is this possible? And why is it happening? I tried saving my normal model "bst" using pickle and it is just 426 KB. Why daal4py model is so big?
How to Reproduce -
- Here my code and data - https://colab.research.google.com/drive/1X12eIMqZZ_IzY2QsTBGF1CIa8kMAXOgs?usp=sharing
- Don't run it on Google Colab and Intel Devcloud, both will crash.
- Run it on some real computer.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Adarsh2,
By design, daal4py trees expect a dense node structure and allocate memory accordingly. The provided example creates extremely sparse trees and is unsuitable for running in daal4py.
We will add support for these scenarios. At the time being, you can run the model with a reasonable maximum depth, for example params['max_depth'] = 8. It provides similar accuracy, and the resulting model dump is only 2.9 MB.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thank you for posting in Intel Communities.
We have observed the same issue with daal4py (version - 2023.2.1) when running your code in DevCloud for oneAPI.
We are checking on this internally, will get back to you with an update.
Meanwhile can you share the version of daal4py you are using?
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am using 2023.1.1
d4p._get__version__()
>> '(2023, 1, 1, \'"P"\')'
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for reporting this issue, we are looking into it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Adarsh2,
By design, daal4py trees expect a dense node structure and allocate memory accordingly. The provided example creates extremely sparse trees and is unsuitable for running in daal4py.
We will add support for these scenarios. At the time being, you can run the model with a reasonable maximum depth, for example params['max_depth'] = 8. It provides similar accuracy, and the resulting model dump is only 2.9 MB.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page