ARTICLE AD BOX
I'm using the SimbaSparkODBC driver provided by DataBricks on Windows to connect to a DataBricks instance which is running in Azure.
Most of the sqls are running fine, but sometimes the result contains invalid non-UTF8 symbols which shouldn't be there. I figured out that this happens more likely when the returned text is getting bigger and I was able to reproduce this error with a simple sql.
Here the example:
SELECT 'aslkdjh alskdjh lsakjdh lksajhd lksajh dlkajh sdlkjash ldkjh aslkdjh salkdjh laksjhd lkasjh dlksajh dlkajh dlkjah sldkh saldkjh aslkdaslkdjh alskdjh lsakjdh lksajhd lksajh dlkajh sdlkjash ldkjh aslkdjh salkdjh laksjhd lkasjh dlksajh dlkajh dlkjah sldkh saldkjh aslkdaslkdjh alskdjh lsakjdh lksajhd lksajh dlkajh sdlkjash ldkjh aslkdjh salkdjh laksjhd lkasjh dlksajh dlkajh dlkjah sldkh saldkjh aslkdaslkdjh alskdjh lsakjdh lksajhd lksajh dlkajh sdlkjash ldkjh aslkdjh salkdjh laksjhd lkasjh dlksajh dlkajh dlkjah sldkh saldkjh aslkdaslkdjh alskdjh lsakjdh lksajhd lksajh dlkajh sdlkjash ldkjh aslkdjh salkdjh laksjhd lkasjh dlksajh dlkajh dlkjah sldkh saldkjh aslkdaslkdjh alskdjh lsakjdh lksajhd lksajh dlkajh sdlkjash ldkjh aslkdjh salkdjh laksjhd lkasjh dlksajh dlkajh dlkjah sldkh saldkjh aslkd' AS testThe returned result was one row with one column with this value:
aslkdjh alskdjh lsakjdh lksajhd lksajh dlkajh sdlkjash ldkjh aslkdjh salkdjh laksjhd lkasjh dlksajh dlkajh dlkjah sldkh saldkjh aslkdaslkdjh alskdjh lsakjdh lksajhd lksajh dlkajh sdlkjash ldkjh aslkdjh salkdjh laksjhd lkasjh dlksajh dlkajh dlkjah sldkh sa�\�Simba Spark ODBC Driver};UseProxy=1;ThriftTransport=2;SSL=1;ProxyPort=xxxxx;ProxyHost=xxxxxxxxxxxxxx;Port=443;HTTPPath=/sql/1.0/warehouses/xxxxxxxxxxx;Host=adb-xxxxxxxxxxxxx.azuredatabricks.net;AuthMech=3;UID=token;PWD=\���\� 6�8���Q�h��Q���7i*V�a6�TyC��linedThe fact that parts of the connection string was in the returned data is really confusing.
I'm using the PDO classes, but I've also tested the odbc_connect() functions with the same result.
To confirm it isn't a driver problem, I did the same with a C# application and everything works like expected.
Do you have any idea what is happening here and how to avoid it?
Tested with
PHP 8.1.3 and 8.5.0 The newest Simba Spark ODBC driver