What must be done if blob is not a valid utf-8 string?

2.9K    Asked by DipikaAgarwal in Salesforce , Asked on Feb 23, 2023

I am reading in a document that the user uploads from the visualFlow:

And its accessed in apex in this manner:

String nameFile = contentFile.toString();

And it works like a charm. I am able to parse through the document and extract all the information needed, but only for English users. But for Spanish users that's not the case.

Those files have some special characters, and cause a BLOB is not a valid UTF-8 string error.

I've tried to base64 Encode the file contents, but the results come out illegible.

String nameFile= EncodingUtil.base64Encode(contentFile);
Answered by Clare Matthews

If blob is not a valid utf-8 string -


Try this code to convert Blob in known charset to UTF-8 string

    /**
    @param input Blob data representing correct string in @inCharset encoding
    @param inCharset encoding of the Blob data (for example 'ISO 8859-2')
*/
public static String blobToString(Blob input, String inCharset){
    String hex = EncodingUtil.convertToHex(input);
    System.assertEquals(0, hex.length() & 1);
    final Integer bytesCount = hex.length() >> 1;
    String[] bytes = new String[bytesCount];
    for(Integer i = 0; i < bytesCount xss=removed>

<strong>Note though this code works correctly, but wastes a lot of CPU time</strong>


Your Answer

Answers (2)

The error "Blob is not a valid UTF-8 string" occurs when a binary large object (BLOB) contains data that cannot be interpreted as a UTF-8 string. This often happens in databases, APIs, or file processing where text encoding is expected. Here’s how to fix it:

1. Verify the Encoding

Check if the BLOB is actually text data. Some BLOBs store binary files (images, PDFs, etc.), which are not meant to be UTF-8.

If using MySQL, check encoding with:

SELECT COLUMN_NAME, CHARACTER_SET_NAME 
FROM information_schema.COLUMNS
WHERE TABLE_NAME = 'your_table';

2. Convert the BLOB to UTF-8

If the BLOB is encoded differently (e.g., Latin-1, Windows-1252), convert it to UTF-8:

  SELECT CONVERT(column_name USING utf8) FROM your_table;

In Python:

  data = blob_data.decode('utf-8', errors='ignore')  # Ignores invalid characters

3. Check for Corrupt Data

  • Sometimes, incorrect storage or retrieval methods corrupt encoding.
  • Retrieve the data as raw bytes and inspect it using a hex editor or bytearray in Python.

4. Use Base64 Encoding for Storage

If storing binary data in a text field, use Base64 encoding to prevent encoding issues:

import base64
encoded = base64.b64encode(blob_data).decode('utf-8')

5. Modify Database Schema (If Needed)

If the column is wrongly set to TEXT instead of BLOB, change it:

  ALTER TABLE your_table MODIFY column_name BLOB;

If the BLOB contains actual binary data, treat it as such instead of forcing UTF-8 conversion.


1 Month

If a blob of data is not a valid UTF-8 string, you typically have a few options depending on your specific requirements and context:


Error Handling: If the invalid UTF-8 data is an anomaly or unexpected, you might handle it as an error condition. This could involve logging the issue, notifying the user, or taking some other appropriate action to address the problem.

Data Transformation or Cleanup: If the data can be salvaged or if there's a chance it contains valid UTF-8 characters with some corruption, you could attempt to clean up or transform the data. This might involve replacing or removing invalid characters, or applying some other transformation to make the data valid.

Ultimately, the best approach will depend on the specific requirements and constraints of your application, as well as the nature of the data you're dealing with.

11 Months

Interviews

Parent Categories