PG
PRO
01

PostgreSQL Binary String (bytea) Functions

Complete reference for PostgreSQL binary string functions for the bytea data type, covering encoding, decoding, byte-level access, length measurement, and binary manipulation. Includes encode/decode with hex and base64, get_byte, set_byte, and substr on binary data. Updated for PostgreSQL 16.

15 functions

What are PostgreSQL Binary String (bytea) Functions?

PostgreSQL binary string functions operate on the bytea data type, which stores arbitrary binary data such as images, file contents, and cryptographic hashes. encode() and decode() convert between bytea and text representations (hex, base64, escape). get_byte() and set_byte() provide direct byte-level access by position. Binary strings differ from text strings — length() on bytea returns the byte count, not the character count.

|| (bytea concatenation)

PG 8.0+bytea

Concatenates two binary strings.

DeveloperData EngSecurity

Signature

bytea || bytea → bytea

Parameters

ParameterTypeDescription
bytes1byteaFirst binary string
bytes2byteaSecond binary string

Examples

sql
SELECT '\x123456'::bytea || '\x789a00bcde'::bytea;
\x123456789a00bcde
sql
SELECT '\x00'::bytea || '\xFF'::bytea;
\x00ff
sql
SELECT '\xDEAD'::bytea || '\xBEEF'::bytea;
\xdeadbeef
sql
SELECT header || payload || trailer FROM binary_frames;
Assembled binary frame
Anti-PatternConcatenating binary data as text strings

Using text `||` concatenation on hex strings or casting binary to text to assemble binary data is error-prone and encoding-fragile. PostgreSQL's bytea `||` operator handles binary concatenation natively and correctly.

✓ Instead: Use `'\\xDEAD'::bytea || '\\xBEEF'::bytea` for binary concatenation rather than joining hex text strings and re-casting.

Use `||` to assemble binary protocol frames from component byte sequences in SQL. Combine with `decode('...', 'hex')` or `set_byte` for dynamic byte construction.

example
SELECT '\x01'::bytea || length(payload)::int2::bytea || payload FROM messages;
Simple binary frame: type byte, 2-byte length, payload

bit_length

PG 8.0+integer

Returns the number of bits in the binary string (8 times the number of bytes).

DeveloperData EngSecurity

Signature

bit_length ( bytea ) → integer

Parameters

ParameterTypeDescription
bytesbyteaBinary string to measure

Examples

sql
SELECT bit_length('\x123456'::bytea);
24
sql
SELECT bit_length('\xFF'::bytea);
8
sql
SELECT bit_length('\x0000'::bytea);
16
sql
SELECT bit_length(data) / 8 AS byte_count FROM binary_records;
Equivalent to octet_length
Anti-PatternUsing bit_length to compute byte counts

`bit_length(b) = 8 * octet_length(b)`. When you need the byte count for size checks, calling `bit_length` and dividing by 8 is needlessly indirect and invites off-by-8 errors.

✓ Instead: Use `octet_length(col)` directly to get byte count instead of `bit_length(col) / 8`.

`bit_length(b) = 8 * octet_length(b)`. Prefer `octet_length` for most byte-oriented use cases — `bit_length` is useful when working with bit-level protocols.

example
SELECT octet_length(data) AS bytes, bit_length(data) AS bits FROM binary_records;
Both measures side by side

btrim

PG 8.0+bytea

Removes the longest string containing only bytes appearing in bytesremoved from the start and end of bytes.

DeveloperData EngSecurity

Signature

btrim ( bytes bytea, bytesremoved bytea ) → bytea

Parameters

ParameterTypeDescription
bytesbyteaBinary string to trim
bytesremovedbyteaSet of bytes to trim from both ends

Examples

sql
SELECT btrim('\x1234567890'::bytea, '\x9012'::bytea);
\x345678
sql
SELECT btrim('\x00112200'::bytea, '\x00'::bytea);
\x1122
sql
SELECT btrim('\xFF0A0BFF'::bytea, '\xFF'::bytea);
\x0a0b
sql
SELECT btrim(raw_frame, '\x00FF'::bytea) AS stripped FROM sensor_data;
Frame with null and 0xFF padding removed from both ends
Anti-PatternTreating bytesremoved as a sequence prefix/suffix instead of a byte set

`btrim(data, '\\x0102'::bytea)` removes any combination of bytes 0x01 and 0x02 from both ends — not the literal sequence `\x0102`. This over-trims when those byte values appear legitimately at the payload boundary.

✓ Instead: Use `substring` with known fixed offsets to strip exact-length headers/trailers rather than `btrim` when the boundary bytes also appear inside valid payload.

Binary data from external systems often includes null byte `\x00` padding. Use `btrim(data, '\x00'::bytea)` to strip them from both ends before processing.

example
SELECT btrim(raw_data, '\x00'::bytea) AS clean_data FROM sensor_readings;
Binary data with null padding removed

decode

PG 7.4+bytea

Decodes a text string in the specified encoding ('base64', 'hex', or 'escape') into binary data.

DeveloperData EngSecurity

Signature

decode ( string text, format text ) → bytea

Parameters

ParameterTypeDescription
stringtextEncoded text to decode
formattext'base64', 'hex', or 'escape'

Examples

sql
SELECT decode('SGVsbG8=', 'base64');
\x48656c6c6f
sql
SELECT decode('deadbeef', 'hex');
\xdeadbeef
sql
SELECT convert_from(decode(b64_col, 'base64'), 'UTF8') AS text FROM encoded_text;
Decoded UTF-8 text from base64
sql
INSERT INTO blobs (data) VALUES (decode(trim($1), 'base64'));
Safely inserts base64-encoded upload after trimming whitespace
Anti-PatternCalling decode() on unvalidated user input without error handling

Passing malformed base64 or hex input to `decode()` raises an exception that aborts the transaction. In bulk import pipelines this can silently roll back many rows. User-supplied encoded strings must be validated before decoding.

✓ Instead: Wrap `decode()` in a `CASE WHEN col ~ '^[0-9a-fA-F]+

#x27; THEN decode(col, 'hex') END` guard for hex, or handle via application-level validation before the SQL call.

When receiving binary content as base64 strings (common in REST APIs), use `decode(base64_str, 'base64')` to convert to bytea for storage or further processing.

example
INSERT INTO files (name, data) VALUES ($1, decode($2, 'base64'));
Stores binary file from base64-encoded input

encode

PG 7.4+text

Encodes binary data as text using the specified encoding: 'base64', 'hex', or 'escape'.

DeveloperData EngSecurity

Signature

encode ( bytes bytea, format text ) → text

Parameters

ParameterTypeDescription
bytesbyteaBinary data to encode
formattext'base64', 'hex', or 'escape'

Examples

sql
SELECT encode('\x48656c6c6f'::bytea, 'hex');
48656c6c6f
sql
SELECT encode('Hello'::bytea, 'base64');
SGVsbG8=
sql
SELECT encode(digest_value, 'hex') AS sha256_hex FROM hashes;
Hexadecimal hash string
sql
SELECT replace(encode(avatar, 'base64'), E'\n', '') AS compact_b64 FROM user_profiles;
Base64 without embedded newlines for JSON embedding
Anti-PatternStoring binary data as text-encoded hex or base64 strings in the database

Encoding binary content to hex or base64 before storing in a text column wastes space (hex doubles size, base64 adds ~33%), loses binary operators, and forces decode on every read. The native `bytea` type stores binary efficiently and supports direct operations.

✓ Instead: Store binary data in a `bytea` column and use `encode(col, 'hex')` only at the API boundary for display or transmission.

Use `encode(data, 'hex')` for human-readable hex output. Use `encode(data, 'base64')` for compact transmission (web APIs, JSON). PostgreSQL's base64 output includes newlines every 76 chars — remove them with `replace(encode(b, 'base64'), E'\n', '')`.

example
SELECT replace(encode(image_data, 'base64'), E'\n', '') AS base64_image FROM user_avatars;
Compact base64 without newlines

get_byte

PG 7.4+integer

Returns the value of the n-th byte in the binary string (0-based indexing). Returns an integer from 0 to 255.

DeveloperData EngSecurity

Signature

get_byte ( bytes bytea, n integer ) → integer

Parameters

ParameterTypeDescription
bytesbyteaBinary string
ninteger0-based byte index

Examples

0x12 = 18 decimal

sql
SELECT get_byte('\x1234567890'::bytea, 0);
18
sql
SELECT get_byte('\xFF00'::bytea, 1);
0
sql
SELECT get_byte(data, 0) AS first_byte FROM packets;
First byte value (0-255)
sql
SELECT get_byte(header, 0) AS version, get_byte(header, 1) AS msg_type FROM frames;
Parsed header fields from binary frame
Anti-PatternUsing get_byte index 1 expecting the first byte (it's 0-indexed)

`get_byte(data, 0)` returns the first byte. Using index 1 returns the second byte — a systematic off-by-one error when parsing protocol frames. The confusion worsens because `substring(data FROM 1 ...)` is 1-based.

✓ Instead: Always use `get_byte(data, 0)` for the first byte. Add a comment clarifying 0-based indexing, especially alongside `substring` calls which use 1-based FROM.

`get_byte` uses 0-based indexing. Byte 0 is the first byte. `substring` uses 1-based FROM. Don't mix them up when parsing binary frames.

example
SELECT get_byte(header, 0) AS version, get_byte(header, 1) AS flags FROM packets;
Individual byte field values from binary header

ltrim

PG 8.0+bytea

Removes the longest string containing only bytes from bytesremoved from the start (left) of bytes.

DeveloperData EngSecurity

Signature

ltrim ( bytes bytea, bytesremoved bytea ) → bytea

Parameters

ParameterTypeDescription
bytesbyteaBinary string to trim
bytesremovedbyteaSet of bytes to remove from the left

Examples

sql
SELECT ltrim('\x1234567890'::bytea, '\x9012'::bytea);
\x34567890
sql
SELECT ltrim('\x00001234'::bytea, '\x00'::bytea);
\x1234
sql
SELECT ltrim(raw_packet, '\x00'::bytea) AS no_leading_nulls FROM packets;
Packets with leading null bytes stripped
sql
SELECT ltrim(encoded_val, '\x00'::bytea) AS compact FROM ber_fields;
BER-encoded integer with leading zero bytes stripped
Anti-PatternExpecting ltrim to strip a fixed-length prefix sequence

`ltrim(data, mask)` greedily removes any byte in the mask set from the left — not a literal prefix sequence. If the leading valid payload bytes happen to match the mask, they get silently stripped too.

✓ Instead: Use `substring(data FROM N+1)` to skip exactly N leading bytes rather than `ltrim` when the prefix bytes also occur at the start of valid content.

Some binary formats embed a known header. Use `ltrim` or `substring` to skip past it and access the payload portion of binary data.

example
SELECT ltrim(file_data, '\xFF\xD8'::bytea) AS jpeg_body FROM images;
JPEG data with SOI marker removed

md5 (bytea)

PG 8.0+text

Computes the MD5 hash of binary data, returning a 32-character lowercase hexadecimal string.

DeveloperData EngSecurity

Signature

md5 ( bytea ) → text

Parameters

ParameterTypeDescription
bytesbyteaBinary data to hash

Examples

sql
SELECT md5('\x48656c6c6f'::bytea);
8b1a9953c4611296a827abf8c47804d7
sql
SELECT md5(file_data) FROM stored_files;
MD5 hex digest of file content
sql
SELECT id, md5(content) AS etag FROM assets WHERE md5(content) != stored_etag;
Assets whose content has changed since etag was recorded
sql
SELECT md5(large_blob) = expected_md5 AS integrity_ok FROM uploads;
Quick integrity check for non-security use case
Anti-PatternUsing md5() for security-sensitive hashing

MD5 is cryptographically broken — collision and preimage attacks are feasible. Using md5 for password hashing, integrity checks on security-critical data, or digital signatures is unsafe regardless of how it is encoded.

✓ Instead: Use `sha256(data::bytea)` (built-in since PG 11) or pgcrypto's `digest(data, 'sha256')` for any security-sensitive hash. Reserve md5 only for non-security use cases like ETags or deduplication keys.

MD5 is cryptographically broken — don't use it for passwords or security-sensitive hashes. Use SHA-256 (`sha256`) or pgcrypto's `digest(data, 'sha256')`. MD5 is still fine for non-security checksums (data deduplication, ETags).

example
-- For ETags/deduplication (non-security):
SELECT md5(data) AS etag FROM content;
-- For security, use sha256 instead
Fast non-cryptographic checksum

octet_length

PG 8.0+integer

Returns the number of bytes in the binary string or text string (counting bytes, not characters).

DeveloperData EngSecurity

Signatures

octet_length ( bytea ) → integer
octet_length ( text ) → integer

Parameters

ParameterTypeDescription
bytes_or_textbytea or textValue to measure in bytes

Examples

sql
SELECT octet_length('\x123456'::bytea);
3
sql
SELECT octet_length('hello'::text);
5

Euro sign is 3 bytes in UTF-8

sql
SELECT octet_length('€'::text);
3
sql
SELECT id, octet_length(payload) AS size_bytes FROM messages WHERE octet_length(payload) > 65535;
Messages exceeding 64 KB payload limit
Anti-PatternUsing length() instead of octet_length() on text for byte-size checks

`length(str)` on text counts characters (code points), not bytes. For multibyte UTF-8 strings like emoji or CJK characters, `length` silently underestimates the byte size, letting oversized values past byte-limit checks.

✓ Instead: Use `octet_length(col) <= N` for all byte-budget validations (network buffers, wire protocols). Reserve `char_length(col)` for UI display-character limits.

For multibyte text (UTF-8), `length(str)` counts characters while `octet_length(str)` counts bytes. Use `octet_length` to check if a value fits a byte-limited field or network buffer.

example
SELECT name, length(name) AS chars, octet_length(name) AS bytes FROM users WHERE octet_length(name) > 100;
Names that exceed 100 bytes in UTF-8

overlay

PG 8.0+bytea

Replaces a portion of a binary string starting at a given byte position.

DeveloperData EngSecurity

Signature

overlay ( bytes bytea PLACING newsubstring bytea FROM start integer [ FOR count integer ] ) → bytea

Parameters

ParameterTypeDescription
bytesbyteaOriginal binary string
newsubstringbyteaBinary string to insert
startintegerStarting byte position (1-based)
countintegerNumber of bytes to replace (defaults to length of newsubstring)

Examples

sql
SELECT overlay('\x1234567890'::bytea placing '\x002003'::bytea from 2 for 3);
\x12002003090a
sql
SELECT overlay(template placing new_checksum from octet_length(template) - 1 for 2) AS patched FROM frames;
Frame with last 2 bytes replaced by new checksum

Replace single byte at position 3

sql
SELECT overlay('\xAABBCCDDEE'::bytea placing '\xFF'::bytea from 3 for 1);
\xaabbffddee
sql
SELECT overlay(record placing decode(lpad(to_hex(new_seq), 4, '0'), 'hex') from 1 for 2) FROM records;
First 2 bytes updated with new sequence number
Anti-PatternNot specifying FOR count when replacement length differs from target region

If you omit `FOR count`, `overlay` replaces exactly as many bytes as the replacement string contains. If the replacement is longer or shorter than the target region, the total length of the bytea result changes silently, breaking fixed-frame binary protocols.

✓ Instead: Always specify `FOR count` explicitly when patching binary frames: `overlay(data PLACING new_val FROM offset FOR length_of_region)`.

Use `overlay` to patch known-offset fields in binary data structures without decoding the entire record. Combine with `set_byte` for single-byte changes.

example
SELECT overlay(firmware placing new_header FROM 1 FOR 16) AS patched FROM firmware_images;
Firmware with header bytes replaced

position

PG 8.0+integer

Returns the first starting byte index of the specified substring within the binary string, or zero if not present.

DeveloperData EngSecurity

Signature

position ( substring bytea IN bytes bytea ) → integer

Parameters

ParameterTypeDescription
substringbyteaBinary string to search for
bytesbyteaBinary string to search within

Examples

sql
SELECT position('\x5678'::bytea in '\x1234567890'::bytea);
3
sql
SELECT position('\xFF'::bytea in data) AS ff_offset FROM binary_data;
Position of first 0xFF byte
sql
SELECT position('\xDEAD'::bytea in '\xBEEFDEADCAFE'::bytea);
3
sql
SELECT id FROM packets WHERE position('\x0D0A0D0A'::bytea IN raw_data) > 0;
Packets containing HTTP double-CRLF separator
Anti-PatternChecking position() IS NOT NULL instead of > 0

`position(sub IN bytes)` always returns an integer — 0 when the subsequence is not found, never NULL. Testing `IS NOT NULL` is always true and finds nothing useful. This silently passes all rows through when filtering for presence.

✓ Instead: Use `WHERE position('\\xFF'::bytea IN data) > 0` to test for presence, not `IS NOT NULL`.

Use `position` to locate delimiters or magic sequences in binary data. A return value of 0 means not found — check with `> 0` rather than `IS NOT NULL`.

example
SELECT id, position('\x0D0A'::bytea IN raw_response) AS crlf_pos FROM http_captures;
Position of CRLF (\r\n) line ending

rtrim

PG 8.0+bytea

Removes the longest string containing only bytes from bytesremoved from the end (right) of bytes.

DeveloperData EngSecurity

Signature

rtrim ( bytes bytea, bytesremoved bytea ) → bytea

Parameters

ParameterTypeDescription
bytesbyteaBinary string to trim
bytesremovedbyteaSet of bytes to remove from the right

Examples

sql
SELECT rtrim('\x1234567890'::bytea, '\x9012'::bytea);
\x12345678
sql
SELECT rtrim('\x12340000'::bytea, '\x00'::bytea);
\x1234
sql
SELECT rtrim(padded_blob, '\xFF'::bytea) AS unpadded FROM fixed_records;
Blob with trailing 0xFF padding removed
sql
SELECT rtrim(fixed_field, '\x20'::bytea) AS trimmed FROM char_fields;
Fixed-width field with trailing space bytes (0x20) removed
Anti-PatternAssuming rtrim removes a fixed-length suffix

`rtrim(data, mask)` greedily removes any byte in the mask set from the right. If valid payload bytes at the end happen to be in the mask, they are also stripped, silently corrupting the binary value.

✓ Instead: Use `substring(data FOR octet_length(data) - N)` to strip exactly N trailing bytes rather than `rtrim` when those byte values appear in the content.

Fixed-length binary fields often have trailing null or 0xFF padding bytes. Use `rtrim(data, '\x00'::bytea)` or `rtrim(data, '\xFF'::bytea)` to remove them.

example
SELECT rtrim(padded_field, '\x00'::bytea) AS unpadded FROM fixed_width_records;
Binary field with trailing zeros removed

set_byte

PG 7.4+bytea

Sets the value of the n-th byte in the binary string to newvalue (0-255). Returns the modified bytea.

DeveloperData EngSecurity

Signature

set_byte ( bytes bytea, n integer, newvalue integer ) → bytea

Parameters

ParameterTypeDescription
bytesbyteaBinary string to modify
ninteger0-based byte index to set
newvalueintegerNew byte value (0–255)

Examples

sql
SELECT set_byte('\x1234567890'::bytea, 2, 255);
\x1234ff7890
sql
SELECT set_byte(data, 0, 0) FROM packets;
Sets first byte to 0x00
sql
SELECT set_byte(set_byte(header, 0, 2), 1, 0) AS patched_header FROM frames;
Version set to 2, second byte zeroed
sql
UPDATE packets SET data = set_byte(data, 3, get_byte(data, 3) | 128) WHERE id = $1;
Sets the high bit of byte 3 in-place
Anti-PatternUsing set_byte with a value outside 0–255

`set_byte` requires a newvalue in [0, 255]. Passing a value outside this range raises an error at runtime. There is no silent wrapping or truncation — validation must happen in application code before the call.

✓ Instead: Validate the byte value before calling: `WHERE new_val BETWEEN 0 AND 255`. Also remember set_byte is 0-indexed; `set_byte(data, 0, v)` modifies the first byte, not the second.

To flip a specific bit in byte n: `set_byte(data, n, get_byte(data, n) | (1 << bit_pos))` to set it, or `& ~(1 << bit_pos)` to clear it.

example
SELECT set_byte(flags, 0, get_byte(flags, 0) | 4) AS flags_with_bit2_set FROM config;
Byte with bit 2 set

sha256

PG 11+bytea

Computes the SHA-256 hash of the binary string, returning a 32-byte (256-bit) bytea.

DeveloperData EngSecurity

Signature

sha256 ( bytea ) → bytea

Parameters

ParameterTypeDescription
bytesbyteaBinary data to hash

Examples

sql
SELECT encode(sha256('hello'::bytea), 'hex');
2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
sql
SELECT encode(sha256(file_data), 'hex') AS hash FROM stored_files;
SHA-256 hash of each file
sql
SELECT id FROM documents WHERE sha256(content) = decode($1, 'hex');
Find document matching a known SHA-256 hash
sql
SELECT id, encode(sha256(payload), 'hex') AS integrity_hash FROM messages ORDER BY id;
Per-row integrity fingerprint for audit log
Anti-PatternStoring SHA-256 hashes as text hex strings instead of bytea

Storing `encode(sha256(data), 'hex')` as text uses 64 bytes per hash. Storing the raw `sha256(data)` bytea uses 32 bytes. The text form also requires a `decode` on lookup, making index comparisons more complex.

✓ Instead: Store `sha256(data)` directly in a `bytea` column. Use `encode(sha256_col, 'hex')` only for display: `WHERE sha256_col = decode($1, 'hex')`.

Store SHA-256 hashes as `bytea` (32 bytes, compact) rather than `text`. Use `encode(sha256(data), 'hex')` only for display. Index comparisons on `bytea` columns are fast.

example
SELECT id FROM documents WHERE sha256(content) = decode($1, 'hex');
Find document by SHA-256 content hash

substring

PG 8.0+bytea

Extracts a substring from a binary string by byte position.

DeveloperData EngSecurity

Signature

substring ( bytes bytea [ FROM start integer ] [ FOR count integer ] ) → bytea

Parameters

ParameterTypeDescription
bytesbyteaBinary string to extract from
startintegerStarting byte position (1-based)
countintegerNumber of bytes to extract

Examples

sql
SELECT substring('\x1234567890'::bytea from 2 for 3);
\x345678
sql
SELECT substring(file_header from 1 for 4) AS magic_bytes FROM files;
First 4 bytes (magic number)

Omitting FOR returns bytes to end of string

sql
SELECT substring('\xDEADBEEF'::bytea from 3);
\xbeef
sql
SELECT encode(substring(frame from 5 for 2), 'hex') AS frame_type FROM packets;
2-byte type field from byte offset 5
Anti-PatternMixing 1-based substring indexing with 0-based get_byte indexing

`substring(data FROM 1 ...)` uses 1-based positioning. `get_byte(data, 0)` uses 0-based positioning. When parsing the same binary frame with both functions, mixing conventions causes persistent off-by-one errors that are hard to debug.

✓ Instead: Use `substring` consistently for multi-byte extractions (1-based) and `get_byte` for single bytes (0-based). Add comments at call sites to make the indexing convention explicit.

Binary file formats (PNG, PDF, ZIP) have magic bytes at known offsets. Use `substring(data FROM offset FOR length)` to read and verify them without external tools.

example
SELECT id, encode(substring(data FROM 1 FOR 4), 'hex') AS magic FROM stored_files;
Magic bytes: 89504e47=PNG, 25504446=PDF