LAS Reading and Writing with PDAL#
- Author:
Howard Butler
- Contact:
- Date:
3/27/2017
This tutorial will describe reading and writing ASPRS LAS data with PDAL, discuss the capabilities that PDAL readers.las and writers.las can provide for this format.
Introduction#
ASPRS LAS is probably the most commonly used LiDAR format, and PDAL’s support of LAS is important for many users of the library. This tutorial describes and demonstrates some of the capabilities the drivers provide, points out items to be aware of when using the drivers, and hopefully provides some examples you can use to get what you need out of the LAS drivers.
LAS Versions#
There are five LAS versions – 1.0 to 1.4. Each iteration added some complexity to the format in terms of capabilities it supports, possible data types it stores, and metadata. Users of LAS must balance the features they need with the use of the data by downstream applications. While LAS support in some form is quite widespread throughout the industry, most applications do not support every feature of each version. PDAL works to provide many of these features, but it is also incomplete. Specifically, PDAL doesn’t support point formats that store waveform data.
Version Example#
We can use the minor_version
option of writers.las to set the
version PDAL should output. The following example will write a 1.1 version LAS
file. Depending on the features you need, this may or may not be what you want.
1[
2 {
3 "type" : "readers.las",
4 "filename" : "input.las"
5 },
6 {
7 "type" : "writers.las",
8 "minor_version": 1,
9 "filename" : "output.las"
10 }
11]
Note
PDAL defaults to writing a LAS 1.2 version if no minor_version
is
specified
or the forward
option of writers.las is not used to carry along
a version from a previously read file.
Spatial Reference System#
LAS 1.0 to 1.3 use GeoTIFF keys for storing coordinate system information, while LAS 1.4 uses Well Known Text. GeoTIFF is well-supported by most software that read LAS, but it is not possible to express some coordinate system specifics with GeoTIFF. WKT is supports more coordinate systems than GeoTIFF, but vendor-specific and later versions (WKT 2) may not be handled well.
Assignment Example#
The PDAL writers.las allows you to override or assign the coordinate system to an explicit value if you need. Often the coordinate system defined by a file might be incorrect or non-existent, and you can set this with PDAL.
The following example sets the a_srs
option of the writers.las to
EPSG:4326
.
1[
2 {
3 "type" : "readers.las",
4 "filename" : "input.las"
5 },
6 {
7 "type" : "writers.las",
8 "a_srs": "EPSG:4326",
9 "filename" : "output.las"
10 }
11]
Note
Remember to set offset_x
, offset_y
, scale_x
, and
scale_y
values to something appropriate if your are storing decimal
degree data in LAS files. The special value auto
can be used for the
offset values, but you should set an explicit value for the scale values
to prevent overdriving the precision of the data and disrupting
[Compression] with LASzip.
Vertical Datum Example#
Vertical coordinate control is important in LiDAR and PDAL supports assignment and reprojection/transform of vertical coordinates using and GDAL. The coordinate system description magic happens in GDAL, and you assign a compound coordinate system (both vertical and horizontal definitions) using the following syntax:
EPSG:4326+3855
This assignment states typical 4326 horizontal coordinate system plus a vertical one that represents EGM08. In Well Known Text, this coordinate system is described by:
$ gdalsrsinfo "EPSG:4326+3855"
COMPD_CS["WGS 84 + EGM2008 geoid height",
GEOGCS["WGS 84",
DATUM["WGS_1984",
SPHEROID["WGS 84",6378137,298.257223563,
AUTHORITY["EPSG","7030"]],
AUTHORITY["EPSG","6326"]],
PRIMEM["Greenwich",0,
AUTHORITY["EPSG","8901"]],
UNIT["degree",0.0174532925199433,
AUTHORITY["EPSG","9122"]],
AUTHORITY["EPSG","4326"]],
VERT_CS["EGM2008 geoid height",
VERT_DATUM["EGM2008 geoid",2005,
AUTHORITY["EPSG","1027"],
EXTENSION["PROJ4_GRIDS","egm08_25.gtx"]],
UNIT["metre",1,
AUTHORITY["EPSG","9001"]],
AXIS["Up",UP],
AUTHORITY["EPSG","3855"]]
As in [Assignment Example], it is common to need to reassign the coordinate system. The following example defines both the horizontal and vertical coordinate system for a file to UTM Zone 15N NAD83 for horizontal and NAVD88 for the vertical.
1[
2 {
3 "type" : "readers.las",
4 "filename" : "input.las"
5 },
6 {
7 "type" : "writers.las",
8 "a_srs": "EPSG:26915+5703",
9 "filename" : "output.las"
10 }
11]
Note
Any coordinate system description format supported by GDAL’s SetFromUserInput method can be used to assign or set the coordinate system in PDAL. This includes WKT, definitions, or OGC URNs. It is your responsibility, however, to escape or massage any input data to make it be valid JSON.
Reprojection Example#
A common desire is to transform the coordinates of an ASPRS LAS file from one coordinate system to another. The mechanism to do that with PDAL is filters.reprojection.
1[
2 {
3 "type" : "readers.las",
4 "filename" : "input.las"
5 },
6 {
7 "type":"filters.reprojection",
8 "out_srs":"EPSG:26915"
9 },
10 {
11 "type" : "writers.las",
12 "filename" : "output.las"
13 }
14]
Note
If the input data doesn’t specify a projection, you must specify the
in_srs
option of filters.reprojection. in_srs
can also
be used to override an existing spatial reference attached to the input
point set.
Point Formats#
As each revision of LAS was released, more point formats were added. A point format is the fixed set of dimensions that a LAS file stores for each point in the file. For any point format, the size and composition of dimensions is consistent across versions, but users should be aware of some minor interpretation changes based on LAS file version. For example, a classification value of 11 in version 1.4 indicates “Road Surface”, while that value is reserved in version 1.1.
Point Format Example#
Point format or dataformat_id
is an integer that defines the set of fixed
dimensions stored for each point in a LAS file.
All point formats
specify the following dimensions as part of a point record:
X |
Y |
Z |
Intensity |
ReturnNumber |
NumberOfReturns |
ScanDirectionFlag |
EdgeOfFlightLine |
Classification |
ScanAngleRank |
UserData |
PointSourceId |
Because LAS files have no built-in compression, it’s important to use a point format that stores the fewest fields possible that store the desired data. For example, point format 10 uses 45 more bytes per point than point format zero.
If one wanted remove the Red/Green/Blue fields from a LAS file
(one using point format 2), one could simply set the dataformat_id
option to 0. The forward
option can also be set to carry forward
all possible header values from the source file to the new, smaller file.
1[
2 {
3 "type" : "readers.las",
4 "filename" : "input.las"
5 },
6 {
7 "type" : "writers.las",
8 "forward": "all",
9 "dataformat_id": 0,
10 "filename" : "output.las"
11 }
12]
Note
The LASzip storage of GPSTime and Red/Green/Blue fields with no data is perfectly efficient.
Extra Dimensions#
A LAS Point Format ID defines the fixed set of dimensions a file must store, but programs are allowed to store extra data beyond that fixed set. This feature of the format was regularized in LAS 1.4 as something called “extra bytes” or “extra dims”, but previous versions can also store these extra per-point attributes.
Extra Dimension Example#
LAS 1.4 provides for the storage of dimensions not part of the chosen point format by appending them to each point record. PDAL supports this feature when writing files with the “extra_dims” option. The following example will store all source dimensions in the output file and place a description of the dimensions that aren’t part of the point format in an “extra bytes” VLR:
1[
2 "some_non_las_file",
3 {
4 "type" : "writers.las",
5 "extra_dims": "all",
6 "minor_version" : "4",
7 "filename" : "output.las"
8 }
9]
Required Header Fields#
Readers of the ASPRS LAS Specification will see there are many fields that softwares are required to write, with their content mandated by various options and configurations in the format. PDAL does not assume responsibility for writing these fields and coercing meaning from the content to fit the specification. It is the PDAL users’ responsibility to do so. Fields where this might matter include:
project_id
global_encoding
system_id
software_id
filesource_id
Header Fields Example#
The “forward” option of writers.las is the easiest way to get most of what you might want in terms of header settings copied from an input to an output file upon processing. Imagine the scenario of zero’ing out the classification values for an LAS file in preparation for using filters.pmf to reassign them. During this scenario, we’d like to keep all of the other LAS header information, such as [Variable Length Records], extent information, and format settings.
1[
2 {
3 "type" : "readers.las",
4 "filename" : "input.las"
5 },
6 {
7 "type" : "filters.assign",
8 "assignment" : "Classification[0:32]=0"
9 },
10 {
11 "type" : "filters.pmf",
12 "cell_size" : 2.5,
13 "approximate" : false,
14 "max_distance" : 25
15 },
16 {
17 "type" : "writers.las",
18 "forward": "all",
19 "filename" : "output.las"
20 }
21]
Note
If multiple input LAS files are being written to an output file, the
forward
option can only preserve values when they are the same in
all input files. If the values differ, a default will be used (as it
would if the forward
option weren’t supplied). You can specify
specific option values for output that will also override any forwarded
data.
Coordinate Scaling#
LAS stores coordinates as 32 bit integers. It is the user’s responsibility to ensure that the coordinate domain required by the data in the file fits within the 32 bit integer domain. Most coordinate values have digits to the right of the decimal point that must be preserved for sufficient accuracy. Using the scale factor allows for integers to be interpreted as floating point values when read by software.
When writing data to LAS, choosing an appropriate scale factor should take into account not just the maximum precision that can be accommodated by the format, but the actual precision of the data. Using a precision greater than the resolution of the data collection can mislead users as to the actual measurement precision of the data. In addition, it can lead to larger files when writing compressed data with LASzip.
Auto Offset Example#
Users can allow PDAL select scale and offset values for data with the auto
option. This can have some detrimental effects on downstream processing.
auto
for scale values will use the entire 32-bit integer domain.
This maximizes the precision available to
store the data, but this will have a detrimental effect on LASzip storage
efficiency. auto
for offset calculation is just fine, however. When given
the option, choose to store ASPRS LAS data with an explicit scale for the X,
Y, and Z dimensions that represents actual expected data precision, not
artificial storage precision or maximal storage precision.
1[
2 {
3 "type" : "readers.las",
4 "filename" : "input.las"
5 },
6 {
7 "type" : "writers.las",
8 "scale_x":"0.0000001",
9 "scale_y":"0.0000001",
10 "scale_z":"0.01",
11 "offset_x":"auto",
12 "offset_y":"auto",
13 "offset_z":"auto",
14 "filename" : "output.las"
15 }
16]
Compression#
LASzip is an open source, lossless compression technique for ASPRS LAS data.
It is supported by two different software libraries, and it can be used in both
the C/C++ and the JavaScript execution environments. LAZ support is provided
by both readers.las and writers.las. It can be enabled by
setting the compression
option to laszip
.
Compression Example#
Providing a filename with a .laz
extension will write compressed data.
Compression can be turned on explicitly as well:
1[
2 {
3 "type" : "readers.las",
4 "filename" : "input.las"
5 },
6 {
7 "type" : "writers.las",
8 "compression":"laszip",
9 "filename" : "output.laz"
10 }
11]
Variable Length Records#
Variable Length Records, or VLRs, are binary data that the LAS format supports to allow applications to store their own data. Coordinate system information is one type of data stored in VLRs, and many different LAS-using applications store data and metadata with this format capability. PDAL allows users to access VLR information, forward it along to newly written files, and create VLRs that store processing history information.
Common VLR data include:
Coordinate system
Metadata
Processing history
Indexing
Note
There are VLRs that are defined by the specification, and they
have the VLR user_id
of LASF_Spec
or LASF_Projection
.
LASF_Spec
VLRs provide a description of the data beyond that
available in the header. LASF_Projection
VLRs store the spatial
coordinate system of the data.
For LAS 1.0-1.3, the VLR length could be no larger than 65535 bytes. Version 1.4 introduced extended VLRs, stored at the end of the file, which could be up to 4gb in size.
VLR Example#
You can add your own VLRs to files to store processing information or
whatever you
want by providing a JSON block via writers.las vlrs
option that
defines the user_id
and data
items for the VLR. The data
option
must be base64-encoded string output. The data will be converted to binary
information and stored in the VLR when the file is written.
[
"input.las",
{
"type":"writers.las",
"filename":"output.las",
"vlrs": [ {
"description": "A description under 32 bytes",
"record_id": 42,
"user_id": "hobu",
"data": "dGhpcyBpcyBzb21lIHRleHQ="
},
{
"description": "A description under 32 bytes",
"record_id": 43,
"user_id": "hobu",
"data": "dGhpcyBpcyBzb21lIG1vcmUgdGV4dA=="
}
]
}
]
PDAL Metadata#
The writers.las driver supports an option, pdal_metadata
, that writes
two PDAL
VLRs to LAS files. The first is the equivalent of
info’s
--metadata
output. The second is a copy of the output of the
--pipeline
serialization
option that describes all stages and options of the pipeline that created
the file. These two VLRs may
be useful in tracking down processing history of data, allow you to determine
which versions of PDAL may have written a file and what filter options were set
when it was written, and give you the ability to store metadata and other
information via pipeline user_data
from your own applications.
Metadata Example#
The pipeline used to construct the file and all of its Metadata
can be written into VLRs in ASPRS LAS files under the PDAL
VLR key.
1[
2 {
3 "type" : "readers.las",
4 "filename" : "input.las"
5 },
6 {
7 "type" : "writers.las",
8 "pdal_metadata":"true",
9 "filename" : "output.laz"
10 }
11]
Warning
LAS versions prior to 1.4 only support VLRs of at most 64K of information. It is possible, though improbable, that the metadata or pipeline stored in the VLRs will not fit in that space.