Data Import
Sanson supports importing GeoJSON, CSV, and Shapefile data to create geographic collections. Imports are processed asynchronously with real-time progress tracking.
Supported formats
| Format | Extensions | Notes |
|---|---|---|
| GeoJSON | .geojson, .json | Must be a FeatureCollection |
| GeoJSON (gzip) | .geojson.gz, .gz | Auto-detected by extension or magic bytes |
| CSV | .csv | Requires longitude/latitude columns |
| Shapefile | .zip | ZIP containing .shp + .dbf + .prj + .shx |
Import via the Admin UI
- Navigate to the Import page in the admin UI
- Select a GeoJSON, CSV, or Shapefile (.zip) from your computer
- A preview panel appears alongside the form with:
- Metadata badges (format, feature count, geometry type, SRID)
- Mini-map showing sample features (first 100)
- Sample data table (first 5 rows)
- Form fields auto-filled from the preview (SRID, CSV separator, geo columns)
- Choose the target workspace (defaults to
default) - Adjust the collection name (auto-generated from the filename)
- Adjust the SRID if needed (defaults to 4326 / WGS84, auto-detected for Shapefiles)
- For CSV files, verify the auto-detected separator, longitude column, and latitude column
- Click Import
The UI transitions to a progress view showing:
- Status badge (queued, importing, complete, or failed)
- Progress bar (0-100%)
- Feature count (imported / total)
- Live structured logs
On completion, a View collection button links directly to the collection detail page with map and table views.
Import via the API
GeoJSON:
curl -X POST http://localhost:3000/api/admin/import \
-F "file=@data/regions-1000m.geojson" \
-F "workspace_id=<workspace-uuid>" \
-F "collection_name=regions" \
-F "srid=4326"CSV:
curl -X POST http://localhost:3000/api/admin/import \
-F "file=@data/gares.csv" \
-F "workspace_id=<workspace-uuid>" \
-F "collection_name=gares" \
-F "srid=4326" \
-F "separator=;" \
-F "longitude=X_WGS84" \
-F "latitude=Y_WGS84"The separator, longitude, and latitude fields are optional — they are auto-detected when not provided.
Shapefile:
curl -X POST http://localhost:3000/api/admin/import \
-F "file=@data/AleaRG_2025_86_L93.zip" \
-F "workspace_id=<workspace-uuid>" \
-F "collection_name=alea_argiles" \
-F "srid=4326"For Shapefiles, srid is the target storage SRID. The source SRID is auto-detected from the .prj file and the data is reprojected automatically.
The API returns 202 Accepted:
{
"import_id": "abc123-...",
"status": "pending",
"message": "Import job queued"
}Monitor progress by polling the job status:
curl http://localhost:3000/api/admin/jobs/<import_id>See Admin Endpoints for full API details.
How it works
The import pipeline is fully asynchronous:
- Upload — the API receives the file and saves it to disk (
UPLOAD_DIR). Shapefiles are streamed directly to disk to handle large files efficiently. - Queue — a pg-boss job is created with the import parameters
- Process — the worker picks up the job and processes the file (format-specific, see below)
- Index — a GIST spatial index is created on the geometry column
- Metadata — collection metadata is updated (bounding box, feature count, geometry type)
- Cleanup — the uploaded file is deleted
GeoJSON specifics
Geometry type handling
GeoJSON files can contain mixed geometry types (e.g., both Polygon and MultiPolygon). Sanson handles this by automatically promoting all geometries to their Multi variant:
PointbecomesMultiPointLineStringbecomesMultiLineStringPolygonbecomesMultiPolygon
Every geometry is wrapped with ST_Multi() on insert, ensuring consistency within the PostGIS table.
Processing
The worker parses the GeoJSON, creates a PostGIS table with columns inferred from feature properties, and inserts features in batches of 500 with progress updates after each batch.
CSV specifics
CSV import creates MultiPoint geometry from longitude/latitude columns.
Separator auto-detection
When no separator is specified, Sanson counts occurrences of ;, ,, and tab characters in the header line and picks the most frequent. You can override this with the separator parameter.
Geo column auto-detection
If longitude/latitude column names are not provided, Sanson matches header names against common conventions:
| Longitude | Latitude |
|---|---|
longitude, lon, lng, long | latitude, lat |
x_wgs84 | y_wgs84 |
x, centroid_x | y, centroid_y |
Names are matched by priority: explicit names (longitude, lat...) first, then WGS84-specific (x_wgs84), then generic (x, y) last. Matching is case-insensitive. If no match is found, the import fails with an error asking you to specify columns explicitly.
Projected coordinates
Column names like x_l93/y_l93 (Lambert 93) are not auto-detected because they contain projected coordinates in meters, not WGS84 degrees. To import such data, specify the columns and SRID explicitly.
Column types
Column types are inferred from the first 100 rows:
- If all non-empty values are numeric integers →
INTEGER - If all non-empty values are numeric with decimals →
DOUBLE PRECISION - Otherwise →
TEXT
Skipped rows
Rows with missing or non-numeric longitude/latitude values are silently skipped. The final import summary includes the number of skipped rows.
Shapefile specifics
Shapefile import uses ogr2ogr (GDAL CLI) for direct-to-PostGIS import. This requires ogr2ogr to be installed on the system (gdal-bin on Debian/Ubuntu, gdal on macOS via Homebrew).
How it works
- Inspect —
ogrinfo -json -soreads metadata from the ZIP (feature count, geometry type, source SRID from.prj) - Import —
ogr2ogr -f PostgreSQLbulk-inserts the data viaCOPY, with automatic reprojection to the target SRID - Multi layers — if the ZIP contains multiple Shapefile layers, only the first is imported (with a warning in the logs)
SRID handling
The source SRID is auto-detected from the .prj file. If no SRID can be detected, the target SRID is used as the source SRID (with a warning). All data is reprojected to the target SRID on import.
Geometry promotion
Like GeoJSON, geometries are promoted to their Multi variant (-nlt PROMOTE_TO_MULTI) for consistency.
Re-import
When importing into an existing collection (same workspace + name), the data table is dropped and recreated. This ensures no stale data from previous imports.
File size
The maximum upload size is 1 GB. Shapefile uploads are streamed directly to disk without buffering in memory, so large files (hundreds of MB) are handled efficiently. For GeoJSON and CSV, consider using gzip compression for large files.
Troubleshooting
| Error | Cause |
|---|---|
| Invalid GeoJSON | The file is not valid GeoJSON or is missing the features array |
| Empty FeatureCollection | The GeoJSON file contains no features |
| Workspace not found | The specified workspace_id does not exist |
| Geometry type mismatch | Should not happen with automatic Multi promotion — report as a bug |
| Could not auto-detect longitude/latitude columns | CSV file headers don't match known column names — specify them explicitly |
| CSV must have a header row | CSV file is empty or has only a header with no data rows |
| No layers found in ZIP file | The ZIP doesn't contain a valid Shapefile (.shp) |
| Failed to spawn ogr2ogr | GDAL/ogr2ogr is not installed or not on PATH |
