Platform: DJI Tello + Nooploop LinkTrack UWB + Workstation GPU
Stack: Python 3.8 · ROS Noetic · ZoeDepth · PyTorch · tellopy · PyBullet
Table of Contents
- Introduction
- Related Work and Design Rationale
- System Architecture
- Hardware Setup
- Sensor Preprocessing
- Monocular Depth Estimation
- Obstacle Segmentation and 3D Reconstruction
- UWB Localisation
- Artificial Potential Field Controller
- Software Architecture
- Simulation Environment
- Coordinate Frame Conventions
- Safety Architecture
- Known Limitations and Failure Modes
- Discussion
- References
1. Introduction
Autonomous indoor navigation for small unmanned aerial vehicles (UAVs) remains a challenging open problem. Unlike outdoor flight where GPS provides reliable absolute localisation and obstacle density is low, indoor environments impose a distinct set of constraints: reflective and textureless surfaces degrade vision-based depth estimation, GPS is unavailable, ceiling and wall clutter confounds obstacle detectors, and limited flight volume demands precise localisation and tight control margins.
Commercial micro-UAVs such as the DJI Tello compound these difficulties further. The Tello carries a single forward-facing monocular camera, no rangefinders, and no onboard compute beyond basic flight stabilisation. All perception and control must run on a connected laptop over a Wi-Fi link that is shared with the video stream, introducing non-trivial latency and the possibility of frame loss.
This project develops and evaluates a lightweight perception-control pipeline that addresses these constraints directly. The approach combines:
- Monocular depth estimation (ZoeDepth) to produce per-frame depth maps from the Tello’s camera without stereo hardware,
- UWB absolute localisation (Nooploop LinkTrack) to provide a stable, GPS-independent position estimate,
- Artificial Potential Fields (APF) as the control policy, chosen for its simplicity, real-time computational cost, and natural handling of dynamic obstacle avoidance in continuous space.
The result is an end-to-end system that takes live sensor inputs and produces drone velocity commands at approximately 5 Hz, capable of navigating from a start position to a goal while avoiding detected obstacles within a bounded indoor flight volume.
2. Related Work and Design Rationale
2.1 Monocular Depth for UAV Obstacle Avoidance
Monocular depth estimation from deep neural networks has matured significantly since the seminal work of Eigen et al. (2014). Recent models such as MiDaS (Ranftl et al., 2020), DPT (Ranftl et al., 2021), and ZoeDepth (Bhat et al., 2023) achieve competitive metric depth accuracy on indoor benchmarks without stereo calibration.
For UAV applications, monocular depth has been explored primarily in simulation or with post-processed video. Deploying it in a real-time control loop introduces a fundamental tension: inference latency versus control frequency. On a mid-range GPU, ZoeDepth processes a 960×720 frame in approximately 0.5–2 seconds, making per-frame inference impractical for 30 Hz control. This project resolves the tension by decoupling the depth update rate from the control loop rate: depth estimation runs every 250 frames (~10 s at 24 fps) while the APF controller runs at the UWB callback rate (~5–10 Hz).
2.2 UWB Localisation
Ultra-Wideband (UWB) time-difference-of-arrival (TDoA) and two-way ranging (TWR) systems have emerged as a practical alternative to motion capture for indoor localisation. The Nooploop LinkTrack system achieves sub-decimetre position accuracy at ~100 Hz update rates, well within the requirements of a 5 Hz control loop. Unlike visual-inertial odometry, UWB is robust to lighting changes and motion blur: precisely the conditions that degrade monocular depth estimation.
The combination of UWB (for position) and monocular depth (for obstacle detection) is therefore complementary: UWB provides a stable world-frame anchor while depth provides local obstacle geometry.
2.3 Artificial Potential Fields
APF, introduced by Khatib (1986), represents the environment as a scalar potential function whose negative gradient gives the robot’s desired velocity. The potential has an attractive component toward the goal and repulsive components around obstacles. APF has well-known limitations: most notably local minima and oscillation in narrow corridors: but it is computationally trivial (linear in the number of obstacles), inherently continuous, and easy to extend with boundary walls. For a prototype system with a small, bounded flight space and a single primary goal, these properties are well matched to the task.
3. System Architecture
The system is structured as a reactive control loop driven by two asynchronous event streams:
UWB callback (ROS, ~100 Hz)
└─ update cur_pos
└─ check bounds (emergency land if violated)
└─ run active_task()
└─ follow_path()
└─ [if depth_model_run] compute APF forces
└─ send velocity commands to drone
Video thread (PyAV, ~24 fps)
└─ decode H.264 frame
└─ write to raw AVI file
└─ [every 250 frames] spawn depth thread:
└─ ZoeDepth inference
└─ K-means obstacle segmentation
└─ 3D reconstruction
└─ update obstacle list
└─ save annotated images
The UWB callback is the primary control tick. The video thread runs independently and updates the obstacle map at a lower rate. This separation ensures that video decode errors or slow depth inference do not stall the control loop.
All state: current position, obstacle list, target position, model handles: is held on the TelloDrone object. Thread safety is achieved implicitly: the obstacle list is only written by the depth thread and only read by the control thread, with no locking. This is acceptable given the low write frequency (once per 250 frames) and the tolerance of the APF controller to stale obstacle data.
4. Hardware Setup
4.1 DJI Tello
| Parameter | Value |
|---|---|
| Camera | 5 MP, fixed focus, 82.6° FOV |
| Video output | 960 × 720, H.264, ~24 fps over Wi-Fi |
| SDK | tellopy (unofficial, lower-latency than DJI official SDK) |
| Battery life | ~13 min (reduced under load) |
| Max speed | ~8 m/s (constrained to ~0.3 m/s in this system) |
| Communication | 802.11n Wi-Fi, 2.4 GHz |
The Tello is controlled via the tellopy library, which provides an event-driven interface for flight data (altitude, speed, battery, Wi-Fi signal) and direct velocity commands. Commands take the form drone.forward(speed), drone.right(speed), etc., where speed is an integer in [0, 100] representing a percentage of the maximum speed for that axis.
4.2 Nooploop LinkTrack UWB
The Nooploop LinkTrack system uses two-way ranging between anchors and the tag mounted on the drone. Three or more anchors are required for full 3D localisation; the experimental setup used four anchors placed at known positions in the flight space.
Position data is published to the ROS topic /nlink_linktrack_nodeframe1 as a LinktrackNodeframe1 message. The relevant field is nodes[0].pos_3d, a three-element float array in metres.
| Parameter | Value |
|---|---|
| Ranging method | Two-way ranging (TWR) |
| Reported accuracy | ~10 cm (manufacturer specification) |
| Update rate | ~100 Hz |
| Communication | USB serial to laptop → nlink_parser ROS node |
4.3 Workstation
All perception and control runs on a connected laptop. A discrete GPU is strongly recommended for ZoeDepth inference speed, though CPU-only operation is functional at reduced inference rate.
5. Sensor Preprocessing
5.1 Camera Calibration
The Tello camera is calibrated using a 9×7 chessboard pattern (square size 20 mm). Calibration images are collected from the live drone feed via task/test_video.py and stored in calibrate/. The calibration routine in task/camera_calibrate.py uses OpenCV’s calibrateCamera, producing the camera matrix and distortion coefficients .
where are the focal lengths in pixels and is the principal point. These are saved to calibration_data.npz and loaded at runtime by map_obstacle.py.
5.2 Image Undistortion
Point-level undistortion is applied in undistort_point() using cv2.initUndistortRectifyMap before projecting obstacle centroids to 3D. This corrects for the Tello’s notable barrel distortion at the edges of the 82.6° FOV.
5.3 Video Decoding
The Tello’s H.264 stream is accessed via tellopy’s get_video_stream() method and decoded with PyAV (av.open). Frames are decoded in a dedicated video thread to avoid blocking the control loop. Each decoded frame is written to a raw AVI file (XVID codec, 24 fps) for post-flight analysis.
6. Monocular Depth Estimation
6.1 Model: ZoeDepth
ZoeDepth (Zero-shot transfer of metric depth) is a transformer-based monocular depth estimator that extends MiDaS-style relative depth to produce metric (absolute) depth estimates. The ZoeDepthNK variant is trained jointly on NYU Depth v2 (indoor) and KITTI (outdoor), making it suitable for indoor drone footage without domain-specific fine-tuning.
The model is loaded from a local directory via the HuggingFace Transformers API:
image_processor = ZoeDepthImageProcessor.from_pretrained("model/zoedepth-nyu-kitti")
depth_model = ZoeDepthForDepthEstimation.from_pretrained("model/zoedepth-nyu-kitti")
Model weights are approximately 1.5 GB and are excluded from the repository.
6.2 Inference Pipeline
For each depth cycle:
- The current frame (
ndarray, BGR) is converted to a PIL RGB image. ZoeDepthImageProcessor.preprocessresizes and normalises the image to the model’s expected input format.ZoeDepthForDepthEstimation.forwardis called undertorch.no_grad().post_process_depth_estimationupsamples the output depth map to the original frame resolution.
The result is a float32 tensor of shape representing metric depth in metres per pixel. A normalised uint8 version is also computed for visualisation:
6.3 Depth Update Rate
Depth inference runs every 250 video frames. At 24 fps, this is approximately one depth update every 10.4 seconds. The design trade-off is:
- Too frequent: ZoeDepth inference on CPU can take 2–10 s per frame, blocking the depth thread and starving the video writer.
- Too infrequent: The obstacle map becomes stale as the drone moves.
At the commanded speeds (maximum 30/100 scale ≈ ~1–2 m/s), the drone travels at most 20 m between depth updates, which would be catastrophic in a 7.5 × 5 m space. In practice, the approach speed toward obstacles is slow enough (and APF forces are reactive) that the 10 s update rate is tolerable for the prototype, but is a clear area for improvement (see §15.1).
6.4 Position Anchoring
Each depth snapshot is anchored to the drone’s current world-frame position. Rather than using the instantaneous UWB reading (which may be noisy), a running average over the last 100 position readings is computed:
where is the -th UWB position reading and is the current reading count. This averaged position is used to offset obstacle coordinates from the camera frame to the world frame (§7.4).
7. Obstacle Segmentation and 3D Reconstruction
7.1 Overview
The obstacle detection pipeline in task/tellodrone/map_obstacle.py converts a ZoeDepth absolute depth map into a list of 3D obstacles, each represented as a sphere in world coordinates. The pipeline proceeds in five stages: depth clustering, segment extraction, morphological cleaning, contour detection, and 3D projection.
7.2 Depth Clustering via K-Means
The depth map is flattened to a vector of scalar depth values. -means clustering (OpenCV cv2.kmeans) partitions these values into clusters with centres , sorted by depth value. The output is a label map assigning each pixel to its nearest cluster.
The rationale for depth-space clustering (rather than spatial clustering) is that nearby objects form coherent regions in depth: a wall at 2 m occupies a different cluster from a chair at 0.8 m: making the clusters a natural proxy for distinct surfaces in the scene.
7.3 Near-Cluster Extraction and Filtering
The clusters with the smallest centre values (i.e., the nearest surfaces) are extracted as binary maps:
Two filtering steps then remove scene structure that is not an obstacle:
Row density filter. Rows in which more than 85% of pixels are marked as foreground are zeroed. This removes the floor, ceiling, and back walls, which appear as near-continuous horizontal bands when the drone is flying parallel to a surface.
Formally, row is zeroed if:
Morphological opening and component filtering. An 11×11 rectangular structuring element is used for morphological opening to remove thin noise filaments. Connected components with area below px² are then discarded:
At 960×720 resolution, this minimum area corresponds to roughly a 141×141 px region: approximately an object subtending 15% of the frame width, which at 1 m depth corresponds to a real-world object of ~0.3 m width.
7.4 Contour Detection and Minimum Enclosing Circle
Obstacle regions are detected using cv2.findContours on each cleaned binary segment. Each contour yields:
- Centroid: from image moments
- Pixel radius: from
cv2.minEnclosingCircle
7.5 3D Reconstruction via Camera Intrinsics
Each obstacle centroid is first undistorted using the calibration map:
The depth at the centroid is estimated as the mean over a 5×5 pixel neighbourhood to reduce noise:
The 3D position in the camera frame is then:
The offset on each axis accounts for the approximate displacement between the camera optical centre and the drone’s UWB tag mounting point. This is a fixed calibration offset and not dynamically estimated.
The physical obstacle radius is computed from the pixel radius via the pinhole model:
Finally, the camera-frame position is translated to world frame by adding the drone’s averaged UWB position :
7.6 Obstacle List Maintenance
The obstacle list is maintained as a set of spheres . When new obstacles are detected, they are merged with the existing list using a distance threshold m: if a new obstacle’s centre lies within of an existing obstacle’s centre, the existing obstacle is replaced. This prevents unbounded list growth during repeated depth cycles.
Two validity filters are applied before adding new obstacles:
- Bounds check: must lie within the configured flight envelope.
- Radius sanity: m. Larger radii typically correspond to walls or floors that escaped the row filter.
8. UWB Localisation
8.1 Nooploop LinkTrack Integration
Position data flows from the LinkTrack hardware via USB serial to the nlink_parser ROS node, which publishes LinktrackNodeframe1 messages on /nlink_linktrack_nodeframe1 at approximately 100 Hz. The main.py ROS subscriber extracts nodes[0].pos_3d and passes it directly to task_handler() on the TelloDrone object.
The position is stored as a Vector3D and appended to a line-delimited log file (logs/log-*/log-pos.log) on every callback:
<timestamp> <elapsed_seconds> <x> <y> <z>
8.2 Coordinate Frame
The UWB coordinate frame is determined by anchor placement. In the experimental setup:
| Axis | Direction | Range |
|---|---|---|
| Forward (into room) | m | |
| Lateral (rightward) | m | |
| Vertical (negative = up) | m |
The drone faces in the direction (toward the room entrance), which is why the follow_path controller maps negative -force to drone.forward().
8.3 Position Noise and Averaging
UWB systems exhibit multipath noise, particularly near metal surfaces and in corners. No explicit filtering is applied to the raw position readings; the 100-sample running average used in depth anchoring (§6.4) provides partial noise reduction for that specific use case. The control loop uses the raw, unfiltered current position.
9. Artificial Potential Field Controller
9.1 Potential Function
The total scalar potential field is:
Attractive potential:
Repulsive potential (per obstacle ):
where is the surface-to-surface distance from the drone to obstacle , and m is the influence radius.
Boundary repulsive potential (per axis, per wall):
The same formula is applied to the signed distance from each of the six boundary planes, yielding six additional repulsive components.
9.2 Force Computation
The control force is the negative gradient of the potential:
Attractive force:
This is linear distance attraction (gradient of a quadratic potential).
Repulsive force (per obstacle ):
Boundary repulsive force (min -wall example):
where m is the boundary influence distance and is the unit vector in the direction. Analogous formulas apply to the other five walls.
9.3 Velocity Mapping
The computed force vector is converted to drone velocity commands:
where is the current distance to the goal. Dividing by normalises the command magnitude relative to distance, preventing very large forces at long range from sending maximum-speed commands. Each component is then clamped:
The sign of determines the axis direction (forward/backward, left/right, up/down).
9.4 Close-Range Gain Reduction
When m, the attractive gain is reduced to prevent overshoot:
This linearly scales the attractive coefficient from its nominal value down to 10 as the drone approaches within 1 m of the goal.
9.5 Parameter Summary
| Parameter | Symbol | Value |
|---|---|---|
| Attractive gain | 30 (10 near goal) | |
| Repulsive gain | 10 | |
| Obstacle influence radius | 0.50 m | |
| Boundary influence distance | 0.50 m | |
| Max velocity command | 30 (of 100 scale) | |
| Arrival threshold | 0.30 m | |
| Control loop sleep | 0.20 s |
9.6 Local Minima
APF controllers are susceptible to local minima where but . This occurs most commonly when the goal lies directly behind a large obstacle. The current implementation does not include any escape mechanism; the drone will stall at such a point. Future work should add a stall detector and perturbation strategy (§15).
10. Software Architecture
10.1 TelloDrone Class Design
The TelloDrone class in task/tellodrone/core.py aggregates all drone state and imports methods from separate module files via class-body from ... import ... statements:
class TelloDrone:
from tellodrone.log import setup_logging, save_log_config
from tellodrone.flight_control import flight_data_callback, check_bounds
from tellodrone.video import setup_display, process_image, process_frame, ...
from tellodrone.task import task_handler
from tellodrone.follow_path import set_target_pos, add_obstacle, follow_path
from tellodrone.depth_model import load_depth_model, run_depth_model, estimate_depth
This pattern allows the class to be split across multiple files without inheritance, keeping each file focused on a single concern while sharing self state naturally.
10.2 Threading Model
| Thread | Purpose | Lifecycle |
|---|---|---|
| Main thread | ROS spin (UWB callbacks → control) | Runs until shutdown |
| Video thread | PyAV frame decode + write | Started at startup() |
| Depth thread | ZoeDepth inference (spawned per-cycle) | Short-lived, per inference |
| Image task thread | Manual depth trigger from UI | Short-lived, on button press |
| Display thread | Pygame event loop + video display | Started at setup_display() |
A threading.Event (stop_video_thread_event) coordinates clean shutdown of the video thread. The display thread terminates the process via shutdown() when the Pygame window is closed.
10.3 Logging
Three log artefacts are written per flight:
logs/log-<run>/log-info.log: human-readable INFO/DEBUG/ERROR log with timestampslogs/log-<run>/log-pos.log: line-delimited UWB position readings (timestamp, elapsed seconds, x, y, z)logs/log-<run>/log-config.json: JSON snapshot of takeoff position, start position, end position, target, and full obstacle list at shutdown
Image artefacts:
img/original/<run>/frame-<N>.png: raw camera frame at depth cycleimg/depth/<run>/frame-<N>.png: normalised depth mapimg/annotated/<run>/frame-<N>.png: frame with obstacle bounding circles and 3D coordinates overlaid
11. Simulation Environment
11.1 PyBullet Simulator
task/sim.py and task/tellodrone_sim/core.py implement a physics simulation using PyBullet. The simulation uses the same apf_with_bounds function as the live drone, so controller parameters can be tuned offline before flight.
The environment is populated with:
- Drone model:
duck_vhacd.urdf(PyBullet built-in), scaled ×5 - Target:
soccerball.urdf, fixed, scaled ×0.5 - Obstacles:
sphere2.urdfinstances at randomised positions - Boundary walls: Translucent box visual shapes (no collision) marking the flight envelope
Obstacle positions are generated by generate_obstacles(num, radius) using uniform random sampling within the flight bounds.
11.2 A* Path Planner
The simulation additionally implements a 3D grid-based A* planner (a_star_waypoints) for comparison against APF. The planner discretises the flight volume at a configurable resolution (default 0.5 m) and finds a collision-free waypoint sequence, which the APF controller then follows as local goals.
11.3 Path Planning Modes
The simulator supports three navigation modes (configured by commenting/uncommenting in sim.py):
- Direct APF: Single waypoint = target; APF navigates directly.
- A* + APF: A* generates waypoints; APF follows each in sequence.
- Sine path + APF: Sinusoidal waypoints for testing oscillation response.
12. Coordinate Frame Conventions
Three coordinate frames are relevant:
12.1 Camera Frame
Standard pinhole camera convention: rightward, downward, forward (into scene). Depth is the -component of the 3D point at pixel .
12.2 World Frame (UWB)
Determined by anchor placement. In the experimental setup:
The drone’s hover altitude of approximately to m in reflects the negative-up convention.
12.3 Camera-to-World Transform
The camera-to-world transform is implemented as a fixed offset rather than a full rigid-body transform:
where m is the hardcoded camera-to-tag offset. A full extrinsic calibration (rotation + translation between camera and UWB tag) would improve obstacle localisation accuracy.
12.4 Drone Command Frame
The Tello faces in the direction. The command mapping is:
| Force direction | tellopy command |
|---|---|
drone.forward() | |
drone.backward() | |
drone.right() | |
drone.left() | |
drone.down() | |
drone.up() |
13. Safety Architecture
Safety is implemented as a layered set of shutdown triggers, all of which call TelloDrone.shutdown(error=True), which in turn calls drone.land() before exiting:
| Trigger | Threshold | Source |
|---|---|---|
| Battery critical | Flight data callback | |
| Battery warning | Flight data callback (logs only) | |
| Out of bounds | Outside configured flight envelope | UWB callback, every step |
| Keyboard interrupt | Ctrl+C | signal.SIGINT handler |
| Drone not connected | Connection timeout | Startup check |
| Pygame window closed | : | Display thread exit |
The shutdown() method attempts a clean land regardless of error state:
def shutdown(self, error=False, reason=None):
self.drone.backward(0) # cancel any active velocity command
self.drone.land()
self.stop_video_thread()
pygame.quit()
self.drone.quit()
rospy.signal_shutdown(...)
sys.exit(0)
14. Known Limitations and Failure Modes
14.1 Depth Estimation Latency
A ZoeDepth inference pass on CPU takes approximately 2–10 seconds per frame depending on hardware. During this time, the depth thread blocks the image write pipeline. The obstacle map is not updated, meaning the drone may move significantly between depth cycles. On GPU (e.g. NVIDIA RTX 3060), inference takes approximately 0.3–0.8 s, reducing this window substantially.
Consequence: The drone may navigate through a region before its obstacles are detected.
Mitigation (partial): The 250-frame interval ensures the depth thread finishes before the next cycle begins. The APF controller is conservative (max speed 30/100) to limit exposure during stale periods.
14.2 K-Means Non-Determinism
cv2.kmeans with KMEANS_RANDOM_CENTERS is non-deterministic. On frames with similar depth distributions (e.g. nearly empty corridor), cluster boundaries may shift significantly between runs, causing obstacle detections to appear and disappear erratically.
Consequence: Obstacle list instability between depth cycles.
Mitigation (partial): The merge threshold of 0.5 m and the minimum area filter of 20,000 px² reduce spurious detections, but do not eliminate them.
14.3 UWB Multipath and Dropout
In environments with metal surfaces or dynamic occlusion of UWB anchors, position jumps of 0.5–1 m can occur. The current system applies no filtering to the raw UWB stream for control purposes.
Consequence: A position jump can trigger a false out-of-bounds detection and an immediate emergency landing.
Mitigation: The flight bounds are set conservatively (0.5 m inset from physical walls), but this does not prevent brief out-of-bounds readings.
14.4 APF Local Minima
As noted in §9.6, the APF controller has no escape mechanism. In environments with obstacles between the start and goal positions, the drone may become trapped at a saddle point in the potential field.
Consequence: The drone hovers motionless until the battery depletes, or a user intervention is needed.
14.5 Fixed Camera-to-Tag Offset
The m camera-to-tag offset in compute_3d() is hardcoded and does not account for drone attitude (pitch/roll), which changes during flight. Obstacle positions will be incorrectly anchored when the drone is not level.
14.6 No Temporal Fusion of Depth
Each ZoeDepth output is processed independently. There is no tracking of obstacles across frames: an obstacle detected in frame 250 is simply merged (by proximity) with the list from the prior cycle. This means:
- Moving obstacles are not tracked (only their last detected position is stored).
- A stationary obstacle that the drone has passed will remain in the list until its position falls outside the flight bounds.
15. Discussion
15.1 Inference Speed vs. Update Rate
The central design tension in this system is that ZoeDepth inference is slow relative to the UAV’s control bandwidth. The gap between obstacle map update rate (~0.1 Hz) and control rate (~5 Hz) means the drone is operating primarily on a frozen obstacle map. For low-speed navigation in structured environments (slow-moving drone, static obstacles), this is acceptable. For higher-speed or dynamic scenarios, it is not.
Lighter models (MiDaS small, Depth Anything small) achieve inference rates of 10–30 Hz on a mid-range GPU, which would enable near-per-frame depth updates. The trade-off is reduced metric accuracy on indoor scenes.
15.2 UWB vs. Visual Odometry
UWB provides accurate absolute position but requires physical infrastructure (anchor installation and surveying). Visual-inertial odometry (VIO) can achieve similar accuracy in textured environments without fixed infrastructure, at the cost of drift over time. A hybrid approach: VIO for relative motion, UWB for periodic drift correction: would be more deployable across environments.
15.3 APF vs. Sampling-Based Planning
APF is computationally trivial and well-suited to real-time reactive control. Its failure cases (local minima, oscillation in narrow passages) are well-understood. Sampling-based planners (RRT*, informed RRT*) are more complete but require a more accurate environment model and are computationally heavier. The A* planner in the simulation is an intermediate option when a grid-resolution environment map is available.
15.4 Prototype Status
This system is explicitly a research prototype. It demonstrates the viability of the perception-control architecture and provides a foundation for more capable systems, but several components would need to be hardened before deployment outside a controlled lab:
- Extrinsic calibration between camera and UWB tag
- Position filtering (Kalman filter or moving average on raw UWB)
- Temporal depth fusion (EKF on obstacle positions)
- APF local minima escape
- Real-time depth model (GPU or lighter model)
- Formal safety certification for any human-occupied space
16. References
- Bhat, S. F., Birkl, R., Wofk, D., Wonka, P., & Müller, M. (2023). ZoeDepth: Zero-shot transfer by combining relative and metric depth. arXiv:2302.12288.
- Eigen, D., Puhrsch, C., & Fergus, R. (2014). Depth map prediction from a single image using a multi-scale deep network. NeurIPS 2014.
- Khatib, O. (1986). Real-time obstacle avoidance for manipulators and mobile robots. The International Journal of Robotics Research, 5(1), 90–98.
- Ranftl, R., Bochkovskiy, A., & Koltun, V. (2021). Vision Transformers for Dense Prediction. ICCV 2021. (DPT/MiDaS v3)
- Ranftl, R., Lasinger, K., Hafner, D., Schindler, K., & Koltun, V. (2020). Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-Shot Cross-Dataset Transfer. TPAMI.
- Nooploop. (2024). LinkTrack User Manual. https://www.nooploop.com/en/linktrack/
- DJI. (2020). Tello SDK 2.0 User Guide.
- Quigley, M., et al. (2009). ROS: an open-source Robot Operating System. ICRA Workshop on Open Source Software.