Multimodal interface for UAV control


Mentor: TBD

Overview:
Note that this project has many different components. Teams may choose to address any of the different problems/needs, as outlined below.

The Navy wishes to use unmanned helicopters to replace road convoys and potentially manned helicopters performing the same mission. Today, the initial supply request is typically made by Marines by radio, but could also conceivably be made by data communication over the Global Information Grid. Either way, the request is interpreted by a human. Marines load the supplies onto the convoy trucks (or helicopters), which must then navigate to the Marines, avoiding threats and obstacles along the way, and then drop off the supplies and / or pick up casualties and return to base. We assume that humans will still receive the request for the requested supplies, pick and pack them, and manually load the supplies onto the Cargo UAS. We assume that all information about drop off location, etc. is directly passed through to the Cargo UAS, which then interprets the information and automatically creates its own flight plan (which may be reviewed and modified by a human mission planner). However, relaxing this assumption to allow initial human-based mission planning does not change the interface requirements.

This user interaction approach built to direct and interact with UAS system must be scalable on several dimensions. It must be usable by low and high skill operators, without requiring extensive training or considerable expertise. It must be effective when the user gives little to intermittent to dedicated attention to guiding the UAS. The interaction methodology used should give an easy visualization of the information from the UAS to the operator. It must be able to rapidly convey small to large amounts of relevant information. It must be usable in both benign and harsh environmental conditions. It must be implementable on a wide range of hardware, from a small mobile device to a laptop or workstation.

The interface designed for the system should be easy to understand and should converge the user’s understanding of the system with the actual working of the system. When an operator’s mental model converges with the working of the system, the system can be easily understood. This implies the operator expectation and the behavior of the system would be the same. The interface should also build an appropriate level of trust in the operator about the system, without undue distrust or automation bias. The interface should convey the situational awareness of the UAS.

Additional related requirements can also be identified. Some are dependent on the capabilities of the on board autonomy, which are a major factor driving the scalability of the user interaction approach:

1. Minimize the amount of input required from the Marines

For Example, Marines can indicate where hostile forces and their weapon types are located, but should not need to tell the UAS how much distance to keep from their location. The UAS has an on board knowledge base of weapon types and their effective ranges. If the Marines ask the UAS to not disclose their location to enemies in a particular location, the Marines don’t have to tell it exactly how to do so, e.g. stay out of their visual line of sight, out of earshot and potentially below radar detection altitude. The UAS can deduce a route satisfying those constraints based upon flight path, terrain topology and line of sight, current visibility conditions, an on board model of its own noise signature, etc.

2. Be efficient in capturing and transmitting information with minimal input actions

For example, a simple sketch of an approach path drawn in a few seconds on a map or video feed might save minutes of verbal instructions.

3. Permit a loosely coupled, asynchronous type of communication between the UAS and the Marines

Though the UAS system is designed to be highly autonomous and intelligent, there could arise certain ambiguous states where human expertise can determine a correct way forward. Therefore, a bi-directional information channel is required to have a high success rate in performing missions. In combat or disaster situations, the recipient of the information may be occupied with life-critical activities; thus, the information channel must be asynchronous and allow the recipient to review information after the initial broadcast in case he or she needs to look at it again. It should also provide an alerting capability that doesn’t require the recipient’s prior attention in order to perceive the alert. Whenever possible information will be displayed in a method that requires minimal cognitive overload [Leo 1999]. Except in critical operations, the UAS should not have to wait for the answer to a question in order to proceed. It should allow the UAS to unobtrusively give its status and rationale for major decisions which can be reviewed by Marines when they wish (e.g. perhaps a data log or sketch of the UAS path). This promotes trust and common situation understanding between the UAS and Marine, e.g. a text log entry: “ETA delayed 10 minutes, deviated course due to hostile fire en route”, possibly accompanied by an image or map marking the location of the UAS as well as where the fire originated.

4. Allow UAS to request more information and alert Marines when it doesn’t know how to proceed

This helps to solve the autonomy limitation problem, it engenders trust because the UAS is being “transparent” about its limitations and it also limits automation bias since it implicitly tells Marines when not to trust it. All requests for more information may be accompanied by a summary of current information, highlighting the issue at hand. For example, if the UAS notices fire up ahead, a text message accompanied by an image marking the dangerous area along with a sketch of the current planned route may be sent to the Marine, requesting the Marine to sketch another route path for the UAS.

5. Structure questions to facilitate the communication of needed information

For example, asking Marines if a suitable landing zone is available presumes knowledge about helicopter flight operations that they do not have, such as knowing the weather and landing requirements of a helicopter. But they can easily answer a question requesting if the landing zone is clear of obstructions, such as poles and wires, within a 100 ft. x 100 ft. area, and asking the Marine to sketch the location on a map or image for reference.

----

Given the projected high level of autonomy for the UAS, it is reasonable to assume that it will be able to initiate interaction with the Marines in order to accomplish its mission, request clarification of previous communications, request additional information if not enough was provided to compute a flight route, progressively request updated information as it is en route, etc. It will be useful to also be able to analyze conflicting instructions or instructions that might put the UAS at risk and seek clarification from the Marines. It might also need to ignore certain Marine instructions if they appear to be erroneous. Also, Marines are used to interacting with convoys and helicopter pilots via (voice) dialogue. And finally, Marines cannot devote attention to the Cargo UAS because they are also busy doing other things. Therefore, although it is highly dependent on the architecture and design of the core autonomy of the Cargo UAS, which is outside of the scope of this proposal, these factors point to a dialogue framework where the UAS and the human “converse” with each other (through whatever mode) instead of a more traditional supervisory control framework (see [Fong 2001, 2006]). This places the Cargo UAS in more of a peer-to-peer teaming relationship rather than in a master-slave relationship. Nonetheless, we will conduct our research and design so that it can fit in whatever control architecture is necessary for optimum mission success. We will measure the effectiveness of our dialogue framework through a series of metrics measuring how well the commands and information transmissions were understood.

----

Possible modes of interaction includes, text, voice, sketch, and gesture.

Examples of different modes for different tasks:

mission specification
- supplies requested, # of casualties to be evacuated, location of casualties, location of enemy and other location information

Sketch
Human Input: marking of locations of casualties, enemy forces, possible landing locations, and other information on a map or video displayed on a phone or Tablet PC, stylus input of other information entered through stylus friendly methods such as pie menus or Cross-Y type inputs
UAS Output: The UAS will mark its current location on the shared map and display other information about the UAS, such as a picture of the UAS itself despicting the aircraft size, Cargo size, and Cargo input area, which can be conveyed much more quickly through image than through text

Speech
Human Input: Marine can dictate supply request including a verbal specification of the MGRS location and/or distance and direction (azimuth) information of the casualties and/or enemy forces.
UAS Output: UAS (or human loading supplies) can read back list of supplies

Text
Human Input: keyboard version of current radio practice for situations when speech and/or gesture is not practical.
UAS Output: The dimensions and size of the aircraft and cargo area are best displayed using text.

Gesture
Human Input: Marine can point to location of casualties or enemies with their gesture (such as an Android phone) device to obtain the direction (azimuth) of the casualties. This would work well in combination with speech used to express the distance.
UAS Output: The UAS could provide continuous feedback in terms of the direction and distance of the UAS through vibration which is present when the gesture device is pointing in the direction of the UAS, with the intensity of the vibration varying according to the distance of the UAS.

navigation planning
user communicates to UAS suggested routes, approach paths, areas to avoid, LZ location

Sketch
Human Input: A Marine can sketch a suggested path, marking items to avoid, on a shared digital map or video feed image.
UAS Output: The UAS can display its planned path, as well as it range of allowable variability on the map. By not only displaying the ideal path, but also showing the UAS’s allowable variability the Marine can better understand how the UAS will react if there is a change in information about enemy forces or needs to adjust its path for some other reason..

Speech
Human Input: Speech would be powerful to accompany sketched input. A possible dialogue accompanying sketched actions could be “Casualties to be retrieved from here. Enemy forces located here and here. Thus an advisable route could be this path.”
UAS Output: When calculating the ideal path, the UAS may ask for additional information, such as, “Your drawn path suggests going to the east of the enemy forces, but the red path to the left of your path travels the same distance to the west of the enemy forces, but is expected to be 20 minutes faster. Please advise.”

Text
Human Input: Text input could describe the MGRS locations of various enemy forces as well as routing locations that the UAS is suggested to fly through.
UAS Output: The UAS can add textual descriptions to the drawn map.

Gesture
Human Input: The Marine could use hand signals or gestures to convey the avoidance information and to draw a path. The drawn path could be used in conjunction with an augmented or virtual reality device, which could be using some form of goggles or augmented or virtual reality embedded in Android Phone view. Gesture interaction would be advantageous in a situation where the only output device is a small Android phone that is difficult to draw on. Gesture could be used to expand the input space.
UAS Output: The UAS can provide vibration feedback according to the understanding of the UAS as per the location of enemy forces and its own location, with the vibration feedback differing for different types of information.

aviate
user gives commands to UAS, e.g. “hover”, “move 10 ft. to your right”, “land”, “stop engine”

Sketch

Human Input: Marine can alter the path of flight at any time, mark new areas to avoid, mark landing areas, and by rotating the map by 90 degrees, the Marine can even sketch a desired flight height level.
UAS Output: UAS flight path and height displayed in real time on Marine’s digital map. As UAS detects new threats, etc., can update digital map.

Speech
Human Input: Voice could be used for as status requests, e.g. “What’s your ETA?”, short commands and the user can keep eyes on the UAS or other more pressing events around him or her.
UAS Output: UAS can easily confirm verbal instructions via speech, e.g. “moving 100 meters. to the west because because of detected fire”

Text
Human Input: Text could be used to specify similar information as speech.
UAS Output: A continuous log of past actions and reasons for those decisions to keep the Marine aware of the UAS’s situation.

Gesture
Human Input: The UAS flight path could be operated directly through carrier deck gestures similar to those by commercial landing crews, or through controller gestures using the Android as a joystick type control device to control the path of the plane. By using accelerometers or gloves to control the plane, the Marine can stay out of site and still directly manipulate the path of the plane.
UAS Output: When using gesture to fly the plane, vibration can be used when the operator tries to perform an action that does not conform with the information of the UAS (e.g., when the UAS senses that the operator is perhaps flying the UAS into a mountain).

payload control
orient EO camera during en-route flight

Sketch
Human Input: Marine could circle locations on map of high interest where Marines want camera to point while passing specified location. Marine could draw a path of where the camera should point at during the flight (note that this may be different from where the UAS should fly due to flight restrictions and enemy forces. Additionally, the Marine could draw on the video marking items of interest. The sketch marks will have gps locations attached to them. These sketches , as well as items from the video itself (such as circled parts of the video, can be overlayed on top of any location aware map or video.
UAS Output: The UAS can overlay previous sketch or video information over the current camera (either the pilot directional or reconnaissance camera) view.

Speech
Human Input: Marines can use speech in combination with sketch to specify camera movement (e.g., “Move camera to focus here”)
UAS Output:The UAS can speak status updates about camera, such as “Area of Interest Named Enemy-A now out of field of view”

Text
Human Input: Marines can use text to name areas of interest.
UAS Output: The UAS can print a log of status updates similar to that of speech.

Gesture
Human Input: The gesture device itself can operate as a camera and controller. An Android type device with an accelerometer can be held in the hands of a Marine and tilted to adjust the camera pointing location while the Marine looks at the images on the Android screen.
UAS Output: The UAS can proide vibration feedback when the operator tries to tilt the camera further than possible.

health monitoring
UAS running low on fuel or damaged en route and has limited functionality

Sketch
Human Input: The Marine can specify alternative landing locations in times of emergency.
UAS Output: UAS can alert user via on screen alerts. In cases when the UAS is damaged due to enemy forces , the UAS can use the map to provide information about possible enemy locations on the map.

Speech
Human Input: User can verbally query UAS status.
UAS Output: UAS can alert user verbally.

Text
Human Input: Similar to speech
UA Output: Similar to speech.

Gesture
Human Input: UAS could communicate to with maneuvers, e.g. rolling left/right for distress, pitching up/down for “yes”, yawing left/right for “no”. Useful if UAS’ radios fail.
UAS Output: UAS could communicate answers to Marine through vibration patterns. This is useful when silence is crucial.

The team proposal should address:
  • Which of the above primary issues/needs are you seeking to address?
  • What type of multimodal interface will you use for UAV control? Justify your selection.

Note that this could be broken down into multiple projects.

Note that this could possibly lead to a funded REU position.