March 17th, 2025

AI represents a powerful accessibility tool for both virtual and physical environments, but there are many questions on how to best use it. Join us for an engaging discussion on the role of AI in enhancing accessibility. First, our featured speakers Yuhang Zhao, Andy Slater, and Thomas Logan will showcase a few AI-powered systems and their applications in real-world and virtual environments; then you, our community, will have the chance to brainstorm new uses and best practices for the future.

Open Questions:

  • How can we prepare AIs to act as guides in physical spaces?
  • What are the best form factors for AI guides?
  • How do we address privacy concerns with AI systems?
  • How can we effectively communicate an AI guide’s capabilities?
  • What role should AI agents play in moderating and ensuring safety in virtual spaces?

If you require accessibility accommodations such as American Sign Language interpretation, please email info@xraccess.org no fewer than 72 hours before the event.

Event Details

Date: Monday, March 17th

Time: 12:30 – 2pm PT / 3:30 – 5pm ET

Location: Zoom

About the Speakers

Headshot of Yuhang Zhao, an Asian woman with medium black and brown hair, wearing thin gold octagonal glasses and a diamond-shaped earring.

Yuhang Zhao

Assistant Professor, University of Wisconsin-Madison

I am an Assistant Professor in Computer Sciences at the University of Wisconsin-Madison. My research interests include Human-Computer Interaction (HCI), accessibility, augmented and virtual reality (AR/VR), and AI-powered interactive systems. I design and build intelligent interactive systems to enhance human abilities. Via my research, I seek to understand the challenges and needs of people with diverse abilities, and design systems and interaction techniques to empower them in real life activities as well as the emerging virtual worlds.

Andy, a middle aged white man smiles as wide as he can , shutting his eyes as if someone off camera is trying very hard to make him laugh. , . He has a full beard that is mostly gray with streaks and spots of red. . Hiss hair is short and reddish brown with a forehead a mile wide. . He's wearing a teal cardigan over a comprehensive patterned shirt. According to AI, the background is a soft, solid green color, enhancing the overall cheerful and pleasant vibe of the photo.

Andy Slater

Artist and researcher, Virtual Access Lab

Andy Slater is a blind media artist and educator. He holds a Masters in Sound Arts and Industries from Northwestern University and BFA from the School of the Art Institute of Chicago. andy is access lead and sound designer at Fair Worlds And a 2022-23 Leonardo Crip Tech fellow.

Headshot of Thomas Logan, a white man with stubble and a short mustache wearing a grey suit jacket over a buttoned blue shirt.

Thomas Logan

Owner, Equal Entry

For 20 years, Thomas Logan has helped organizations create technology solutions that work for people with disabilities. Over his career, he has worked with startups to Fortune 500s including Microsoft, Google, Meta, Bank of America, New York Times, Princeton University, and Disney. He has also done projects for many federal, state, and local government agencies in the U.S.

Event Notes & Chatlog

Yuhang Zhao

Summary: Real-world scene-aware AR aims to support BLV users in cooking by identifying tools and providing visual cues, but faces challenges with safety and varied tools. In virtual worlds, AI solutions address VR accessibility issues, as traditional methods rely on developer effort, which is often not prioritized. VRSight uses vision and language models to provide post-hoc accessibility to any VR app, including affective scene overviews, scene sweeps, aim assist, and guardian alerts, demonstrated in a pilot evaluation with BLV users.

  • Yuhang Zhao’s project homepage: https://www.yuhangz.com/projects/
  • Real World: Scene-aware AR for cooking to support BLV users
    • Important for independence; can be risky for BLV people
    • Identified unique challenges:
      • Use different cooking tools, e.g. knives, hot pans
    • CookAR (Lee et al UIST ‘24)
      • Wearable passthrough AR system
      • Recognizes tool affordances (interactive components), e.g. knife blade vs handle
      • Render visual cues to distinguish affordances, e.g. overlay handles in green
  • Virtual world – AI solution to VR accessibility
    • Prior VR a11y techniques rely on developer efforts; however, a11y is not a priority in VR industry
    • Interviewed with 24 XR developers; found that accessibility integration is not a priority
      • Effort, cost, small population, immature industry
    • VRSight
      • Recognize visual info on VR screen via vision foundation models & LLMs
      • Provide spatial audio and effective descriptions for immersion
      • Tones map to atmosphere
      • Apply to any VR app post-hoc without developer efforts
      • Features:
        • Affective scene overview
          • LLM based (GPT 4o) scene description
          • Tone detection – neutral, cheerful, fearful, urgent
          • Audio based overview with corresponding emotion via Azure Neural TTS
        • Scene sweep
          • Fine-tuned Yolov8 for virtual object detection
          • VR Object Dataset
            • Focus on social VR
            • 15 Social VR apps
            • 16905 images
            • 33 object classes in 6 categories, e.g. avatars, seating areas, controls
          • Spatial audio via depth – DepthAnythingV2 model
        • Aim assist
          • Read aloud visual elements pointed by a laser or near virtual hands
          • Laser detection: Through line extraction
          • Virtual hand detection: Yolov8
        • Guardian alert
          • Guardian boundary detection
          • Spatial audio alert
      • Pilot evaluation – 3 BLV users
      • Could help address key challenges in Social VR

Andy Slater

Summary: Andy, working with FairWorlds, focuses on blind navigation and autonomy in museums and art spaces but faces VR access barriers. Museum access, especially in art museums, poses challenges for blind individuals due to the ocular-centric nature of displays. Image descriptions are often objective and lack creativity; subjective, artist- or curator-led descriptions would be preferable, alongside customizable and personalized experiences. Integrating LLMs into apps to provide context-aware and efficient descriptions, avoiding juggling multiple tools, is a goal. An AI assistant could also aid in games, adapting to user needs for quick or detailed information.

  • Works with group FairWorlds
  • Blind navigation & autonomy in museum & art spaces
  • Hitting brick walls trying to get access to VR spaces
  • Museum access
    • Specifically for art museums
    • Blind folks often don’t feel welcome in art museums & art galleries – places that are very ocular-centric, focused on visual art
    • Image description is really important for art; often these are boring & objective, coming from an art historian or inexperienced describer
    • Nice to have subjective, creative descriptions alongside the objective
    • How can we have opportunities where you can hear descriptions from the curator or artist themselves? Something that scanning a QR code or bumbling through the gallery doesn’t offer
    • Being able to hear different descriptions, customize it to your preference, walk around a space, get details
    • Storytelling opportunities in different spaces? E.g. walking tours like SpaceTime Adventure Tour
    • Thinking about implementing LLMs, ideally embedding them into apps so you don’t have to go to external things like ChatGPT or PiccyBot
    • Often feel like juggling between one thing and the next – close an app to open another, download a video to upload to PiccyBot (video description tool)
      • Can customize it to sound like a slang slinging Gen Z that makes you feel like an old man, or something more neutral
      • Hard to download, upload, listen, switch back on the fly
      • One of those exhausting things we have to deal with as blind people
  • Could be great to have an AI assistant for e.g. games as well
    • Could get accustomed to user, context; e.g. when to get something quick and to the point, or more detailed

Thomas Logan

Summary: A discussion about AI’s role in accessibility, emphasizing the importance of human involvement. It covers topics like the necessity of human oversight in AI translations and transcriptions, the limitations of ASR versus CART, the need for accurate captions and translations, and tools like Polly for real-time translation. The key takeaway is the pursuit of 100% accuracy and true inclusion by combining AI efficiency with human refinement.

  • Why Humans Always Matter with AI
  • Captions success criteria from W3C: Captions are provided for all live audio content in realtime video
  • “CART is dead. Long live ASR.” -Kate Kalcevich
    • CART = Communication Access Realtime Translation
    • ASR = Automatic Speech Recognition
  • AI also useful for realtime translation
    • E.g. Altspace let you adjust your spoken language, caption language
  • How can we use AI but still have a human be involved to clean up the AI or improve the experience?
    • Hire a translator
      • Logistics are challenging
      • Have to speak, wait for translator to speak again
  • True inclusion is making it possible for everyone to participate and understand
  • In speech to text transcription, we’re always aiming for 100% accuracy
    • Japanese captions got 90% accuracy; but accuracy for someone reading captions in English  was very low
      • I.e. Translating from Japaenese to English was poor
    • Put a human in the loop to edit the translation
    • E.g. Thomas Logan presenting in English, Mirabai Knight transcribing, translator editing
  • Discussion: How do we insert humans to always be a part of overriding or improving decisions made by AI to have a human touch?
  • Created Polly

Discussion

Summary: The discussion revolves around AI and accessibility, specifically focusing on VR, AI tools, privacy concerns, and accommodating users with various disabilities. Participants discussed existing AI tools like Piccybot, Seeing AI, Envision Companion, and ChatGPT for audio descriptions and note-taking. Concerns were raised about the affordability of VR, privacy issues with data being used by AI, and the need for transparency from companies. There was also a focus on accessibility for people with visual, cognitive, and multiple disabilities, emphasizing the importance of customization, adaptability, and ongoing assessment of user needs. Legal aspects and grant funding challenges related to accessibility were also mentioned.

  • Kevin Cao: Is there a website to access these resources? Do you need a VR headset? Not everyone can afford VR equipment
    • Yuhang: prototypes are not ready for public use, but you can see ongoing research projects at her website. Will open source on website as well
    • Dylan: will add to XR Access resources
  • Jesse Anderson: Have used Piccybot, See My Eyes, Seeing AI, Envision Companion
    • Work with Envision Glasses as well as app on your phone
    • Envision glasses are using old Google Glass style – bulky, uncomfortable
    • Some apps come with onboarding questions – what would you like me to do? How would you like for me to behave?
      • Can customize different “allies” for different needs, e.g. describing something in a museum vs just reading something
      • Doesn’t have to all involve the camera; works as a live conversational chatbot
      • Assistive Technology/AI Spotlight – Ally
      • Can use it to ask facts – “what was that move I watched 20 years ago with XYZ?” Generate trivia, study for tests, etc.
      • Could tell one Ally the songs he likes, ask for recommendations for drumlines to practice
    • Want to get to live camera model, instead of just a snapshot
      • “Let me know when the gray Ford arrives, that’s my Uber”
      • “I’m going to pan my camera around, tell me when you see my keys”
      • “Let me know when my health gets below 50%”
  • Andy: ChatGPT – voice mode with live camera
    • Had it give play by play audio descriptions of Mary Poppins and it worked well; tried it with Uncut Gems and it didn’t
    • Introduced my dog when he ran in; it remembered him 10 minutes later
    • Jesse: Won’t remember in between sessions; can put info in bio to help it remember longstanding information
    • Andy: was using ChatGPT for note taking, brainstorming
      • It called something “lazy and ableist” – it was taking after him
  • Dylan: Privacy – how can we be sure that our info isn’t being used to feed the AI or even expose private details to owners directly?
    • Andy: ChatGPT wouldn’t say what a check was for or who it was made out to; wouldn’t give information about the cover of a book because of IP/copyright laws
      • Need to balance access and privacy
      • Don’t trust Meta reading his mail
      • Would like more transparency beyond regular privacy statements
    • Yuhang: Always a trade-off between functionality & privacy
      • Application level may limit access to sensors, e.g. front camera
    • Dylan: there’s research on applications that will e.g. prevent background face recognition
    • Thomas: It’s an old problem. Used to e.g. use a plugin that would generate false search history
      • Hard to trust any company at this point
    • Jesse: Try to do as much as I can for privacy-related things, but if you’re not paying for the product, you are the product
      • Resigned myself to giving up at least some information to get access to services
    • Andy: Grocery stores know what you’re buying, where you are – instacart/rewards cards are reporting it
      • That’s a place where I’m willing to make concessions – beats wandering around a grocery store when you’re blind
    • Dylan: If Black Mirror has taught us anything…
    • Jesse: Envision posted a video a week ago showing a woman being guided around a store
  • Terry Fullerton: From New Zealand, working with a PhD student at University of Canterbury to help people like me with macular degeneration
    • Subtitles are the worst for me – can’t read them, or text on e.g. a phone screen in the movie
    • Interested in cooking – hard to deal with digital stoves
    • Afraid of e.g. hitting the wrong button on registering for a flight
  • Larry Goldberg: NY bar committee looking into effect of AI on people with disabilities
    • Group of lawyers that have proposed successful legislation 
    • Will be working on tangible actions to promote benefits, avoid hazards in AI & Accessibility
  • Karim Merchant: A lot of people have multiple disabilities – challenges for this tech for e.g. users with cognitive & visual disabilities?
    • DF: There are groups like the Equitable Learning Technology Lab at University of Florida focusing on neurodiverse accessibility for XR: https://education.ufl.edu/eltl/
    • Andy: Definitely gets harder – need to incorporate multiple perspectives
    • Yuhang: Also looking at supporting ADHD users by e.g. letting them customize video content by removing distractions, zooming in on key content, etc.
      • Customization is really important, but can’t be too complex
    • Thomas: Getting people to make changes to tech, defining what people need to do
    • Andy: Can’t even use a lot of key words in grant proposals / public-facing content
      • Be careful of AIs programmed to recognize terms like accessibility
      • Friends have lost state department grants because they’re focused on accessibility & equity
  • Paul Jackson: PhD in Computer Engineering, research at Boeing in AR, diagnosed with multiple sclerosis in 93; been working with MTX Reality to create an application for those involved with MS
    • Studio, 3D games to improve mobility
    • MIT had discussion called Able 20 years ago; interviewed disabled people, creators, etc.
  • Akila Gamage: from New Zealand, work with Terry; regarding customizability, doing PhD research and trying to identify requirements of BLV users
    • Vision loss can get progressively worse – a device that works now may not in 5 years
    • Followed work from Professor Yuhang
    • Assess continuously the visual conditions of users as time goes by – provide more adaptable solutions
    • Subtitles difficult – way to get audio descriptions or subtitles translations?

Event Chat

Andy Vogel  to  Everyone 12:29 PM

Lack of ease of use

Kevin Cao  to  Everyone 12:32 PM

NYC

Thomas Logan  to  Everyone 12:32 PM

NYC x 2

Ricardo Gonzalez  to  Everyone 12:32 PM

NYC, Phd Candidate at Cornell Tech 🙂

Peirce Clark  to  Everyone 12:32 PM

DC!

Karim Merchant  to  Everyone 12:32 PM

Los Angeles! 🌴

Andy Vogel  to  Everyone 12:32 PM

Instructional Designer from The Ohio State University

Allison Cassels  to  Everyone 12:32 PM

Richmond, VA

Jonathan Martin  to  Everyone 12:32 PM

Los Angeles!

Kristin Zobel  to  Everyone 12:32 PM

Chicago

Benjamin Hagen  to  Everyone 12:33 PM

KY!

Maddy Boecker  to  Everyone 12:33 PM

Joining from Fargo, North Dakota from CareerViewXR!

Hank De Leo  to  Everyone 12:33 PM

Hank De Leo, Ithaca

Yuhang Zhao  to  Everyone 12:33 PM

Madison

Suzanne David  to  Everyone 12:33 PM

Los Angeles area

Sandra Oshiro  to  Everyone 12:33 PM

Honolulu PhD student at University of Hawaii

Akila Gamage  to  Everyone 12:33 PM

Christchurch, New Zealand

Jesse Anderson  to  Everyone 12:33 PM

Jesse from MN and IllegallySighted onYoutube

Matthew Macomber  to  Everyone 12:33 PM

Urbana, IL

Larry Goldberg  to  Everyone 12:33 PM

NYC!

Perry Voulgaris  to  Everyone 12:35 PM

Toronto

Kevin Cao  to  Everyone 12:35 PM

https://rsgames.org

Kevin Cao  to  Everyone 12:36 PM

https://www.qcsalon.net/en/

Jesse Anderson  to  Everyone 12:38 PM

https://youtu.be/nLzNJH_Xw8c?si=jGffafAru8KQ-TJb

You  to  Everyone 12:39 PM

Hi all. Please mute your microphones for now, thanks!

Jesse Anderson  to  Everyone 12:58 PM

I so have a few thoughts and ideas about this.

Karim Merchant  to  Everyone 12:59 PM

This sounds cool! Writing audio descriptions of art sounds similar to translating poems in other languages – striving to match the tone and pacing of the original such as specific word choices or expressions.

You  to  Everyone 1:02 PM

Sighted Guides for VR, both human and AI, is definitely something we’re working on at XR Access. https://xraccess.org/sighted-guides-to-enhance-accessibility-for-blind-and-low-vision-people-in-vr/

Suzanne David  to  Everyone 1:03 PM

Excellent information

You  to  Everyone 1:07 PM

“CART is dead. Long live ASR.”

https://www.linkedin.com/pulse/cart-dead-long-live-asr-kate-kalcevich-vfo9c/?trackingId=j%2Bh3yqnzRH6cpvguA9gXxA%3D%3D

You  to  Everyone 1:08 PM

Rest in peace, Altspace. 😭

Andy Vogel  to  Everyone 1:12 PM

Human in the loop 🙂

Jesse Anderson  to  Everyone 1:14 PM

This would also be useful for blind users with tts. Prior to a session, a human could enter key subject matter terms and abbreviations used to give the AI a better shot at being accurate.

You  to  Everyone 1:15 PM

Polly: https://equalentry.com/polly-meeting-plugin-realtime-translation/

Thomas Logan  to  Everyone 1:16 PM

Links:

Thomas Logan  to  Everyone 1:16 PM

https://www.w3.org/WAI/WCAG21/Understanding/captions-live.html  https://www.linkedin.com/pulse/cart-dead-long-live-asr-kate-kalcevich-vfo9c/  https://polly.equalentry.com/

Yuhang Zhao  to  You (direct message) 1:17 PM

https://www.yuhangz.com/projects/

You  to  Everyone 1:18 PM

XR Access resources

https://xraccess.org/resources

Yuhang Zhao  to  Everyone 1:18 PM

Yuhang Zhao’s project homepage: https://www.yuhangz.com/projects/

You  to  Everyone 1:18 PM

XR Access GitHub:

https://xraccess.org/github

Email me: Dylan@xraccess.org

Envision Companion: https://www.letsenvision.com/companion

Yuhang Zhao  to  Everyone 1:19 PM

dataset and model open source for CookAR: https://github.com/makeabilitylab/CookAR?tab=readme-ov-file

paul jackson  to  Everyone 1:24 PM

https://www.mxtreality.com/

You  to  Everyone 1:26 PM

I’m pretty sure I saw a google prototype that does constant monitoring, but can’t find it at the moment

paul jackson  to  Everyone 1:26 PM

I’ve worked with Jeff from mxtreality who created an app for the seattle swedish ms group

They’ve also worked on the differently abled community

Karim Merchant  to  Everyone 1:29 PM

Those seem increasingly critical – historical context and mental models

paul jackson  to  Everyone 1:30 PM

Medical XR think tank https://www.mxtreality.com/medicalxrthinktank

Karim Merchant  to  Everyone 1:32 PM (Edited)

Question for @Thomas Logan – wondering what the average delay is using Polly when text is updated with edits?

Thomas Logan 1:46 PM

This process requires displaying 10 to 12 lines of text in the meeting room instead of the standard two or three lines of text. This is because it takes time for the human translator to make the correction to the generated text. We need the audience to have enough time to see the update, which improves their comprehension of the content.

Karim Merchant 1:47 PM

Thanks! Sorry if that was at the link and I missed it.

Thomas Logan 1:47 PM

That’s why we had to display so many lines and it really only worked either in the room with a big display or asking participants to use that on their mobile phones

Thomas Logan 1:47 PM

No its a good question, its a hard part of it because you need that time to make the fix

paul jackson  to  Everyone 1:36 PM

https://www.codedbias.com/

Larry Goldberg  to  Everyone 1:42 PM

Not dead, it just smells funny. 🙂

You  to  Everyone 1:48 PM

MSF Accessibility: https://metaverse-standards.org/domain-groups/accessibility-in-the-metaverse/

Miranda McCarthy  to  Everyone 1:48 PM

That was a great session! Thanks so much 👍🏻

Jesse Anderson  to  Everyone 1:49 PM

I’m interested

You  to  Everyone 1:50 PM

There are groups like the Equitable Learning Technology Lab at University of Florida focusing on neurodiverse accessibility for XR:

https://education.ufl.edu/eltl/

Andy Vogel  to  Everyone 1:18 PM

I would love to use these tools to help generate Alt text but my institution want governance of the data policy and security. Is there way to work around this?

Peirce Clark 1:23 PM

If AI-generated alt-text is a concern for governance, human intervention or review could be an acceptable compromise. On premise or private servers where AI can do the computing is more secure than cloud based too. We’re currently working on data policy and security with reps in DC too

Jesse Anderson 1:52 PM

I don’t know about AI for altogether text, because it doesn’t have awareness about context. The same image needs different altogether text depending on the situation. Is the image a link, is it background art, or does the user need another type of description?

Jesse Anderson 1:53 PM

Alt text. Stupid autocorrect…

Ricardo Gonzalez 1:54 PM

Hello Jessie, there is actually current research looking into augmenting and generating alt-text that is context-aware to the specific web setting. Here is a link:

https://dl.acm.org/doi/pdf/10.1145/3663548.3675658

Thomas Logan  to  Everyone 1:52 PM

Benchmarking Usability and the Cognitive A11y Frontier – Abid Virani and Alwar Pillai

https://www.youtube.com/watch?v=Wmx-zayhFUE

Karim Merchant  to  Everyone 1:54 PM

Is there a link to that talk, @Thomas Logan?

Thomas Logan 1:54 PM

Yes this is live stream https://www.youtube.com/watch?v=Wmx-zayhFUE  but we will do recap on our website as well at equalentry.com

Karim Merchant 1:55 PM

Thanks again!

paul jackson  to  Everyone 2:00 PM

Conversation I produced about MS and medical cannabis

https://vimeo.com/showcase/10488966

Thomas Logan  to  Everyone 2:01 PM

I have to hop off, nice to speak with everyone!

You  to  Everyone 2:01 PM

Helping Hands: https://xraccess.org/helping-hands/

Karim Merchant  to  Everyone 2:01 PM

Thank you Yuhang, Andy, Thomas, Dylan and everyone!

Suzanne David  to  Everyone 2:02 PM

Thank you! This was very interesting!

Transcript

WEBVTT

150
00:14:59.850 –> 00:15:04.909
Dylan Fox: Alright. Well, I think we could probably go ahead and get started here.

151
00:15:07.480 –> 00:15:17.770
Dylan Fox: welcome, everybody thanks for coming. I’m Dylan Fox. I’m the the director of Operations for Xr access Cornell. Tech. I’m here with Pierce Pierce. You want to introduce yourself

152
00:15:18.545 –> 00:15:31.960
Peirce Clark: Yeah, sure, thanks, Alan. My name is Pierce Clark. I’m the director of Research and best practices at Xr Association. We’re a DC based trade association, and we kind of advocate for responsible development of Xr at large.

153
00:15:35.532 –> 00:15:40.369
Peirce Clark: Can we also mute to everybody else in the crowd, maybe other than the speakers.

154
00:15:40.490 –> 00:16:05.309
Peirce Clark: or if there’s like questions or the discussion pops up. I say, too, we’re very excited to kick off this. Now, our 5th community discussion we have had. We’ve held topics on audio cues and Xr and closed captioning and speech to text and text speech. The summaries of all of which are found on the extra access website and Youtube Channel and with that I’ll pass it back over to Dylan

155
00:16:05.980 –> 00:16:29.249
Dylan Fox: Yeah. So for those who are unfamiliar with our community discussions, the format we go with is, it’s a it’s an hour and a half event to give us a little bit of wiggle room and we’re gonna have our 3 featured speakers talk each for about 10 min just a little bit about the their research. And some of the things they see is kind of pressing issues in this area, which today is AI accessibility assistance.

156
00:16:29.905 –> 00:16:59.089
Dylan Fox: And then the rest of the time should be 45 min to an hour is gonna be for us to talk as a community. Figure out best practices. Talk about the the current kind of big, unanswered, unanswered questions and open needs. And we really enjoy doing these, because, as I was saying before, like we get folks from all over that that come in all kinds of different expertise. And it’s this big interdisciplinary melting pot.

157
00:16:59.636 –> 00:17:28.619
Dylan Fox: So with that said, I’m gonna go ahead and introduce our speakers real quick, and then why don’t we go in that in that order as well. So yuhong, Yuan Zhao is an assistant professor at the University of Wisconsin, Madison, with focuses on focuses on human computer interaction accessibility augmented and virtual reality. And AI powered interactive systems

158
00:17:28.850 –> 00:17:48.539
Dylan Fox: following that will be Andy Slater, who is an artist and researcher with virtual access lab, who has a master’s in sounds, arts, and Industries from Northwestern University, Bfa. From the School of Art Institute of Chicago, and has done a lot of really awesome work on blind, focused VR. Exhibits.

159
00:17:48.800 –> 00:17:58.910
Dylan Fox: And then, finally, Thomas Logan, who is the owner of equal entry and for 20 years has helped organizations create technology solutions that work for folks with disabilities.

160
00:17:59.828 –> 00:18:04.250
Dylan Fox: So young you’ve got 10 min. Go ahead and take it away.

161
00:18:05.710 –> 00:18:09.994
Yuhang Zhao: Oh, thank you, Dylan. I’ll share my screen then.

162
00:18:18.250 –> 00:18:21.697
Yuhang Zhao: yeah, let me know if you can see it.

163
00:18:24.030 –> 00:18:53.069
Yuhang Zhao: great. So Hi! Everyone very glad to be here talking about my research. So I’m an assistant professor at the University of Wisconsin, Madison and I lead the Matte Ability Research Lab, where we investigate how to design and develop effective assistive technologies for people with disabilities. And one important line of our research is to leverage the state of the art AI, and also extended reality technologies to support blind and low vision users.

164
00:18:55.610 –> 00:19:22.780
Yuhang Zhao: so AI technology has been advancing really quickly, with powerful capability of visual interpretation, reasoning, and then combined with Xr technology that incorporates sensing, displaying techniques to retrieve information from the surrounding environment as well as presenting multi-module feedback such a technology specifically AI-powered Xr has presented great potential

165
00:19:22.780 –> 00:19:23.110
Larry Goldberg: To, the.

166
00:19:23.110 –> 00:19:36.160
Yuhang Zhao: Enable blind and low vision users to access visual information, but also immersively, so that they can complete different tasks, achieving different experiences that they may not be able to before.

167
00:19:36.350 –> 00:19:49.020
Yuhang Zhao: So my research seeks to see this opportunity and explore how we can best leverage AI technology and develop the AI powered Xr systems to support blind and low vision users.

168
00:19:49.740 –> 00:20:09.920
Yuhang Zhao: So our research specifically include systems that addresses both users, challenges in the real world as well as how we can design technologies to make the emerging virtual world accessible. And I’m going to give one example for each, but with more focus on our work for VR accessibility.

169
00:20:10.620 –> 00:20:37.650
Yuhang Zhao: So I’ll start with the design and the development of an AR powered, augmented reality system to support low vision people to facilitate cooking tasks. So when saying low vision people, we refer to people who experience visual impairments, but they still have some remaining vision, and many times they prefer using their vision in their daily life. And the reason we focus on cooking is because it’s very important.

170
00:20:37.650 –> 00:21:02.520
Yuhang Zhao: dynamic and also complex tasks, daily activities of daily living. It’s very important for people’s independence. But it’s also very challenging and risky for people with low vision and via context, inquiry study with 10 blind and low vision people. We sort of do a comparison between 4 completely blind users and 6 low vision people, and we identified some very unique challenges

171
00:21:02.520 –> 00:21:11.409
Yuhang Zhao: faced by people with visual impairments, and one of them is to use and interact with different cooking tools

172
00:21:11.430 –> 00:21:29.789
Yuhang Zhao: in the kitchen environment, for example, how you use a knife to cut different ingredients, how you use a hot pans, and all of these processes very challenging, and sometimes pose risks to people with low vision, and there is unfortunately no effective low vision aids to support people.

173
00:21:30.230 –> 00:21:41.529
Yuhang Zhao: So to address this problem, we designed an AI-based, augmented reality system called Cooker. It is a variable, head-mounted pass-through AR system.

174
00:21:41.530 –> 00:22:05.529
Yuhang Zhao: and, unlike the common air system that would recognize and enhance a whole object, our system recognized the affordance of different kitchen tools, meaning that the interactive components of the different tools. For example, if we are referring to a knife, and then we can distinguish the knife blade versus the knife handle so that we can then generate visual augmentations to support low vision people

175
00:22:05.530 –> 00:22:13.070
Yuhang Zhao: to better distinguish these different interactive area and grasp and use these tools more efficiently and also safely.

176
00:22:13.860 –> 00:22:38.470
Yuhang Zhao: So here is the image showing our system. So it’s a head-mounted system that we combine a Z Mini stereo camera to capture the surrounding environment. Both the RGB. Information and the depth information, and we combine it with the oculus, the quest. 2. To display both the surrounding environment and our visual augmentations. And this headset is connected to a very powerful Gpu laptop

177
00:22:38.470 –> 00:22:44.110
Yuhang Zhao: to conduct the recognitions of different affordance area of the different tools

178
00:22:44.160 –> 00:22:58.753
Yuhang Zhao: and our augmentations distinguish these different areas. And specifically, we’re using a green AR overlay to represent the graspable area and the pinkish red color overlay on top of the

179
00:22:59.750 –> 00:23:03.430
Yuhang Zhao: the risky area that people’s hand should avoid.

180
00:23:03.600 –> 00:23:27.469
Yuhang Zhao: So this is our system design and our evaluation with 10 lovin users have demonstrated the effectiveness of our cooker system in enabling low vision people to quickly and safely grasping different kitchen tools, and also indicate that people do prefer the affordance augmentation over the conventional whole object augmentations when they’re using AR systems.

181
00:23:27.890 –> 00:23:44.560
Yuhang Zhao: Well, I won’t go into too much details of how we implement such a system. But we did open source our kitchen tool affordance data set and also our fine-tuned recognition model on Github. Later, I’m going to put this in the chat, but you can also find the link on my slides here

182
00:23:46.540 –> 00:23:59.660
Yuhang Zhao: and beyond our systems to support real world tasks. We also look into how we can better leverage AI to make the virtual world virtual reality more accessible to blind users.

183
00:23:59.960 –> 00:24:28.380
Yuhang Zhao: So VR has been challenging to blind people due to its visually dominant nature. Well, there has been many prior research where interaction techniques created to provide audio and haptic feedback to enable blind users to explore and navigate a VR space. But the issue is they really focus on the interaction design in itself and mostly relying on VR developers to incorporate these technologies in their development process.

184
00:24:28.630 –> 00:24:51.419
Yuhang Zhao: However, unfortunately, our interview with 25 Xr developers in industry have reviewed that accessibility. Integration is not really a priority because of many different reasons. For example, effort cost relatively small population, and so on, especially in this relatively amature VR industry.

185
00:24:52.310 –> 00:25:08.800
Yuhang Zhao: So our research tried to fill this gap by thinking about how we can come up with the solution. That is the post hoc, so that we do not really need to rely on developers. Effort to make all these VR applications accessible.

186
00:25:09.060 –> 00:25:33.440
Yuhang Zhao: So we present VR site, which is an AI driven system. Or you can see this as a pipeline. We can automatically stream the video content on any VR head mounted display to a server automatically recognize all these visual contents via state of the art vision foundation models and large language models. And then we can provide spatial audio

187
00:25:33.440 –> 00:25:56.060
Yuhang Zhao: and effective descriptions that that is spoken in different emotions and tones, that map to the VR. Scene’s atmosphere, to create immersive experience, and via this pipeline, our system can be applied to any VR applications running on a VR headset, that is, post hoc without any developer effort.

188
00:25:56.940 –> 00:26:20.760
Yuhang Zhao: So our system design consider not only providing blind users sufficient information from the VR environment, but also consider how we can better enable them in interactions also maintain immersion. Immersive experience in the VR environment as well as how we can protect their safety when interacting in the VR scene.

189
00:26:21.010 –> 00:26:44.899
Yuhang Zhao: So as a result, we support 4 different features in VR. Sect. To demonstrate the capability of our system, including effective scene, overview, scene, sweep, aim, assist, and guardian alert. So these are the names of the 4 features. But I’ll just go through them quickly, one by one, to tell you more about how they work and what are the technologies behind it.

190
00:26:45.890 –> 00:26:57.420
Yuhang Zhao: So the 1st feature is effective scene overview where we leverage gpt to generate a scene description of the user’s current view based on their request.

191
00:26:57.640 –> 00:27:10.500
Yuhang Zhao: But beyond these basic descriptions we also seek to enhance users immersion by incorporating emotions into the the verbal description, so that we can better reflect the current scene atmosphere.

192
00:27:10.500 –> 00:27:33.909
Yuhang Zhao: So to do that, besides the content recognition. We also use Gpt. To describe the detect the tone of the scene. For example, is it neutral, cheerful, fearful, urgent, and so on. And then we can leverage the azure neural text-to-speech engine to generate audio feedback with the corresponding emotion, to describe the scene to the user.

193
00:27:35.429 –> 00:27:55.540
Yuhang Zhao: So that’s effective scene overview, but of course, only providing generic descriptions of the scene wouldn’t be sufficient for user to better interact with the different objects. So that we also support this scene sweep feature to enable users to better understand what are the different objects or important objects in the scene.

194
00:27:55.600 –> 00:28:17.870
Yuhang Zhao: So, to implement such a feature, we tend to leverage some computer vision technologies to recognize the different virtual elements in the scene. But the issue here is, it could be very difficult to do so because many different virtual elements in VR look different from the real world environment.

195
00:28:17.870 –> 00:28:41.700
Yuhang Zhao: And there could also be some unique virtual elements that do not really exist in the real world which makes the pre-trained AI models to be very difficult to function well in the VR environment. Consider the lack of existing data set in the VR context, then what objects should we recognize? And how should we collect the data set to support recognition in VR environment

196
00:28:42.730 –> 00:29:02.750
Yuhang Zhao: and to address this problem. This is another contribution from our project is to construct a data set for the VR context, and we specifically focused on the social VR scenarios be considering the very complex or diverse type of VR applications.

197
00:29:02.830 –> 00:29:27.810
Yuhang Zhao: So we collect different visual scenes, images from 15 commonly used social VR. Applications, for example, VR. Chat, rec room, and so on, and manually annotate the different major objects on these different images, and our data set resulted in 16,905 images with 33

198
00:29:27.810 –> 00:29:52.760
Yuhang Zhao: classes that are categorized into 6 different categories. For example, avatars in terms of human versus non-human avatars, seating area, which is also important in social VR, for example, campfire single seats, multiple seats, and some unique elements. In VR, for example, the controls, including virtual hands, controllers, dashboard and so on, so that our data set would include all of

199
00:29:52.760 –> 00:29:57.330
Yuhang Zhao: these unique elements in social VR to support our

200
00:29:57.340 –> 00:30:03.560
Yuhang Zhao: recognition model. And we fine-tuned a yellow V 8. Model for VR objected detection

201
00:30:03.860 –> 00:30:27.549
Yuhang Zhao: and also combined with the depth detection in the environment. We’re using this depth. Anything. V. 2. Deep learning model to infer the depth of each of the recognized objects, so that we are able to render spatial audio to announce different objects from its corresponding position, and all of these are post hoc, based on any currently running applications on the head

202
00:30:29.570 –> 00:30:52.660
Yuhang Zhao: and beyond. So we also seek to further support people’s interactions so that people can better understand the details of certain objects. And we designed this aim, assist feature that is based on either laser pointing or free form hand exploration in the VR environment also depending on what the original app would support.

203
00:30:52.660 –> 00:31:21.020
Yuhang Zhao: For example, if the original app provides, this laser pointing function extruded from the user controller or their virtual hand, and then we can use line extraction to detect the laser, and then verbally describe the object that is collided with the laser. But if this app, only support hand interaction directly, and we would detect the position of the virtual hand and then describe the different objects near the user’s hands. Position

204
00:31:22.354 –> 00:31:29.770
Dylan Fox: Young, this is really awesome work. But I I just wanna make sure we have time for community discussion. See if you could wrap it up in the next minute or 2

205
00:31:29.770 –> 00:31:52.569
Yuhang Zhao: Yes, this is the last last feature I’m going to talk about, which is guardian, alert which we care about people’s safety as well, and if you have experience with VR. They usually generate this guardian system, that showing the boundary of the system. If people get too far away hating a real world object or getting out of the virtual boundary you have set up.

206
00:31:52.570 –> 00:32:14.509
Yuhang Zhao: That’s why our system can also detect the visual guardian boundary in the system. And when they’re visible enough, meaning that the user is close enough. And then we can also render a spatial audio alert. So this is a video showing that you can see when the virtual hands is to the boundary and then close to the boundary, and then we hear the audio feedback.

207
00:32:14.670 –> 00:32:43.450
Yuhang Zhao: and this is an ongoing project that we have run a pilot study with 3 low vision blind users and find out our system can help address some key challenges in social VR, for example, identifying open space to sit and so on. So that would wrap up my talk today that I talk about 2 projects, one for real world, the other for the virtual world, and I’d be happy to discuss more about them later in the discussion. Thank you so much.

208
00:32:44.840 –> 00:32:46.310
Peirce Clark: Great. Thank

209
00:32:46.520 –> 00:33:08.340
Peirce Clark: thanks. Young, this is awesome research and work. And it’s so good to hear about the positive stuff you guys are doing over there. So a big thanks. I’m sure that there are some ideas percolating around everybody. And we’d like to ask to hold on to questions or post them in the chat, and we’ll make a log of them and dive into them after the next 2 speakers have gone

210
00:33:08.877 –> 00:33:16.309
Peirce Clark: and with that I’d like to quickly pass the baton next to Andy Slater Andy, feel free to take it away

211
00:33:16.310 –> 00:33:19.859
andy slater: Hi! Hey, everybody! I’m Andy Slater. I am

212
00:33:20.010 –> 00:33:34.440
andy slater: blind, and I think that’s probably important to the discussion. I am a sound designer and access lead for an immersive design company called Fair Worlds. I also do a lot of

213
00:33:34.660 –> 00:33:42.360
andy slater: consultation and advising with art museums and galleries, and that sort of thing, not only with describing

214
00:33:42.470 –> 00:33:48.050
andy slater: the work, visual work or doing live audio description for performance and stuff, but also

215
00:33:48.290 –> 00:33:51.920
andy slater: like blind navigation and autonomy in those spaces.

216
00:33:52.240 –> 00:34:02.360
andy slater: I’m also working artists, installation artists, media artists. I do a bunch of of different stuff. And a lot of it is in the Xr space.

217
00:34:02.670 –> 00:34:04.569
andy slater: And so, you know, being

218
00:34:04.960 –> 00:34:11.789
andy slater: being blind, working in the industry, and having my own practice, and wanting to use these tools, and often coming up with

219
00:34:12.020 –> 00:34:17.709
andy slater: hitting, hitting really hard brick walls, sometimes trying to get ahead and figure out solutions.

220
00:34:17.980 –> 00:34:27.539
andy slater: We all know that that’s frustrating. But at the same time it also gives me this sort of perspective to think speculative speculatively of what

221
00:34:27.830 –> 00:34:30.620
andy slater: what the future might hold, and what kind of ideas

222
00:34:30.800 –> 00:34:44.680
andy slater: and techniques and stuff need to be implemented in certain things, especially when we’re working with Xr or any kind of digital accessibility. And so that’s that’s some of what I’ll be talking about is kind of

223
00:34:44.880 –> 00:34:52.310
andy slater: you know what? What could work, and me not being a developer necessarily, or not knowing a whole lot about how

224
00:34:52.760 –> 00:35:02.159
andy slater: a lot of things on that end work. What’s what’s a heavy lift? And what’s kind of an easy solution. So I might just be proposing these these ideas that might take

225
00:35:02.270 –> 00:35:07.960
andy slater: 40 years to complete. Or maybe it might just be something that somebody in the community could be like, Oh, yeah, you just need to

226
00:35:08.320 –> 00:35:13.400
andy slater: throw in this code or something like that. So I invite any kind of

227
00:35:14.200 –> 00:35:18.349
andy slater: any kind of add-ons to any of the things that that I’m talking about today.

228
00:35:18.960 –> 00:35:20.700
andy slater: I’d like to talk 1st about

229
00:35:21.611 –> 00:35:23.858
andy slater: museum access, you know.

230
00:35:25.200 –> 00:35:32.409
andy slater: blind people. And this is going to be, I guess, specifically for art museums. But it really could be used in any kind of case, I guess.

231
00:35:32.670 –> 00:35:40.780
andy slater: but, like historically blind folks aren’t always either invited or feel comfortable or feel invited in

232
00:35:41.100 –> 00:35:47.749
andy slater: an art museum or or an art gallery space or something that is very ocular centric, and carries a lot of

233
00:35:48.150 –> 00:35:56.459
andy slater: of, you know, visual visual art and that sort of thing. And so I’ve always felt that

234
00:35:57.840 –> 00:36:00.290
andy slater: I think we all do that image description.

235
00:36:00.540 –> 00:36:05.070
andy slater: It’s really important. That’s kind of a way that we can certainly experience art

236
00:36:05.260 –> 00:36:14.989
andy slater: and a lot of the times. These descriptions and stuff are really kind of boring and objective, either come from like an art historian sort of background, or

237
00:36:15.210 –> 00:36:23.859
andy slater: through somebody that felt they needed to throw everything together last minute, and may not have a whole lot of experience or training in doing any kind of description.

238
00:36:25.670 –> 00:36:32.859
andy slater: I personally like subjective and weird and poetic descriptions, also mixed alongside with something maybe more objective or

239
00:36:33.130 –> 00:36:35.489
andy slater: historic. And I think that

240
00:36:35.910 –> 00:36:41.610
andy slater: if we’re thinking about using any kind of Xr or AR in a museum space.

241
00:36:41.940 –> 00:36:47.970
andy slater: giving those kind of opportunities where you could have, you can hear descriptions.

242
00:36:48.250 –> 00:37:02.019
andy slater: and even like from the curator or the artist themselves, just giving extra bits of detail that scanning a QR code and like bumbling around through the gallery itself doesn’t offer

243
00:37:02.210 –> 00:37:07.879
andy slater: right? So with with AR, let’s say you have your your glasses, or something like that.

244
00:37:08.300 –> 00:37:09.660
andy slater: you’d be able to.

245
00:37:09.810 –> 00:37:16.070
andy slater: you know, safely navigate, maybe, and then even get information on whatever artwork is in the room.

246
00:37:16.310 –> 00:37:21.079
andy slater: or that you’re standing in front of using, either, you know, like a

247
00:37:21.610 –> 00:37:27.140
andy slater: object recognition, or even like visual positioning system, or something like that.

248
00:37:29.720 –> 00:37:38.080
andy slater: being able to to hear different descriptions, different voices, different kinds of things, being able to customize it yourself.

249
00:37:39.000 –> 00:37:44.440
andy slater: being able to walk around a space, and even

250
00:37:44.660 –> 00:37:45.870
andy slater: I’m sorry. Hold on a second.

251
00:37:51.960 –> 00:37:53.250
andy slater: I spilled my coffee

252
00:37:56.890 –> 00:38:05.509
andy slater: even getting any other kind of augmented details and that sort of thing, I think, is really important, and I think that these sort of ideas can work over into

253
00:38:05.780 –> 00:38:16.409
andy slater: story storytelling opportunities in different spaces at fair worlds. We have developed these AR walking tours.

254
00:38:16.660 –> 00:38:20.210
andy slater: so to speak. We have one at the Seattle Center.

255
00:38:20.839 –> 00:38:27.279
andy slater: and one currently in Development Central Park, and it’s called Spacetime Adventure Tour. And it’s really fun. And it’s

256
00:38:27.620 –> 00:38:35.950
andy slater: playful. And there’s a lot of image descriptions and and things like that written into the script that the tour guide gives and

257
00:38:36.140 –> 00:38:40.570
andy slater: the narrative. But one thing that we’re thinking about for

258
00:38:40.760 –> 00:38:46.690
andy slater: the future is trying to implement. You know, these these like Gpts, or these sort of

259
00:38:48.490 –> 00:38:56.259
andy slater: these these extensions within the app. So you don’t have to go externally to go to, maybe be my eyes, or

260
00:38:56.910 –> 00:39:02.510
andy slater: pickybot, or something like that, to to get descriptions like being able to embed it into the app

261
00:39:03.110 –> 00:39:10.120
andy slater: itself, and having access to any of these sort of like open source things, or or gaining some kind of license, or something like that.

262
00:39:10.460 –> 00:39:14.069
andy slater: could be really important. And I think that not only for, like

263
00:39:14.370 –> 00:39:20.210
andy slater: you know, our own projects, but for giving other people the opportunity to create their own thing. Right?

264
00:39:20.380 –> 00:39:26.780
andy slater: I feel like myself. I’m always juggling between one thing or the next.

265
00:39:27.060 –> 00:39:30.979
andy slater: you know, having to close out an app to go into something else, having to download

266
00:39:31.340 –> 00:39:40.339
andy slater: a video and bringing it into Pickybot. If you’re not sure, Pickybot, it’s PICC, YBOT. And it’s an image and video description

267
00:39:40.630 –> 00:40:06.809
andy slater: tool that uses a bunch of different Gpt models like it’s got openai. It’s got deepseek. It’s got gemini. It’s got a bunch of different ones you can use that will all provide like different types of descriptions, and you can customize whether you want it to be sound like a Gen. Z. Person using a whole bunch of slang that I don’t understand, like riz or giving, or whatever

268
00:40:07.290 –> 00:40:12.769
andy slater: whatever makes me feel like an old man, or then something that that is really more straight to the point.

269
00:40:12.890 –> 00:40:15.870
andy slater: which again, you know, gives different

270
00:40:16.690 –> 00:40:21.910
andy slater: different versions of descriptions from different sort of voices, and that sort of thing.

271
00:40:22.150 –> 00:40:26.550
andy slater: Having to download the video, get all of that, and then go back into something else

272
00:40:27.310 –> 00:40:30.660
andy slater: on the fly while I’m out somewhere, is really.

273
00:40:30.970 –> 00:40:37.300
andy slater: it’s really distracting. And I think that a lot of the times, you know, it’s just one of those

274
00:40:37.470 –> 00:40:54.369
andy slater: exhausting things that we have to deal with as blind people trying to juggle all these tools, hopping over to seeing AI in order to read a sign, or I guess if you’re using your metaglasses, you know, doing that sort of thing. But but juggling. And so I think that integrating these things.

275
00:40:54.700 –> 00:40:57.940
andy slater: having the ability to do that into. You know.

276
00:40:58.830 –> 00:41:03.979
andy slater: one app is going to be the most important thing, especially.

277
00:41:04.370 –> 00:41:10.199
andy slater: you know, moving forward in these spaces like museums or in public spaces, with some kind of like storytelling or

278
00:41:10.340 –> 00:41:12.780
andy slater: other things that you might be using.

279
00:41:13.820 –> 00:41:19.250
andy slater: And you know this could all work into games as well right, if you had

280
00:41:19.460 –> 00:41:28.160
andy slater: an AI assistant, that when you walk into a 3D. Space or a virtual space, and it gives you an idea of what’s in the room, and you somehow can interact with it.

281
00:41:30.340 –> 00:41:44.190
andy slater: just there on the fly without having to, you know, get any other kind of assistance, like. If if something is telling you, the the room is this big, these objects are here, this Npc is moving towards you, and that sort of thing which I believe we’ve seen some

282
00:41:44.800 –> 00:41:49.629
andy slater: in some examples of games and that sort of thing, but

283
00:41:50.060 –> 00:41:55.390
andy slater: none of it necessarily seems to be intuitive or practical.

284
00:41:55.780 –> 00:42:01.500
andy slater: and I think that if you were able to incorporate some kind of thing like this using machine learning that

285
00:42:01.770 –> 00:42:23.459
andy slater: gets accustomed to the user, to the player, to yourself. And and you know, maybe you want a description of a space. But you just want a really quick one. It’s it’s used to you asking for those sort of things just like if you’ve had conversations with Chatgpt, you know, it gets to know you. And I think you know, giving these opportunities to.

286
00:42:23.780 –> 00:42:26.380
andy slater: you know, if you’re a user.

287
00:42:27.310 –> 00:42:33.360
andy slater: Sorry I spilled coffee again, giving these opportunities and these options

288
00:42:35.160 –> 00:42:42.699
andy slater: we’ll make, we’ll make things just seem easier. And I think that’s really kind of what I want to get at is like what is going to make

289
00:42:42.860 –> 00:42:48.909
andy slater: these these experiences and opportunities easier for blind users

290
00:42:49.530 –> 00:42:55.190
andy slater: and just more customized and personable, so that

291
00:42:55.310 –> 00:43:02.479
andy slater: you’re not fumbling around with something that might even be based in ableism or medical model. And those sort of things.

292
00:43:02.750 –> 00:43:05.930
andy slater: My time is up, and I feel like I rambled. But

293
00:43:06.510 –> 00:43:11.330
andy slater: we have time in the Q. And A, so yeah, thanks.

294
00:43:11.490 –> 00:43:13.069
andy slater: I got to clean this coffee up

295
00:43:13.590 –> 00:43:27.179
Dylan Fox: Thank you, Andy. No, that is some fantastic fuel for the the conversation, I think. Yeah, drink. Drink the rest of your coffee before you spill it. Then we’ll I’m a hundred percent sure. We’ll come back to some of these during the the main discussion.

296
00:43:27.763 –> 00:43:33.759
Dylan Fox: But 1st I want to give our last featured speaker a chance. Thomas. Take it away

297
00:43:35.690 –> 00:43:42.569
Thomas Logan: Great. Hello! Everyone won’t take any sips of any liquids and no spilling here, I hope.

298
00:43:42.810 –> 00:43:51.550
Thomas Logan: Alright, I’m gonna show you all the presentation today. I think it’s 1 very good for conversation, and excited to see so many people

299
00:43:51.730 –> 00:44:00.390
Thomas Logan: here in the group, either that I’ve met or not. I’m the owner of a company equal entry. And why humans always matter with AI,

300
00:44:00.950 –> 00:44:26.960
Thomas Logan: I’m gonna always try to be very topical. So today I wanted to talk about. There’s a success criteria and getting geeky for one second for the accessibility standards about having captions for live presentations like me talking right now, and the idea is that we should have captions provided synchronized text for all audio content in real time videos.

301
00:44:27.110 –> 00:44:37.660
Thomas Logan: So this is a kind of hot topic on social media. Right now there is a topic. Cart is dead. Long live ASR.

302
00:44:37.950 –> 00:44:49.229
Thomas Logan: Carp stands for communication, access, real time, translation, and ASR standing for automatic speech, recognition and automatic speech recognition being an application

303
00:44:49.390 –> 00:45:05.160
Thomas Logan: of artificial intelligence. This person, Kate, calcific from Fable, talked about. She’s a person who is deaf or hard of hearing. I recently attended 11 sessions at a conference, each with a different stenographer.

304
00:45:05.520 –> 00:45:27.969
Thomas Logan: basically a human person who is listening to the presentation and typing the communication real time onto the screen, she said. The quality varied between the different individuals who are providing the cart and the 11 sessions. But even the most accurate and fastest stenographer couldn’t match my experience with ASR automatic speech recognition.

305
00:45:28.420 –> 00:45:37.699
Thomas Logan: So in this particular discussion topic, and the heading is very sensational, I think it’s a good one to get the discussion. It’s good one for us to have a discussion about.

306
00:45:37.850 –> 00:45:40.050
Thomas Logan: This is basically saying

307
00:45:40.260 –> 00:45:49.789
Thomas Logan: at this point for providing live translation or meeting the web content, accessibility, guideline criterion for one dot, 2 dot 4 live

308
00:45:50.961 –> 00:45:55.950
Thomas Logan: live captions. It’s better to use AI than to use a human.

309
00:45:56.470 –> 00:46:02.980
Thomas Logan: And honestly, if you read the full article which I’ll I’ll paste into our chat after I finish this 10 min, so you all can see the whole

310
00:46:03.100 –> 00:46:19.449
Thomas Logan: conversation, maybe conversate there as well. There’s there’s a lot of other nuance in this article other than just the headline, but I think it was interesting just to talk about, because this has been something that had been very interesting or still is very interesting. To me.

311
00:46:19.904 –> 00:46:44.619
Thomas Logan: I had done work on this as well for thinking about this in the context of multilingual and the idea that why don’t we just make this more complicated? We want to have live captions, but we also want to have live captions in multiple languages, because we may have more than people attending from different parts of the world where English is not the 1st language, and one of the cool things

312
00:46:45.545 –> 00:46:49.899
Thomas Logan: with AI technologies is the ability to also live, translate

313
00:46:50.250 –> 00:46:52.660
Thomas Logan: on the fly into any language.

314
00:46:53.133 –> 00:47:06.789
Thomas Logan: And so this was a presentation I did in the past and on the screen. I have some text in English that says, provide subtitles for dialogue and sound effects, and then a human translated that into Japanese Kanji characters

315
00:47:06.950 –> 00:47:08.030
Thomas Logan: on the screen.

316
00:47:11.260 –> 00:47:14.320
Thomas Logan: Oh, I didn’t share. Let me share my screen again with sound.

317
00:47:21.080 –> 00:47:22.919
Thomas Logan: Bear with sound

318
00:47:23.780 –> 00:47:31.460
Thomas Logan: one of the cool a one of the cool VR experiences that no longer is alive today was all space.

319
00:47:32.200 –> 00:47:33.080
Thomas Logan: but

320
00:47:33.230 –> 00:47:40.890
Thomas Logan: which was purchased by Microsoft. They don’t have this technology anymore. But one of the ideas in alt space was when we meet up as virtual avatars

321
00:47:41.150 –> 00:47:48.480
Thomas Logan: in a shared space such as, like the meeting we are in today, we could turn on a feature where we would have captions shown

322
00:47:49.640 –> 00:48:08.699
Thomas Logan: in our chosen language, and we could be translating, though from any language that that person speaking not really every language. But we could say, my spoken language is English, but I want to view the captions in Japanese, or, for example, my spoken language is English, and I want to view the captions in English.

323
00:48:08.930 –> 00:48:16.050
Thomas Logan: But we made a screen demo of.

324
00:48:18.600 –> 00:48:21.320
Thomas Logan: Yeah, yeah, that makes us good.

325
00:48:22.020 –> 00:48:26.689
Thomas Logan: Yeah. And this is correct. This is in mine and Eichmann text funding.

326
00:48:29.670 –> 00:48:37.739
Thomas Logan: So basically on the screen, we had a person, Roland Dubois, speaking in German and on the screen. It was being displayed visually in Japanese.

327
00:48:39.900 –> 00:48:44.949
Thomas Logan: And then this one. We’ll do Japanese to English. It’s working, it’s working.

328
00:48:49.080 –> 00:48:53.300
Thomas Logan: But I said, There you go.

329
00:48:53.820 –> 00:48:54.620
Thomas Logan: It’s more

330
00:48:58.690 –> 00:49:05.670
Thomas Logan: kanari reopening communication.

331
00:49:05.980 –> 00:49:06.690
Thomas Logan: But the more

332
00:49:07.740 –> 00:49:32.439
Thomas Logan: so basically, what was interesting in this whole presentation was, what I found is, this is, I’m putting 2 ideas here together in 10 min. But you know AI translation is never 100 accurate, and what I was speaking in the beginning was, neither were humanographers. But one of the ideas that I want to discuss is, how do we have

333
00:49:33.350 –> 00:49:57.709
Thomas Logan: AI to use, but still enable a human to be involved to either clean up the AI or to improve the AI experience. So I had the opportunity to build something called Poly POLL y. I was living in Tokyo and basically was attending a meetup that was being translated live into both English and Japanese.

334
00:49:58.440 –> 00:50:05.780
Thomas Logan: and I think, listening to someone speak a language you don’t understand is boring and confusing. This is also.

335
00:50:05.920 –> 00:50:12.636
Thomas Logan: if I’m deaf and I’m at your meetup or your event, and you have no live captions. It’s boring

336
00:50:12.990 –> 00:50:14.250
terry Fullerton: Like my sample

337
00:50:15.640 –> 00:50:19.769
Thomas Logan: So you’ll probably fall asleep or you’re definitely not gonna be engaged if you can’t participate.

338
00:50:20.450 –> 00:50:29.440
Thomas Logan: So how can we improve the experience? One? You could hire a translator in this experience. And so this would oftentimes be very cumbersome

339
00:50:29.460 –> 00:50:53.170
Thomas Logan: in. When I was living in Tokyo. If you have a translator, you basically have to 1st speak in English and then wait for the translator to process the English and then have that person say the same thing in Japanese. So basically, a 10 min presentation, like what I’m doing right now would take at least 20 min because you have to at least account for speaking the same amount of words in another language.

340
00:50:54.569 –> 00:50:55.229
Thomas Logan: So

341
00:50:55.840 –> 00:51:01.199
Thomas Logan: really true inclusion. And this is going, you know, just so broadly is we would want everyone to be able to understand

342
00:51:01.700 –> 00:51:06.019
Thomas Logan: and participate in any event, and in whatever language they need to do.

343
00:51:06.740 –> 00:51:13.475
Thomas Logan: So. This is my 2 Tokyo Meetup. We we would do this. And what we thought about was,

344
00:51:14.150 –> 00:51:20.810
Thomas Logan: we’re always aiming for a hundred percent accuracy. Whether we were just translating my speech right now from English.

345
00:51:20.970 –> 00:51:39.729
Thomas Logan: And we’re using AI, we want a hundred percent accuracy. If we’re translating from English to Japanese, we still want 100% accuracy. And basically what I found living in Tokyo was when people would speak into Japanese. I think the Japanese captions were probably getting 90% accuracy.

346
00:51:39.860 –> 00:51:54.080
Thomas Logan: Still, not a hundred percent accuracy even for a Japanese person who is deaf. But the accuracy for someone reading those captions in English was really really low. Basically unusable. You you couldn’t follow the conversation.

347
00:51:55.010 –> 00:52:09.920
Thomas Logan: So Japanese to English very flawed. So when we made Poly the idea, I’m not going to show the demo, because we don’t have a lot of time. But we basically made a Google document that would take the live translation, the auto translation from

348
00:52:10.380 –> 00:52:22.720
Thomas Logan: Japanese to English and a person a human would go in and be able to benefit from the AI, but then also be able to edit what the AI was doing, so how that would work

349
00:52:22.840 –> 00:52:25.339
Thomas Logan: for my meet up, for example.

350
00:52:25.460 –> 00:52:30.779
Thomas Logan: in the Us. I would be the presenter in English. I had a human captioner

351
00:52:31.333 –> 00:52:51.419
Thomas Logan: nearby knight in English. We worked with her for 11 years at our meetup, and then we had Kenji Yanagawa, who was a translator, could convert from English to Japanese. He’s not a sonographer, so he couldn’t type at the speed that he would need to to improve the captions. But he could clean up the captions that were getting automatically translated.

352
00:52:51.790 –> 00:52:59.660
Thomas Logan: And here on the screen, basically, I think one of the goals we should always have is enable humans and AI to work together to make a great experience for everyone.

353
00:53:00.548 –> 00:53:11.580
Thomas Logan: And I think in the discussion I want to say, how do we insert humans to always be a part of overriding or improving decisions made by AI to have a human touch, because, I think, is

354
00:53:11.720 –> 00:53:25.570
Thomas Logan: no matter any domain we talk about, or any of the topics we’ve talked about. I I fully support having AI to enable new possibilities. But I think, as a design mechanism or as a discussion point.

355
00:53:25.730 –> 00:53:27.869
Thomas Logan: how do we have

356
00:53:28.100 –> 00:53:34.959
Thomas Logan: humans involved in that process? And since I have about 1 min, I’m going to show you the poly site.

357
00:53:35.582 –> 00:53:38.960
Thomas Logan: basically in our process. What would happen?

358
00:53:39.526 –> 00:53:44.000
Thomas Logan: I’m just showing visually here. But the Google document would show

359
00:53:44.230 –> 00:53:59.730
Thomas Logan: the 2 languages. Actually, on the screen. I’m showing English and Spanish, because it’s probably more familiar English speaking world. But we had a example of when I said, Hello, everyone. Thank you for being here with us at 8 1 1 y. And Yc. Today.

360
00:54:00.130 –> 00:54:12.700
Thomas Logan: a 1 1 y. Nyc, that might be pronounced as Allie, Nyc. Accessibility. Nyc, this would be a typical thing for a automatic speech recognition tool to have a problem with, because we’re using an acronym.

361
00:54:12.940 –> 00:54:23.999
Thomas Logan: But our human captioner does a good job. You know. We work with her. She knows this acronym. So she types that perfectly. But when we get into the Google document.

362
00:54:24.140 –> 00:54:46.059
Thomas Logan: there is a mistake in the Spanish. And so what happens in the Spanish is once we create or change the tense of the language, we can have that corrected in the Google document and then now displayed visually on the screen for someone reading it in Spanish as just one simple idea of humans being involved in AI process.

363
00:54:46.640 –> 00:54:48.250
Thomas Logan: Okay, it’s about 10 min

364
00:54:52.570 –> 00:54:54.628
Dylan Fox: Awesome. Thanks, Thomas, this is

365
00:54:55.180 –> 00:54:58.331
Dylan Fox: Let me PIN. There we go.

366
00:54:59.429 –> 00:55:09.109
Dylan Fox: awesome presentation here. I very, very true that we need to to keep humans in the loop and try for the best of both worlds. Not just purely one or purely the other.

367
00:55:10.157 –> 00:55:32.393
Dylan Fox: Okay. So we’ve got now 45 min for the rest of the discussion here. I wanna open it up to people I know. I’ve got certainly a couple of kind of open questions. That could that keeps attention. But if people had had burning questions either for the panelists or just that, they wanted to switch the group. Please go ahead and

368
00:55:33.080 –> 00:55:39.789
Dylan Fox: either just unmute and speak out, or if you want to raise your hand, and then pure style will call on you and kind of moderate.

369
00:55:40.140 –> 00:55:42.099
Dylan Fox: But yeah, please go ahead

370
00:55:44.760 –> 00:55:45.890
Kevin Cao: So

371
00:55:46.320 –> 00:55:58.429
Kevin Cao: I like the 1st presentation. Because it benefits benefits a lot of us that are totally blind, partially blind, and whatever it is. But

372
00:55:59.100 –> 00:56:16.950
Kevin Cao: is there like a site that you’re gonna be posting so that way? If if some of us are interested, we can go in there and we can get some of this stuff, or you have to have a VR headset in order to do this? Or can you

373
00:56:17.340 –> 00:56:25.939
Kevin Cao: use another device besides of a VR. Headset, because not a lot of people can afford all this expensive VR equipment

374
00:56:28.630 –> 00:56:35.014
Yuhang Zhao: Yeah, thank you, Kevin, for the question. So a lot of our prototypes are in

375
00:56:35.610 –> 00:57:00.170
Yuhang Zhao: prototype face that because we are a research team, not necessarily industry group, but we do have 1st of all, as a professor at the university. I do have a homepage that we organize all of our completed and also ongoing research projects on my website so that you can see the live demos and links and yearly for things that we can open source. For example, both

376
00:57:00.170 –> 00:57:12.360
Yuhang Zhao: cooker and also we are set after we completely finish it. We’re going to open source it on Github, and there will also be a link attached to my website as well. And I’m going to just

377
00:57:12.360 –> 00:57:35.330
Yuhang Zhao: put it here quickly so that you can. If you’re interested in or folks are interested, you can take a look. But for now VR site is not really on it, because we only finished the development and we didn’t complete finish our evaluation yet. So we want to wait until everything is more mature so that we can announce it.

378
00:57:35.630 –> 00:57:38.549
Yuhang Zhao: But we are trying our best to open source things

379
00:57:39.440 –> 00:57:58.360
Dylan Fox: And Kevin, I would I would say as well that we have our website, xraccess.org. We have a resources. Page. So I’m sharing that as well as our github that has some of these things. Now, those are right now, mostly focused on developers.

380
00:57:58.814 –> 00:58:22.240
Dylan Fox: But I think if there’s interest in having kind of a collection of, for example, these different, like AI tools like picky and we’re just talking about, I think, envision companion before the the event started. We can also add those there so that we have a repository of tools for blind users as well, because I think that would also be really useful.

381
00:58:23.377 –> 00:58:35.719
Dylan Fox: So we’ll we’ll be sure to add those. And if people have suggestions definitely post them in chat now, or you can email them to me at dylan@extraaccess.org and I’ll make sure we get those on there as well

382
00:58:40.590 –> 00:58:41.450
Jesse Anderson: Okay, thanks.

383
00:58:41.450 –> 00:58:42.050
Jesse Anderson: This is

384
00:58:45.410 –> 00:58:54.459
Jesse Anderson: Hello. This is Jesse. Just real quick. Yeah, these were really all good presentations. And I’m especially like for the 1st session.

385
00:58:54.930 –> 00:59:00.609
Jesse Anderson: especially intrigued about the sort of live VR portion of that.

386
00:59:01.267 –> 00:59:11.552
Jesse Anderson: I the second presentation I was thinking a lot about. You know, I’ve used Pickybot I’ve used be my eyes seeing AI

387
00:59:12.230 –> 00:59:16.320
Jesse Anderson: and what Dylan had just mentioned was that

388
00:59:16.937 –> 00:59:27.270
Jesse Anderson: I think like I said. I think I thought they still called it Ally but it’s from the envision group, and it’s a live conversational

389
00:59:27.700 –> 00:59:36.099
Jesse Anderson: AI, and I’ve been in a few of their sort of. They kind of have these monthly meetings, Zoom Meetings, town hall type things, and

390
00:59:36.820 –> 00:59:45.660
Jesse Anderson: like what I’m like, they currently work with the envision glasses right now as well as just an app on your phone.

391
00:59:46.000 –> 01:00:00.850
Jesse Anderson: And like, I really wish that, like, I’ve I’ve seen the envision glasses and they’re using. I think old Google Glass, which are really many years old now, and they are bulky, and to me not very comfortable at all.

392
01:00:03.000 –> 01:00:06.990
Jesse Anderson: But you know, as someone who has the meta glasses as well.

393
01:00:08.780 –> 01:00:31.710
Jesse Anderson: you know you’re looking at like whether it’s a a museum or just something in general right now. The this live conversational thing with AI for ally. The thing that I think would be intriguing about this is for those who don’t are unfamiliar with it. It’s a live conversational AI,

394
01:00:31.750 –> 01:00:39.300
Jesse Anderson: and they call it Ally, because when you start up the app, it asks you a few different onboarding questions like.

395
01:00:39.450 –> 01:00:47.189
Jesse Anderson: you know, what what would you like me to know about you? What do you like? What do you not like? How do you want the AI to behave?

396
01:00:47.540 –> 01:00:53.770
Jesse Anderson: But then you. Actually, it comes with 4 sample AI

397
01:00:54.410 –> 01:00:59.010
Jesse Anderson: things that you can either further customize or you can create your own AI

398
01:00:59.120 –> 01:01:04.089
Jesse Anderson: and or your own ally. And what’s nice about that is

399
01:01:04.410 –> 01:01:15.089
Jesse Anderson: for different mo, you know, for different needs. Within the same app, you could have different, live conversational allies. Essentially, maybe one that is geared toward

400
01:01:15.470 –> 01:01:19.789
Jesse Anderson: how I want, how I prefer to have something described in a museum

401
01:01:19.910 –> 01:01:29.279
Jesse Anderson: versus how I might want them to read something versus how I just want to converse about something, and it doesn’t have to all involve the camera

402
01:01:29.570 –> 01:01:30.770
Jesse Anderson: because

403
01:01:30.960 –> 01:01:38.409
Jesse Anderson: it works as a live conversational chat bot, too. So I posted a video in the link in the chat

404
01:01:38.630 –> 01:01:43.469
Jesse Anderson: where I kind of demoed some of these things a few weeks ago.

405
01:01:43.590 –> 01:01:52.419
Jesse Anderson: and you can like you can use it like it’s really open ended because, you know, you can have it. Use the camera and describe. You know, it’ll take a picture of something.

406
01:01:52.860 –> 01:02:00.579
Jesse Anderson: But you can also just, you know whether you want to create like a you know, you want to ask it a fact. I’ve used it to

407
01:02:01.203 –> 01:02:23.779
Jesse Anderson: like. Oh, what was that movie that I read? You know that I watched 20 years ago, and I remember little vague bits of it. And it helped me remember the movie, or I can do like a simple kind of like a almost like a D, and D, choose your own adventure sort of campaign. I could have it generate trivia. You could have it, you know. Help you study for tests. I’ve used it for.

408
01:02:24.210 –> 01:02:43.469
Jesse Anderson: like I’m you know, I’m play. I’m playing and practicing drums, so I could have one ally tell me, or I could tell it. You know the songs, the songs, and the type of music I like and what I’ve learned, and then it can give me better suggestions at my skill level what it could recommend for me.

409
01:02:43.580 –> 01:02:48.269
Jesse Anderson: But I know at these town halls. What they were talking about was

410
01:02:48.540 –> 01:03:03.969
Jesse Anderson: eventually, they want to get toward a live camera model, because right now you can say, Hey, what am I looking at, or what am I? What or you know? What am I holding, or what does this say, or can you tell me? Read me the directions, and you don’t have to say it in a

411
01:03:04.080 –> 01:03:08.160
Jesse Anderson: really specific syntax way, which is what I like about it.

412
01:03:08.480 –> 01:03:17.560
Jesse Anderson: But they are working toward a live camera thing. So like you right now, it can only look at what’s there when you take a picture.

413
01:03:18.249 –> 01:03:31.919
Jesse Anderson: But eventually, you know, like hopefully, we’ll get to the point where you know. Oh, I dropped my keys on the floor and I could say, Hey, I’m gonna pan my phone around. Let me know when you see my keys, or let me know when the the gray

414
01:03:32.370 –> 01:03:38.440
Jesse Anderson: forward, whatever arrives from my uber or my lyft, that kind of functionality.

415
01:03:38.600 –> 01:03:49.787
Jesse Anderson: So if you could just have a live. You combine live conversation. And then these custom allies along with live video

416
01:03:50.570 –> 01:03:59.120
Jesse Anderson: when that finally hopefully happens, and you and if they, I would love to have them put it on a more modern pair of glasses

417
01:03:59.320 –> 01:04:12.879
Jesse Anderson: like the Meta glasses, or something like it. And even having a computer version that just monitors your screen. I’m waiting for the day where that happens where I can use. It’s also on allies on the web as well.

418
01:04:13.150 –> 01:04:15.819
Jesse Anderson: and if it could just monitor my screen

419
01:04:16.070 –> 01:04:29.350
Jesse Anderson: and my microphone, and I could just be playing a game or looking at a map or a diagram, and just live chat with an ally to say, Describe this image to me, or, Oh, hey! Let me know when my health gets below 50%, or

420
01:04:29.770 –> 01:04:32.534
Jesse Anderson: what does that thing say above the door?

421
01:04:33.540 –> 01:04:40.649
Jesse Anderson: I think that’s where, like allies, like one of the closest things I’ve seen, and once they get

422
01:04:41.300 –> 01:04:49.079
Jesse Anderson: to the point where you can live video, monitor or camera monitor. And if you combine that with glasses.

423
01:04:50.240 –> 01:04:56.000
Jesse Anderson: that’s sort of where I’m hoping things are gonna go. And I’m hoping it happens sooner rather than later.

424
01:04:56.670 –> 01:04:58.940
Jesse Anderson: So just a couple ideas out there.

425
01:04:59.660 –> 01:05:02.929
Jesse Anderson: And I put a chat. I put a little item in the for the 3.rd

426
01:05:04.440 –> 01:05:07.970
Jesse Anderson: A couple thoughts on that on that 3rd presentation, too.

427
01:05:08.480 –> 01:05:10.110
Jesse Anderson: So I’ll be quiet now.

428
01:05:11.301 –> 01:05:15.379
andy slater: Hey? It’s Andy, if I if I can hop on to some of that

429
01:05:15.380 –> 01:05:16.110
Jesse Anderson: Yeah.

430
01:05:16.110 –> 01:05:22.180
andy slater: Yeah, no, I haven’t used Ally myself, but I have been reading about it, and it

431
01:05:22.180 –> 01:05:24.820
Jesse Anderson: Okay, I’ve been Beta testing it for like a few months

432
01:05:24.820 –> 01:05:34.360
andy slater: Cool. No, no, that’s awesome. And I think that if it’s something that you know you’re talking about screen sharing with a game or something, and it being able to

433
01:05:34.670 –> 01:05:43.570
andy slater: monitor all of that, wondering, you know, if you can integrate it in in a particular app, or have it as an extension or an add-on for things that already exist

434
01:05:43.570 –> 01:05:44.210
Jesse Anderson: Right.

435
01:05:44.210 –> 01:05:46.259
andy slater: Would be, would be wonderful and and

436
01:05:46.260 –> 01:05:55.060
Jesse Anderson: The the app did just come out for Android Ios and the web as of Monday, this last, or as of last Monday. So you can go play with it now.

437
01:05:55.230 –> 01:05:56.560
andy slater: It’s public.

438
01:05:56.560 –> 01:05:57.690
andy slater: Do you know I will?

439
01:05:58.460 –> 01:05:58.980
andy slater: I

440
01:05:58.980 –> 01:05:59.570
Jesse Anderson: And you

441
01:05:59.850 –> 01:06:04.309
andy slater: Have you or anybody else in in the meeting here?

442
01:06:05.580 –> 01:06:11.330
andy slater: Utilize with chat, gpt the voice mode with the live camera.

443
01:06:11.790 –> 01:06:20.289
andy slater: It’s conversational. It’s pretty pretty phenomenal. I sat down the 1st night I figured out what it was.

444
01:06:20.500 –> 01:06:29.350
andy slater: and had it give me like play-by-play, live audio description of Mary Poppins, and it was so on point. It knew the characters, names, and knew all of this stuff.

445
01:06:29.530 –> 01:06:38.280
andy slater: and then I tried it with uncut gems, and it was like, I don’t know how to do this, so I think it was the fact that, you know maybe it knew something about Mary Poppins, or was trained on that

446
01:06:39.170 –> 01:06:53.220
Jesse Anderson: This is Jesse. To my knowledge, I have played with the Google one a little bit, and it has it currently. I think it has the same limitation that Ally does is that it’s only looking at the Am. The image when you tell it to

447
01:06:53.470 –> 01:07:02.999
Jesse Anderson: it. Can’t just live monitor, and let you know when things change like when somebody enters the room. Or if you’re watching a movie, it’s just telling you

448
01:07:03.690 –> 01:07:07.390
Jesse Anderson: one moment in time, that’s all it can do right now

449
01:07:07.390 –> 01:07:08.660
andy slater: Yeah, you need to

450
01:07:08.660 –> 01:07:10.700
Jesse Anderson: And that’s what right where Ally’s at

451
01:07:10.700 –> 01:07:19.039
andy slater: Yeah. And I like that. And with the Gpt, it’s you know I was having it. Describe my room. I wanted to get a rug that matched my

452
01:07:19.240 –> 01:07:32.909
andy slater: my couch, and it was describing to me. And then my dog ran into frame, and it said, Who’s that? And I said, Oh, that’s Ozzy, Hi Ozzy! And then, 10 min later, still doing it, the dog came in and like, Oh, it’s Ozzy! He’s everywhere. You want to be so like. It’s that sort of like fun

453
01:07:33.170 –> 01:07:40.060
andy slater: and very personable, personable interaction that, you know. If that’s something that we’re talking

454
01:07:40.960 –> 01:08:02.099
andy slater: Ally what was really intriguing about that? Talking about being able to like having the different question prompts in the beginning, when you launch the app for the 1st time is setting those preferences on, you know. Who are you? What do you like? Are you an impatient person? You know that sort of thing. I don’t know what the questions are. I’m just, you know.

455
01:08:02.100 –> 01:08:06.619
Jesse Anderson: Yeah, and the other thing I will say as a limitation.

456
01:08:07.450 –> 01:08:11.560
Jesse Anderson: it’ll remember everything during a current session that you do.

457
01:08:11.670 –> 01:08:15.990
Jesse Anderson: But they haven’t built in like overall, like, if you know

458
01:08:16.130 –> 01:08:24.600
Jesse Anderson: like, let’s say I was using a version of the ally to do music tutoring, or, like, you know, drum drum lessons or something

459
01:08:25.040 –> 01:08:30.919
Jesse Anderson: it wouldn’t like if I started the app tomorrow. It wouldn’t remember our conversation today.

460
01:08:31.060 –> 01:08:34.479
Jesse Anderson: but that’s something that they are working toward.

461
01:08:34.830 –> 01:08:35.420
andy slater: Yeah.

462
01:08:35.700 –> 01:08:43.120
Jesse Anderson: And you’d have to put it in your bio, basically in the settings to to give it long-standing information

463
01:08:43.740 –> 01:08:50.110
andy slater: I think that you know the the more like machine learning like that incorporated into things that we will use

464
01:08:50.380 –> 01:08:52.790
andy slater: a lot is important, like I’ve

465
01:08:52.979 –> 01:08:59.299
andy slater: going back to Chat Gpt. I have enough conversations with it, and and use it for a lot of things, especially

466
01:08:59.580 –> 01:09:08.500
andy slater: note taking and brainstorming and things. And and just, for example, I was asking it, there’s an an inaccessible audio software

467
01:09:08.600 –> 01:09:16.509
andy slater: platform called Max Max, Msp. And and just as an aside, it’s somebody that I’ve been bugging to make accessible for screen readers for a while.

468
01:09:16.740 –> 01:09:24.879
andy slater: and I was talking to it, asking if there if they had any solutions, and that I talked to the company, and they didn’t want to prioritize it.

469
01:09:24.990 –> 01:09:26.699
andy slater: And the way that it answered is that

470
01:09:26.899 –> 01:09:51.419
andy slater: not making this accessible is lazy and ablest, and I was just like this thing’s listening to me way too much where it’s getting the character of myself in it. But that’s what I like, and I really think that that would be a fun and cool way of customizing something like I had said, like the example of maybe you know, entering, you know, and you’re in a 1st person game or something like that. You want a description of the space, and

471
01:09:51.640 –> 01:10:08.520
andy slater: you know, if it knew you enough. It could, you know, automatically, just assume and predict what you might be asking and that sort of thing. And you know I don’t know how close we are for that to be open source, or easy or cheap to obtain. You know we’ve talked a million times before about how

472
01:10:08.950 –> 01:10:17.440
andy slater: expensive you know some of the apps and and the hardware that we we would need to use to gain more access. And so

473
01:10:19.550 –> 01:10:27.960
andy slater: yeah, I don’t know. I’m I’m excited to to dig into that, and you know anything else that anybody else would recommend to mess around with. I’m

474
01:10:28.390 –> 01:10:29.640
andy slater: I’m all about that

475
01:10:30.720 –> 01:10:31.190
Dylan Fox: Yeah, I mean, I think.

476
01:10:31.190 –> 01:10:39.089
Jesse Anderson: The one feature of ally that I’m waiting for is the the more memory from session to session. So it does learn you your your

477
01:10:39.300 –> 01:10:41.059
Jesse Anderson: subtleties like you were talking about

478
01:10:41.060 –> 01:10:41.610
andy slater: Yeah.

479
01:10:42.210 –> 01:10:53.227
Dylan Fox: But that that does lead. This is Dylan. That leads to one of the questions I did want to discuss here, which is privacy. Right is, you know, when you tell the chat. Yes, this is my dog so and so.

480
01:10:53.880 –> 01:11:04.113
Dylan Fox: Who who else? Then, eventually kind of knows that information? Right. How can we have confidence that the groups that run these models?

481
01:11:04.670 –> 01:11:07.479
Dylan Fox: you know, at the base level? Are they

482
01:11:07.550 –> 01:11:29.140
Dylan Fox: feeding the information into the the AI and kind of like just a big amalgamation way. But then, are they also, you know, taking our information and specifically attributing it to us and knowing. Okay, we, you know, you’ve asked us to look at your your bank statement. And now that’s somehow gonna end up in the hands of the developers, right? Which

483
01:11:29.150 –> 01:11:38.470
Dylan Fox: you know, as far as I know, is not necessarily happening. But it’s hard to have. And if for me and a lot of people, it’s hard to have confidence that that isn’t happening

484
01:11:38.630 –> 01:11:53.549
Dylan Fox: without, you know, for example, a lot of government oversight. And so I wonder if people have ideas about that, or opinions about that resources for people who are interested in that kind of thing, because I know that’s been one of the sticking points for us in a lot of conversations about it.

485
01:11:54.960 –> 01:11:58.229
andy slater: I mean I do, but I don’t want to steal all the time.

486
01:11:59.190 –> 01:12:00.200
Jesse Anderson: Yeah, same.

487
01:12:01.230 –> 01:12:07.130
Dylan Fox: Yeah, definitely encourage folks who haven’t chatted yet to to unmute and and speak up. It’s community discussion

488
01:12:08.740 –> 01:12:10.440
Jesse Anderson: I could blab about this all day

489
01:12:11.610 –> 01:12:24.789
Thomas Logan: I wanted to add, just I think I was curious for the Max Msp. Because I’m a sound person. So, Andy, I was wondering what your prompt was, because I want to see if I can get my chat. Gpt to say, ableist.

490
01:12:24.790 –> 01:12:35.330
andy slater: My prompt. I was going down this thing, wondering how I can make like wise or fmod, which is the audio middleware sound designers use to, you know, to interact with

491
01:12:35.590 –> 01:12:47.669
andy slater: making a game or something in unity or unreal, and that stuff’s not accessible. And then I got on. I was asking. And and then I said, Well, what about Max? Msp. And I’ve you know, cycling 74. Who is the company?

492
01:12:48.291 –> 01:12:56.150
andy slater: Doesn’t even necessarily want to even talk further with me about this. And and I’m a sound designer. That’s like one thing that I’ve always wanted. And

493
01:12:56.310 –> 01:13:15.410
andy slater: and I was like, can this be made accessible from your knowledge, because they’re not going to be helpful. And that’s kind of how it did like not prioritizing accessibility is lazy and ableist. They should know better. Why not? You know blind people should be in charge of sound and that sort of thing, so I don’t know if it’s just mimicking what I say.

494
01:13:15.660 –> 01:13:19.590
andy slater: Try it out and and and see how it feels. But

495
01:13:21.000 –> 01:13:24.029
andy slater: yeah, and it’s and it’s weird. So it’ll give me that kind of

496
01:13:24.170 –> 01:13:27.919
andy slater: back and forth, but what it won’t do, and this is going back to what Dylan just said.

497
01:13:28.370 –> 01:13:42.409
andy slater: It’ll tell me what the back of the check is for me to sign where I do my mobile deposit. But then it wouldn’t tell me what the amount was for, because I needed to go through 2 checks and see which one was what. So it’s like. I can’t tell you how much this is for or who it’s made out to.

498
01:13:42.540 –> 01:13:47.820
andy slater: I’m talking about Chat Gpt when I was using it. Now, seeing AI would give me all of that information.

499
01:13:48.030 –> 01:13:53.970
andy slater: But Openai won’t. And and also I had. I tried to get it to read me the cover of a book

500
01:13:54.130 –> 01:14:00.900
andy slater: and Chatgpt was like this. Looks like it’s a cover of a book. I can’t give you any information. And I think that’s because of

501
01:14:01.290 –> 01:14:06.089
andy slater: IP or copyright laws and that sort of thing. And so it’s like one thing to keep

502
01:14:06.200 –> 01:14:10.169
andy slater: privacy of my bank statement, or my mail, or something like that. But

503
01:14:10.720 –> 01:14:14.980
andy slater: to have this like barrier to a book

504
01:14:16.740 –> 01:14:34.370
andy slater: is like, that’s that’s all. Because I think they’re afraid of, like, you know, regulations or something like that. And so, like some of these regulations, are taking away our access. But we also want to keep our privacy, and I will tell you like one of the reasons that’s held me from getting the meta glasses is because I don’t trust Meta with

505
01:14:34.490 –> 01:14:40.840
andy slater: a lot of my stuff. I don’t want them reading my mail, but somehow I have no problem with Microsoft knowing it, you know. So there’s really

506
01:14:41.310 –> 01:14:48.389
andy slater: I would love to have more transparency from these companies. As to where these things go beyond, like what the regular privacy statements are, you know.

507
01:14:53.690 –> 01:15:03.209
Yuhang Zhao: Yeah, I also want to chime in about this privacy issue of Arvr technologies and overall accessibility technologies. I feel like there is always a

508
01:15:03.210 –> 01:15:26.479
Yuhang Zhao: trade-off between. You know what functionalities we can get and how we can better protect people’s privacy, sensitive data, including both the directly user sensitive data. For example, a blind user. If they’re trying to take photos, get more information from Gpt and so on. Where do those data go who will be seeing that there is also another layer. I think people have discussed a lot in

509
01:15:26.480 –> 01:15:50.050
Yuhang Zhao: research, domain and industry as well in terms of the bystanders privacy that if you know by chance we capture someone else in that image. And then how do we deal with those data? And so on. So I think there’s definitely a trade-off in terms of you know how we should handle this. And I also find it’s interesting in terms of the data access we have

510
01:15:50.050 –> 01:16:18.519
Yuhang Zhao: from the current AR VR devices. This is also some some difficulty our lab has been experiencing now is, if you want to develop any AR VR applications especially based on, for example, quest, and so on. Right? We want to design more features to access the front camera so that we can better support. You know, people’s safety enhance mixed reality, accessibility, and so on, but also because of the privacy

511
01:16:18.660 –> 01:16:43.459
Yuhang Zhao: policy. I think a lot of these companies, for example, Meta, they disable the access for developers, if all of these cameras and sensors, and so on, I mean, probably for privacy reasons. But I also see there is this inconsistent in terms of one. We are able to have access to this data, and when not, why not? For example, as Jesse and Andy mentioned this, I

512
01:16:43.460 –> 01:16:57.780
Yuhang Zhao: application right? So if you are trying to develop some specialized devices, maybe you’re able to access the camera and do a bunch of things. But if, as a commercial device, I think there is more restrictions on sensor

513
01:16:57.780 –> 01:17:21.209
Yuhang Zhao: for developers, and so on. So I think all of these could be interesting problems to discuss. I don’t have any answers in terms of how we should like moving forward towards these, like what data we should access and whatnot. But I think sometimes we also need those data for accessibility purposes for aggregate purposes, but also because of the privacy concerns. We are somehow

514
01:17:21.210 –> 01:17:26.780
Yuhang Zhao: banned from the access of these data. So there is definitely some barriers there, I guess.

515
01:17:28.400 –> 01:17:38.189
Dylan Fox: Yeah. And it, it’s definitely there’s a question about, where do the the platforms and things come into this right? Because I was reading a a paper, and I’ll see if I can find it and put it in chat.

516
01:17:38.400 –> 01:17:45.939
Dylan Fox: That was about a system that could. Theoretically, you know. Let’s say you’re you’re wearing a AR headset in the hospital.

517
01:17:46.408 –> 01:17:51.199
Dylan Fox: Is this system that would like automatically blur the faces of everyone except

518
01:17:51.360 –> 01:18:13.460
Dylan Fox: the person you were talking to just to the application level. Not like. Wouldn’t blur your vision but it would use an understanding of like, who are you talking to, you know, in terms of proximity and like lingering focus on person. And once it had that criteria that oh, yes, you’re actively talking to this person, it would let other applications access the facial data of that

519
01:18:13.941 –> 01:18:22.018
Dylan Fox: which I thought was really interesting. Because, you know, you could imagine going around a hospital wanting to be able to have

520
01:18:22.850 –> 01:18:43.039
Dylan Fox: You know the doctors and the nurses maybe offer up and say say proactively, you can use my face for face recognition, but protect the privacy of the other people that are just patients. Right? And applications like that, yeah, require a certain amount of interplay and and deep tech access.

521
01:18:43.330 –> 01:18:55.809
Dylan Fox: And right now, I think there’s a lot of kind of people in academia that have some interesting ideas about this. There’s the companies that are kind of doing mostly what they want, and we need to to take their word on it.

522
01:18:56.231 –> 01:19:05.268
Dylan Fox: I think in EU there’s more kind of strict regulation and and enforcement to a certain extent. But it’s a it’s a complicated subject.

523
01:19:06.980 –> 01:19:18.969
Thomas Logan: I’m just gonna throw out there, too, that my, this is more like back in simpler times of just querying our Google search results. But I I honestly feel like to trust

524
01:19:20.133 –> 01:19:46.789
Thomas Logan: almost any company. It’s almost at this point of like one thing that people used to do was when they perform the search, use a plugin that would also do 10 other searches right? And then it’s like, if you’re not giving so much false data with it. And you picture. I also watch a lot of true crime. But I’m like, if you picture being like you have to report on like, what if someone does get access to that information. If there’s not just like fake information also in there, I mean

525
01:19:46.920 –> 01:19:57.100
Thomas Logan: for someone to say they’re protecting your data and or keeping it private. I feel like we live in a world where you’re like. If you really trust any of these companies to properly do that and or not like

526
01:19:57.650 –> 01:20:09.939
Thomas Logan: do that? I don’t trust that, you know. So I’m like, and other than like obfuscating via like fake data being mixed in with real data. I, personally don’t have a trust in any company.

527
01:20:10.458 –> 01:20:10.969
Thomas Logan: Think at this point

528
01:20:10.970 –> 01:20:16.579
Jesse Anderson: Yeah, this is Jesse again, and I’m kind of in the mind of like

529
01:20:17.120 –> 01:20:23.220
Jesse Anderson: I try to do as much as I can, you know, for privacy related things. But you know, it’s like.

530
01:20:23.750 –> 01:20:44.200
Jesse Anderson: you know, a lot of the free software or free websites, you know, outlook or Gmail, or whatever it’s like, you know, if if you’re not paying for the product, you are the product. And so I don’t necessarily trust. You know as much as I use Microsoft as much as I use Google. I’m really not a big Meta.

531
01:20:44.640 –> 01:20:51.830
Jesse Anderson: or especially Facebook fan. But I do have the glasses, and I kind of just resigned myself to the fact that well.

532
01:20:52.110 –> 01:20:59.270
Jesse Anderson: where we are I’m I’m not going to overshare, but I’m going to like if I want really access to

533
01:20:59.800 –> 01:21:15.550
Jesse Anderson: these types of services I have to give up at least some of my information which is unfortunate, but, like privacy, is like, I don’t wanna say privacy is dead but cheekily, but you know

534
01:21:17.540 –> 01:21:20.320
andy slater: But there’s there’s some places that

535
01:21:20.580 –> 01:21:31.229
andy slater: we give our our information like a grocery store. Right? They know what you’re buying. They know where you are they? You know all that sort of thing, you know. If you’re using

536
01:21:31.460 –> 01:21:38.620
andy slater: Instacart or your your rewards card, or whatever, let’s say that were to have

537
01:21:38.870 –> 01:21:47.679
andy slater: an app or or something where you go into Aldi’s, or like Heb or something, and you have your shopping list ahead of time, and you know it could tell you where to go.

538
01:21:47.960 –> 01:22:03.119
andy slater: either using like Vps or whatever even say you bought this last time. This is, it’s in this aisle now, and go down this way and that sort of thing, either using your glasses or your camera on your phone, or whatever. That’s something where I’m willing to

539
01:22:03.590 –> 01:22:14.750
andy slater: allow that transparency of my life because of its convenience. And you know, unless you’re paying cash, and you’re not giving your Id or your like rewards card, or whatever

540
01:22:15.080 –> 01:22:22.919
andy slater: they kind of already know it. And and so there are certain things there are like certain environments in worlds where I’m totally cool with making those concessions

541
01:22:22.920 –> 01:22:23.670
Jesse Anderson: Yeah.

542
01:22:24.790 –> 01:22:27.180
andy slater: Because, honestly, like I use Instacart

543
01:22:27.370 –> 01:22:33.830
andy slater: all the time. Maybe I’ll go to Cvs or the gas station to pick something up. But I just get the delivery, because.

544
01:22:34.230 –> 01:22:38.250
andy slater: you know, wandering around grocery store when you’re blind.

545
01:22:38.692 –> 01:22:44.960
andy slater: It’s it’s it’s a big time waster, and it’s frustrating. And I always feel like I’m in other people’s way and stuff. But

546
01:22:45.240 –> 01:22:51.510
andy slater: you know that autonomy, you know, which is something that you know, Dylan, you had written about being

547
01:22:51.830 –> 01:23:00.259
andy slater: being something that could be a blind in the blind future. And you know, I feel like we’re actually really pretty close to that

548
01:23:00.650 –> 01:23:01.520
Jesse Anderson: But

549
01:23:01.750 –> 01:23:06.039
andy slater: That, you know, developing something like that, that tool

550
01:23:06.340 –> 01:23:14.570
andy slater: who knows if, like Lockheed Martin, or whatever that, you know, like some somebody that shouldn’t have that. Technology is going to take that. And and you know, exploit

551
01:23:14.880 –> 01:23:15.480
Jesse Anderson: Okay.

552
01:23:15.480 –> 01:23:19.700
andy slater: Sound paranoid, but I mean that this is really kind of just like the whole conversation

553
01:23:19.700 –> 01:23:20.180
Dylan Fox: And black

554
01:23:20.670 –> 01:23:22.660
Jesse Anderson: Not us anything right?

555
01:23:22.660 –> 01:23:33.449
Jesse Anderson: Totally totally I would recommend following envision on social media, ENVI, SION, literally, about a week or 2 ago

556
01:23:33.630 –> 01:23:51.899
Jesse Anderson: they posted a video that for a product that I I think it’s built building on Ally or something. And it’s not with the meta glasses right now, but I think it’s with the upcoming glasses. They did an exact video demo, and I don’t know how realistic or how far out it is.

557
01:23:52.060 –> 01:24:09.249
Jesse Anderson: but they did exactly that type of thing where, like they had a woman shopping in a grocery store, and it like literally guided them audibly to like where the apples were, and then she would hold up an apple like, oh, you’re holding a Granny Smith apple. Oh, I want this one. And then I want to go check out.

558
01:24:10.890 –> 01:24:17.470
Jesse Anderson: Yeah, follow them on social media. Cause I’m curious they might be coming up with some neat stuff pretty soon.

559
01:24:20.860 –> 01:24:28.229
Dylan Fox: Awesome. Well, we’ve got about 15 min left here. I wanna to open up to to somebody that hasn’t had a chance to speak yet, and see what else

560
01:24:28.410 –> 01:24:34.270
Dylan Fox: is on folks, mind and what? What needs discussion from the Brain trust we got here.

561
01:24:46.870 –> 01:24:49.580
Dylan Fox: Terry, were you trying to say something? It looks like you

562
01:24:50.230 –> 01:24:54.534
terry Fullerton: Yeah, yeah, I am. It’s good.

563
01:24:55.260 –> 01:24:58.440
terry Fullerton: I was. Wasn’t sure if my speaker was on. Can you hear me? There

564
01:24:58.870 –> 01:24:59.340
Dylan Fox: Yes.

565
01:24:59.340 –> 01:25:13.449
terry Fullerton: And so I’m I’m Terry from New Zealand, and I’m working with a Phd. Student at the University of Canterbury. Here, looking at augmented reality to help people like me with with macular degeneration

566
01:25:15.140 –> 01:25:31.209
terry Fullerton: the point I was interested in with with the talk about the subtitles. Subtitles to me are the worst thing, you know, I can watch a movie. But as soon as it turns to subtitles I can’t read them. And and also in the movies these days. Often

567
01:25:31.370 –> 01:25:35.969
terry Fullerton: they’ll do a screenshot of a phone or something, and you’re supposed to be able to read

568
01:25:36.210 –> 01:25:40.390
terry Fullerton: what’s on the phone in the middle of the movie to get the plot, and that so

569
01:25:40.500 –> 01:26:10.010
terry Fullerton: that was one comment, but really interested, too, about the session there, about using cooking and that. And of course, the hardest thing there is digital stoves, because anything digital is is the hard one. I was trying to get on a flight yesterday, and I checked in until I got to the page where it said, Have you got a gun or not? And I didn’t want to click the wrong button? So that’s a couple of comments from me

570
01:26:12.890 –> 01:26:13.969
Dylan Fox: Awesome. Thanks, Terry.

571
01:26:16.980 –> 01:26:20.259
Dylan Fox: Anybody else that that has been been waiting to chime in.

572
01:26:20.370 –> 01:26:21.510
Dylan Fox: Larry. Go ahead.

573
01:26:21.880 –> 01:26:22.650
Larry Goldberg: Hey there!

574
01:26:23.432 –> 01:26:24.730
Larry Goldberg: Larry Goldberg! Here

575
01:26:24.940 –> 01:26:35.640
Larry Goldberg: people in New York should be interested in the fact that the New York City Bar Association just put together a subcommittee on AI and the impact on people with disabilities.

576
01:26:35.870 –> 01:26:39.919
Larry Goldberg: And we’re going to be recording a podcast tomorrow.

577
01:26:40.060 –> 01:26:52.550
Larry Goldberg: it’ll probably start streaming in a couple of weeks, and this is a group of powerful lawyers in the city who are kind of far behind in understanding what AI’s impact is on the legal profession.

578
01:26:52.770 –> 01:26:57.259
Larry Goldberg: but they’ve also, in a number of cases, proposed legislation

579
01:26:57.430 –> 01:27:00.150
Larry Goldberg: before the New York State Legislature that has passed.

580
01:27:00.520 –> 01:27:03.580
Larry Goldberg: So we’re going to be getting together with some of our

581
01:27:03.720 –> 01:27:08.970
Larry Goldberg: well-known friends in the area of AI, like Yuta, Trevorainis and Arianna Balafia from

582
01:27:09.120 –> 01:27:18.729
Larry Goldberg: Cdt, and start trying to come up with some ideas on actual tangible actions, that we can take that promote the benefits

583
01:27:18.910 –> 01:27:21.900
Larry Goldberg: as well as help avoid some of the hazards

584
01:27:22.270 –> 01:27:30.549
Larry Goldberg: on AI, and accessibility. So stay tuned, for that. All the information will be public, so you’ll have a chance to see it all yourself.

585
01:27:34.400 –> 01:27:41.583
Dylan Fox: Awesome. Yeah, and I’ll I’ll take the the opportunity to to boost as well that we are working on the

586
01:27:42.270 –> 01:28:03.394
Dylan Fox: accessibility in the metaverse working group of the metaverse standards Forum. We are working on trying to create a kind of W. 3 C. Wcag level stand accessibility standards for Xr. And we meet on that every Tuesday every other Tuesday at 8 am. Pacific including tomorrow is the next one. So if you’re interested in that, I’ll put the

587
01:28:04.100 –> 01:28:22.360
Dylan Fox: group here in the chat and feel free to to shoot me an email, because there’s there’s a lot that goes into trying to determine whether or not any of these systems is accessible, especially once AI enters the picture. So if you’re interested, definitely feel free to reach out

588
01:28:29.460 –> 01:28:30.240
paul jackson: Sorry man.

589
01:28:30.910 –> 01:28:31.530
Karim Merchant: Okay.

590
01:28:32.000 –> 01:28:33.909
paul jackson: No go ahead, Carmen! Carmen! Go ahead!

591
01:28:34.440 –> 01:28:37.107
Karim Merchant: I’m sorry I couldn’t see a put up your hand button

592
01:28:37.630 –> 01:28:58.060
Karim Merchant: so I can take turns. But really, briefly, my name is Kareem merchant really quick question, which is really more of an open question, maybe not suitable for today. Given the breadth of stuff we’ve already talked about. But you know, acknowledging the fact that a lot of people have multiple disabilities. How do you see the potential

593
01:28:58.470 –> 01:29:19.979
Karim Merchant: challenges for this? For the sort of technology has been discussed today with, say, cognitive disabilities and the need to, you know, be consistent or to contextualize, maybe more. And things like that are there specific things that need to be brought into the conversation is that being attack? Are there initiatives in that? Are you looking at it? I’m just curious

594
01:29:22.327 –> 01:29:26.070
andy slater: Hey? It’s Andy I could say. One thing is is, you know, be

595
01:29:26.240 –> 01:29:29.609
andy slater: being blind with Adhd, and then working a lot with sound. I’ve

596
01:29:29.870 –> 01:29:49.900
andy slater: come to understand that, you know, having options for people that have, like maybe audio processing need audio processing accommodations and volume control, and that sort of thing that you know, going going off the the premise of being able to customize the kind of access that you want, and giving options

597
01:29:50.750 –> 01:29:59.709
andy slater: for people to kind of control those sort of things, either in like, you know, in a in a VR experience, or in immersive like even like an escape room, or something like that.

598
01:30:00.090 –> 01:30:02.860
andy slater: It in, you know, employing.

599
01:30:03.110 –> 01:30:11.340
andy slater: Let’s face it. Accommodations, conflict, and sometimes the result is kind of fun and cool. And then in other other times. It’s

600
01:30:11.650 –> 01:30:20.650
andy slater: just going to be a challenge 1st off developing the accommodations for certain different disabilities, but then, like combining them in a way that

601
01:30:21.214 –> 01:30:28.500
andy slater: you know, other people might need multiple accommodations at once. I’d love to hear if anybody else has anything, because

602
01:30:29.190 –> 01:30:34.329
andy slater: seems to be, you know, a real challenge, and I could. I could see it being even harder

603
01:30:37.530 –> 01:30:59.910
Yuhang Zhao: Yeah, just want to echo what Andy has been saying, this is yuhong here. So we didn’t work too much in my research lab on multi type of disabilities. But we did start thinking about how we can better support people with Adhd via AI technology. So one of our ongoing project is, how can we better

604
01:30:59.910 –> 01:31:22.099
Yuhang Zhao: customize enable people with Adhd to customize video content so that we can remove different distractors in the video? For example, can you simplify the background. Can you enable them to adjust the captions? Can you zoom into certain content that they are interested in and so on? But I think it’s a great point to also consider what if

605
01:31:22.100 –> 01:31:47.210
Yuhang Zhao: people have multiple disabilities, and then to what extent people can customize things. And there is also another thing we’ve been thinking about is when we talk about customizations because there is a wide range of flexibility that people can customize. And for people with Adhd, the feedback we get is if the customization is too complex and it also overload people, and then what would be the

606
01:31:47.280 –> 01:31:51.799
Yuhang Zhao: good balance we can do there? Should we provide some optimal

607
01:31:52.290 –> 01:32:05.210
Yuhang Zhao: combinations to recommend to people and then provide further customizations? Or should we open up this full agency to people so that they can customize everything. So that’s also something we are looking into

608
01:32:06.950 –> 01:32:10.150
Thomas Logan: I just wanna chime in that. I think we

609
01:32:10.730 –> 01:32:24.849
Thomas Logan: I was. I was the only one here that made us look at a Wcag success criterion, unfortunately, but it does dovetail into, I think, what Larry is talking about, too, of like working with lawyers, and think in getting into the reality of getting

610
01:32:25.060 –> 01:32:26.779
Thomas Logan: getting people to

611
01:32:27.080 –> 01:32:40.540
Thomas Logan: make changes to technology or defining what people need to do. We had a really interesting discussion with the I mentioned the fabled person in my 10 min. But we had a presentation meet up this month.

612
01:32:40.720 –> 01:32:56.130
Thomas Logan: benchmarking usability and the cognitive accessibility frontier from the founders of fable, and I did find it very interesting that I know that cognitive has been really hard area to quantify. And you know, it’s really been the one

613
01:32:56.290 –> 01:33:03.510
Thomas Logan: you know, pointed out in the web content accessibility guidelines. You know. They’ve done work on that, and they continue to do work on that. But it is

614
01:33:04.117 –> 01:33:26.250
Thomas Logan: you know, it can be more difficult. So I I found it interesting in their presentation, talking about usability studies. And yeah, just wanted to throw that out there as a resource, too, that there’s certain items which I feel like are easier to say. Sort of technically, you have to do this. And then there are items where, obviously, as we’ve been talking about like testing with real people getting feedback somehow, you know.

615
01:33:26.400 –> 01:33:35.782
Thomas Logan: making sure the research and the sort of corporate work all comes together, and we we learn what are the right design patterns.

616
01:33:36.300 –> 01:33:47.320
Thomas Logan: I found their talk really interesting just because they are trying to make that into more of a quantifiable scale, on, not on the research side, but more on the corporate side, so might be of interest as well for cognitive

617
01:33:50.700 –> 01:33:56.130
andy slater: There’s. There’s also something that I know. This could be a 20 day conversation, but

618
01:33:56.420 –> 01:33:59.439
andy slater: a lot of the words and terms that

619
01:33:59.630 –> 01:34:09.090
andy slater: you know, especially those of us that do research. Maybe for like institutions or public funding, we can’t use those anymore. And grant proposals, or maybe even like

620
01:34:09.640 –> 01:34:13.040
andy slater: public facing content.

621
01:34:13.340 –> 01:34:19.160
andy slater: And that’s something that I think we’re all going to need to just come to terms with or figure out alternative

622
01:34:19.270 –> 01:34:22.540
andy slater: ways of describing things. I know that on the whole.

623
01:34:22.640 –> 01:34:30.039
andy slater: forbidden words, that terms accommodation isn’t on that list, so we could at least be using that. But there’s just

624
01:34:30.200 –> 01:34:32.080
andy slater: so many things now, where

625
01:34:32.610 –> 01:34:39.930
andy slater: some ais have been programmed to acknowledge the word disability and inclusion and access, and that sort of thing. And if

626
01:34:40.050 –> 01:34:46.009
andy slater: that’s something that you know, if if we want to use any of any of those tools to help us

627
01:34:46.150 –> 01:34:53.870
andy slater: navigate, you know those, you know, those those things, especially if we’re trying to do research, or even

628
01:34:54.020 –> 01:34:56.790
andy slater: in school. Talk about those sort of things like

629
01:34:57.350 –> 01:34:59.904
andy slater: we’re really in in bad shape.

630
01:35:00.770 –> 01:35:08.889
andy slater: I have friends that have lost State department grants because their whole things were about disability and inclusion and access, and that sort of thing. So it’s like.

631
01:35:09.240 –> 01:35:15.849
andy slater: I don’t know, Larry, if that’s something that the Bar Association wants to talk about with you, but I think that it’s something that

632
01:35:16.330 –> 01:35:19.910
Larry Goldberg: Yes, it absolutely. It absolutely came up

633
01:35:20.310 –> 01:35:26.210
Larry Goldberg: immediately in those forbidden words. What’s the 1st one? Alphabetically? Accessibility

634
01:35:26.210 –> 01:35:26.760
andy slater: Yeah.

635
01:35:27.530 –> 01:35:29.229
Larry Goldberg: So yeah, we’re looking into that

636
01:35:29.230 –> 01:35:29.770
Dylan Fox: Yep.

637
01:35:31.860 –> 01:35:34.559
andy slater: But accommodation isn’t on there, folks so

638
01:35:37.210 –> 01:35:38.739
andy slater: That’s positive, at least

639
01:35:40.880 –> 01:35:41.350
paul jackson: Well, yeah.

640
01:35:41.350 –> 01:35:42.880
Dylan Fox: Oh, go ahead!

641
01:35:43.520 –> 01:36:02.709
paul jackson: I guess I should introduce myself. My name is Paul Jackson. I have a Phd in computer engineering that got in 1998. I did like 20 years of research at Boeing and augmented in virtual reality. I was diagnosed with multiple sclerosis, though in 93. And now I’m living in Tacoma instead of Seattle? And am I part of the Seattle? Swedish

642
01:36:03.630 –> 01:36:09.520
paul jackson: multiple sclerosis talk, and I’ve been interjecting some things in the chat. And so the work with this

643
01:36:09.850 –> 01:36:19.170
paul jackson: company mixed reality, and Jeff Raynar is the CEO of that, and he’s teamed up with the Swedish group to create a

644
01:36:19.330 –> 01:36:30.080
paul jackson: a meta application for those that are involved with Ms. And it gives you like a simulation of a studio. There’s 3D kind of games that you can do to give you

645
01:36:30.250 –> 01:36:48.419
paul jackson: like reaching to increase your mobility and things like that. But I’ve been following the disability side for a long time. I remember, like 20 years ago Mit had a discussion called Abel, and he had who her was a person on the on the panel, the guy who was like the CEO of of

646
01:36:49.190 –> 01:36:51.649
paul jackson: of oh, she’s given the the company.

647
01:36:52.860 –> 01:36:59.157
paul jackson: It was a company that had a wheelchair that can go up the stairs, and so, and another gentleman who was a disabled person

648
01:36:59.480 –> 01:37:10.140
paul jackson: commentator, and I think they wrote for like New York Times, and he’s in a wheelchair, and he put lights on his wheelchair in the sense that he wanted to make sure he could customize and make the disability his own.

649
01:37:10.160 –> 01:37:34.292
paul jackson: And so I thought that was fascinating. So I’ve been trying to find that link. I know it’s like, 20 years ago, I did find it on Youtube. But the Mit able is something I think is very powerful, but as far as disabilities, I hate to think that with new administration that they’re going to cut that funding doesn’t make any sense. But that’s unfortunately what what the world is, or what the Us at least is devolved into. With that I’ll just

650
01:37:35.910 –> 01:37:46.589
paul jackson: give. I’ll give you. I’ll give my time. Okay, God bless you guys, as a great, great discussion, I’m wondering. The chat. Can somebody email me in the chat because I’m on my phone right now. So

651
01:37:47.200 –> 01:37:48.999
paul jackson: I’m going to the members

652
01:37:49.190 –> 01:37:58.139
Dylan Fox: We’ll be putting the the recording of this talk, as well as the chat and some notes and things as well on the on the event page probably sometime later this week.

653
01:37:58.480 –> 01:37:59.520
paul jackson: Okay. Thanks.

654
01:37:59.660 –> 01:38:00.200
Dylan Fox: Yeah.

655
01:38:04.630 –> 01:38:05.669
Dylan Fox: Oh, go ahead, I think.

656
01:38:05.670 –> 01:38:06.140
Dylan Fox: Oh, Hi!

657
01:38:06.140 –> 01:38:07.970
Dylan Fox: Last comments before we should probably wrap up

658
01:38:08.290 –> 01:38:33.519
Akila Gamage: Alright cool. Hi, I’m Akila here. And I’m from New Zealand. And yeah, actually, I work with Terry Terry. And yeah, just to add couple of things regarding the customizability when it comes to all the accessibility stuff. So yeah, I’m doing my Phd research. And when I’m looking for at at my 1st phase, when I’m looking for looking to identify the requirements of

659
01:38:33.620 –> 01:38:45.190
Akila Gamage: individuals that have blindness and low vision. One thing that came up to me is usually the the progressive.

660
01:38:45.370 –> 01:39:00.280
Akila Gamage: How the vision get vision loss get progressively difficult for people like even they use one device at a particular time. It usually doesn’t suit for them, maybe 5 years time. So developing a specific

661
01:39:00.827 –> 01:39:11.470
Akila Gamage: device for a specific task at a 1 point won’t be going to be helpful for them in a later period. So I actually followed the work from

662
01:39:11.600 –> 01:39:32.909
Akila Gamage: Professor. You hung also for closely regarding the flexibility that there are solutions like flexic provider. And I think one thing would be one thing. One interesting thing will be to assess continuously, maybe through a maybe a Vro AR device to assess the visual conditions. Maybe

663
01:39:33.000 –> 01:39:51.589
Akila Gamage: as time goes by for people to maybe provide more adaptable solution. So then, because the VR devices are actually expensive, that there’s no point of switching devices as they go along. So I think that could be again something useful for people.

664
01:39:51.790 –> 01:40:05.379
Akila Gamage: And yeah. And one more thing about that audio subtitles stuff. Again, when I talk with people. One thing one person said to me that when she, when she goes to cinemas. Reading subtitles again

665
01:40:05.480 –> 01:40:35.169
Akila Gamage: like doesn’t work for her in foreign movies, and she has asked from the people in the theater that is it possible to get a audio version of subtitles for her, so I’m not sure whether is there anything like that? So maybe like in a cricket match, where you can wear a small device in your ear where you can hear the radio commentary. Something like that. I’m not sure whether there’s something available like that that can directly translate the subtitles for the person.

666
01:40:35.270 –> 01:40:41.229
Akila Gamage: Yeah, something could be again interesting. I’m not sure. So yeah, that’s that’s all. That’s the comments from me.

667
01:40:42.270 –> 01:40:43.429
Akila Gamage: Yeah, thank you.

668
01:40:43.430 –> 01:40:53.149
Dylan Fox: Awesome. Thank you. Akil, yeah. I know there are certain places in the Us where you can get go to movie theater and get yeah. Audio description headsets that can work with it.

669
01:40:53.786 –> 01:40:56.463
Dylan Fox: But certainly a good spot for

670
01:40:57.360 –> 01:41:01.410
Dylan Fox: you know, wearable devices and and AI to potentially play play a part.

671
01:41:02.773 –> 01:41:09.129
Dylan Fox: But I think, for now we are just about at time. I wanted to

672
01:41:09.506 –> 01:41:38.469
Dylan Fox: invite anybody who will be around at the end of the month. We’re actually holding a special event. On Sunday, which is which is unusual for us. But this is gonna be with the helping hands which is a very large, I think, 25,000 person. Last I was I was talking deaf community that is based out of the VR platform VR, chat, and so we’ll be doing a panel with them. So I posted that link

673
01:41:38.470 –> 01:41:38.820
andy slater: Awesome.

674
01:41:38.820 –> 01:41:39.590
Dylan Fox: There.

675
01:41:40.800 –> 01:41:41.850
Dylan Fox: And

676
01:41:41.960 –> 01:41:55.120
Dylan Fox: yeah, I think, thank everybody so much for coming out. We will be, as I said, posting the video of this talk as well as the notes in the chat. To our event. Page for the this event.

677
01:41:56.460 –> 01:41:57.570
Dylan Fox: And

678
01:41:57.740 –> 01:42:02.869
Dylan Fox: yeah, I think we’ll we’ll definitely look forward to to seeing you in the next one. Pierce any any closing words for us

679
01:42:03.260 –> 01:42:04.889
Peirce Clark: No, that was great. Thanks, everyone.

680
01:42:06.470 –> 01:42:08.240
Dylan Fox: Alright! Thanks, everybody!

681
01:42:09.610 –> 01:42:10.480
paul jackson: Thank you.

682
01:42:11.550 –> 01:42:12.280
terry Fullerton: Thank you.