Extensive software architecture experience designing highly available, scalable, and durable systems that routinely and reliably process large amounts of data for startups and high-growth organizations
Experience implementing AI, NLP, and information retrieval platforms that analyze, present, and enrich business critical information across disparate data sources
Expertise in capturing and analyzing data to drive insights using large-scale data storage and analysis systems and toolkits
CTO – Passionately working with my team on the first artificially intelligent medical scribe. We are focused on providing physicians the best products in the industry specifically developed to minimize the effort associated with medical documentation.
About – Listen.MD™ is the first artificial intelligence medical scribing application that listens to physicians as they talk to their patient—automatically drafting the visit note for the physician as they exit the exam room.
Principal & Founder – Successfully designed and implemented data and analytical systems for Fortune 50 healthcare, energy, SEO, finance, and retail intelligence customers utilizing AWS, Google Cloud Platform, Redshift, Spark, Hadoop, Lucene/Solr, Java, Ruby on Rails, NodeJS, Python, React, Redux, D3.js. Data science, natural language processing, and machine learning systems developed with a variety of Python and Java frameworks.Accomplishments »
About – Opaque Dot provides consulting services for data-intensive software architectures, machine learning, and data engineering and science. The firm specializes in highly available, scalable, and durable software systems that routinely and reliably process large and disparate data sets for high-growth organizations and startups.
CTO & Co-Founder – Founded, grew, and led the engineering team that developed and delivered the flagship product to market. Participated in successfully fundraising $3.5 million from angel and institutional investors through seed and series A rounds. Designed and implemented architecture for data processing and search across a variety of disparate data sources. Product and engineering team acquired by Wellspring Worldwide in 2016.Accomplishments »
About – Collective IP is a business intelligence software provider that utilizes advanced information retrieval, machine learning, and natural language processing techniques to identify global licensing opportunities from technology transfer, patent, clinical trial, research grant, SEC, and scientific publication data.
Graduate Research Assistant – Developed applications and tools for non-technical colleagues to collect and analyze large amounts (i.e., billions) of social media streams to aid in the discovery and response of mass emergencies. Over 50 publications are associated with the data collected by the system.Accomplishments »
About – The University of Colorado led Project EPIC is a $2.8 million multi-disciplinary, multi-university, NSF grant to support the information needs of the general public during times of mass emergency.
Principal & Co-Founder – Principal consultant working with startup and Fortune 500 enterprises in the areas of software architecture, information retrieval, and data science.
About – 287 Development provides services in project management, information retrieval, distributed systems, web applications and machine learning using Agile/Scrum, Lucene/Solr, Hadoop, Cassandra, JPA, Spring and Spring MVC, Ruby on Rails, Mahout, and OpenNLP.
Software Engineer – Lead engineer for consumer facing applications including platform design, implementation, deployment, and user experience for both web and mobile channels.
About – Mocapay is a provider of mobile payments software. Mocapay enables merchants and consumers to interact through payments, loyalty, and gift transactions using their SaaS or mobile-based platform.
Software Engineer – Started with the first 25 employees and grew to be a senior member of the core engineering team, leading and serving as the scrum master of a 7 person development team.
About – Leading provider of SaaS-based solutions for Agile project management within large software enterprises. The company has 170,000+ users and conducted a successful IPO in 2013.
Software Engineer Intern – Member of the Weblogic Portal applications team. Designed and implemented the Groupspace Analytics framework from components to user interface using J2EE Portal specific APIs (including page flow and beehive). Resulting framework was deployed to end users and submitted by BEA for an International and United States patent.
Information Security Co-Op – Member of the Information Systems Security team. Designed and implemented an automated auditing environment to satisfy government security requirements using distributed system techniques to parse and interpret large amounts of audit data. Developed software deployed throughout the Western region.
Doctoral dissertation, "Software Architectures and Patterns for Persistence in Heterogeneous Data-Intensive Systems", describes novel software engineering architectures for polyglot persistence, which is the utilization of many purpose built data stores (e.g., Cassandra, Solr, Hadoop, Titan, etc.) within a single large-scale, data-intensive application. The resulting application architecture is capable of adapting to an unknown or varying number of specialized data storage technologies; eliminating the need for an application to rely solely on a "one-size-fits-all" relational database management system (RDBMS).
Course work and projects focused on Software Engineering, Distributed Systems, Natural Language Processing, Information Retrieval, Human Computer Interaction, and Neuroscience.
Worked as an undergraduate researcher for the department of computer science. Worked solely on the EventTrails research project, which is a part of the Insider Threat grant sponsored by the Advanced Research and Development Activity (ARDA). Primary tasks included developing lightweight event sensors using Perl and a full graphical user interface using Java Swing.
Software engineers are faced with a variety of difficult choices when selecting appropriate technologies on which to base a software system. As the typical software user has become accustomed to systems being “on-demand” and “always- available,” the software engineer is more concerned than ever before about the issues of system scalability, availability, and durability. In the absence of expertise in distributed systems, architectural decisions become complex, slowing feature development and introducing error. Software engineering is in need of robust patterns and tools that increase the accessibility of specialized technologies developed for the completion of specialized tasks. This dissertation describes my existing work related to the challenges of domain modeling and data-access in large- scale, heterogeneous data-intensive systems and extends this work to include novel architectures for utilizing multiple large-scale data stores effectively. This research focuses on increasing the accessibility and flexibility of these data stores, which typically afford scalability, availability, and durability at the cost of added complexity for the application developer. The resulting architecture and associated implementation alleviates common challenges faced by small and medium software enterprises during the development of heterogeneous data-intensive software applications.
Crisis informatics is a field of research that investigates the use of computer-mediated communication—including social media—by members of the public and other entities during times of mass emergency. Supporting this type of research is challenging because large amounts of ephemeral event data can be generated very quickly and so must then be just as rapidly captured. Such data sets are challenging to analyze because of their heterogeneity and size. We have been designing, developing, and deploying software infrastructure to enable the large-scale collection and analysis of social media data during crisis events. We report on the challenges encountered when working in this space, the desired characteristics of such infrastructure, and the techniques, technology, and architectures that have been most useful in providing both scalability and flexibility. We also discuss the types of analytics this infrastructure supports and implications for future crisis informatics research.
Software systems today seldom reside as isolated systems confined to generating and consuming their own data. Collecting, integrating and storing large amounts of data from disparate sources has become a need for many software engineers, as well as for scientists in research settings. This paper presents the lessons learned when transitioning a large-scale data collection infrastructure from a relational database to a hybrid persistence architecture that makes use of both relational and NoSQL technologies. Our examples are drawn from the software infrastructure we built to collect, store, and analyze vast numbers of status updates from the Twitter micro-blogging service in support of a large interdisciplinary group performing research in the area of crisis informatics. We present both the software architecture and data modeling challenges that we encountered during the transition as well as the benefits we gained having migrated to the hybrid persistence architecture.
Crisis informatics is an emerging research area that studies how information and communication technology (ICT) is used in emergency response. An important branch of this area includes investigations of how members of the public make use of ICT to aid them during mass emergencies. Data collection and analytics during crisis events is a critical prerequisite for performing such research, as the data generated during these events on social media networks are ephemeral and easily lost. We report on the current state of a crisis informatics data analytics infrastructure that we are developing in support of a broader, interdisciplinary research program. We also comment on the role that software engineering research plays in these increasingly common, highly interdisciplinary research efforts.
In times of mass emergency, vast amounts of data are generated via computer-mediated communication (CMC) that are difficult to manually cull and organize into a coherent picture. Yet valuable information is broadcast, and can provide useful insight into time- and safety-critical situations if captured and analyzed properly and rapidly. We describe an approach for automatically identifying messages communicated via Twitter that contribute to situational awareness, and explain why it is beneficial for those seeking information during mass emergencies. We collected Twitter messages from four different crisis events of varying nature and magnitude and built a classifier to automatically detect messages that may contribute to situational awareness, utilizing a combination of hand- annotated and automatically-extracted linguistic features. Our system was able to achieve over 80% accuracy on categorizing tweets that contribute to situational awareness. Additionally, we show that a classifier developed for a specific emergency event performs well on similar events. The results are promising, and have the potential to aid the general public in culling and analyzing information communicated during times of mass emergency.
I have a wide variety of research interests including software engineering, distributed systems, information retrieval, machine learning, graph databases and natural language processing. I'm always excited to meet new individuals that share my interests. Please feel free to contact me at email@example.com