Aaron Schram

Aaron Schram

A Software Engineer from Boulder, Colorado

Expertise

Software Architecture


Extensive software architecture experience designing highly available, scalable, and durable systems that routinely and reliably process large amounts of data for startups and high-growth organizations

Machine Learning


Experience implementing AI, NLP, and information retrieval platforms that analyze, present, and enrich business critical information across disparate data sources

Data Engineering and Science


Expertise in capturing and analyzing data to drive insights using large-scale data storage and analysis systems and toolkits

Experience

Listen.MD, Inc. Logo

Listen.MD

CTO – Passionately working with my team on the first artificially intelligent medical scribe. We are focused on providing physicians the best products in the industry specifically developed to minimize the effort associated with medical documentation.

About – Listen.MD™ is the first artificial intelligence medical scribing application that listens to physicians as they talk to their patient—automatically drafting the visit note for the physician as they exit the exam room.

Opaque Dot

Principal & Founder – Successfully designed and implemented data and analytical systems for Fortune 50 healthcare, energy, SEO, finance, and retail intelligence customers utilizing AWS, Google Cloud Platform, Redshift, Spark, Hadoop, Lucene/Solr, Java, Ruby on Rails, NodeJS, Python, React, Redux, D3.js. Data science, natural language processing, and machine learning systems developed with a variety of Python and Java frameworks.

Accomplishments »
Notable Accomplishments
  • Designed and implemented data capture and analysis system for weather and energy data that led to client acquisition in 2016
  • Senior member of data science at Fortune 50 healthcare company routinely providing data-driven insights to C-level stakeholders
  • Completed numerous large-scale data capture and machine learning engagements across healthcare, energy, finance, and retail
  • Deployed large-scale production applications across both AWS and Google Cloud Platforms

About – Opaque Dot provides consulting services for data-intensive software architectures, machine learning, and data engineering and science. The firm specializes in highly available, scalable, and durable software systems that routinely and reliably process large and disparate data sets for high-growth organizations and startups.

Opaque Dot Logo
Collective IP, Inc. Logo

Collective IP

CTO & Co-Founder – Founded, grew, and led the engineering team that developed and delivered the flagship product to market. Participated in successfully fundraising $3.5 million from angel and institutional investors through seed and series A rounds. Designed and implemented architecture for data processing and search across a variety of disparate data sources. Product and engineering team acquired by Wellspring Worldwide in 2016.

Accomplishments »
Notable Accomplishments
  • Architecture implemented using Java, Spring, Nutch, Hadoop, and SolrCloud
  • Deployed a 45+ node processing architecture in a HIPAA and PCI compliant facility
  • Processing architecture actively processes over 100,000,000 documents
  • SolrCloud-based search infrastructure deployed to AWS using EC2, S3, and VPC
  • Developed, trained, and deployed machine learning classifiers (stochastic gradient descent) and natural language processing systems (named entity recognition, part of speech tagging, and sentence parsing) to enable opportunity discovery and data normalization across unstructured disparate data sources
  • Developed data visualization using D3.js and Ruby on Rails hosted on Heroku
  • Served as product manager for the business and engineering teams. Responsible for planning, requirements generation, task management, and board of directors updates

About – Collective IP is a business intelligence software provider that utilizes advanced information retrieval, machine learning, and natural language processing techniques to identify global licensing opportunities from technology transfer, patent, clinical trial, research grant, SEC, and scientific publication data.

University of Colorado

Graduate Research Assistant – Developed applications and tools for non-technical colleagues to collect and analyze large amounts (i.e., billions) of social media streams to aid in the discovery and response of mass emergencies. Over 50 publications are associated with the data collected by the system.

Accomplishments »
Notable Accomplishments
  • Designed and implemented data collection architecture using Java, Spring, and Cassandra
  • Architecture has currently collected 2.0+ billion tweets
  • System has maintained a 99%+ uptime since deployment in early 2012

About – The University of Colorado led Project EPIC is a $2.8 million multi-disciplinary, multi-university, NSF grant to support the information needs of the general public during times of mass emergency.

University of Colorado Logo

287 Development

Principal & Co-Founder – Principal consultant working with startup and Fortune 500 enterprises in the areas of software architecture, information retrieval, and data science.

About – 287 Development provides services in project management, information retrieval, distributed systems, web applications and machine learning using Agile/Scrum, Lucene/Solr, Hadoop, Cassandra, JPA, Spring and Spring MVC, Ruby on Rails, Mahout, and OpenNLP.

Mocapay

Software Engineer – Lead engineer for consumer facing applications including platform design, implementation, deployment, and user experience for both web and mobile channels.

About – Mocapay is a provider of mobile payments software. Mocapay enables merchants and consumers to interact through payments, loyalty, and gift transactions using their SaaS or mobile­-based platform.

Rally Software

Software Engineer – Started with the first 25 employees and grew to be a senior member of the core engineering team, leading and serving as the scrum master of a 7 person development team.

About – Leading provider of SaaS-based solutions for Agile project management within large software enterprises. The company has 170,000+ users and conducted a successful IPO in 2013.

BEA Systems

Software Engineer Intern – Member of the Weblogic Portal applications team. Designed and implemented the Groupspace Analytics framework from components to user interface using J2EE Portal specific APIs (including page flow and beehive). Resulting framework was deployed to end users and submitted by BEA for an International and United States patent.

Lockheed Martin

Information Security Co-Op – Member of the Information Systems Security team. Designed and implemented an automated auditing environment to satisfy government security requirements using distributed system techniques to parse and interpret large amounts of audit data. Developed software deployed throughout the Western region.

Education

Ph.D., Computer Science

University of Colorado at Boulder

Doctoral dissertation, "Software Architectures and Patterns for Persistence in Heterogeneous Data-Intensive Systems", describes novel software engineering architectures for polyglot persistence, which is the utilization of many purpose built data stores (e.g., Cassandra, Solr, Hadoop, Titan, etc.) within a single large-scale, data-intensive application. The resulting application architecture is capable of adapting to an unknown or varying number of specialized data storage technologies; eliminating the need for an application to rely solely on a "one-size-fits-all" relational database management system (RDBMS).

M.S., Computer Science

University of Colorado at Boulder

Course work and projects focused on Software Engineering, Distributed Systems, Natural Language Processing, Information Retrieval, Human Computer Interaction, and Neuroscience.

B.S., Computer Science

University of Colorado at Boulder

Worked as an undergraduate researcher for the department of computer science. Worked solely on the EventTrails research project, which is a part of the Insider Threat grant sponsored by the Advanced Research and Development Activity (ARDA). Primary tasks included developing lightweight event sensors using Perl and a full graphical user interface using Java Swing.

Skills

Frameworks

  • Spring Core
  • NodeJS
  • Flask
  • Ruby on Rails
  • React and Redux

Data Stores

  • Hadoop Ecosystem
  • Lucene/Solr/ES
  • Cassandra
  • Neo4j and Titan
  • Postgres/RDS

Analytics

  • Spark
  • Python Data Science and ML
  • D3.js
  • R

Miscellaneous

  • AWS
  • Google Cloud Platform
  • Nutch
  • Certified Scrum Master

Publications

Software Architectures and Patterns for Persistence in Heterogeneous Data-Intensive Systems

Dissertation: University of Colorado at Boulder, May 2015

Aaron Schram
Read Abstract »

Software engineers are faced with a variety of difficult choices when selecting appropriate technologies on which to base a software system. As the typical software user has become accustomed to systems being “on-demand” and “always- available,” the software engineer is more concerned than ever before about the issues of system scalability, availability, and durability. In the absence of expertise in distributed systems, architectural decisions become complex, slowing feature development and introducing error. Software engineering is in need of robust patterns and tools that increase the accessibility of specialized technologies developed for the completion of specialized tasks. This dissertation describes my existing work related to the challenges of domain modeling and data-access in large- scale, heterogeneous data-intensive systems and extends this work to include novel architectures for utilizing multiple large-scale data stores effectively. This research focuses on increasing the accessibility and flexibility of these data stores, which typically afford scalability, availability, and durability at the cost of added complexity for the application developer. The resulting architecture and associated implementation alleviates common challenges faced by small and medium software enterprises during the development of heterogeneous data-intensive software applications.

Architectural Implications of Social Media Analytics in Support of Crisis Informatics Research

IEEE Bulletin of the Technical Committee on Data Engineering, September 2013

Kenneth M. Anderson, Aaron Schram, Ali Alzabarah, Leysia Palen
Read Abstract »

Crisis informatics is a field of research that investigates the use of computer-mediated communication—including social media—by members of the public and other entities during times of mass emergency. Supporting this type of research is challenging because large amounts of ephemeral event data can be generated very quickly and so must then be just as rapidly captured. Such data sets are challenging to analyze because of their heterogeneity and size. We have been designing, developing, and deploying software infrastructure to enable the large-scale collection and analysis of social media data during crisis events. We report on the challenges encountered when working in this space, the desired characteristics of such infrastructure, and the techniques, technology, and architectures that have been most useful in providing both scalability and flexibility. We also discuss the types of analytics this infrastructure supports and implications for future crisis informatics research.

MySQL to NoSQL: Data Modeling Challenges in Supporting Scalability

ACM conference on Systems, Programming, Languages and Applications: Software for Humanity, October 2012

Aaron Schram, Kenneth M. Anderson
Read Abstract »

Software systems today seldom reside as isolated systems confined to generating and consuming their own data. Collecting, integrating and storing large amounts of data from disparate sources has become a need for many software engineers, as well as for scientists in research settings. This paper presents the lessons learned when transitioning a large-scale data collection infrastructure from a relational database to a hybrid persistence architecture that makes use of both relational and NoSQL technologies. Our examples are drawn from the software infrastructure we built to collect, store, and analyze vast numbers of status updates from the Twitter micro-blogging service in support of a large interdisciplinary group performing research in the area of crisis informatics. We present both the software architecture and data modeling challenges that we encountered during the transition as well as the benefits we gained having migrated to the hybrid persistence architecture.

Design and Implementation of a Data Analytics Infrastructure in Support of Crisis Informatics Research (NIER Track)

International Conference on Software Engineering, May 2011

Kenneth M. Anderson, Aaron Schram
Read Abstract »

Crisis informatics is an emerging research area that studies how information and communication technology (ICT) is used in emergency response. An important branch of this area includes investigations of how members of the public make use of ICT to aid them during mass emergencies. Data collection and analytics during crisis events is a critical prerequisite for performing such research, as the data generated during these events on social media networks are ephemeral and easily lost. We report on the current state of a crisis informatics data analytics infrastructure that we are developing in support of a broader, interdisciplinary research program. We also comment on the role that software engineering research plays in these increasingly common, highly interdisciplinary research efforts.

NLP to the Rescue? Extracting "Situational Awareness" Tweets During Mass Emergency

Association for the Advancement of Artificial Intelligence Conference on Weblogs and Social Media, July 2011

Sudha Verma, Sarah Vieweg, William J. Corvey, Leysia Palen, James H. Martin, Martha Palmer, Aaron Schram, Kenneth M. Anderson
Read Abstract »

In times of mass emergency, vast amounts of data are generated via computer-mediated communication (CMC) that are difficult to manually cull and organize into a coherent picture. Yet valuable information is broadcast, and can provide useful insight into time- and safety-critical situations if captured and analyzed properly and rapidly. We describe an approach for automatically identifying messages communicated via Twitter that contribute to situational awareness, and explain why it is beneficial for those seeking information during mass emergencies. We collected Twitter messages from four different crisis events of varying nature and magnitude and built a classifier to automatically detect messages that may contribute to situational awareness, utilizing a combination of hand- annotated and automatically-extracted linguistic features. Our system was able to achieve over 80% accuracy on categorizing tweets that contribute to situational awareness. Additionally, we show that a classifier developed for a specific emergency event performs well on similar events. The results are promising, and have the potential to aid the general public in culling and analyzing information communicated during times of mass emergency.

Contact

Aaron Schram Crest
 

Say hello.

I have a wide variety of research interests including software engineering, distributed systems, information retrieval, machine learning, graph databases and natural language processing. I'm always excited to meet new individuals that share my interests. Please feel free to contact me at aaron@aaronschram.com