forked from materials-data-facility/llm-hackathon
-
Notifications
You must be signed in to change notification settings - Fork 61
Expand file tree
/
Copy pathmain.tex
More file actions
429 lines (303 loc) · 21.3 KB
/
main.tex
File metadata and controls
429 lines (303 loc) · 21.3 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
\PassOptionsToPackage{hyphens}{url}
\documentclass[superscriptaddress, nofootinbib, amsmath, amssymb, twocolumn]{revtex4-2} % twocolumn causing issues with neverending spillover into invisible area past first page
% \usepackage{balance}
% \usepackage[margin=1.5cm]{geometry}
\usepackage[english]{babel}
\usepackage[utf8]{inputenc}
\usepackage[]{graphicx}
\usepackage{xspace}
\usepackage{siunitx}
\usepackage{mhchem}
\DeclareSIUnit\angstrom{\text {Å}}
\usepackage{orcidlink}
\usepackage{hyperref}
\usepackage{fontawesome5}
\usepackage{natmove}
\usepackage{placeins}
\usepackage{ifthen} % For conditional statements
\usepackage{longtable}
% % Initialize a macro to keep track of seen project names
% \newcommand{\projectnamesseen}{}
% % Define a macro to select the appropriate social media icon
% \newcommand{\socialmediaicon}[1]{%
% \IfSubStr{#1}{twitter.com}{\faTwitter}{%
% \IfSubStr{#1}{linkedin.com}{\faLinkedin}{%
% \IfSubStr{#1}{facebook.com}{\faFacebook}{%
% \IfSubStr{#1}{instagram.com}{\faInstagram}{\faLink}}}}%
% }
% \DTLloadrawdb[keys={ProjectNum,ProjectLink,ProjectName,TeamName,FirstName,LastName,Email,Organization,Industry/Academia/Gov't,City,Address,GithubRepo,SocialMediaLink,VideoLink}]{stores}{test.csv}
\usepackage{xparse}
\usepackage[a-1b]{pdfx}
% \usepackage[allfiguresdraft]{draftfigure}
\usepackage[disable]{todonotes}
% \usepackage{todonotes}
\newcommand{\githublink}[2]{
\href{https://github.com/#1/#2}{\faGithub\ \url{#1/#2}}
}
\newcommand{\twitterlink}[1]{
\href{#1}{\faTwitter}
}
\newcommand{\zenodolink}[1]{
\href{https://doi.org/#1}{\faArchive\ \url{#1}}
}
%https://twitter.com/SamCox822/status/1641484192566460416?s=20
\newcommand{\formatlinks}[3]{%
\href{\ifx\empty#1\empty N/A\else#1\fi}{\faGithub}\,%
\href{\ifx\empty#2\empty N/A\else#2\fi}{\faVideo}\,%
\href{\ifx\empty#3\empty N/A\else#3\fi}{\faTwitter} % Adjust the icon based on social media type
}
\newcommand{\hflogo}{%
\includegraphics[height=.9em]{figures/huggingface.png}
}
\newcommand{\huggingfacelink}[2]{
\href{https://huggingface.co/spaces/#1/#2}{\hflogo \url{#1/#2}}
}
\newcommand{\huggingfacehublink}[2]{
\href{https://huggingface.co/#1/#2}{\hflogo \url{#1/#2}}
}
% Adjusted Hyperref Setup for Automatically Colored Text Links
\hypersetup{
colorlinks=true, % Enables colored links
breaklinks=true,
urlcolor=blue, % Sets the color of URL links
linkcolor=blue, % Sets the color of internal links
citecolor=blue, % Sets the color of citation links
filecolor=blue, % Sets the color of file links
allcolors=blue, % Ensures all link types are blue by default
pdftitle={Title}, % PDF Title
pdfauthor={Author} % PDF Author
}
\usepackage[nameinlink,capitalise]{cleveref} %needs to appear after hyperref, https://tex.stackexchange.com/questions/396728/my-equations-referencing-not-working
\Crefname{figure}{Figure}{Figures} %needs to appear after hyperref and cleveref
\crefname{appsec}{Appendix}{Appendices}
\newcommand\crefrangeconjunction{--} % modify the reference style
% =====================================================
% packages for creating code listings
\usepackage{listings, xcolor}
\definecolor{codegreen}{rgb}{0,0.6,0}
\definecolor{codegray}{rgb}{0.5,0.5,0.5}
\definecolor{codepurple}{rgb}{0.58,0,0.82}
\definecolor{tqblue}{HTML}{08293d}
\definecolor{backcolour}{HTML}{fefdf5}
\lstdefinestyle{pythonstyle}{
backgroundcolor=\color{backcolour},
commentstyle=\color{codegreen},
keywordstyle=\color{magenta},
numberstyle=\tiny\color{codegray},
stringstyle=\color{codepurple},
basicstyle=\ttfamily\footnotesize\color{tqblue},
breakatwhitespace=false,
breaklines=true,
postbreak=\mbox{\textcolor{magenta}{$\hookrightarrow$}\space},
captionpos=b,
keepspaces=true,
numbers=left,
numbersep=5pt,
showspaces=false,
showstringspaces=false,
showtabs=false,
tabsize=2
}
\lstset{style=pythonstyle}
\hbadness=99999
\newcolumntype{C}{>{$}c<{$}}
\AtBeginDocument{%
\heavyrulewidth=.08em
\lightrulewidth=.05em
\cmidrulewidth=.03em
\belowrulesep=.65ex
\belowbottomsep=0pt
\aboverulesep=.4ex
\abovetopsep=0pt
\cmidrulesep=\doublerulesep
\cmidrulekern=.5em
\defaultaddspace=.5em
}
\usepackage[most]{tcolorbox}
\tcbset {
base/.style={
arc=0mm,
bottomtitle=0.5mm,
boxrule=0mm,
colbacktitle=black!10!white,
coltitle=black,
fonttitle=\bfseries,
left=2.5mm,
leftrule=1mm,
right=8.5mm,
title={#1},
toptitle=0.75mm,
width=\textwidth,
breakable
}
}
\definecolor{brandblue}{rgb}{0, 0.27843137254902, 0.466666666666667}
\newtcolorbox{agentinteraction}[1]{
colframe=brandblue,
base={#1}
}
\definecolor{brandbred}{rgb}{0.63921568627451, 0, 0}
\newtcolorbox{agentinteraction2}[1]{
colframe=brandbred,
base={#1}
}
\newtcolorbox{subbox}[1]{
colframe=black!30!white,
base={#1}
}
\usepackage [autostyle, english = american]{csquotes}
\MakeOuterQuote{"}
\usepackage[acronym, nonumberlist]{glossaries}
\glsdisablehyper
\makeglossaries
\input{latex/glossary}
\usepackage{tabularx} % For flexible tables with adjustable column widths
\usepackage{booktabs} % For better table lines (\toprule, \midrule, \bottomrule)
\usepackage{cleveref}
\let\originalcite\cite
\renewcommand{\cite}[1]{\unskip~\originalcite{#1}}
\usepackage{setspace}
% \clubpenalty=10000
% \widowpenalty=10000
% \displaywidowpenalty=10000
\usepackage{titlesec}
\titlespacing{\subsection}
{0pt}{9pt}{6pt}
\usepackage{array}
\usepackage{ragged2e}
\usepackage{nicefrac}
\usepackage[caption=false]{subfig}
\newcolumntype{P}[1]{>{\raggedright\arraybackslash}p{#1}}
\begin{document}
\title{Bayesian Optimization Hackathon for Chemistry and Materials}
% \input{latex/authors}
\input{latex/authors-hardcoded}
\begin{abstract}
The Acceleration Consortium and Merck KGaA hosted a 2-day virtual hackathon on March 27-28, 2024, bringing together scientists to explore, collaborate, and innovate in the field of Bayesian optimization for the physical sciences. Participants were encouraged to select or develop Bayesian optimization algorithms, apply them to benchmarking tasks, design new benchmarks, create instructional tutorials, and describe real-world applications. With over 100 participants across 60 academic, industry, and government organizations located in 38 cities, 14 countries, and 4 continents, this was a global event. % https://chatgpt.com/share/f6cd733f-1126-4151-86c5-d4b59d158dc3
The outputs from this event, including developed algorithms, benchmarks, and tutorials, will serve as valuable resources for the research community, in addition to the new skills learned and connections formed. Released projects and general information are available at \url{https://ac-bo-hackathon.github.io/} and other locations linked from individual project pages. This event demonstrates the potential of community-driven research efforts to accelerate advances in Bayesian optimization in chemistry and materials science.
\end{abstract}
\maketitle
% ToDo:
% emphasize tooling/constrained prompting/guidance
\todo[inline]{automate authors.tex based on the google sheet (maybe some info needed from credit form}
\section{Introduction}
\Gls{bo} has emerged as a powerful tool in optimizing complex and expensive-to-evaluate functions, often outperforming traditional search methods in a variety of scientific domains such as optimizing composition and processing parameters to maximize alloy yield strength or identifying synthesis pathways that maximize efficacy of HIV drugs (\cref{fig:intro-bo}). Hackathons help people to connect, gain skills, and flesh out new ideas. In the words of Michelle Duke, the "Hackathon Queen":
\begin{quote}
A hackathon is a short competition where people work together in teams to solve problems and challenges by coming up with solutions and ideas.
\end{quote}
\begin{figure}
\centering
\includegraphics[width=1\linewidth]{latex/figures/intro-bo.png}
\caption{Optimization traces for traditional design of experiment (DoE) methods compared with \gls{bo}, typically outperforms. \Gls{bo} uses a smart model to predict where to look next in an experiment to find the best results with few experiments \cite{baird_building_2023}.}
\label{fig:intro-bo}
\end{figure}
The goal of the AC BO Hackathon was to leverage the expertise of a diverse, global community to advance the development and application of \gls{bo} techniques for solving critical challenges in the physical sciences. The hackathon also aimed to foster collaboration and knowledge sharing among participants from different backgrounds, including academia, national laboratories, government agencies, and private industry. The event attracted 120 active participants from 44 teams, representing 41 academic institutions, 12 national labs, and 9 companies. Likewise, the participants were located in 38 cities, 14 countries, and 4 continents (\cref{fig:map}). A full list of projects, including links to the corresponding GitHub repositories, submission video, and social media post are provided in \cref{tab:projects}.
% \onecolumngrid % (https://tex.stackexchange.com/a/214834/224861)
% % don't remove this linebreak
\begin{figure*}
\centering
% \captionsetup{justification=centering}
\includegraphics[width=0.95\linewidth]{latex/figures/world_map.png}
\caption{Demographic distributions of the participating teams and their affiliations. The hackathon supported 129 participants across 60 academic, industry, and government organizations located in 59 cities, 19 countries, and 4 continents. \label{fig:map} }
\end{figure*}
% \begin{table*}[]
% \caption{List of projects and project types, with links to corresponding website project pages, repositories, videos, and social media posts.} \label{tab:projects} \setlength{\extrarowheight}{0.4em} \begin{tabularx}{\textwidth}{>{\centering\arraybackslash}p{1cm} X >{\centering\arraybackslash}X} \toprule \# & Project Name & Links \\ \midrule \href{https://example.com}{\#1} & Project A & \href{https://github.com/example}{\faGithub} \, \href{https://youtube.com}{\faVideo} \, \href{https://twitter.com}{\faTwitter} \tabularnewline \href{https://example.com}{\#2} & Project B & \href{https://github.com/example}{\faGithub} \, \href{https://youtube.com}{\faVideo} \, \href{https://linkedin.com}{\faLinkedin} \tabularnewline \href{https://example.com}{\#3} & Project C & \href{https://github.com/example}{\faGithub} \, \href{https://youtube.com}{\faVideo} \, \href{https://twitter.com}{\faTwitter} \tabularnewline \bottomrule \end{tabularx}
% \end{table*}
% \section{Prior Materials-focused Hackathons}
\section{Hackathon Details and Setup}
This hackathon was in large part inspired by \citet{jablonka_14_2023} and is one of several materials-focused hackathons in the past \cite{mulholland_hackathon_2015, sparks_insights_2024}. % l.ferguson_conference_2019
Participants were provided with various resources to prepare for the hackathon – this included GitHub classroom assignments with automated feedback, application- and theory-focused videos and tutorials, Python refresher materials, and a list of tools to consider using during the hackathon (\cref{fig:preparation}).
\begin{figure*}
\centering
\includegraphics[width=0.98\linewidth]{latex/figures/preparation.png}
\caption{A snapshot of \href{https://ac-bo-hackathon.github.io/resources/}{resources listed on the hackathon webpage} such as hackathon orientation, intro to \gls{bo}, and a Python refresher assignment. These resources prepared participants to maximize their time during the two-day synchronous portion of the hackathon and helped level the playing field for participants with varied backgrounds and skill levels.}
\label{fig:preparation}
\end{figure*}
One of the unique aspects of this event is that it was hosted in Gather Town, a sort of union between traditional video conferencing software and retro arcade-style avatars and virtual spaces (\cref{fig:gathertown}). Participants create a custom avatar and maneuver in a two-dimensional space. The videos and audio of other participants appear and become audible when nearby, and fade out when far away, simulating an in-person experience. At the beginning of the hackathon, all participants gathered to listen to keynotes in realtime, which were broadcasted via YouTube live and embedded into the Gather Town space. The videos were then \href{https://ac-bo-hackathon.github.io/videos-slides/}{made available on the hackathon website}, which collectively garnered approximately 1600 views within two months. After the keynotes, teams were assigned tables in breakout rooms, each with a whiteboard. Individual tables were assigned as "private spaces" which isolated the shared audio and video within each space. This had a number of advantages for collaboration within and across teams.
\begin{figure*}
\centering
\includegraphics[width=0.95\linewidth]{latex/figures/gathertown.png}
\caption{Gather town \href{https://ac-bo-hackathon.github.io/videos-slides/}{keynote} room (left), custom avatars (top-right), and an example of a breakout room for teams (bottom-right). Keynotes were broadcasted in realtime to participants via an embedded YouTube livestream. Use of Gather Town helped level the playing field for teams who were in physically separate locations and made it easier for facilitators and other teams to have more natural "check-ins" with other projects.}
\label{fig:gathertown}
\end{figure*}
The hackathon concluded with a project showcase accompanied by crowdsourced judging within a "poster room" (\cref{fig:poster}).
\begin{figure*}
\centering
\includegraphics[width=0.95\linewidth]{latex/figures/posters.png}
\caption{The synchronous portion of the hackathon concluded with a poster session and community judging. One participant noted that "it almost felt like a real poster session."}
\label{fig:poster}
\end{figure*}
Community judging occurred via \href{https://github.com/anishathalye/gavel}{Gavel}, an automated pairwise comparison judging system. Use of this system helped to improve fairness, scalability, and accuracy by having judges compare projects relative to each other rather than assigning subjective numerical scores (for example, "rate your pain on a scale from 1 to 10, where 10 is the worst possible pain you can imagine"). The approach of pairwise judging reduces bias, allows for handling large competitions efficiently, and produces high-quality rankings using statistical models with dynamic assignments to judges to maximize information gain. It has been successfully used at HackMIT and other events to streamline judging and enhance transparency and credibility.
The corresponding Gavel web app was hosted on Heroku according to directions in the Gavel repository. Individualized links were distributed to judges via email using Gavel's SendGrid integration. Collectively, 35 judges cast 319 votes.
% \clearpage
\begin{longtable*}{>{\centering\arraybackslash}p{1.5cm} @{\hspace{0.4cm}} >{\raggedright\arraybackslash}p{11cm} @{\hspace{1.5cm}} >{\raggedright\arraybackslash}p{2.5cm}}
\caption{List of projects with links to GitHub, Social Media, and Video. Project pages on the hackathon website are available at \url{https://ac-bo-hackathon.github.io/projects/}.} \label{tab:projects} \\
% \toprule
% \textbf{Proj. \#} & \textbf{Project Name} & \textbf{Links} \\
% \midrule
% \endfirsthead
\toprule
\textbf{Proj. \#} & \textbf{Project Name} & \textbf{Links} \\
\midrule
\endhead
\midrule \multicolumn{3}{r}{\textit{Continued on the next page}} \\
\midrule
\endfoot
\bottomrule
\endlastfoot
\input{|python3 python_scripts/process_spreadsheet.py}
\end{longtable*}
\begin{table*}[]
\caption{Ranked Projects with Team Names and Prize Distribution. To avoid incentivizing single-person teams and very large teams, both per-person and per-team limits were imposed (e.g., teams of 4 or more would have the max per-team amount divided equally rather than receive the max per-person amount).}
\label{tab:winners}
\setlength{\extrarowheight}{0.8em}
\begin{tabularx}{\textwidth}{>{\centering\arraybackslash}p{1.0cm} >{\centering\arraybackslash}p{1.5cm} >{\centering\arraybackslash}p{3cm} X >{\centering\arraybackslash}p{3cm}}
\toprule
\textbf{Rank} & \textbf{Proj. \#} & \textbf{Team Name} & \textbf{Project Name} & \textbf{Prize* (CAD)} \\ \midrule
1st & \#23 & Noisy Nerds & Reliable Surrogate Models of Noisy Data & 300 (1000 max) \tabularnewline
2nd & \#34 & BOMS Prob & Streamlining Material Discovery - Bayesian Optimization in Thermal Fluid Mixtures & 150 (500 max) \tabularnewline
3rd & \#7 & Surface Science Syndicate & BayBE One More Time - Exploring Corrosion Inhibitors for Materials Design & 75 (250 max) \tabularnewline
4th & \#5 & KLM & Comparing Bayesian Optimization Methods \ldots Against Simulated "Human" Decision-making & 40 (125 max) \tabularnewline
5th & \#8 & Molecular Representation & BO for Drug Discovery - What is the role of molecular representation? & 40 (125 max) \tabularnewline
6th & \#9 & PME No Hikari & Optimizing The CO2 Adsorption Capacity of Metal-Organic Frameworks Using Thompson Sampling & 40 (125 max) \tabularnewline
7th & \#11 & BlenDS & BlendDS - An intuitive specification of the design space for blends of components & 40 (125 max) \tabularnewline
8th & \#30 & SERO Opt & Active learning for voltammetry waveform design & 40 (125 max) \tabularnewline
9th & \#43 & General Optimizers & Bayesian Optimization for Generality & 40 (125 max) \tabularnewline
10th & \#3 & Sparks Group & Take Your Time - Measuring Optimization Performance as a Function of ACQF Optimizer Runtime & 40 (125 max) \tabularnewline
\bottomrule
\end{tabularx}
\end{table*}
% Preparation for the hackathon - 111 GitHub Classroom assignments accepted
% The hackathon was designed with tips, trick, and resources from various sources, such as https://github.com/github/hackathons.
% Hosts: Acceleration Consortium, Merck KGaA
\begin{table*}[]
\caption{Project Topics for the Hackathon. See \href{https://ac-bo-hackathon.github.io/submission/}{the submission page} for more details.}
\label{tab:project_topics}
\setlength{\extrarowheight}{0.4em}
\begin{tabularx}{\textwidth}{>{\centering\arraybackslash}p{0.5cm} p{4.5cm} X}
\toprule
& \textbf{Topic} & \textbf{Description} \\ \midrule
1 & \textbf{Apply Algorithms} & Choose an algorithm and apply it to a \href{https://huggingface.co/collections/AccelerationConsortium/optimization-benchmarks-66a44daf10de1a0335f28826}{hackathon benchmark task} \\
2 & \textbf{Develop Benchmarks} & Develop a new benchmark and add it to a suite of benchmarks \\
3 & \textbf{Create Tutorials} & Create "gentle introduction" tutorials for \href{https://ac-microcourses.readthedocs.io/en/latest/courses/data-science/overview.html}{advanced optimization topics} \\
4 & \textbf{Propose Tasks} & Propose materials tasks that \textit{can} and \textit{should} be tackled with \gls{bo} \\
5 & \textbf{General} & Other projects related to \gls{bo} for the physical sciences \\
\bottomrule
\end{tabularx}
\end{table*}
% \clearpage
\section{Projects' Key Findings}
This section provides a comprehensive summary and highlights the key findings from all project submissions.
To streamline the evaluation process, all YouTube video submissions were transcribed and analyzed using an AI agent powered by Anthropics’s \emph{claude-3-5-sonnet-20240620} with a temperature of 0.3, ensuring consistent and accurate processing.
Each transcript was then manually edited to ensure a high-quality manuscript compilation.
This automated approach enhances efficiency while maintaining a structured and objective assessment of the submissions.
\input{|python3 python_scripts/process_summaries.py}
\clearpage
\section*{Acknowledgements}
We express a huge thanks to Kevin Jablonka and Ben Blaiszik (latter a co-author of this paper) and others did phenomenal work paving the way with the LLMs for Chemistry and Materials Hackathon (2023). Much of the setup and many of the patterns of the Bayesian optimization hackathon was either directly replicated based off of or inspired by the LLM hackathon.
This work was undertaken thanks in part to funding provided to the University of Toronto’s Acceleration Consortium from the Canada First Research Excellence Fund (CFREF-2022-00042). Tim Würger acknowledges Tiago L. P. Galvão for providing Al alloy data from the CORDATA database. Can Özkan acknowledges the VIPCOAT project (Virtual Open Innovation Platform for Active Protective Coatings Guided by Modelling and Optimisation) funded by the Horizon 2020 research and innovation programme of the European Union by grant agreement no. 952903. Ankur Kumar Gupta acknowledges support by the U.S. Department of Energy, Office of Science, Basic Energy Sciences, Materials Sciences and Engineering Division under Contract No. DE-AC02-05CH11231, FWP No. DAC-LBL-Long.
\section*{References}
%\printglossaries
% \bibliographystyle{achemso}
\bibliography{latex/references, latex/summaries-ref}
\end{document}