Leo McKee-Reid, Christoph Sträter, Maria Angelica Martinez, Joe Needham, Mikita Balesni
Previous work has shown that training "helpful-only" LLMs with reinforcement learning on a curriculum of gameable environments can lead models to generalize to egregious specification gaming, such as editing their own reward function or modifying task checklists to appear more successful. We show that gpt-4o, gpt-4o-mini, o1-preview, and o1-mini - frontier models trained to be helpful, harmless, and honest - can engage in specification gaming without training on a curriculum of tasks, purely from in-context iterative reflection (which we call in-context reinforcement learning, "ICRL"). We also show that using ICRL to generate highly-rewarded outputs for expert iteration (compared to the standard expert iteration reinforcement learning algorithm) may increase gpt-4o-mini's propensity to learn specification-gaming policies, generalizing (in very rare cases) to the most egregious strategy where gpt-4o-mini edits its own reward function. Our results point toward the strong ability of in-context reflection to discover rare specification-gaming strategies that models might not exhibit zero-shot or with normal training, highlighting the need for caution when relying on alignment of LLMs in zero-shot settings.
This project was part of the AI Safety program LASR-Labs that I participated in in summer 2024. The paper has been accepted to the NeurIPS workshop SafeGenAI in 2024.
https://arxiv.org/abs/2410.06491
Martin Reitter, Jakob Näger, Karen Wintersperger, Christoph Sträter, Immanuel Bloch, André Eckardt, Ulrich Schneider
Periodic driving of optical lattices has enabled the creation of novel bandstructures not realizable in static lattice systems, such as topological bands for neutral particles. However, especially driven systems of interacting bosonic particles often suffer from strong heating. We have systematically studied heating in an interacting Bose-Einstein condensate in a driven one-dimensional optical lattice. We find interaction-dependent heating rates that depend both on the scattering length and the driving strength and identify the underlying resonant intra- and interband scattering processes. By comparing experimental data and theory, we find that for driving frequencies well above the trap depth, the heating rate is dramatically reduced by the fact that resonantly scattered atoms leave the trap before dissipating their energy into the system. This mechanism of Floquet evaporative cooling offers a powerful strategy to minimize heating in Floquet engineered quantum gases.
https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.119.200402
Christoph Sträter, André Eckardt
We investigate multi-"photon" interband excitation processes in an optical lattice that is driven periodically in time by a modulation of the lattice depth. Assuming the system to be prepared in the lowest band, we compute the excitation spectrum numerically. Moreover, we estimate the effective coupling parameters for resonant interband excitation processes analytically, employing degenerate perturbation theory in Floquet space. We find that below a threshold driving strength, interband excitations are suppressed exponentially with respect to the inverse driving frequency. For sufficiently low frequencies, this leads to a rather sudden onset of interband heating, once the driving strength reaches the threshold. We argue that this behavior is rather generic and should also be found in lattice systems that are driven by other forms of periodic forcing. Our results are relevant for Floquet engineering, where a lattice system is driven periodically in time in order to endow it with novel properties like the emergence of a strong artificial magnetic field or a topological band structure. In this context, interband excitation processes correspond to detrimental heating.
https://arxiv.org/abs/1604.00850
https://www.degruyter.com/document/doi/10.1515/zna-2016-0129/html
Egidijus Anisimovas, Mantas Račiūnas, Christoph Sträter, André Eckardt, I. B. Spielman, Gediminas Juzeliūnas
We propose a cold-atom realization of a zigzag ladder. The two legs of the ladder correspond to a “synthetic” dimension given by two internal (spin) states of the atoms, so that tunneling between them can be realized as a laser-assisted process. The zigzag geometry is achieved by employing a spin-dependent optical lattice with the site position depending on the internal atomic state, i.e., on the ladder's leg. The lattice offers a possibility to tune the single-particle dispersion from a double-well to a single-minimum configuration. In contrast to previously considered semisynthetic lattices with a square geometry, the tunneling in the synthetic dimension is accompanied by spatial displacements of atoms. Therefore, the atom-atom interactions are nonlocal and act along the diagonal (semisynthetic) direction. We investigate the ground-state properties of the system for the case of strongly interacting bosons. In particular, we find that the interplay between the frustration induced by the magnetic field and the interactions gives rise to an interesting gapped phase at fractional filling factors corresponding to one particle per magnetic unit cell.
https://arxiv.org/abs/1610.00709
https://journals.aps.org/pra/abstract/10.1103/PhysRevA.94.063632
Christoph Sträter, Shashi C. L. Srivastava, André Eckardt
We propose a simple scheme for mimicking the physics of one-dimensional anyons in an optical-lattice experiment. It relies on a bosonic representation of the anyonic Hubbard model to be realized via lattice-shaking-induced resonant tunneling against potential offsets, which are created by a combination of a lattice tilt and strong on-site interactions. No lasers additional to those used for the creation of the optical lattice are required. We also discuss experimental signatures of the continuous interpolation between bosons and fermions when the statistical angle 𝜃 is varied from 0 to 𝜋. Whereas the real-space density of the bosonic atoms corresponds directly to that of the simulated anyonic model, this is not the case for the momentum distribution. Therefore, we propose to use Friedel oscillations in the density as a probe for continuous fermionization of the bosonic atoms.
https://arxiv.org/abs/1602.08384
https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.117.205303
M. Weinberg, C. Ölschläger, C. Sträter, S. Prelle, A. Eckardt, K. Sengstock, J. Simonet
We report on the observation of multiphoton interband absorption processes for quantum gases in shaken light crystals. Periodic inertial forcing, induced by a spatial motion of the lattice potential, drives multiphoton interband excitations of up to the ninth order. The occurrence of such excitation features is systematically investigated with respect to the potential depth and the driving amplitude. Ab initio calculations of resonance positions as well as numerical evaluation of their strengths exhibit good agreement with experimental data. In addition our findings could make it possible to reach novel phases of quantum matter by tailoring appropriate driving schemes.
https://arxiv.org/abs/1505.02657
https://journals.aps.org/pra/abstract/10.1103/PhysRevA.92.043621
Christoph Sträter, André Eckardt
In order to study the interesting interplay between localized and dispersive orbital states in a system of strongly interacting ultracold atoms in an optical lattice, we investigate the possibility to coherently couple the lowest two Bloch bands by means of resonant periodic forcing. For bosons in one dimension we show that a strongly interacting Floquet system can be realized, where at every lattice site two (and only two) near-degenerate orbital states are relevant, whose tunneling matrix elements differ in sign and magnitude. By smoothly tuning both states into resonance, the system is predicted to undergo an orbital-driven Mott-insulator-to-superfluid transition. As a consequence of kinetic frustration, this transition can be either continuous or first order, depending on parameters such as lattice depth and filling.
https://arxiv.org/abs/1407.7421
https://journals.aps.org/pra/abstract/10.1103/PhysRevA.91.053602
Alexandre Faribault, Omar El Araby, Christoph Sträter, Vladimir Gritsev
We present a numerical approach which allows the solving of Bethe equations whose solutions define the eigenstates of Gaudin models. By focusing on a different set of variables, the canceling divergences which occur for certain values of the coupling strength no longer appear explicitly. The problem is thus reduced to a set of quadratic algebraic equations. The required inverse transformation can then be realized using only linear operations and a standard polynomial root-finding algorithm. The method is applied to Richardson’s fermionic pairing model, the central spin model, and the generalized Dicke model.
https://arxiv.org/abs/1103.0472
https://journals.aps.org/prb/abstract/10.1103/PhysRevB.83.235124
Christoph Sträter, Oleksandr Tsyplyatyev, Alexandre Faribault
Using the exact eigenstates of the inhomogeneous Dicke model obtained by numerically solving the Bethe equations, we study the decay of bosonic excitations due to the coupling of the mode to an ensemble of two-level (spin 1/2) systems. We compare the quantum time evolution of the bosonic mode population with the mean-field description which, for a few bosons, agree up to a relatively long Ehrenfest time. We demonstrate that additional excitations lead to a dramatic shortening of the period of validity of the mean-field analysis. However, even in the limit where the number of bosons equal the number of spins, the initial instability remains adequately described by the mean-field approach leading to a finite, albeit short, Ehrenfest time. Through finite size analysis we also present indications that the mean-field approach could still provide an adequate description for thermodynamically large systems even at long times. However, for mesoscopic systems one cannot expect it to capture the behavior beyond the initial decay stage in the limit of an extremely large number of excitations.
https://arxiv.org/abs/1209.0292
https://journals.aps.org/prb/abstract/10.1103/PhysRevB.86.195101